Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/bz1507664 suppress health checks #17155

Conversation

knobunc
Copy link
Contributor

@knobunc knobunc commented Nov 2, 2017

Fix the "supress health checks when only one backing service" logic

This reverts commit 9835dca because it is wrong, then the second commit implements the correct change.

Previously we had attempted to suppress the health checks, but it was looking at the endpoints of the service, not the servers that back the route. The problem is that if we just look at endpoints, then if a route has two backing services, each with one endpoint, we would incorrectly suppress the health checks.

This is the third attempt. I had missed a case in the last one, so now the logic is moved to be computed and recorded by the router controller, but used in the template.

Fixes bug 1507664 (https://bugzilla.redhat.com/show_bug.cgi?id=1507664)

@knobunc knobunc self-assigned this Nov 2, 2017
@knobunc knobunc added component/routing kind/bug Categorizes issue or PR as related to a bug. labels Nov 2, 2017
@knobunc
Copy link
Contributor Author

knobunc commented Nov 2, 2017

/test

@openshift/networking PTAL

@rajatchopra
Copy link
Contributor

Looks right, finally.

Previously we had attempted to suppress the health checks, but it was looking at the endpoints of the service, not the servers that back the route. The problem is that if we just look at endpoints, then if a route has two backing services, each with one endpoint, we would incorrectly suppress the health checks.

This is the third attempt.  I had missed a case in the last one, so now the logic is moved to be computed and recorded by the router controller, but used in the template.

Fixes bug 1507664 (https://bugzilla.redhat.com/show_bug.cgi?id=1507664)
@knobunc knobunc force-pushed the bug/bz1507664-suppress-health-checks branch from 0d964b2 to 8cb7b95 Compare November 2, 2017 19:25
@imcsk8
Copy link
Contributor

imcsk8 commented Nov 2, 2017

LGTM

if len(alias.PreferPort) == 0 || endpoint.PortName == alias.PreferPort || endpoint.Port == alias.PreferPort {
for i := range svc.EndpointTable {
endpoint := svc.EndpointTable[i]
if endpoint.PortName == alias.PreferPort || endpoint.Port == alias.PreferPort {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need the len(alias.PreferPort) == 0 to avoid a segfault?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pecameron
No, len(alias.PreferPort) == 0 implies user did not specific any preferred port (http/https), then we don't need to filter any endpoints. That case is already handled in Line 160

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two commits in this PR. The first simply reverts my previous bad PR restoring the previous behavior. It probably makes sense to look at the second commit only.

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 2, 2017
if weight < 0 {
weight = 0
} else if weight > 256 {
weight = 256

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When weight >= 256, don't we want to set it to 0?
Otherwise in Line 918, we count this serviceUnit toward active endpoints which doesn't seem right.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we want to ignore the service unit if the corresponding weight is invalid.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight=0 clearly means user want to ignore the endpoint but weight > 256 means mostly some typo/calculation error and may not mean ignore the endpoint.

It may be useful to log a warning when we are defaulting invalid weights to 0 or 256?

@knobunc
Copy link
Contributor Author

knobunc commented Nov 2, 2017 via email

Copy link

@pravisankar pravisankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pravisankar
Copy link

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 2, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: knobunc, pravisankar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@knobunc
Copy link
Contributor Author

knobunc commented Nov 3, 2017 via email

@knobunc knobunc dismissed pecameron’s stale review November 3, 2017 10:37

You commented on the reversion PR... that behavior was working before my most recent changes.

@knobunc
Copy link
Contributor Author

knobunc commented Nov 3, 2017

/test all

@knobunc
Copy link
Contributor Author

knobunc commented Nov 3, 2017

/retest

@knobunc
Copy link
Contributor Author

knobunc commented Nov 3, 2017

/test extended_conformance_gce

@knobunc
Copy link
Contributor Author

knobunc commented Nov 3, 2017

/retest

Flaked on #16972

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 3, 2017
@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit fc8725b into openshift:master Nov 3, 2017
@knobunc knobunc deleted the bug/bz1507664-suppress-health-checks branch June 7, 2018 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/routing kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants