Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new shared "Drainer" handler. #1517

Merged
merged 2 commits into from
Jul 18, 2020

Conversation

mattmoor
Copy link
Member

This implements a new http.Handler called Drainer, which is intended to wrap some inner http.Handler business logic with a new outer handler that can respond to Kubelet probes (successfully until told to "Drain()").

This takes over the webhook's relatively new probe handling and lame duck logic with one key difference. Previously the webhook waited for a fixed period after SIGTERM before exitting, but the new logic waits for this same grace period AFTER THE LAST REQUEST. So if the handler keeps getting (non-probe) requests, the timer will continually reset, and once it stops receiving requests for the configured grace period, "Drain()" will return and the webhook will exit.

The goal of this work is to try to better cope with what we believe to be high tail latencies of the API server seeing that a webhook replica is shutting down.

Related: #1509

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 17, 2020
@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 17, 2020
@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 17, 2020
Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely needs more comments :)

Comment on lines +38 to +43
Inner http.Handler

// QuietPeriod is the duration that must elapse without any requests
// after Drain is called before it may return.
QuietPeriod time.Duration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hide those as well and add a ctor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes it more complicated to have an optional QuietPeriod 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pass 0 as QP? ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤢

network/handlers/drain.go Outdated Show resolved Hide resolved
network/handlers/drain.go Outdated Show resolved Hide resolved
network/handlers/drain.go Outdated Show resolved Hide resolved
network/handlers/drain.go Outdated Show resolved Hide resolved
network/handlers/drain.go Outdated Show resolved Hide resolved
This implements a new `http.Handler` called `Drainer`, which is intended to wrap some inner `http.Handler` business logic with a new outer handler that can respond to Kubelet probes (successfully until told to "Drain()").

This takes over the webhook's relatively new probe handling and lame duck logic with one key difference.  Previously the webhook waited for a fixed period after SIGTERM before exitting, but the new logic waits for this same grace period AFTER THE LAST REQUEST.  So if the handler keeps getting (non-probe) requests, the timer will continually reset, and once it stops receiving requests for the configured grace period, "Drain()" will return and the webhook will exit.

The goal of this work is to try to better cope with what we believe to be high tail latencies of the API server seeing that a webhook replica is shutting down.

Related: knative#1509
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-pkg-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
network/handlers/drain.go Do not exist 96.4%
webhook/webhook.go 87.0% 85.9% -1.0

Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold
if you wanna do the ctor, if not unhold :)
we can sort it out later

Comment on lines +38 to +43
Inner http.Handler

// QuietPeriod is the duration that must elapse without any requests
// after Drain is called before it may return.
QuietPeriod time.Duration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pass 0 as QP? ;-)

@knative-prow-robot knative-prow-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jul 18, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mattmoor
Copy link
Member Author

/unhold

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2020
@knative-prow-robot knative-prow-robot merged commit e193c4b into knative:master Jul 18, 2020
@mattmoor mattmoor deleted the drainer branch July 18, 2020 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants