Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver: add --shutdown-delay-duration to keep serving until LBs stop sending traffic #74416

Conversation

@sttts
Copy link
Contributor

commented Feb 22, 2019

This is meant to delay the apiserver shutdown for a defined time duration in order to give the SDN a chance to update changed endpoints.

The reconciler is part of the "master controller", also called "bootstrap controller". It has a pre shutdown hook triggered by the stopCh. We delay the internalStopCh being closed which triggers to stop serving.

Add --shutdown-delay-duration to kube-apiserver in order to delay a graceful shutdown. `/healthz` will keep returning success during this time and requests are normally served, but `/readyz` will return faillure immediately. This delay can be used to allow the SDN to update iptables on all nodes and stop sending traffic.
@sttts

This comment has been minimized.

Copy link
Contributor Author

commented Feb 22, 2019

/assign @deads2k

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Feb 22, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sttts

This comment has been minimized.

Copy link
Contributor Author

commented Feb 22, 2019

/assign @stewart-yu

@@ -174,5 +181,9 @@ func (s *ServerRunOptions) AddUniversalFlags(fs *pflag.FlagSet) {
"handler, which picks a randomized value above this number as the connection timeout, "+
"to spread out load.")

fs.DurationVar(&s.MinimalShutdownDuration, "minimal-shutdown-duration", s.MinimalShutdownDuration, ""+
"Minimal duration of a graceful shutdown, e.g. to guarantee that all endpoints pointing to this API server "+

This comment has been minimized.

Copy link
@stewart-yu

stewart-yu Feb 23, 2019

Contributor

can can remove "" at the end of line 184?

This comment has been minimized.

Copy link
@sttts

sttts Feb 25, 2019

Author Contributor

I am just following the style in this file.

Show resolved Hide resolved staging/src/k8s.io/apiserver/pkg/server/options/server_run_options.go

@sttts sttts force-pushed the sttts:sttts-apiserver-minimum-shutdown-duration branch from 1ec3faf to 6418b0c Feb 25, 2019

@k8s-ci-robot k8s-ci-robot added size/M and removed size/S labels Feb 25, 2019

@deads2k

This comment has been minimized.

Copy link
Contributor

commented Feb 25, 2019

This lgtm. It helps limit an unnecessary race. It's not foolproof because we cannot be aware of all of our consumers, but it makes it possible to avoid unnecessary dead endpoints.

@kubernetes/sig-api-machinery-misc

@lavalamp

This comment has been minimized.

Copy link
Member

commented Feb 25, 2019

/hold

not that I necessarily disagree, but I think this might be a big enough addition to the surface area that I want to think about it for a second.

@sttts

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2019

/retest

@logicalhan
Copy link
Contributor

left a comment

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 30, 2019

@sttts sttts changed the title apiserver: add --shutdown-delay-period to keep serving until LBs stop sending traffic apiserver: add --shutdown-delay-duration to keep serving until LBs stop sending traffic May 30, 2019

@sttts sttts force-pushed the sttts:sttts-apiserver-minimum-shutdown-duration branch from db581a6 to 77bb860 May 30, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label May 30, 2019

@logicalhan
Copy link
Contributor

left a comment

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 30, 2019

@sttts

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2019

/retest

1 similar comment
@sttts

This comment has been minimized.

Copy link
Contributor Author

commented May 31, 2019

/retest

@sttts sttts force-pushed the sttts:sttts-apiserver-minimum-shutdown-duration branch from 77bb860 to bd1f77a Jul 5, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 5, 2019

@sttts

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2019

Rebased.

@lavalamp @logicalhan please cancel the hold here.

@sttts sttts added the lgtm label Jul 5, 2019

@sttts

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

/retest

@sttts sttts force-pushed the sttts:sttts-apiserver-minimum-shutdown-duration branch from bd1f77a to e0d6b98 Jul 9, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 9, 2019

@sttts sttts force-pushed the sttts:sttts-apiserver-minimum-shutdown-duration branch from e0d6b98 to 408f36b Jul 9, 2019

@logicalhan

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 9, 2019

@logicalhan

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

/hold cancel

@fejta-bot

This comment has been minimized.

Copy link

commented Jul 11, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot

This comment has been minimized.

Copy link

commented Jul 12, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 7e17aeb into kubernetes:master Jul 12, 2019

21 of 23 checks passed

pull-kubernetes-e2e-gce-100-performance Job triggered.
Details
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation sttts authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2019

@sttts: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-100-performance 408f36b link /test pull-kubernetes-e2e-gce-100-performance

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.