Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of fix for #77733 (NodeLifecycleController is overloading kube-apiserver) into release-1.16 #88959

Conversation

@mborsz
Copy link
Member

mborsz commented Mar 9, 2020

Cherry pick of #82489 #82884 #83248 #83320 #83780 #84445 #81167 on release-1.16.

#82489: adding lock to node data map
#82884: eviction processing refactor
#83248: adding fakeGetPodsAssignedToNode
#83320: adding pods to DeletePods parameters
#83780: using pod pointers in node lifecycle controller
#84445: MarkPodsNotReady retry fix
#81167: adding pods lister

While the number of PRs is quite high, most of them are quite trivial. (#81167 is the main one)

Context: This is a cherry pick of fixes for #77733 (many nodes becoming unhealthy can overload kube-apiserver making master unavailable).

The risk of not cherry picking is that if 5-50 nodes (actual number depends on number of pods in the system) will become unhealthy in the same time (e.g. network outage), this will overload kube-apiserver and etcd making all of the nodes unavailable. In my opinion this is a critical issue we should address also in the past releases.

For details on the cherry pick process, see the cherry pick requests page.

Fix the problem where couple nodes becoming NotReady at the same time could cause master instability or even complete outage in large enough clusters.
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Mar 9, 2020
@mborsz mborsz changed the title Automated cherry pick of #82489: adding lock to node data map #82884: eviction processing refactor #83248: adding fakeGetPodsAssignedToNode #83320: adding pods to DeletePods parameters #83780: using pod pointers in node lifecycle controller #84445: MarkPodsNotReady retry fix #81167: adding pods lister Automated cherry pick of fix for #77733 (NodeLifecycleController is overloading kube-apiserver) into release-1.16 Mar 9, 2020
@k8s-ci-robot k8s-ci-robot requested review from deads2k and gmarek Mar 9, 2020
@mborsz mborsz force-pushed the mborsz:automated-cherry-pick-of-#82489-#82884-#83248-#83320-#83780-#84445-#81167-upstream-release-1.16 branch from 5d0b630 to 2fab421 Mar 9, 2020
@wojtek-t wojtek-t self-assigned this Mar 9, 2020
@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Mar 10, 2020

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 10, 2020
@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Mar 10, 2020

@kubernetes/patch-release-team - can you please take a look?

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Mar 10, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mborsz, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ef1ba35 into kubernetes:release-1.16 Mar 10, 2020
12 of 13 checks passed
12 of 13 checks passed
tide Not mergeable. Retesting: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation krzysied authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.