Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid listing pods in NodeLifecycleController #81167

Merged
merged 1 commit into from Nov 7, 2019

Conversation

@krzysied
Copy link
Contributor

krzysied commented Aug 8, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
Reduces number of pods lists on cluster level.

Which issue(s) this PR fixes:

Fixes #77733

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@krzysied

This comment has been minimized.

Copy link
Contributor Author

krzysied commented Aug 8, 2019

/assign @wojtek-t
/cc @mborsz

@k8s-ci-robot k8s-ci-robot requested a review from mborsz Aug 8, 2019
@k8s-ci-robot k8s-ci-robot added sig/apps and removed needs-sig labels Aug 8, 2019
@k8s-ci-robot k8s-ci-robot requested review from deads2k and k82cn Aug 8, 2019
@krzysied krzysied force-pushed the krzysied:node_controller_list branch 6 times, most recently from 8350bc7 to 032295c Aug 19, 2019
@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Aug 22, 2019
@krzysied krzysied force-pushed the krzysied:node_controller_list branch from 032295c to 23e63c1 Aug 22, 2019
@k8s-ci-robot k8s-ci-robot added size/L and removed size/XL labels Aug 22, 2019
@krzysied krzysied force-pushed the krzysied:node_controller_list branch 2 times, most recently from b1925fc to 3a686d5 Aug 22, 2019
pods, err := nc.getPodsAssignedToNode(node.Name)
if err != nil {
utilruntime.HandleError(fmt.Errorf("unable to list pods of node %v: %v", node.Name, err))
nc.nodesToRetry.Store(node.Name, struct{}{})

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 5, 2019

Member

Why do we need this one? We will enter the "currentReadyCondition != nil" in the next iteration anyway, right?

This comment has been minimized.

Copy link
@krzysied

krzysied Nov 5, 2019

Author Contributor

The problem is that we can handle error when node status transition just happened (Ready -> NotReady). In next iteration the node status will not change (NotReady), so it will not trigger MarkPodsNotReady unless we mark it as "needs retry".

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 6, 2019

Member

Please add a comment - if I'm asking about it it means it's not obvious. Which means people will be asking about it in the future (or will simply remove it).

This comment has been minimized.

Copy link
@krzysied

krzysied Nov 6, 2019

Author Contributor

Comment added.
I also added currentReadyCondition.Status != v1.ConditionTrue && observedReadyCondition.Status == v1.ConditionTrue guard so we ensure that MarkPodsNotReady will be called only once after status transition.

@krzysied krzysied force-pushed the krzysied:node_controller_list branch from c718348 to ecefea9 Nov 5, 2019
@krzysied

This comment has been minimized.

Copy link
Contributor Author

krzysied commented Nov 5, 2019

/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-kubemark-e2e-gce-big

Copy link
Member

wojtek-t left a comment

Just couple minor comments. Once applied I will take a (hopefully) final pass.

pods, err := nc.getPodsAssignedToNode(node.Name)
if err != nil {
utilruntime.HandleError(fmt.Errorf("unable to list pods of node %v: %v", node.Name, err))
nc.nodesToRetry.Store(node.Name, struct{}{})

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 6, 2019

Member

Please add a comment - if I'm asking about it it means it's not obvious. Which means people will be asking about it in the future (or will simply remove it).

@krzysied krzysied force-pushed the krzysied:node_controller_list branch from ecefea9 to d6c29ed Nov 6, 2019
@krzysied

This comment has been minimized.

Copy link
Contributor Author

krzysied commented Nov 6, 2019

/test pull-kubernetes-e2e-gce
/test pull-kubernetes-integration

Copy link
Member

wojtek-t left a comment

Last 2 very minor comments - other than that LGTM.

// - adds node to evictor queue if the node is not marked as evicted.
// Returns false if the node name was already enqueued.
// - deletes pods immediately if node is already marked as evicted.
// The method returns false, because the node wasn't added to the queue.

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 6, 2019

Member

s/The method returns/Returns/

This comment has been minimized.

Copy link
@krzysied

krzysied Nov 6, 2019

Author Contributor

Done.

@krzysied krzysied force-pushed the krzysied:node_controller_list branch from d6c29ed to beef648 Nov 6, 2019
@krzysied

This comment has been minimized.

Copy link
Contributor Author

krzysied commented Nov 6, 2019

/test pull-kubernetes-e2e-gce-device-plugin-gpu

Copy link
Member

wojtek-t left a comment

OK - one more nit and add one more unit test I mentioned. And will be good to go.

if nc.useTaintBasedEvictions {
nc.processTaintBaseEviction(node, &observedReadyCondition)
} else {
nc.processNoTaintBaseEviction(node, &observedReadyCondition, gracePeriod)
if err := nc.processNoTaintBaseEviction(node, &observedReadyCondition, gracePeriod, pods); err != nil {
utilruntime.HandleError(fmt.Errorf("unable to evict all pods from node %v: %v; queuing for retry in next iteration", node.Name, err))

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 7, 2019

Member

s/ in next iteration//

This comment has been minimized.

Copy link
@krzysied

krzysied Nov 7, 2019

Author Contributor

Done.

pods, err := nc.getPodsAssignedToNode(node.Name)
if err != nil {
utilruntime.HandleError(fmt.Errorf("unable to list pods of node %v: %v", node.Name, err))
if currentReadyCondition.Status != v1.ConditionTrue && observedReadyCondition.Status == v1.ConditionTrue {

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 7, 2019

Member

good catch
that said, it requires some unit test

This comment has been minimized.

Copy link
@krzysied

krzysied Nov 7, 2019

Author Contributor

New case to TestMonitorNodeHealthMarkPodsNotReadyRetry added.

@krzysied krzysied force-pushed the krzysied:node_controller_list branch from beef648 to 3db0b79 Nov 7, 2019
Copy link
Member

wojtek-t left a comment

just one nit

fakeNodeHandler: &testutil.FakeNodeHandler{
Clientset: fake.NewSimpleClientset(&v1.PodList{Items: []v1.Pod{*testutil.NewPod("pod0", "node0")}}),
},
fakeGetPodsAssignedToNode: func() func(c *fake.Clientset) func(string) ([]*v1.Pod, error) {

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 7, 2019

Member

I don't get it - why you need func()?
Why it can't be:

func(c *fake.Clientset) func(string) ([]*v1.Pod, error) {
  // what you currently have?
}

This comment has been minimized.

Copy link
@krzysied

krzysied Nov 7, 2019

Author Contributor

Good catch. I removed unneeded closure.

@krzysied krzysied force-pushed the krzysied:node_controller_list branch from 3db0b79 to 9406e5b Nov 7, 2019
@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Nov 7, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 7, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 7, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: krzysied, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 8841624 into kubernetes:master Nov 7, 2019
15 checks passed
15 checks passed
cla/linuxfoundation krzysied authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@krzysied krzysied deleted the krzysied:node_controller_list branch Nov 7, 2019
// Node eviction already happened for this node.
// Handling immediate pod deletion.
_, err := nodeutil.DeletePods(nc.kubeClient, pods, nc.recorder, node.Name, string(node.UID), nc.daemonSetStore)
if err != nil {

This comment has been minimized.

Copy link
@tedyu

tedyu Nov 7, 2019

Contributor

Edit: I checked DeletePods().
If the Pod is not found, no error is returned.

Should be fine here.

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Nov 7, 2019

Member

Yup - this should work fine.

@jfbai jfbai mentioned this pull request Dec 13, 2019
k8s-ci-robot added a commit that referenced this pull request Mar 10, 2020
…82884-#83248-#83320-#83780-#84445-#81167-upstream-release-1.16

Automated cherry pick of fix for #77733 (NodeLifecycleController is overloading kube-apiserver) into release-1.16
k8s-ci-robot added a commit that referenced this pull request Mar 11, 2020
…81839-#82489-#82884-#83248-#83320-#83780-#84445-#81167-upstream-release-1.15

Automated cherry pick of fix for #77733 (NodeLifecycleController is overloading kube-apiserver) into release-1.15
whypro pushed a commit to whypro/kubernetes that referenced this pull request Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.