Pod GC controller - use node lister #82365

jkaniuk · 2019-09-05T10:56:39Z

Change-Id: Ie5e97dac8e434fa032af2a2923dd66df63b4286c

What type of PR is this?
/kind cleanup
/sig scalability

What this PR does / why we need it:
Pod GC controller lists the nodes in the cluster in every loop, which is a costly operation in a big cluster.
This PR changes controller implementation to use node informer events, only occasionally listing all the nodes if watch is stale.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2019-09-05T10:56:47Z

Hi @jkaniuk. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jkaniuk · 2019-09-05T10:56:53Z

/hold
/cc @mm4tt
/cc @mborsz

fedebongio · 2019-09-05T20:22:00Z

/assign @caesarxuchao @janetkuo

caesarxuchao · 2019-09-05T23:37:03Z

/ok-to-test

pkg/controller/podgc/gc_controller.go

caesarxuchao · 2019-09-06T00:22:07Z

pkg/controller/podgc/gc_controller.go

+		}
+		unknownNames.Insert(name)
+	}
+	if len(unknownNames) > 0 {


If a user keeps creating pods with fake node names, than updateExistingNodes() will always be called.

I'm thinking if there is a better way to detect if the cached node list is stale.

I've changed the implementation. Currently in fake node names scenario, there is a single GET (per ~hour) for every unique node name.

Implementation changed once again.
When orphaned pod is found and after 40s there is still no node with matching name in an informer, GET call is issued to verify that the node is truly gone (single call for all pods from the same node, but without caching the result).

I think it's fine - I wouldn't optimize for a bt artificial situations that @caesarxuchao describe above. If really needed, we can optimize later.

pkg/controller/podgc/gc_controller.go

wojtek-t

Mostly minor comments at this point - once addressed LGTM.

pkg/controller/podgc/gc_controller.go

pkg/controller/podgc/gc_controller_test.go

staging/src/k8s.io/client-go/util/workqueue/delaying_queue.go

plugin/pkg/auth/authorizer/rbac/bootstrappolicy/testdata/controller-roles.yaml

mm4tt

LGTM assuming you address Wojtek's comments.

I really like this approach!

pkg/controller/podgc/gc_controller.go

wojtek-t

Just two minor nits - once applied will approve

cmd/kube-controller-manager/app/core.go

pkg/controller/podgc/gc_controller_test.go

wojtek-t · 2019-10-24T08:05:48Z

/lgtm
/approve

wojtek-t · 2019-10-24T08:05:57Z

Thanks a lot for working on that!

k8s-ci-robot · 2019-10-24T08:06:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkaniuk, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kube-controller-manager/OWNERS~~ [wojtek-t]
~~pkg/controller/podgc/OWNERS~~ [wojtek-t]
~~plugin/pkg/auth/authorizer/OWNERS~~ [wojtek-t]
~~staging/src/k8s.io/client-go/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jkaniuk · 2019-10-24T08:36:04Z

/hold cancel

mm4tt · 2019-10-24T08:43:23Z

/lgtm

Nice!

tedyu · 2019-10-24T12:40:45Z

pkg/controller/podgc/gc_controller.go

-		if nodeNames.Has(pod.Spec.NodeName) {
+	}
+	// Check if nodes are still missing after quarantine period
+	deletedNodesNames, quit := gcc.discoverDeletedNodes(existingNodeNames)


If this is lifted ahead of line 168, we can iterate over pods once - checking both deletedNodesNames and existingNodeNames

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 5, 2019

k8s-ci-robot requested review from eparis and gmarek September 5, 2019 10:56

k8s-ci-robot requested review from mborsz and mm4tt September 5, 2019 10:56

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Sep 5, 2019

k8s-ci-robot assigned caesarxuchao and janetkuo Sep 5, 2019

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 5, 2019

caesarxuchao reviewed Sep 6, 2019

View reviewed changes

mm4tt reviewed Sep 6, 2019

View reviewed changes

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved

pkg/controller/podgc/gc_controller.go Show resolved Hide resolved

jkaniuk force-pushed the pod-gc branch 4 times, most recently from b056cc9 to 4207d7b Compare September 9, 2019 16:14

k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 9, 2019

wojtek-t assigned wojtek-t and unassigned caesarxuchao and janetkuo Oct 15, 2019

jkaniuk force-pushed the pod-gc branch 3 times, most recently from c2c1987 to 567bf85 Compare October 15, 2019 15:12

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 15, 2019

jkaniuk force-pushed the pod-gc branch from 567bf85 to a626937 Compare October 15, 2019 15:22

This comment has been minimized.

Sign in to view

wojtek-t reviewed Oct 16, 2019

View reviewed changes

mm4tt reviewed Oct 21, 2019

View reviewed changes

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved

pkg/util/workqueue/delaying_queue: export contructor with custom clock

638c02f

jkaniuk force-pushed the pod-gc branch from a626937 to 585f18a Compare October 23, 2019 14:14

wojtek-t reviewed Oct 23, 2019

View reviewed changes

cmd/kube-controller-manager/app/core.go Show resolved Hide resolved

pkg/controller/podgc/gc_controller_test.go Outdated Show resolved Hide resolved

jkaniuk added 2 commits October 23, 2019 16:54

Allow pod-garbage-collector to get nodes

e6e026f

Pod GC controller - use node lister

39883f0

jkaniuk force-pushed the pod-gc branch from 585f18a to 39883f0 Compare October 23, 2019 14:58

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2019

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2019

k8s-ci-robot assigned mm4tt Oct 24, 2019

k8s-ci-robot merged commit 2c4cba8 into kubernetes:master Oct 24, 2019

k8s-ci-robot added this to the v1.17 milestone Oct 24, 2019

tedyu reviewed Oct 24, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod GC controller - use node lister #82365

Pod GC controller - use node lister #82365

jkaniuk commented Sep 5, 2019

k8s-ci-robot commented Sep 5, 2019

jkaniuk commented Sep 5, 2019

fedebongio commented Sep 5, 2019

caesarxuchao commented Sep 5, 2019

caesarxuchao Sep 6, 2019

jkaniuk Sep 9, 2019

jkaniuk Oct 23, 2019

wojtek-t Oct 24, 2019

This comment has been minimized.

wojtek-t left a comment

mm4tt left a comment

wojtek-t left a comment

wojtek-t commented Oct 24, 2019

wojtek-t commented Oct 24, 2019

k8s-ci-robot commented Oct 24, 2019

jkaniuk commented Oct 24, 2019

mm4tt commented Oct 24, 2019

tedyu Oct 24, 2019 •

edited

Pod GC controller - use node lister #82365

Pod GC controller - use node lister #82365

Conversation

jkaniuk commented Sep 5, 2019

k8s-ci-robot commented Sep 5, 2019

jkaniuk commented Sep 5, 2019

fedebongio commented Sep 5, 2019

caesarxuchao commented Sep 5, 2019

caesarxuchao Sep 6, 2019

Choose a reason for hiding this comment

jkaniuk Sep 9, 2019

Choose a reason for hiding this comment

jkaniuk Oct 23, 2019

Choose a reason for hiding this comment

wojtek-t Oct 24, 2019

Choose a reason for hiding this comment

This comment has been minimized.

wojtek-t left a comment

Choose a reason for hiding this comment

mm4tt left a comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

wojtek-t commented Oct 24, 2019

wojtek-t commented Oct 24, 2019

k8s-ci-robot commented Oct 24, 2019

jkaniuk commented Oct 24, 2019

mm4tt commented Oct 24, 2019

tedyu Oct 24, 2019 • edited

Choose a reason for hiding this comment

tedyu Oct 24, 2019 •

edited