Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod GC controller - use node lister #82365

Merged
merged 3 commits into from Oct 24, 2019
Merged

Conversation

jkaniuk
Copy link
Contributor

@jkaniuk jkaniuk commented Sep 5, 2019

Change-Id: Ie5e97dac8e434fa032af2a2923dd66df63b4286c

What type of PR is this?
/kind cleanup
/sig scalability

What this PR does / why we need it:
Pod GC controller lists the nodes in the cluster in every loop, which is a costly operation in a big cluster.
This PR changes controller implementation to use node informer events, only occasionally listing all the nodes if watch is stale.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 5, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @jkaniuk. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 5, 2019
@jkaniuk
Copy link
Contributor Author

jkaniuk commented Sep 5, 2019

/hold
/cc @mm4tt
/cc @mborsz

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Sep 5, 2019
@fedebongio
Copy link
Contributor

/assign @caesarxuchao @janetkuo

@caesarxuchao
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 5, 2019
pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
}
unknownNames.Insert(name)
}
if len(unknownNames) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user keeps creating pods with fake node names, than updateExistingNodes() will always be called.

I'm thinking if there is a better way to detect if the cached node list is stale.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation. Currently in fake node names scenario, there is a single GET (per ~hour) for every unique node name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation changed once again.
When orphaned pod is found and after 40s there is still no node with matching name in an informer, GET call is issued to verify that the node is truly gone (single call for all pods from the same node, but without caching the result).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine - I wouldn't optimize for a bt artificial situations that @caesarxuchao describe above. If really needed, we can optimize later.

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller.go Show resolved Hide resolved
@jkaniuk jkaniuk force-pushed the pod-gc branch 4 times, most recently from b056cc9 to 4207d7b Compare September 9, 2019 16:14
@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 9, 2019
@wojtek-t wojtek-t assigned wojtek-t and unassigned caesarxuchao and janetkuo Oct 15, 2019
@jkaniuk jkaniuk force-pushed the pod-gc branch 3 times, most recently from c2c1987 to 567bf85 Compare October 15, 2019 15:12
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 15, 2019
@jkaniuk

This comment has been minimized.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments at this point - once addressed LGTM.

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller_test.go Outdated Show resolved Hide resolved
pkg/controller/podgc/gc_controller_test.go Show resolved Hide resolved
pkg/controller/podgc/gc_controller_test.go Show resolved Hide resolved
Copy link
Contributor

@mm4tt mm4tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming you address Wojtek's comments.

I really like this approach!

pkg/controller/podgc/gc_controller.go Outdated Show resolved Hide resolved
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor nits - once applied will approve

cmd/kube-controller-manager/app/core.go Show resolved Hide resolved
pkg/controller/podgc/gc_controller_test.go Outdated Show resolved Hide resolved
@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2019
@wojtek-t
Copy link
Member

Thanks a lot for working on that!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkaniuk, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2019
@jkaniuk
Copy link
Contributor Author

jkaniuk commented Oct 24, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2019
@mm4tt
Copy link
Contributor

mm4tt commented Oct 24, 2019

/lgtm

Nice!

@k8s-ci-robot k8s-ci-robot merged commit 2c4cba8 into kubernetes:master Oct 24, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 24, 2019
if nodeNames.Has(pod.Spec.NodeName) {
}
// Check if nodes are still missing after quarantine period
deletedNodesNames, quit := gcc.discoverDeletedNodes(existingNodeNames)
Copy link
Contributor

@tedyu tedyu Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is lifted ahead of line 168, we can iterate over pods once - checking both deletedNodesNames and existingNodeNames

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants