New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod GC controller - use node lister #82365
Conversation
Hi @jkaniuk. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @caesarxuchao @janetkuo |
/ok-to-test |
} | ||
unknownNames.Insert(name) | ||
} | ||
if len(unknownNames) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user keeps creating pods with fake node names, than updateExistingNodes() will always be called.
I'm thinking if there is a better way to detect if the cached node list is stale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the implementation. Currently in fake node names scenario, there is a single GET (per ~hour) for every unique node name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation changed once again.
When orphaned pod is found and after 40s there is still no node with matching name in an informer, GET call is issued to verify that the node is truly gone (single call for all pods from the same node, but without caching the result).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine - I wouldn't optimize for a bt artificial situations that @caesarxuchao describe above. If really needed, we can optimize later.
b056cc9
to
4207d7b
Compare
c2c1987
to
567bf85
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor comments at this point - once addressed LGTM.
plugin/pkg/auth/authorizer/rbac/bootstrappolicy/testdata/controller-roles.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming you address Wojtek's comments.
I really like this approach!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor nits - once applied will approve
/lgtm |
Thanks a lot for working on that! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jkaniuk, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/lgtm Nice! |
if nodeNames.Has(pod.Spec.NodeName) { | ||
} | ||
// Check if nodes are still missing after quarantine period | ||
deletedNodesNames, quit := gcc.discoverDeletedNodes(existingNodeNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is lifted ahead of line 168, we can iterate over pods once - checking both deletedNodesNames and existingNodeNames
Change-Id: Ie5e97dac8e434fa032af2a2923dd66df63b4286c
What type of PR is this?
/kind cleanup
/sig scalability
What this PR does / why we need it:
Pod GC controller lists the nodes in the cluster in every loop, which is a costly operation in a big cluster.
This PR changes controller implementation to use node informer events, only occasionally listing all the nodes if watch is stale.
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: