New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC might delete objects during kube-apiserver startup #104342
GC might delete objects during kube-apiserver startup #104342
Comments
/sig api-machinery |
/triage accepted |
Hi @tkashem, I'm curious on the details of this bug. GC ignores the CR if the CRD is not listed in the apiserver's API discovery doc (see this comment). |
@caesarxuchao : remember there are multiple processes that can restart here. I have not studied the GC code myself, but colleagues reported to me that they had done root cause analysis on some incorrect deletions of child objects. If I understood and recall correctly, they said to me that the failure scenario is a restart of a kube-apiserver --- long after the controller manager had started up GC --- and the parent object (as well as the falsely orphaned child) being served from an aggregated custom apiserver, and the garbage collector specifically querying for the parent in the startup time when the corresponding APIService had not yet been fully processed. This would be release 1.18 at the latest, possibly 1.16 or 1.17. |
@caesarxuchao yes, i guess there is an inherent race between the CRD being available in discovery and serving the CR. I am not very familiar with the CRD logic, @p0lyn0mial is working on a PR #104748 to fix this race. |
|
What happened:
GC may delete objects during startup while the apiserver has not fully initialized yet.
One potential case where it can happen: CRDs are available through informers
404
What you expected to happen:
kube-apiserver
should respond with aRetry-After
until it has fully initialized. GC will get a429
with aRetry-After
response header in this case.How to reproduce it (as minimally and precisely as possible):
With an HA cluster it's hard to reproduce.
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: