-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Establishing logic for CRDs by implementing Installing status to prevent race conditions #62725
Comments
Adding a new status just pushes the problem somewhere else. I'm ok with holding requests for a very short period (order of five-ish seconds). |
With the Another solution would be:
|
But we have to serve the CR during those 5 seconds already to avoid the race. So maybe we still need the |
Automatic merge from submit-queue (batch tested with PRs 64258, 63068). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add Establishing Controller to avoid race between Established condition and CRs actually served **What this PR does / why we need it**: If you create CR shorty after CRD, it can happen that it returns error that CRD doesn't exists, even if it exists and is Established. This implements the Establishing Controller, is used to Establish CRD once we're sure it's ready and CRs are served. For more details, check issue #62725. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #62725 ```release-note Add Establishing Controller on CRDs to avoid race between Established condition and CRs actually served. In HA setups, the Established condition is delayed by 5 seconds. ``` /sig api-machinery /area custom-resources /cc sttts deads2k
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened:
If you create CR shorty after CRD, it can happen that it returns error that CRD doesn't exists, even if it exists and is Established.
Relevant issue for this problem #57042, where this happened on the cluster with limited resources while deploying Istio. This one describes age
<invalid>
more, but could is relevant to the problem. For earlier research, check out #57042 (comment).What you expected to happen:
CR creation success if CRD is Established.
Anything else we need to know?:
With the current implementation, we have the following steps when creating and establishing the CRD:
NamesAccepted
status.Established
status shortly after (synchronously without roundtripping to the apiserver).After some time, the following happens:
ServeHTTP
). It calls the lister to make sure CRD is Established.However, we think there can be a race condition here, which causes the above described issue.
What can happen, is that the Established becomes true, CR request comes in, the lister is called, but the informer hasn't got the UPDATE event, and therefore, Established is not true in the lister memory cache. In that case, the request is rejected even if Established is set to true, and we get the described issue.
The potential fix could be to change the workflow little bit. Instead of
NamesAccpeted
=>Established
, we want to add another status,Installing
, which would happen afterNamesAccpeted
. By the implementation idea, onceInstallation
is set to true, it can't be turned to false (maybe just when it's time to turn it intoEstablished
), unlike theNamesAccepted
status.Then, we would turn CRD into
Established
in the UPDATE event handler for CR handlers, e.g. in the updateCustomResourceDefinition handler.Also, we want to make
ServeHTTP
for CR to answer the request when the CRD is still inInstalling
, but notEstablished
state, e.g. by changing this part of the code.This can be further improved for HA environments if we get an HA checkpointing algorithm, so we can extend to logic to only set Established when all API servers have seen installing.
I'll be working on fixing this one, so this issue is a placeholder for further discussions on the approach until I open the PR. If you have any suggestion, please let me know.
/cc deads2k sttts
/sig api-machinery
/area custom-resources
The text was updated successfully, but these errors were encountered: