Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add Establishing Controller to avoid race between Established condition and CRs actually served #63068
What this PR does / why we need it: If you create CR shorty after CRD, it can happen that it returns error that CRD doesn't exists, even if it exists and is Established. This implements the Establishing Controller, is used to Establish CRD once we're sure it's ready and CRs are served. For more details, check issue #62725.
Which issue(s) this PR fixes (optional, in
@xmudrii: Adding do-not-merge/release-note-label-needed because the release note process has not been followed.
One of the following labels is required "release-note", "release-note-action-required", or "release-note-none".
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
2 times, most recently
Apr 27, 2018
[APPROVALNOTIFIER] This PR is APPROVED
The full list of commands accepted by this bot can be found here.
The pull request process is described here
May 29, 2018
15 of 18 checks passed
We need to solve this problem for a number of items, I am not sure I wanted a specific solution for just this problem, especially not if it uses the master count mechanism. I wasn't able to read this and it looks like it merged before @yliaog had a chance to review, also. @sttts for this kind of change I'd like to have either deads2k or myself sign off--adding coordination mechanisms between apiservers is kind of a big deal.
@lavalamp I would love to help with this if I can. I'm absolutely aware this is not going to be an easy task, and that there's a lot of work behind this, but I would definitely love to help around.
Also, if I can hotfix this for 1.11, I'm willing to do that as soon as possible. What comes to my mind, as @sttts mentioned above, having some predefined timeout could work. The only negative downside here I see is that non-HA users can be annoyed, but there should be no other downsides.
Let's say start with 2-3 seconds at max, or even 5, and let's see how it is going to work. Not a big delay, will not cause any downsides, beside maybe annoyance to some users. But could be good as a temporary hotfix.
What I see as a possible reason for focusing on CRDs is that mostly CRDs are affected by this race. I did some research, and maybe you can find details in #57042 helpful.
I had reproduced some of the stuff mentioned in that issue, such as
Issue #63656 can be relevant to this as well.
This definitely needs a better fix, but we hoped this is going to help at least CRDs, as we think they're mostly affected by this.
If I can help with anything, let me know. Maybe it would be best to discuss this on the SIG call, but it's next week and I'm not sure is it going to be late, if there something to be fixed before 1.11 release.
@lavalamp the actual change here was not the coordination mechanism, but a race that even showed up in single master setup. The idea with the seconds was actually proposed by @deads2k. I agree that this deserves a larger discussion, and duct-tape is not the right solution.
I would like to see this coordination topic somewhere in our roadmap, or even block further dynamic configuration mechanism until it is solved (audit is just around the corner; componentconfigs become more and more a thing). /cc @kubernetes/sig-architecture-misc-use-only-as-a-last-resort
Sounds great to me. Added an agenda item for our next SIG meeting.…
On Tue, Jun 12, 2018 at 4:01 AM Dr. Stefan Schimanski < ***@***.***> wrote: @lavalamp <https://github.com/lavalamp> the actual change here was not the coordination mechanism, but a race that even showed up in single master setup. The idea with the seconds was actually proposed by @deads2k <https://github.com/deads2k>. I agree that this deserves a larger discussion, and duct-tape is not the right solution. I would like to see this coordination topic somewhere in our roadmap, or even block further dynamic configuration mechanism until it is solved (audit is just around the corner; componentconfigs become more and more a thing). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#63068 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAnglsfDWg4xF_BHGee338-EkFbxo6l7ks5t7599gaJpZM4ThQJo> .