-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tolerate temporary errors from etcdserver #11401
Conversation
c2bbcb0
to
3ca36c0
Compare
3ca36c0
to
0635219
Compare
dc81bb9
to
86defa7
Compare
There are cases when the etcdserver is temporarily unavailable and the errors that we get back from kube-apiserver reflect that error. It looks like we bail out immediately when these errors happen currently. We should retry until timeout is reached when this sort of errors happen. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
86defa7
to
ebc79fa
Compare
@hickeyma this is ready now! |
@dims thanks for looking into the issues here. I the Kubernetes API supposed to be a leaky abstraction? Are clients expected to work with etcd? I'm asking about intent and mid-term intent. I'm wondering whether this code is something we will need to maintain long term or if this is a short term situation. |
@mattfarina we'll need a KEP in upstream, i've requested some folks who were pushing for this earlier to do more in 1.27 cycle (not 1.26), So until that KEP is discussed/reviewed/approved we will need this. we'll also need this until versions of kubernetes supported by helm has the old style leaky abstraction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the appropriate stop-gap for this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dims for tracking these issues down and providing this interim solution. It will be helpful to users.
thanks @hickeyma ! |
Hey @technosophos @hickeyma. Unfortunately, this fix does not solve the issue. Can you take a look at my new fix? #11426 |
@dims Can this be backported to 3.2 please? |
@sruthiwander not this one! it was reverted, you will need #11426 Also https://github.com/helm/helm/releases/tag/v3.2.0 is practically ancient, i don't know/think that helm maintainers will go back that far https://helm.sh/docs/topics/release_policy/ |
What this PR does / why we need it:
There are cases when the etcdserver is temporarily unavailable and the
errors that we get back from kube-apiserver reflect that error. It looks
like we bail out immediately when these errors happen currently. We
should retry until timeout is reached when this sort of errors happen.
Fixes #9502
Fixes #7637
Signed-off-by: Davanum Srinivas davanum@gmail.com
Special notes for your reviewer:
With this patch, temporary errors like the etcdserver leader changes are not treated as terminal. We continue to retry until the specified timeout.
Note that there are things that can be done on the k8s side, discussion is going on there as well:
kubernetes/kubernetes#112152
If applicable:
isServiceUnavailable