-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase robustness for kubeadm join / add etcd #2094
Comments
@fabriziopandini note there is already a ticket for learner mode here: |
@neolit123 thanks! nevertheless, I will keep this also one for making current implement more robust |
/assign |
@fabriziopandini what timeouts are we talking about? current backoff for AddMember is:
~53 sec also are you sure this is a AddMember issue and not an issue with the client dial? |
@neolit123 I was thinking that we can raise this timeout up to 2 minutes (or even more) |
all cherry picks are about to merge, but we should keep this open to refactor the etcd client management in a similar way in mater. e.g. kubernetes/kubernetes#90645 made it so MemberAdd behaves differently than MemberRemove |
cherry picks merged. lowering priority as the remaining refactor is not mandatory for 1.19. |
actually let me log a new ticket |
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version: v1.17.*
What happened?
While executing Cluster API tests, in some cases it was observed kubeadm join failures when waiting for the new etcd member to report healthy state.
xref kubernetes-sigs/cluster-api#2769
What you expected to happen?
To add new etcd member more resilient by increasing the timeout/the number of retries for this operation
How to reproduce it (as minimally and precisely as possible)?
This error happens only sometimes, most probably due to slow network/slow I/O causing delays in etcd getting online or in some cases, also change fo the etcd leader.
Anything else we need to know?
Important: if possible the change should be kept as small and possible and backported
The text was updated successfully, but these errors were encountered: