-
Notifications
You must be signed in to change notification settings - Fork 16.9k
incubator/etcd scaling and recovery not working #685
Comments
Is this still an issue? |
As far as I can see no changes were made to the chart, so, yes it is. |
I am experiencing a similar issue.
I have a 3-node etcd cluster on top of a 3-node GKE container cluster on preemptible nodes - by design I am expecting to lose etcd pods every now and then and have new ones spun and the cluster recover. It's not happening. I don't know etcd well enough to understand and fix the problem. I would however, like it to work. Am happy to try and help collect logs/data if someone with more etcd background is interested in taking a stab at a fix. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
seems to always be an issue... |
and it's still an issue :-(
(Just copied from my cluster) |
I get this in AWS, but not in GKE. |
For anyone landing here looking for a workaround, in my case, all it took to get pods to join the cluster was to delete all of them. When they re-start they join each other in the cluster. |
Workaround for this without loosing you data or recreating your whole cluster. |
Etcd cluster does not recover if pod was deleted.
The issue seems to be in rejoining nodes with the same name.
Steps to reproduce:
On the next restart it shows only this:
Meanwhile in the logs of other nodes will be something like this:
With scaledown/up there is similar problem - after scaledown it's not possible to scale up, since pod will be unable to rejoin.
The text was updated successfully, but these errors were encountered: