-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd crashes after restart with panic: unexpected removal of unknown remote peer
#13119
Comments
Complete log of the etcd server:
|
what is interesting, one line before the panic etcd seems to ignore already removed member:
but panics right after that. |
exactly same integration test steps pass just fine with 3.4.15, but with etcd 3.5.0 Talos also switched to adding new members as learners and promoting them, so not clear if learner mode plays some role here. |
This can be reproduced by running the https://github.com/etcd-io/etcd/tree/main/contrib/raftexample and issuing a delete for a node that doesn't exist:
|
I also encountered the same problem. That change 3 nodes cluster to 1 nodes cluster, and restart etcd.service. |
I'm also hitting this exact issue when upgrading from 3.4.X to 3.5.0. Here are the steps I followed to reproduce:
|
etcd cant start with --force-new-cluster=true with panic: unexpected removal of unknown remote peer , and i can`t downgrade to 3.4.16. |
Same here tried to recover cluster after #13196 (comment), by starting it with
|
See https://github.com/etcd-io/etcd/releases/tag/v3.5.1 This version has a fix for member info getting out of sync: etcd-io/etcd#13119 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
See https://github.com/etcd-io/etcd/releases/tag/v3.5.1 This version has a fix for member info getting out of sync: etcd-io/etcd#13119 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The issue seems to be fixed with etcd 3.5.1 |
See https://github.com/etcd-io/etcd/releases/tag/v3.5.1 This version has a fix for member info getting out of sync: etcd-io/etcd#13119 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
is this issue fixed by PR 13348 ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
This seems to be similar to some old issues, e.g. #3106
etcd version: 3.5.0
The exact steps to reproduce are not clear, but the bug is reproducible with the integration test for Talos (github.com/talos-systems/talos).
The integration test performs multiple steps reconfiguring the 3 node cluster:
Eventually a test reboots a node (which eventually restarts etcd), and one of the members goes into infinite crash loop with the panic:
The text was updated successfully, but these errors were encountered: