Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-reset fails occasionally when the quorum is lost #2682

Closed
ShylajaDevadiga opened this issue Mar 26, 2022 · 1 comment
Closed

cluster-reset fails occasionally when the quorum is lost #2682

ShylajaDevadiga opened this issue Mar 26, 2022 · 1 comment
Assignees
Labels
kind/bug Something isn't working

Comments

@ShylajaDevadiga
Copy link
Contributor

Environmental Info:
RKE2 Version: v1.23.5-rc3+rke2r1, v1.22.8-rc4+rke2r1

Node(s) CPU architecture, OS, and Version:
OS used for testing: rocky linux 8.4

Cluster Configuration:
4 node cluster, 3 server 1 agent

Describe the bug:

Steps To Reproduce:
Create a 3 node cluster, all servers
Stop two servers to simulate quorum loss
Stop rke2 service on first node
Run cluster-reset

sudo systemctl stop rke2-server
sudo rke2 server --cluster-reset

Expected behavior:
cluster-reset is expected to be seen on the console, indicating server is reset

Actual behavior:
Complains on wal file not found error":"open /var/lib/rancher/rke2/server/db/etcd-tmp/member/wal: no such file

Additional context / logs:

{"level":"info","ts":"2022-03-25T23:56:41.153Z","caller":"embed/etcd.go:276","msg":"now serving peer/client/metrics","local-member-id":"35a3a0e4dc08adcd","initial-advertise-peer-urls":["http://127.0.0.1:2400"],"listen-peer-urls":["http://127.0.0.1:2400"],"advertise-client-urls":["http://127.0.0.1:2399"],"listen-client-urls":["http://127.0.0.1:2399"],"listen-metrics-urls":[]}
{"level":"info","ts":"2022-03-25T23:56:41.153Z","caller":"etcdserver/server.go:744","msg":"starting initial election tick advance","election-ticks":10}
{"level":"fatal","ts":"2022-03-25T23:56:41.153Z","caller":"etcdserver/server.go:874","msg":"failed to purge wal file","error":"open /var/lib/rancher/rke2/server/db/etcd-tmp/member/wal: no such file or directory","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).purgeFile\n\t/go/pkg/mod/github.com/k3s-io/etcd/server/v3@v3.5.1-k3s1/etcdserver/server.go:874\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach.func1\n\t/go/pkg/mod/github.com/k3s-io/etcd/server/v3@v3.5.1-k3s1/etcdserver/server.go:2661"}
@ShylajaDevadiga
Copy link
Contributor Author

Validated cluster-reset as well as cluster-reset-restore functionality on v1.23.5-rc4+rke2r1, v1.22.8-rc5+rke2r1, v1.21.11-rc5+rke2r1

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
No open projects
Development [DEPRECATED]
Done Issue / Merged PR
Development

No branches or pull requests

2 participants