-
Notifications
You must be signed in to change notification settings - Fork 313
Description
Environmental Info:
RKE2 Version: v1.27.10+rke2r1
This version of RKE2 is embedded in Harvester.
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
Describe the bug:
After running rke2 token rotate on single-node Harvester cluster, the rke2-server service can't start (no matter restart rke-server service, or restart the whole cluster)
The key-word of errors are:
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"
Steps To Reproduce:
-
Installed RKE2: Install Harvester https://github.com/harvester/harvester/releases/tag/v1.3.1 cluster with RKE2 embedded, RKE2 version
v1.27.10+rke2r1 -
Run
rke2 token rotate
harv41:/opt/rke2/bin # ./rke2 --version
rke2 version v1.27.10+rke2r1 (915672bd6cab658edb974d0aedb33ec5a32c239a)
go version go1.20.13 X:boringcrypto
harv41:/opt/rke2/bin # ./rke2 token rotate --token rancher --new-token rancher1
WARNING: Recommended to keep a record of the old token. If restoring from a snapshot, you must use the token associated with that snapshot.
WARN[0000] Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation.
Token rotated, restart rke2 nodes with new token
Restart rke2-server service
harv41:/opt/rke2/bin # systemctl restart rke2-server.service
Job for rke2-server.service failed because the control process exited with error code.
See "systemctl status rke2-server.service" and "journalctl -xeu rke2-server.service" for details.
rke2-server serivce is looping starting
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
Loaded: loaded (/etc/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/rke2-server.service.d
└─override.conf
Active: activating (start) since Tue 2024-06-25 10:40:21 UTC; 3s ago
Docs: https://github.com/rancher/rke2#readme
journalctl log errors:
...
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="Defragmenting etcd database"
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.292253Z","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.334119Z","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes":41873408,"current-db-size":"42 MB","current-db-size-in-use-bytes":26152960,"current-db-size-in-use":"26 MB"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.56096Z","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes-diff":-15966208,"current-db-size-bytes":25907200,"current-db-size":"26 MB","current-db-size-in-use-bytes-diff":-253952,"current-db-size-in-use-bytes":25899008,"current-db-size-in-use":"26 MB","took":"268.681194ms"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.561008Z","caller":"v3rpc/maintenance.go:96","msg":"finished defragment"}
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="etcd temporary data store connection OK"
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"
Jun 25 10:40:38 harv41 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 25 10:40:38 harv41 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
...
Jun 25 10:40:38 harv41 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
...
Jun 25 10:41:35 harv41 rke2[2479]: time="2024-06-25T10:41:35Z" level=info msg="Defragmenting etcd database"
Jun 25 10:41:35 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:35.912247Z","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
Jun 25 10:41:35 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:35.953711Z","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes":41873408,"current-db-size":"42 MB","current-db-size-in-use-bytes":26853376,"current-db-size-in-use":"27 MB"}
Jun 25 10:41:36 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:36.159331Z","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes-diff":-15290368,"current-db-size-bytes":26583040,"current-db-size":"27 MB","current-db-size-in-use-bytes-diff":-286720,"current-db-size-in-use-bytes":26566656,"current-db-size-in-use":"27 MB","took":"247.043173ms"}
Jun 25 10:41:36 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:36.159377Z","caller":"v3rpc/maintenance.go:96","msg":"finished defragment"}
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=info msg="etcd temporary data store connection OK"
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"
Jun 25 10:41:36 harv41 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 25 10:41:36 harv41 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
Expected behavior:
The rke2 token rotate works correctly.
Actual behavior:
The Harvester cluster is broken after running rke2 token rotate, it cann't restart.
Additional context / logs: