Skip to content

rke2 token rotate does not work as expected (v1.27.10+rke2r1) #6250

@w13915984028

Description

@w13915984028

Environmental Info:
RKE2 Version: v1.27.10+rke2r1

This version of RKE2 is embedded in Harvester.

Node(s) CPU architecture, OS, and Version:

Cluster Configuration:

Describe the bug:

After running rke2 token rotate on single-node Harvester cluster, the rke2-server service can't start (no matter restart rke-server service, or restart the whole cluster)

The key-word of errors are:

Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"

Steps To Reproduce:

harv41:/opt/rke2/bin # ./rke2 --version
rke2 version v1.27.10+rke2r1 (915672bd6cab658edb974d0aedb33ec5a32c239a)
go version go1.20.13 X:boringcrypto

harv41:/opt/rke2/bin # ./rke2 token rotate --token rancher --new-token rancher1
WARNING: Recommended to keep a record of the old token. If restoring from a snapshot, you must use the token associated with that snapshot.
WARN[0000] Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation. 
Token rotated, restart rke2 nodes with new token

Restart rke2-server service

harv41:/opt/rke2/bin # systemctl restart rke2-server.service 
Job for rke2-server.service failed because the control process exited with error code.
See "systemctl status rke2-server.service" and "journalctl -xeu rke2-server.service" for details.


rke2-server serivce is looping starting

● rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/etc/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/rke2-server.service.d
             └─override.conf
     Active: activating (start) since Tue 2024-06-25 10:40:21 UTC; 3s ago
       Docs: https://github.com/rancher/rke2#readme


journalctl log errors:
...
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="Defragmenting etcd database"
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.292253Z","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.334119Z","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes":41873408,"current-db-size":"42 MB","current-db-size-in-use-bytes":26152960,"current-db-size-in-use":"26 MB"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.56096Z","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes-diff":-15966208,"current-db-size-bytes":25907200,"current-db-size":"26 MB","current-db-size-in-use-bytes-diff":-253952,"current-db-size-in-use-bytes":25899008,"current-db-size-in-use":"26 MB","took":"268.681194ms"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.561008Z","caller":"v3rpc/maintenance.go:96","msg":"finished defragment"}
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="etcd temporary data store connection OK"
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"
Jun 25 10:40:38 harv41 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 25 10:40:38 harv41 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
...
Jun 25 10:40:38 harv41 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).


...
Jun 25 10:41:35 harv41 rke2[2479]: time="2024-06-25T10:41:35Z" level=info msg="Defragmenting etcd database"
Jun 25 10:41:35 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:35.912247Z","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
Jun 25 10:41:35 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:35.953711Z","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes":41873408,"current-db-size":"42 MB","current-db-size-in-use-bytes":26853376,"current-db-size-in-use":"27 MB"}
Jun 25 10:41:36 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:36.159331Z","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes-diff":-15290368,"current-db-size-bytes":26583040,"current-db-size":"27 MB","current-db-size-in-use-bytes-diff":-286720,"current-db-size-in-use-bytes":26566656,"current-db-size-in-use":"27 MB","took":"247.043173ms"}
Jun 25 10:41:36 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:36.159377Z","caller":"v3rpc/maintenance.go:96","msg":"finished defragment"}
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=info msg="etcd temporary data store connection OK"
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"
Jun 25 10:41:36 harv41 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 25 10:41:36 harv41 systemd[1]: rke2-server.service: Failed with result 'exit-code'.

Expected behavior:

The rke2 token rotate works correctly.

Actual behavior:

The Harvester cluster is broken after running rke2 token rotate, it cann't restart.

Additional context / logs:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions