Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rke2 token rotate does not work as expected (v1.27.10+rke2r1) #6250

Closed
w13915984028 opened this issue Jun 25, 2024 · 4 comments
Closed

rke2 token rotate does not work as expected (v1.27.10+rke2r1) #6250

w13915984028 opened this issue Jun 25, 2024 · 4 comments

Comments

@w13915984028
Copy link

Environmental Info:
RKE2 Version: v1.27.10+rke2r1

This version of RKE2 is embedded in Harvester.

Node(s) CPU architecture, OS, and Version:

Cluster Configuration:

Describe the bug:

After running rke2 token rotate on single-node Harvester cluster, the rke2-server service can't start (no matter restart rke-server service, or restart the whole cluster)

The key-word of errors are:

Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"

Steps To Reproduce:

harv41:/opt/rke2/bin # ./rke2 --version
rke2 version v1.27.10+rke2r1 (915672bd6cab658edb974d0aedb33ec5a32c239a)
go version go1.20.13 X:boringcrypto

harv41:/opt/rke2/bin # ./rke2 token rotate --token rancher --new-token rancher1
WARNING: Recommended to keep a record of the old token. If restoring from a snapshot, you must use the token associated with that snapshot.
WARN[0000] Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation. 
Token rotated, restart rke2 nodes with new token

Restart rke2-server service

harv41:/opt/rke2/bin # systemctl restart rke2-server.service 
Job for rke2-server.service failed because the control process exited with error code.
See "systemctl status rke2-server.service" and "journalctl -xeu rke2-server.service" for details.


rke2-server serivce is looping starting

● rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/etc/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/rke2-server.service.d
             └─override.conf
     Active: activating (start) since Tue 2024-06-25 10:40:21 UTC; 3s ago
       Docs: https://github.com/rancher/rke2#readme


journalctl log errors:
...
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="Defragmenting etcd database"
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.292253Z","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.334119Z","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes":41873408,"current-db-size":"42 MB","current-db-size-in-use-bytes":26152960,"current-db-size-in-use":"26 MB"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.56096Z","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes-diff":-15966208,"current-db-size-bytes":25907200,"current-db-size":"26 MB","current-db-size-in-use-bytes-diff":-253952,"current-db-size-in-use-bytes":25899008,"current-db-size-in-use":"26 MB","took":"268.681194ms"}
Jun 25 10:40:38 harv41 rke2[1929]: {"level":"info","ts":"2024-06-25T10:40:38.561008Z","caller":"v3rpc/maintenance.go:96","msg":"finished defragment"}
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="etcd temporary data store connection OK"
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Jun 25 10:40:38 harv41 rke2[1929]: time="2024-06-25T10:40:38Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"
Jun 25 10:40:38 harv41 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 25 10:40:38 harv41 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
...
Jun 25 10:40:38 harv41 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).


...
Jun 25 10:41:35 harv41 rke2[2479]: time="2024-06-25T10:41:35Z" level=info msg="Defragmenting etcd database"
Jun 25 10:41:35 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:35.912247Z","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
Jun 25 10:41:35 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:35.953711Z","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes":41873408,"current-db-size":"42 MB","current-db-size-in-use-bytes":26853376,"current-db-size-in-use":"27 MB"}
Jun 25 10:41:36 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:36.159331Z","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/rke2/server/db/etcd-tmp/member/snap/db","current-db-size-bytes-diff":-15290368,"current-db-size-bytes":26583040,"current-db-size":"27 MB","current-db-size-in-use-bytes-diff":-286720,"current-db-size-in-use-bytes":26566656,"current-db-size-in-use":"27 MB","took":"247.043173ms"}
Jun 25 10:41:36 harv41 rke2[2479]: {"level":"info","ts":"2024-06-25T10:41:36.159377Z","caller":"v3rpc/maintenance.go:96","msg":"finished defragment"}
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=info msg="etcd temporary data store connection OK"
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"
Jun 25 10:41:36 harv41 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 25 10:41:36 harv41 systemd[1]: rke2-server.service: Failed with result 'exit-code'.

Expected behavior:

The rke2 token rotate works correctly.

Actual behavior:

The Harvester cluster is broken after running rke2 token rotate, it cann't restart.

Additional context / logs:

@brandond
Copy link
Member

It sounds like Harvester has a token configured somewhere that does not match the new value. Can you confirm that, if Harvester is dropping a config file on disk that includes a token, that value is being updated? The token rotate command does NOT modify config files for you.

@w13915984028
Copy link
Author

Harvester has a configuration file in the path /oem/harvester.config , I changed it but seemed also no luck. Will furhter check if there are some other files related.

harv41:/opt/rke2/bin # cat /oem/harvester.config 
schemeversion: 1
serverurl: ""
token: rancher1  /// changed to new token and reboot cluster, still not working

Regarding this error
Jun 25 10:41:36 harv41 rke2[2479]: time="2024-06-25T10:41:36Z" level=fatal msg="Failed to reconcile with temporary etcd: bootstrap data already found and encrypted with different token"

@brandond How is the token passed to rke2/etcd ? thanks.

@megabreit
Copy link

@w13915984028 Just an idea: You changed the token also in /oem/90_custom.yaml?

Copy link
Contributor

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants