Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd-snapshot save times out in 10 seconds the first try #9985

Closed
1 of 2 tasks
aganesh-suse opened this issue Apr 19, 2024 · 1 comment
Closed
1 of 2 tasks

etcd-snapshot save times out in 10 seconds the first try #9985

aganesh-suse opened this issue Apr 19, 2024 · 1 comment
Assignees
Labels
kind/bug Something isn't working status/blocker
Milestone

Comments

@aganesh-suse
Copy link

Issue found on master branch with version v1.29.4-rc1+k3s1

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_VERSION='v1.29.4-rc1+k3s1' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Perform etcd-snapshot save with s3 details provided:
$ sudo /usr/local/bin/k3s etcd-snapshot save --s3 --s3-bucket=<bucket> --s3-folder=<folder> --s3-region=<region> --s3-access-key=xxxx --s3-secret-key="xxxx" --debug 

Expected behavior:

The saved snapshot should succeed. The first attempt times out in 10 seconds. And a retry attempted after maybe 30 to 60 seconds starts saving successfully again.

Reproducing Results/Observations:

  • k3s version used for replication:
$ k3s -v
k3s version v1.29.4-rc1+k3s1 (d973fadb)
go version go1.21.9
$ sudo /usr/local/bin/k3s etcd-snapshot save --s3 --s3-bucket=<bucket> --s3-folder=<folder> --s3-region=<region> --s3-access-key=xxxx --s3-secret-key="xxxx" --debug 
time="2024-04-19T18:04:31Z" level=warning msg="Unknown flag --cluster-init found in config.yaml, skipping\n"
time="2024-04-19T18:04:31Z" level=warning msg="Unknown flag --write-kubeconfig-mode found in config.yaml, skipping\n"
time="2024-04-19T18:04:31Z" level=warning msg="Unknown flag --node-external-ip found in config.yaml, skipping\n"
time="2024-04-19T18:04:31Z" level=warning msg="Unknown flag --node-label found in config.yaml, skipping\n"
time="2024-04-19T18:04:31Z" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
time="2024-04-19T18:04:41Z" level=fatal msg="see server log for details: Post \"https://127.0.0.1:6443/db/snapshot\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

Also, on the journal logs, the server is still performing some operations during the above timeframe, and we do not see any error reported on the journal logs, when the client timed out.

@aganesh-suse
Copy link
Author

Validated on master branch with commit d3b6054

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='d3b60543e7df924881854108984593aafb557d3c' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Perform etcd-snapshot save with s3 details provided:
$ sudo /usr/local/bin/k3s etcd-snapshot save --s3 --s3-bucket=<bucket> --s3-folder=<folder> --s3-region=<region> --s3-access-key=xxxx --s3-secret-key="xxxx" --debug 

Expected Behavior:
etcd snapshot save action should be successful and not timeout in 10 seconds.

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.29.4+k3s-d3b60543 (d3b60543)
go version go1.21.9
$ sudo /usr/local/bin/k3s etcd-snapshot save --s3 --s3-bucket=<s3-bucket> --s3-region=<s3-region> --s3-access-key=xxxx --s3-secret-key="xxxx" --debug
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --cluster-init found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --write-kubeconfig-mode found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --node-external-ip found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Unknown flag --node-label found in config.yaml, skipping\n"
time="2024-04-22T20:19:49Z" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
time="2024-04-22T20:20:19Z" level=info msg="Snapshot on-demand-ip-172-31-16-180-1713817190 saved."
time="2024-04-22T20:20:19Z" level=info msg="Snapshot on-demand-ip-172-31-16-180-1713817190 saved."

As we can see from log timings above, the save did not timeout in 10 seconds. It waits for the save completion and the save is successful. Closing the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working status/blocker
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants