Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.28] - etcd snapshot controller thrashes on etcdsnapshotfile management when server is run with --disable-agent #9814

Closed
brandond opened this issue Mar 27, 2024 · 1 comment
Assignees
Milestone

Comments

@brandond
Copy link
Contributor

Backport fix for etcd snapshot controller thrashes on etcdsnapshotfile management when server is run with --disable-agent

@aganesh-suse
Copy link

Validated on release-1.28 branch with commit feb211d

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
disable-agent: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server
etcd-snapshot-retention: 2
etcd-snapshot-schedule-cron: "* * * * *"
etcd-s3: true
etcd-s3-access-key: xxxx
etcd-s3-secret-key: xxxx
etcd-s3-bucket: xxxx
etcd-s3-folder: xxxx
etcd-s3-region: xxxx

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='feb211d3ce0c41ee8d02dfc9164bb9c7dd97533c' sh -s - server
  1. Check the journal logs for reconciliation error messages:
$ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing'
$ sudo journalctl -xeu k3s | grep error | grep snapshot

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.28.8+k3s1 (653dd61a)
go version go1.21.8
 $ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing' 
Apr 12 18:16:01 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:16:01Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 18:16:34 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:16:34Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 18:17:05 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:17:05Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
Apr 12 18:17:37 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:17:37Z" level=debug msg="Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing"
 $ sudo journalctl -xeu k3s | grep error | grep snapshot 
Apr 12 18:15:42 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:42Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945693-fa6b85, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 589f2cfb-de3c-48ca-b937-edf509c99b29, UID in object meta: , requeuing"
Apr 12 18:15:43 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:43Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:44 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:44Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 9fae5bc1-2147-48c1-9779-aee5305ed898, UID in object meta: , requeuing"
Apr 12 18:15:46 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:46Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945693-fa6b85, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 27b49a1f-5b6e-40fa-8797-19ccd0e55fe4, UID in object meta: , requeuing"
Apr 12 18:15:47 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:47Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:48 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:48Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945685-42d628, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 20905e61-ae7f-41c9-b695-c4f361a47500, UID in object meta: , requeuing"
Apr 12 18:15:50 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:50Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: bf4466e5-2ed4-4f8a-aa4d-2ef1aec15419, UID in object meta: , requeuing"
Apr 12 18:15:51 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:51Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:52 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:52Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945693-fa6b85, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: cfcc2ed3-d048-4d59-a676-490f12ee1961, UID in object meta: , requeuing"
Apr 12 18:15:54 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:54Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945685-42d628, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: e9ffec4f-ed1a-41dd-a3bd-665657646d79, UID in object meta: , requeuing"
Apr 12 18:15:55 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:55Z" level=error msg="Failed to record snapshots for cluster: nodes \"ip-172-31-16-179\" not found"
Apr 12 18:15:56 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:56Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945685-42d628, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: a30b91e5-e9ab-46a6-a436-8267a4f98b85, UID in object meta: , requeuing"
Apr 12 18:15:57 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:57Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945693-fa6b85, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: f2dc8516-49fb-4522-a183-d1995104e316, UID in object meta: , requeuing"
Apr 12 18:15:58 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:58Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 816f76c6-3937-4f53-bcc9-4439731529bf, UID in object meta: , requeuing"
Apr 12 18:15:58 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:58Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945685-42d628': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945685-42d628\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945685-42d628, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 87fdb54e-1d91-43cb-8948-353c31236080, UID in object meta: , requeuing"
Apr 12 18:15:58 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:58Z" level=error msg="error syncing 'local-on-demand-ip-172-31-16-179-1712945693-fa6b85': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-on-demand-ip-172-31-16-179-1712945693-fa6b85\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-on-demand-ip-172-31-16-179-1712945693-fa6b85, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: cf28b383-3d46-49fe-b63d-609df1517dee, UID in object meta: , requeuing"
Apr 12 18:15:59 ip-172-31-16-179 k3s[2629]: time="2024-04-12T18:15:59Z" level=error msg="error syncing 'local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac': handler managed-etcd-snapshots-controller: Operation cannot be fulfilled on etcdsnapshotfiles.k3s.cattle.io \"local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac\": StorageError: invalid object, Code: 4, Key: /registry/k3s.cattle.io/etcdsnapshotfiles/local-etcd-snapshot-ip-172-31-16-179-1712945705-0890ac, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 3526385f-d60b-4c9c-9ccf-f656e8781a61, UID in object meta: , requeuing"

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.28.8+k3s-feb211d3 (feb211d3)
go version go1.21.8
 $ sudo journalctl -xeu k3s | grep 'Failed to reconcile snapshot ConfigMap: no nodes have reconciled ETCDSnapshotFile resources, requeuing' 
 $ sudo journalctl -xeu k3s | grep error | grep snapshot 
Apr 12 17:04:39 ip-172-31-26-137 k3s[2555]: time="2024-04-12T17:04:39Z" level=debug msg="Error encountered attempting to retrieve extra metadata from k3s-etcd-snapshot-extra-metadata ConfigMap, error: configmaps \"k3s-etcd-snapshot-extra-metadata\" not found"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

3 participants