Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leader-elected etcd controllers not consistently functional when leader election/lease mismatches occur #5866

Closed
Oats87 opened this issue Apr 29, 2024 · 6 comments
Assignees

Comments

@Oats87
Copy link
Contributor

Oats87 commented Apr 29, 2024

For the sake of sanity, I am not going to copy-paste the issue description and attempt to replace instances of k3s with rke2, but the same issue as reported here: k3s-io/k3s#10046 also applies to RKE2.

This has been one of the major causes of the 0b snapshot issue that we've been having on the Rancher provisioning side.

Also linked is #5008

@brandond
Copy link
Contributor

brandond commented Apr 29, 2024

The previous cause of the 0b snapshots was entries in the configmap with no extra metadata, it sounds like that can also be caused by snapshots that show up in the etcd-snapshot list command but are not in the configmap?

@fmoral2
Copy link
Contributor

fmoral2 commented Jun 10, 2024

Not able to reproduce this
@Oats87

my exact steps :

  • SERVER-CP 1
 kubectl create configmap -n kube-system rke2-etcd-snapshot-extra-metadata --from-literal=foo=bar

kubectl edit lease rke2-etcd -n kube-system >> change holderIdentity to   SERVER-CP 3
 sudo rke2 etcd-snapshot save
 sudo rke2 etcd-snapshot save
  • SERVER-CP 3
sudo systemctl  restart rke2-server.service
kubectl get leases -n kube-system

sudo rke2 etcd-snapshot save
sudo rke2 etcd-snapshot save

here we can see that  even i adding ip from SERVER-CP 3  on the holderIdentity it gets the ip from another server-cp in this case the second 

$ kubectl get leases -n kube-system
NAME                                   HOLDER                                                                      AGE
apiserver-sadas   apiserver-das-das-das-das-das   26m
apiserver-dsad   apiserver-dsa-abdasc0-da-das-dsa   29m
apiserver-das   apiserver-das-dcd4-da-das-das   26m
kube-controller-manager                ip-172-1
kube-scheduler                         ip-172-1
rke2                                   ip-172-1
rke2-cloud-controller-manager          ip-172-1
rke2-etcd                              ip-172-2
  • SERVER-CP 2
sudo rke2 etcd-snapshot save
  • In all 3 servers: rke2-etcd-snapshots is correctly 5

  kubectl get configmap -n kube-system
NAME                                                   DATA   AGE
chart-content-rke2-canal                               1      12m
chart-content-rke2-coredns                             1      12m
chart-content-rke2-ingress-nginx                       1      12m
chart-content-rke2-metrics-server                      1      12m
chart-content-rke2-snapshot-controller                 1      12m
chart-content-rke2-snapshot-controller-crd             1      12m
chart-content-rke2-snapshot-validation-webhook         1      12m
cluster-dns                                            2      12m
extension-apiserver-authentication                     6      12m
kube-apiserver-legacy-service-account-token-tracking   1      12m
kube-root-ca.crt                                       1      12m
rke2-canal-config                                      7      12m
rke2-coredns-rke2-coredns                              1      12m
rke2-coredns-rke2-coredns-autoscaler                   1      12m
rke2-etcd-snapshot-extra-metadata                      1      5m45s
rke2-etcd-snapshots                                    5      3m43s
rke2-ingress-nginx-controller                          1      11m

take snapshots on all 3 nodes
all snapshots are under kubectl get configmap -n kube-system and rke2 etcd-snapshot list in all nodes

@fmoral2
Copy link
Contributor

fmoral2 commented Jun 10, 2024

@brandond any tips?

@brandond
Copy link
Contributor

@fmoral2 you might team up with @VestigeJ as he validated this on k3s: k3s-io/k3s#10046 (comment)

@fmoral2
Copy link
Contributor

fmoral2 commented Jun 10, 2024

@fmoral2 you might team up with @VestigeJ as he validated this on k3s: k3s-io/k3s#10046 (comment)

i did yeah, but it seems the same.

@brandond
Copy link
Contributor

brandond commented Jun 11, 2024

@fmoral2 perhaps lets mark this as validated in k3s for now, since thats where the code change occurret. If @Oats87 can provide steps to reproduce on an affected release of RKE2 we can give that a try.

@fmoral2 fmoral2 closed this as completed Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants