New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.2.4] etcd/controlplane master not registering on new clusters #21514
Comments
when k8s 1.14 is used, this error appears in etcd logs:
|
can someone please check this? its a show-stopper its clearly an issue with rancher: tls: bad certificate |
related #20909 |
The error shown usually indicates nodes being re-used and data is left behind which clashes with the newly generated data. If its reproducible every time, please provide exact steps that you've taken so I can use that to reproduce on my end (see https://rancher.com/docs/rancher/v2.x/en/contributing/#bugs-issues-or-questions. Adding a etcd + controlplane node in a custom cluster (1.13 + flannel) does not reproduce. |
@superseb I suspect the issue lies with the rancher-master. The etcd node is a brand new fresh VM with fresh hostname (because of the famous hostname issue...when is that ever going to be fixed?). I tried this 4-5 times now, making sure its always a new VM with new disk and new different hostname steps:
Ubuntu 18.04 LTS |
When did it start to happen (as it doesnt happen on a fresh 2.2.4 Rancher)? What is in the log of the rancher container (possible leads to why its failing) |
Since the upgrade to 2.2.4. I dont see anything on the rancher master, nor in the other pods
same time on the etcd pod:
|
@superseb Where are the certs coming from? I assume the issue is the certs provided by the rancher master? Maybe because we started out with 2.1 or something and upgrades did not fix some issue with them. Would explain why a fresh 2.4 setup does not have this problem. |
When you create a new cluster and add the node, rancher starts provisioning the cluster. Logging from this would be helpful as it will show what part of the provisioning process succeeds and where it fails. Also agent logs would help as I see connections to Rancher dropping (from the new node in the new cluster and possibly from other nodes in other clusters) |
rancher master logs during cluster creation and node adding (I removed a few i/o timeout and warning lines that are unrelated, was too much log spam)
Agent pods only show this:
|
@superseb same issue on 2.2.6 master and 2.2.6 agent. This is 100% related to rancher upgrades from 2.1 or 2.2
|
Can you share the output of:
|
the ca.pem
|
The timestamps of the certificates don't match, what is the output from |
@superseb You're right, our base image was polluted for unknown reasons. I forgot to check that. |
Deleting my |
Thank you! I spent all day while found it! I believe Rancher team could improve provisioning script to detect if these files already persists and delete/warn about them. |
rke cert rotate Did the job for me. |
Clearing /etc/kubernets worked for me thanks. |
Clearing |
I've encountered this issue on a fresh VM install using v2.5.8. I checked the ssl and everything was correct and updated. I did cleanup all folders and followed the steps here to clean the nodes and still no luck. The only thing that fixed it was to turn-off |
Fresh custom cluster. Rancher 2.2.4
k8s 1.13, flannel
steps: create new cluster, add master node
Master node can't register, one of the pods endlessly loops:
etcd pod shows:
The text was updated successfully, but these errors were encountered: