New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't join existing cluster JWS not found? #668
Comments
It seem that token expired, Use |
I tried that. The token in the error message changes, but the error persists. I can see the tokens when I run |
@brendandburns Can you connect to the cluster now? Please help check the configmap |
Sorry, I nuked the cluster and rebuilt it from scratch. I will try to reproduce this later today and see if it re-occurs. Thanks |
I saw this happen on a node with Kubernetes 1.6.8 today. This doesn't seem to be a ttl issue and the workaround mentioned in [https://github.com//issues/335](Bug 335) does not help. The cluster-info does not have the |
@arunmk You can try to run |
@dixudx creation of a new token did not help and the new configmap also has the same issue (no jws-kubeconfig). I think there should be something unique to the node and am debugging along those lines. Do you know if there are some packages needed etc for this token to be created on the machine? EDIT: Apologies @dixudx, this command does work on a test node. I mistook the token create command to mean recreation of the cluster with the new token. I'll try on the other node of interest and update the thread. |
/cc @mattmoyer |
@dixudx on a machine where I don't have much access, the The second command yields only the |
I just encountered this problem and a This was using kubeadm 1.9.2 from the master, using an existing token as an argument. Interestingly |
@arunmk Works well on my env. root@server-01:~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
root@server-01:~# kubeadm token create --ttl 0
aacff5.cb1a195970ddba98
root@server-01:~# kubectl get configmap cluster-info --namespace=kube-public -o json
{
"apiVersion": "v1",
"data": {
"jws-kubeconfig-aacff5": "eyJhbGciOiJIUzI1NiIsImtpZCI6ImFhY2ZmNSJ9..Xgn2YRa_SM5qHq04vw_8SF-5-6nztGBi-4euSCIz_6Q",
"kubeconfig": "apiVersion: v1\nclusters:\n- cluster:\n certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNE1ERXlOVEF6TVRrME5sb1hEVEk0TURFeU16QXpNVGswTmxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTHNmCks4b2R0bmExdDdRSlpxUjBCYnA4aVJqMWQ5OHQwRHRoUVBPOUlMTUt6M3F3c09qWWtXL0ZQc2R0QU1BYmw0Q24KM2lTajJqWW9TTTNLTUJpNUg1QjdhVHM1OEN6Rk85Q3FVVDkyUHFiMkhmVnZYNjdPZ1poZFo3ak9vVUNXM0pjYwprNnFDYjZxWWZSVkdXUkl4cldoQXJOTllyeEttZE01L1ErM3hFclJwdGtDaFVyTi9ETk05ZGhTQlBpRFBocEVoCkgyNy9Xa2JnUm95TThZQ1F5bTZaa204eGR2Zk1DVEV0WHgvdko4U0lERHYyS1orZnRPQWNoRCtsVmxpK2xJZ1MKVmttdVAvU0lNRU5sMDVMeTBqQVlyM004QkNMeUx6bWRiZU1zMlpCWTltMk1kckJ4S2lLRXg3RkZlTG9odGw3NwpEZGVxakhqaklaMW1oTUxqMXY4Q0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFMSjYxY1FzY3FPeDZxeXJrU0lkSmRKbWs4WmsKcVk1TldUa2RBRFFQTFBLSEo0eSt0TEpYVE81dGRKQ0NYT1JpeHJCMk9SQ0k5M2dGMWJwcWhOamlUNE40Rmg0YQpwSUl1RmZkNURRaWVQTWlFdEl2cmVKNEY3ZVZWWENOMitNZ0l6ZFRCZTRHQmczcXIrVk43TGtndEdmZllubEFuClRheVE3MkhnZG40YXlYcUdSQm1lTzBYeExSTTZUck1PeWtPckhkdVdtNVBCbXNmdENzM3IxdGczTmVHNEwyYzAKS1J6TkI5cVVneW9hOTlEMWFZcDNaWVBacDhORFBiR2Rsay9GaTRXOWZFZkIrem5BVXkrVGdqK1VQUjY3SzZpMApvbEkvWG1YemVNaHBEQ2F1Y2RsK2d4emNJeFhSQmxkQkFWREE5a0RNa0xuQ2VEL1FpUEpOZmxuZHBJbz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=\n server: https://192.168.31.100:6443\n name: \"\"\ncontexts: []\ncurrent-context: \"\"\nkind: Config\npreferences: {}\nusers: []\n"
},
"kind": "ConfigMap",
"metadata": {
"creationTimestamp": "2018-01-25T03:20:16Z",
"name": "cluster-info",
"namespace": "kube-public",
"resourceVersion": "125266",
"selfLink": "/api/v1/namespaces/kube-public/configmaps/cluster-info",
"uid": "aa1498ff-017e-11e8-abe2-023e88328eba"
}
}
root@server-01:~# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
aacff5.cb1a195970ddba98 <forever> <never> authentication,signing <none> system:bootstrappers:kubeadm:default-node-token |
Can you show controller-manager log from your master? It looks like cluster-info configmap was not updated for some reason. I hope there will be errors in the controller manager log that would help to investigate why that happened. |
@brendandburns @binarybana Can you show controller-manager log from your master? |
@bart0sh how do we get the controller manager logs? I looked at the etcd database using the etcdctl utility and see that, on a working machine the token is present while on the failing machine the token isn't. This is the command used: Working machine: Failing machine: |
There are also numerous errors in the journal logs as follows: |
@arunmk you can get controller manager logs this way:
You can see name of its pod using kubectl get pods:
It would be interesting to see its logs when new token is created and configmap requested:
|
@bart0sh ah ok, you were mentioning the pod logs. I was thinking of some process related logs in journald. I’ll get these out. |
Hi, I'm experiencing this same problem ONLY when creating the master with the flags --cloud=aws or when i enable it on the config file. this is a config file example:
When i remove 'cloudProvider: aws' on new nodes or after running 'kubeadm reset' i can join the nodes without errors. Should i open another issue ? |
@bart0sh looks like the problem got fixed. We suspected that the secret could not be read by the |
@arunmk Thank you for the info. Can you tell how have you discovered the reason? Which logs did you look at, etc? |
@bart0sh I saw numerous errors in journalctl logs of the form: That, and the dump of the etcd led us to suspect that the token was not present because it was probably not getting signed (sue to the error above). Hence I tried granting rights to allow access to the signature. |
/cc @liztio |
@marranz hey, I repro'd your bug and am working on tests / a fix |
update:
That clusterID problem looks a lot like #53538. I'll investigate the solutions mentioned there tomorrow. |
Like I speculated yesterday, this is not a bug related to signing. Rather, it's a documentation and error exposure failure. using Here's how I got everything to work:
There's a good writeup by Nate Baker on these requirements. @marranz hope this helps! |
@liztio I don’t think this should be closed. The aws issue is one facet of the issue and the issue we hit had no relation to the cloud (on-prem scenario). |
@arunmk ah, my mistake, I thought the other issue was solved (from this comment). |
@liztio no worries. I have a workaround, but it’s quite messy to automate, and we don’t have a root cause yet. Hence I don’t want to close this issue and hope for a cleaner fix. If the issue reported by @brendandburns is fixed by the commit please do mention it and I’ll open a new bug for my issue. |
@arunmk - can you distill the actionable requirements you are looking for? Most of these issues I have seen have todo with the ability to change the hostname. |
@timothysc I saw errors of the form mentioned in this comment: #668 (comment) To get around this issue, we used the script mentioned in #668 (comment) I don't have an RCA for that as it was on a machine where I don't have much access. What I am looking for are:
We can detect this issue and automate the workaround, but we don't want to do it until we understand the root cause. Why does the 'kube-system:bootstrap-signer' not have access to the secret? |
@arunmk have you noticed anything about the environments the issue occurs in? I'm happy to dig in on this, but I need a starting point. |
I think the first issue was a race condition in the controller-manager that has been fixed later, and the other issue conflated here is unrelated to the first comment. I'm closing this now, please file a new issue for the aws-related bug and reopen if you can reproduce the JWS-signing bug in a bare metal env with no cloud provider running kubeadm v1.10. |
@liztio I was away for a while and could not respond. Let me create a new bug with the required information if this reoccurs. Thanks for the fixes! |
It depends on what version of kubernetes you are running: example 1.18 you can generate new token that supports v=5. step 1: step 2: |
@liztio @timothysc @devtech0101 this bug has fallen off my radar and I have not heard of any new reports of this. Kubernetes also has moved quite a ways from 1.6.8 when this issue was seen to 1.18 nowadays. So I am fine with closing this issue. |
This issue happend for me where the kube init, or create token was not creating the jws-kubeconfig token. I was using fedora coreos which was restrcitive in terms of write rights. The part which says "/usr is mounted read-only on nodes" from this doco So basically kubeadm wasn't able to write because of insufficient write access. |
Trying to join an existing ~20 day old cluster:
Any ideas?
Thanks
Cluster was created by kubeadm
1.9.0
, current kubeadm is1.9.2
The text was updated successfully, but these errors were encountered: