Can't join existing cluster JWS not found? #668

brendandburns · 2018-01-23T07:15:03Z

Trying to join an existing ~20 day old cluster:

sudo kubeadm join --token <redacted>
[preflight] Running pre-flight checks.
	[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.01.0-ce. Max validated version: 17.03
	[WARNING FileExisting-crictl]: crictl not found in system path
[discovery] Trying to connect to API Server "10.0.0.1:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.1:6443"
[discovery] Failed to connect to API Server "10.0.0.1:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "aec65f" is invalid for this cluster, can't connect
[discovery] Trying to connect to API Server "10.0.0.1:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.1:6443"
[discovery] Failed to connect to API Server "10.0.0.1:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "aec65f" is invalid for this cluster, can't connect

Any ideas?

Thanks

Cluster was created by kubeadm 1.9.0, current kubeadm is 1.9.2

The text was updated successfully, but these errors were encountered:

stewart-yu · 2018-01-23T07:36:09Z

It seem that token expired, Use kubeadm token create on the master node to creating a new valid token

brendandburns · 2018-01-24T00:30:57Z

I tried that. The token in the error message changes, but the error persists.

I can see the tokens when I run kubeadm token list on the master, any more things I can do to help debug?

dixudx · 2018-01-24T13:48:59Z

@brendandburns Can you connect to the cluster now?

Please help check the configmap cluster-info in kube-public namespace. Seems this configmap is damaged. Missing key jws-kubeconfig-aec65f in cluster-info.

brendandburns · 2018-01-24T18:04:32Z

Sorry, I nuked the cluster and rebuilt it from scratch. I will try to reproduce this later today and see if it re-occurs.

Thanks
--brendan

arunmk · 2018-01-27T05:52:42Z

I saw this happen on a node with Kubernetes 1.6.8 today. This doesn't seem to be a ttl issue and the workaround mentioned in [https://github.com//issues/335](Bug 335) does not help. The cluster-info does not have the jws-kubeconfig set. At what step is this variable set? We use kubeadm to set up the cluster.

dixudx · 2018-01-28T07:30:57Z

@arunmk You can try to run kubeadm token create to create a new token.

arunmk · 2018-01-28T21:48:31Z

@dixudx creation of a new token did not help and the new configmap also has the same issue (no jws-kubeconfig). I think there should be something unique to the node and am debugging along those lines.

Do you know if there are some packages needed etc for this token to be created on the machine?

EDIT: Apologies @dixudx, this command does work on a test node. I mistook the token create command to mean recreation of the cluster with the new token. I'll try on the other node of interest and update the thread.

timothysc · 2018-01-30T00:40:04Z

/cc @mattmoyer

arunmk · 2018-01-30T00:48:35Z

@dixudx on a machine where I don't have much access, the jws-kubeconfig does not get created. Here is what I do:
kubectl token create --ttl=0
kubectl -n kube-public get configmap cluster-info -o json

The second command yields only the kubeconfig child under the data field and the jws-kubeconfig-* field is missing. The kubeadm token list does show the newly created tokens but the kube-public configmap does not have it.

binarybana · 2018-01-31T01:28:21Z

I just encountered this problem and a kubeadm token create <existing token> DID seem to populate the jws-kubeconfig-<string> field of the cloud-info ConfigMap for me.

This was using kubeadm 1.9.2 from the master, using an existing token as an argument. Interestingly kubeadm token list was empty prior to running the token creation.

dixudx · 2018-02-01T08:19:27Z

@arunmk Works well on my env.

root@server-01:~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
root@server-01:~# kubeadm token create --ttl 0
aacff5.cb1a195970ddba98
root@server-01:~# kubectl get configmap cluster-info --namespace=kube-public -o json
{
    "apiVersion": "v1",
    "data": {
        "jws-kubeconfig-aacff5": "eyJhbGciOiJIUzI1NiIsImtpZCI6ImFhY2ZmNSJ9..Xgn2YRa_SM5qHq04vw_8SF-5-6nztGBi-4euSCIz_6Q",
        "kubeconfig": "apiVersion: v1\nclusters:\n- cluster:\n    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNE1ERXlOVEF6TVRrME5sb1hEVEk0TURFeU16QXpNVGswTmxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTHNmCks4b2R0bmExdDdRSlpxUjBCYnA4aVJqMWQ5OHQwRHRoUVBPOUlMTUt6M3F3c09qWWtXL0ZQc2R0QU1BYmw0Q24KM2lTajJqWW9TTTNLTUJpNUg1QjdhVHM1OEN6Rk85Q3FVVDkyUHFiMkhmVnZYNjdPZ1poZFo3ak9vVUNXM0pjYwprNnFDYjZxWWZSVkdXUkl4cldoQXJOTllyeEttZE01L1ErM3hFclJwdGtDaFVyTi9ETk05ZGhTQlBpRFBocEVoCkgyNy9Xa2JnUm95TThZQ1F5bTZaa204eGR2Zk1DVEV0WHgvdko4U0lERHYyS1orZnRPQWNoRCtsVmxpK2xJZ1MKVmttdVAvU0lNRU5sMDVMeTBqQVlyM004QkNMeUx6bWRiZU1zMlpCWTltMk1kckJ4S2lLRXg3RkZlTG9odGw3NwpEZGVxakhqaklaMW1oTUxqMXY4Q0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFMSjYxY1FzY3FPeDZxeXJrU0lkSmRKbWs4WmsKcVk1TldUa2RBRFFQTFBLSEo0eSt0TEpYVE81dGRKQ0NYT1JpeHJCMk9SQ0k5M2dGMWJwcWhOamlUNE40Rmg0YQpwSUl1RmZkNURRaWVQTWlFdEl2cmVKNEY3ZVZWWENOMitNZ0l6ZFRCZTRHQmczcXIrVk43TGtndEdmZllubEFuClRheVE3MkhnZG40YXlYcUdSQm1lTzBYeExSTTZUck1PeWtPckhkdVdtNVBCbXNmdENzM3IxdGczTmVHNEwyYzAKS1J6TkI5cVVneW9hOTlEMWFZcDNaWVBacDhORFBiR2Rsay9GaTRXOWZFZkIrem5BVXkrVGdqK1VQUjY3SzZpMApvbEkvWG1YemVNaHBEQ2F1Y2RsK2d4emNJeFhSQmxkQkFWREE5a0RNa0xuQ2VEL1FpUEpOZmxuZHBJbz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=\n    server: https://192.168.31.100:6443\n  name: \"\"\ncontexts: []\ncurrent-context: \"\"\nkind: Config\npreferences: {}\nusers: []\n"
    },
    "kind": "ConfigMap",
    "metadata": {
        "creationTimestamp": "2018-01-25T03:20:16Z",
        "name": "cluster-info",
        "namespace": "kube-public",
        "resourceVersion": "125266",
        "selfLink": "/api/v1/namespaces/kube-public/configmaps/cluster-info",
        "uid": "aa1498ff-017e-11e8-abe2-023e88328eba"
    }
}
root@server-01:~# kubeadm token list
TOKEN                     TTL         EXPIRES   USAGES                   DESCRIPTION   EXTRA GROUPS
aacff5.cb1a195970ddba98   <forever>   <never>   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token

bart0sh · 2018-02-14T16:03:46Z

Can you show controller-manager log from your master? It looks like cluster-info configmap was not updated for some reason. I hope there will be errors in the controller manager log that would help to investigate why that happened.

bart0sh · 2018-02-17T17:44:10Z

@brendandburns @binarybana Can you show controller-manager log from your master?

arunmk · 2018-02-20T17:56:39Z

@bart0sh how do we get the controller manager logs? I looked at the etcd database using the etcdctl utility and see that, on a working machine the token is present while on the failing machine the token isn't. This is the command used:
ETCDCTL_API=3 ./etcdctl get "/registry/configmaps/kube-public/cluster-info"

Working machine:
cluster-info kube-public"6/api/v1/namespaces/kube-public/configmaps/cluster-info*$db43c18c-fbe3-11e7-99fe-001e67f85cfc28B ��܉zn jws-kubeconfig-266b02UeyJhbGciOiJIUzI1NiIsImtpZCI6IjI2NmIwMiJ9..3Fz-FLir0GSBPeKGS5u5FHsm69YA6MULOIKvJUO4CUc�

Failing machine:
cluster-info kube-public"*$78557677-0a90-11e8-b710-94188268af7028B ÈâÓ¶æÑùz¤

arunmk · 2018-02-20T18:40:16Z

There are also numerous errors in the journal logs as follows:
k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.Secret: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list secrets in the namespace "kube-system".: "role.rbac.authorization.k8s.io \"system:controller:bootstrap-signer\" not found" (get secrets)

bart0sh · 2018-02-20T19:50:13Z

@arunmk you can get controller manager logs this way:

kubectl logs <controller-manager pod> --namespace=kube-system

You can see name of its pod using kubectl get pods:

kubectl get pods --all-namespaces | grep controller-manager

It would be interesting to see its logs when new token is created and configmap requested:

kubeadm token create && kubectl get configmap cluster-info --namespace=kube-public -o json

arunmk · 2018-02-21T04:04:22Z

@bart0sh ah ok, you were mentioning the pod logs. I was thinking of some process related logs in journald. I’ll get these out.

marranz · 2018-02-22T16:19:31Z

Hi,

I'm experiencing this same problem ONLY when creating the master with the flags --cloud=aws or when i enable it on the config file.

this is a config file example:

apiVersion: kubeadm.k8s.io/v1alpha1 kind: MasterConfiguration api: advertiseAddress: 10.1.11.22 networking: podSubnet: 192.168.0.0/16 cloudProvider: aws

When i remove 'cloudProvider: aws' on new nodes or after running 'kubeadm reset' i can join the nodes without errors.

Should i open another issue ?

arunmk · 2018-02-27T01:01:12Z

@bart0sh looks like the problem got fixed. We suspected that the secret could not be read by the system:serviceaccount:kube-system:bootstrap-signer in order to create a signed token. Hence we applied the attached file and that allowed the system:serviceaccount:kube-system:bootstrap-signer to read and sign the token.

k8s-workaround-cr.txt

bart0sh · 2018-02-27T10:18:54Z

@arunmk Thank you for the info. Can you tell how have you discovered the reason? Which logs did you look at, etc?

bart0sh · 2018-02-27T10:22:20Z

@marranz Can you try if @arunmk's solution works for you? Do you see anything suspicious in the controller-manager log?

arunmk · 2018-02-27T17:20:00Z

@bart0sh I saw numerous errors in journalctl logs of the form:
k8s.io/kubernetes/pkg/controller/bootstrap/bootstrapsigner.go:151: Failed to list *v1.Secret: User "system:serviceaccount:kube-system:bootstrap-signer" cannot list secrets in the namespace "kube-system".: "role.rbac.authorization.k8s.io \"system:controller:bootstrap-signer\" not found" (get secrets)

That, and the dump of the etcd led us to suspect that the token was not present because it was probably not getting signed (sue to the error above). Hence I tried granting rights to allow access to the signature.

timothysc · 2018-03-22T21:39:52Z

/cc @liztio

liztio · 2018-03-30T14:43:40Z

@marranz hey, I repro'd your bug and am working on tests / a fix

liztio · 2018-04-03T21:18:22Z

update:
I tracked down the actual signing action to the "controller-manager" pod. And lo and behold, it's crash-looping with cloudProvider: aws:

ubuntu@ip-172-31-80-217:~$ kubectl logs -n kube-system kube-controller-manager-ip-172-31-80-217
I0403 21:11:54.344806       1 controllermanager.go:108] Version: v1.9.6
I0403 21:11:54.348686       1 leaderelection.go:174] attempting to acquire leader lease...
I0403 21:11:54.362171       1 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager
I0403 21:11:54.362616       1 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"e275f07f-3389-11e8-b7b8-12eedf22aef4", APIVersion:"v1", ResourceVersion:"272947", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-172-31-80-217 became leader
I0403 21:11:54.389127       1 aws.go:1000] Building AWS cloudprovider
I0403 21:11:54.389173       1 aws.go:963] Zone not specified in configuration file; querying AWS metadata service
E0403 21:11:55.040927       1 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
W0403 21:11:55.040956       1 tags.go:78] AWS cloud - no clusterID filtering applied for shared resources; do not run multiple clusters in this AZ.
F0403 21:11:55.041029       1 controllermanager.go:150] error building controller context: no ClusterID Found.  A ClusterID is required for the cloud provider to function properly.  This check can be bypassed by setting the allow-untagged-cloud option

That clusterID problem looks a lot like #53538. I'll investigate the solutions mentioned there tomorrow.

liztio · 2018-04-04T18:20:19Z

Like I speculated yesterday, this is not a bug related to signing. Rather, it's a documentation and error exposure failure.

using cloudProvider: aws imposes a number of additional requirements on an environment that aren't present for non-cloudProvider clusters. Failures are hidden away in secret places: the kubelet logs, the controller manager logs (as seen above), and the api server pod logs.

Here's how I got everything to work:

I had to set each node's hostname to a FQDN. On Ubuntu this meant: sudo hostname $(curl 169.254.169.254/latest/meta-data/hostname).
Each node needs an AWS tag that looks like kubernetes.io/cluster/<cluster-name>, set to either owned or `shared.
I gave the master an IAM role that lets it manage clusters.

There's a good writeup by Nate Baker on these requirements.

@marranz hope this helps!

arunmk · 2018-04-04T20:20:14Z

@liztio I don’t think this should be closed. The aws issue is one facet of the issue and the issue we hit had no relation to the cloud (on-prem scenario).

liztio · 2018-04-04T20:23:01Z

@arunmk ah, my mistake, I thought the other issue was solved (from this comment).

arunmk · 2018-04-04T20:26:06Z

@liztio no worries. I have a workaround, but it’s quite messy to automate, and we don’t have a root cause yet. Hence I don’t want to close this issue and hope for a cleaner fix. If the issue reported by @brendandburns is fixed by the commit please do mention it and I’ll open a new bug for my issue.

timothysc · 2018-04-09T13:53:45Z

@arunmk - can you distill the actionable requirements you are looking for? Most of these issues I have seen have todo with the ability to change the hostname.

arunmk · 2018-04-09T19:13:13Z

@timothysc I saw errors of the form mentioned in this comment: #668 (comment)
Also, when I dumped the config from etcd and using 'kubectl get configmap' on master, the tokens weren't present. However the token list command did show the tokens. (Do they pull from different locations in etcd?)

To get around this issue, we used the script mentioned in #668 (comment)

I don't have an RCA for that as it was on a machine where I don't have much access. What I am looking for are:

does anyone know of a root-cause for this issue?
is there a better workaround / some other fix related to system settings?

We can detect this issue and automate the workaround, but we don't want to do it until we understand the root cause. Why does the 'kube-system:bootstrap-signer' not have access to the secret?

liztio · 2018-04-10T21:15:26Z

@arunmk have you noticed anything about the environments the issue occurs in? I'm happy to dig in on this, but I need a starting point.

luxas · 2018-05-16T12:59:46Z

I think the first issue was a race condition in the controller-manager that has been fixed later, and the other issue conflated here is unrelated to the first comment. I'm closing this now, please file a new issue for the aws-related bug and reopen if you can reproduce the JWS-signing bug in a bare metal env with no cloud provider running kubeadm v1.10.

arunmk · 2018-05-16T16:57:37Z

@liztio I was away for a while and could not respond. Let me create a new bug with the required information if this reoccurs. Thanks for the fixes!

devtech0101 · 2020-08-23T17:58:48Z

It depends on what version of kubernetes you are running: example 1.18 you can generate new token that supports v=5.

step 1:
sudo kubeadm --v=5 token create --print-join-command (this will update cluster-info.yaml with JWS in data section)

step 2:
run the above command on node should work.
kubeadm join 172.16.26.136:6443 --token 0l27fp.tegcha916hiwn4lv --discovery-token-ca-cert-hash sha256:058073bb05c1d15ec802288c815e2f1d5fa12f912e6e7da9086f4b7c2e2aa850

arunmk · 2020-08-23T18:16:36Z

@liztio @timothysc @devtech0101 this bug has fallen off my radar and I have not heard of any new reports of this. Kubernetes also has moved quite a ways from 1.6.8 when this issue was seen to 1.18 nowadays. So I am fine with closing this issue.

m33m33k · 2020-09-07T08:54:04Z

This issue happend for me where the kube init, or create token was not creating the jws-kubeconfig token. I was using fedora coreos which was restrcitive in terms of write rights.

The part which says "/usr is mounted read-only on nodes" from this doco
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/ helped me fix the issue.

So basically kubeadm wasn't able to write because of insufficient write access.

brendandburns changed the title ~~Can't join existing cluster...~~ Can't join existing cluster JWS not found? Jan 23, 2018

timothysc added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jan 30, 2018

timothysc added the triaged label Jan 30, 2018

timothysc assigned liztio Apr 3, 2018

timothysc added this to the v1.11 milestone Apr 3, 2018

timothysc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 3, 2018

timothysc removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 4, 2018

liztio closed this as completed Apr 4, 2018

liztio reopened this Apr 4, 2018

timothysc added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed triaged priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 26, 2018

timothysc assigned timothysc and unassigned liztio May 15, 2018

luxas closed this as completed May 16, 2018

distortedsignal mentioned this issue Oct 22, 2020

Token not being added to configmap after kubeadm token create #2331

Closed

Can't join existing cluster JWS not found? #668

Can't join existing cluster JWS not found? #668

Comments

brendandburns commented Jan 23, 2018 • edited

stewart-yu commented Jan 23, 2018 • edited

brendandburns commented Jan 24, 2018

dixudx commented Jan 24, 2018 • edited

brendandburns commented Jan 24, 2018

arunmk commented Jan 27, 2018 • edited

dixudx commented Jan 28, 2018

arunmk commented Jan 28, 2018 • edited

timothysc commented Jan 30, 2018

arunmk commented Jan 30, 2018 • edited

binarybana commented Jan 31, 2018

dixudx commented Feb 1, 2018

bart0sh commented Feb 14, 2018

bart0sh commented Feb 17, 2018

arunmk commented Feb 20, 2018

arunmk commented Feb 20, 2018

bart0sh commented Feb 20, 2018

arunmk commented Feb 21, 2018

marranz commented Feb 22, 2018

arunmk commented Feb 27, 2018

bart0sh commented Feb 27, 2018

bart0sh commented Feb 27, 2018

arunmk commented Feb 27, 2018

timothysc commented Mar 22, 2018

liztio commented Mar 30, 2018

liztio commented Apr 3, 2018

liztio commented Apr 4, 2018

arunmk commented Apr 4, 2018

liztio commented Apr 4, 2018

arunmk commented Apr 4, 2018

timothysc commented Apr 9, 2018

arunmk commented Apr 9, 2018 • edited

liztio commented Apr 10, 2018

luxas commented May 16, 2018 • edited

arunmk commented May 16, 2018

devtech0101 commented Aug 23, 2020 • edited

arunmk commented Aug 23, 2020

m33m33k commented Sep 7, 2020 • edited

brendandburns commented Jan 23, 2018 •

edited

stewart-yu commented Jan 23, 2018 •

edited

dixudx commented Jan 24, 2018 •

edited

arunmk commented Jan 27, 2018 •

edited

arunmk commented Jan 28, 2018 •

edited

arunmk commented Jan 30, 2018 •

edited

arunmk commented Apr 9, 2018 •

edited

luxas commented May 16, 2018 •

edited

devtech0101 commented Aug 23, 2020 •

edited

m33m33k commented Sep 7, 2020 •

edited