[2.2.4] etcd/controlplane master not registering on new clusters #21514

superxor · 2019-07-13T10:23:37Z

Fresh custom cluster. Rancher 2.2.4
k8s 1.13, flannel

steps: create new cluster, add master node

Master node can't register, one of the pods endlessly loops:

docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet

sleep 2

etcd pod shows:

WARNING: 2019/07/13 10:19:35 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

slamdev · 2019-07-13T11:16:02Z

when k8s 1.14 is used, this error appears in etcd logs:

2019-07-13 11:14:24.030026 I | embed: rejected connection from "10.3.1.20:40787" (error "EOF", ServerName "")
2019-07-13 11:14:32.866684 I | embed: rejected connection from "10.3.1.20:58550" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-ca\")", ServerName "")
2019-07-13 11:14:37.882558 I | embed: rejected connection from "10.3.1.20:58552" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-ca\")", ServerName "")
2019-07-13 11:14:42.905844 I | embed: rejected connection from "10.3.1.20:58554" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-ca\")", ServerName "")

superxor · 2019-07-24T08:47:03Z

can someone please check this? its a show-stopper

its clearly an issue with rancher: tls: bad certificate
I can reproduce it every time

superxor · 2019-07-24T08:49:23Z

related #20909

superseb · 2019-07-24T09:00:10Z

The error shown usually indicates nodes being re-used and data is left behind which clashes with the newly generated data.

If its reproducible every time, please provide exact steps that you've taken so I can use that to reproduce on my end (see https://rancher.com/docs/rancher/v2.x/en/contributing/#bugs-issues-or-questions. Adding a etcd + controlplane node in a custom cluster (1.13 + flannel) does not reproduce.

superxor · 2019-07-24T09:04:02Z

@superseb I suspect the issue lies with the rancher-master. The etcd node is a brand new fresh VM with fresh hostname (because of the famous hostname issue...when is that ever going to be fixed?).

I tried this 4-5 times now, making sure its always a new VM with new disk and new different hostname

steps:

Create new cluster
spawn new VM/node
run docker command...
see etcd pod error about tls cert failure

Ubuntu 18.04 LTS
docker version
Client:
Version: 18.09.7
API version: 1.39
Go version: go1.10.8
Git commit: 2d0083d
Built: Thu Jun 27 17:56:23 2019
OS/Arch: linux/amd64
Experimental: false

superseb · 2019-07-24T09:09:10Z

When did it start to happen (as it doesnt happen on a fresh 2.2.4 Rancher)? What is in the log of the rancher container (possible leads to why its failing)

superxor · 2019-07-24T09:14:18Z

Since the upgrade to 2.2.4.

I dont see anything on the rancher master, nor in the other pods

2019/07/24 09:12:29 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.60:44214: i/o timeout
2019/07/24 09:12:29 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.72:49762: i/o timeout
2019/07/24 09:12:30 [INFO] Handling backend connection request [c-q89w5:m-4c0602df0d65]
2019/07/24 09:12:30 [INFO] Handling backend connection request [c-q89w5:m-e5f8240f1893]
2019/07/24 09:12:30 [INFO] Handling backend connection request [c-5q2fz:m-623b23917cdf]
2019/07/24 09:12:30 [INFO] Handling backend connection request [c-5q2fz:m-f48ee2daad85]
2019/07/24 09:12:32 [INFO] Handling backend connection request [c-q89w5:m-b0676695ae93]
2019/07/24 09:12:32 [INFO] Handling backend connection request [c-5q2fz:m-25bd4ee7511c]
2019/07/24 09:12:33 [INFO] Handling backend connection request [c-5q2fz:m-a885899d4c85]
2019/07/24 09:12:34 [INFO] Handling backend connection request [c-5q2fz:m-2b5b7779d339]
2019/07/24 09:12:36 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.26:39580: i/o timeout
2019/07/24 09:12:36 [INFO] Handling backend connection request [c-5q2fz:m-e81840c53f86]

same time on the etcd pod:

2019-07-24 09:12:22.726490 W | auth: simple token is not cryptographically signed
2019-07-24 09:12:22.730662 I | etcdserver: starting server... [version: 3.2.24, cluster version: to_be_decided]
2019-07-24 09:12:22.731466 I | etcdserver/membership: added member fd1bb75040a79e2d [https://10.3.1.20:2380] to cluster 3bf383ae9c8f9b68
2019-07-24 09:12:22.731580 N | etcdserver/membership: set the initial cluster version to 3.2
2019-07-24 09:12:22.731638 I | etcdserver/api: enabled capabilities for version 3.2
2019-07-24 09:12:22.732407 I | embed: ClientTLS: cert = /etc/kubernetes/ssl/kube-etcd-10-3-1-20.pem, key = /etc/kubernetes/ssl/kube-etcd-10-3-1-20-key.pem, ca = , trusted-ca = /etc/kubernetes/ssl/kube-ca.pem, client-cert-auth = true
2019-07-24 09:12:30.221083 I | raft: fd1bb75040a79e2d is starting a new election at term 2
2019-07-24 09:12:30.221215 I | raft: fd1bb75040a79e2d became candidate at term 3
2019-07-24 09:12:30.221258 I | raft: fd1bb75040a79e2d received MsgVoteResp from fd1bb75040a79e2d at term 3
2019-07-24 09:12:30.221306 I | raft: fd1bb75040a79e2d became leader at term 3
2019-07-24 09:12:30.221323 I | raft: raft.node: fd1bb75040a79e2d elected leader fd1bb75040a79e2d at term 3
2019-07-24 09:12:30.221963 I | etcdserver: published {Name:etcd-slamdev-master-t13-20 ClientURLs:[https://10.3.1.20:2379 https://10.3.1.20:4001]} to cluster 3bf383ae9c8f9b68
2019-07-24 09:12:30.222173 I | embed: ready to serve client requests
2019-07-24 09:12:30.223095 I | embed: serving client requests on [::]:2379
WARNING: 2019/07/24 09:12:30 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

superxor · 2019-07-24T11:30:52Z

@superseb Where are the certs coming from? I assume the issue is the certs provided by the rancher master? Maybe because we started out with 2.1 or something and upgrades did not fix some issue with them. Would explain why a fresh 2.4 setup does not have this problem.

superseb · 2019-07-24T11:36:29Z

When you create a new cluster and add the node, rancher starts provisioning the cluster. Logging from this would be helpful as it will show what part of the provisioning process succeeds and where it fails.

Also agent logs would help as I see connections to Rancher dropping (from the new node in the new cluster and possibly from other nodes in other clusters)

superxor · 2019-07-24T11:49:24Z

rancher master logs during cluster creation and node adding (I removed a few i/o timeout and warning lines that are unrelated, was too much log spam)

2019/07/24 11:45:32 [INFO] Provisioning cluster [c-8gf44]
2019/07/24 11:45:32 [INFO] Creating cluster [c-8gf44]
2019/07/24 11:45:35 [INFO] Handling backend connection request [c-8gf44:m-91f491207258]
2019/07/24 11:45:35 [INFO] Handling backend connection request [c-5q2fz:m-3057e8ce2de9]
2019/07/24 11:45:36 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.86:37826: i/o timeout
2019/07/24 11:45:37 [INFO] Handling backend connection request [c-5q2fz:m-4b59508bd6b8]
2019/07/24 11:45:37 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:10800
2019/07/24 11:45:37 [INFO] cluster [c-8gf44] provisioning: Initiating Kubernetes cluster
2019/07/24 11:45:37 [INFO] cluster [c-8gf44] provisioning: [dialer] Setup tunnel for host [10.3.1.20]
2019/07/24 11:45:37 [INFO] cluster [c-8gf44] provisioning: [state] Pulling image [rancher/rke-tools:v0.1.28] on host [10.3.1.20]
2019/07/24 11:45:42 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.37:43978: i/o timeout
2019/07/24 11:45:42 [INFO] Handling backend connection request [c-5q2fz:m-daaf4c2d51eb]
2019/07/24 11:45:42 [INFO] cluster [c-8gf44] provisioning: [state] Successfully pulled image [rancher/rke-tools:v0.1.28] on host [10.3.1.20]
2019/07/24 11:45:43 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.63:41068: i/o timeout
2019/07/24 11:45:43 [INFO] Handling backend connection request [c-5q2fz:m-41c1a245fd95]
2019/07/24 11:45:44 [INFO] cluster [c-8gf44] provisioning: [state] Successfully started [cluster-state-deployer] container on host [10.3.1.20]
2019/07/24 11:45:44 [INFO] Handling backend connection request [c-5q2fz:m-1afe65447b6f]
2019/07/24 11:45:44 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.60:51806: i/o timeout
2019/07/24 11:45:44 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating CA kubernetes certificates
2019/07/24 11:45:44 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.42:37558: i/o timeout
2019/07/24 11:45:44 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
2019/07/24 11:45:44 [INFO] Handling backend connection request [c-5q2fz:m-30da5bbfa251]
2019/07/24 11:45:45 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating Kube Scheduler certificates
2019/07/24 11:45:45 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating Kube Proxy certificates
2019/07/24 11:45:45 [INFO] Handling backend connection request [c-5q2fz:m-5fa522286aa8]
2019/07/24 11:45:45 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating Node certificate
2019/07/24 11:45:45 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating Kubernetes API server certificates
2019/07/24 11:45:45 [INFO] Handling backend connection request [c-5q2fz:m-fa0b42cd127d]
2019/07/24 11:45:45 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.119:34756: i/o timeout
2019/07/24 11:45:46 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating admin certificates and kubeconfig
2019/07/24 11:45:46 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating Kubernetes API server proxy client certificates
2019/07/24 11:45:47 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.83:60424: i/o timeout
2019/07/24 11:45:47 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating etcd-10.3.1.20 certificate and key
2019/07/24 11:45:47 [INFO] cluster [c-8gf44] provisioning: [certificates] Generating Kube Controller certificates
2019/07/24 11:45:47 [INFO] Handling backend connection request [c-q89w5:m-ad044186556d]
2019/07/24 11:45:47 [INFO] cluster [c-8gf44] provisioning: Successfully Deployed state file at [management-state/rke/rke-119787230/cluster.rkestate]
2019/07/24 11:45:48 [INFO] cluster [c-8gf44] provisioning: Building Kubernetes cluster
2019/07/24 11:45:48 [INFO] cluster [c-8gf44] provisioning: [dialer] Setup tunnel for host [10.3.1.20]
2019/07/24 11:45:48 [INFO] cluster [c-8gf44] provisioning: [network] Deploying port listener containers
2019/07/24 11:45:48 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.62:43140: i/o timeout
2019/07/24 11:45:48 [INFO] cluster [c-8gf44] provisioning: [network] Successfully started [rke-etcd-port-listener] container on host [10.3.1.20]
2019/07/24 11:45:49 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.88:49832: i/o timeout
2019/07/24 11:45:49 [INFO] cluster [c-8gf44] provisioning: [network] Successfully started [rke-cp-port-listener] container on host [10.3.1.20]
2019/07/24 11:45:49 [INFO] cluster [c-8gf44] provisioning: [network] Port listener containers deployed successfully
2019/07/24 11:45:49 [INFO] cluster [c-8gf44] provisioning: [network] Running control plane -> etcd port checks
2019/07/24 11:45:50 [INFO] Handling backend connection request [c-5q2fz:m-132e9f7d86bc]
2019/07/24 11:45:50 [INFO] Handling backend connection request [c-q89w5:m-b0676695ae93]
2019/07/24 11:45:50 [INFO] Handling backend connection request [c-5q2fz:m-623b23917cdf]
2019/07/24 11:45:50 [INFO] cluster [c-8gf44] provisioning: [network] Successfully started [rke-port-checker] container on host [10.3.1.20]
2019/07/24 11:45:50 [INFO] cluster [c-8gf44] provisioning: [network] Running control plane -> worker port checks
2019/07/24 11:45:51 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.91:44942: i/o timeout
2019/07/24 11:45:51 [INFO] cluster [c-8gf44] provisioning: [network] Successfully started [rke-port-checker] container on host [10.3.1.20]
2019-07-24 11:45:51.343530 I | mvcc: store.index: compact 34577767
2019/07/24 11:45:51 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.38:35090: i/o timeout
2019/07/24 11:45:51 [INFO] cluster [c-8gf44] provisioning: [network] Running workers -> control plane port checks
2019-07-24 11:45:51.490473 I | mvcc: finished scheduled compaction at 34577767 (took 136.99168ms)
2019/07/24 11:45:51 [INFO] cluster [c-8gf44] provisioning: [network] Skipping kubeapi port check
2019/07/24 11:45:51 [INFO] cluster [c-8gf44] provisioning: [network] Removing port listener containers
2019/07/24 11:45:51 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.58:57148: i/o timeout
2019/07/24 11:45:51 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.24:37620: i/o timeout
2019/07/24 11:45:51 [INFO] cluster [c-8gf44] provisioning: [remove/rke-etcd-port-listener] Successfully removed container on host [10.3.1.20]
2019/07/24 11:45:52 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.98:57630: i/o timeout
2019/07/24 11:45:52 [INFO] cluster [c-8gf44] provisioning: [remove/rke-cp-port-listener] Successfully removed container on host [10.3.1.20]
2019/07/24 11:45:52 [INFO] cluster [c-8gf44] provisioning: [network] Port listener containers removed successfully
2019/07/24 11:45:52 [INFO] cluster [c-8gf44] provisioning: [certificates] Deploying kubernetes certificates to Cluster nodes

2019/07/24 11:45:58 [INFO] cluster [c-8gf44] provisioning: [reconcile] Rebuilding and updating local kube config
I0724 11:45:58.076641 7 http.go:110] HTTP2 has been explicitly disabled
2019/07/24 11:45:58 [INFO] cluster [c-8gf44] provisioning: Successfully Deployed local admin kubeconfig at [management-state/rke/rke-119787230/kube_config_cluster.yml]
2019/07/24 11:45:58 [INFO] Handling backend connection request [c-5q2fz:m-471f71f8e09b]
2019/07/24 11:45:58 [INFO] cluster [c-8gf44] provisioning: [certificates] Successfully deployed kubernetes certificates to Cluster nodes
2019/07/24 11:45:58 [INFO] cluster [c-8gf44] provisioning: [file-deploy] Deploying file [/etc/kubernetes/kube-api-authn-webhook.yaml] to node [10.3.1.20]
2019/07/24 11:45:59 [INFO] cluster [c-8gf44] provisioning: Successfully started [file-deployer] container on host [10.3.1.20]
2019/07/24 11:45:59 [INFO] cluster [c-8gf44] provisioning: Waiting for [file-deployer] container to exit on host [10.3.1.20]
2019/07/24 11:45:59 [INFO] cluster [c-8gf44] provisioning: Waiting for [file-deployer] container to exit on host [10.3.1.20]
2019/07/24 11:45:59 [INFO] cluster [c-8gf44] provisioning: Container [file-deployer] is still running on host [10.3.1.20]
2019/07/24 11:45:59 [INFO] Handling backend connection request [c-q89w5:m-4c0602df0d65]
2019/07/24 11:46:00 [INFO] cluster [c-8gf44] provisioning: Waiting for [file-deployer] container to exit on host [10.3.1.20]
2019/07/24 11:46:00 [INFO] cluster [c-8gf44] provisioning: [remove/file-deployer] Successfully removed container on host [10.3.1.20]
2019/07/24 11:46:00 [INFO] cluster [c-8gf44] provisioning: [/etc/kubernetes/kube-api-authn-webhook.yaml] Successfully deployed authentication webhook config Cluster nodes
2019/07/24 11:46:00 [INFO] cluster [c-8gf44] provisioning: [reconcile] Reconciling cluster state
2019/07/24 11:46:00 [INFO] cluster [c-8gf44] provisioning: [reconcile] This is newly generated cluster
2019/07/24 11:46:00 [INFO] cluster [c-8gf44] provisioning: Pre-pulling kubernetes images
2019/07/24 11:46:00 [INFO] cluster [c-8gf44] provisioning: [pre-deploy] Pulling image [rancher/hyperkube:v1.13.5-rancher1] on host [10.3.1.20]

2019/07/24 11:46:20 [INFO] cluster [c-8gf44] provisioning: [pre-deploy] Successfully pulled image [rancher/hyperkube:v1.13.5-rancher1] on host [10.3.1.20]
2019/07/24 11:46:20 [INFO] cluster [c-8gf44] provisioning: Kubernetes images pulled successfully
2019/07/24 11:46:20 [INFO] cluster [c-8gf44] provisioning: [etcd] Building up etcd plane..
2019/07/24 11:46:20 [INFO] cluster [c-8gf44] provisioning: [etcd] Pulling image [rancher/coreos-etcd:v3.2.24-rancher1] on host [10.3.1.20]
2019/07/24 11:46:20 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.122:41248: i/o timeout
2019/07/24 11:46:20 [INFO] Handling backend connection request [c-5q2fz:m-f48ee2daad85]
2019/07/24 11:46:23 [INFO] cluster [c-8gf44] provisioning: [etcd] Successfully pulled image [rancher/coreos-etcd:v3.2.24-rancher1] on host [10.3.1.20]
2019/07/24 11:46:24 [INFO] Handling backend connection request [c-5q2fz:m-471f71f8e09b]
2019/07/24 11:46:25 [INFO] Handling backend connection request [c-5q2fz:m-132e9f7d86bc]
2019/07/24 11:46:25 [INFO] Handling backend connection request [c-q89w5:m-4c0602df0d65]
2019/07/24 11:46:26 [INFO] cluster [c-8gf44] provisioning: [etcd] Successfully started [etcd] container on host [10.3.1.20]
2019/07/24 11:46:26 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.124:57546: i/o timeout
2019/07/24 11:46:26 [INFO] cluster [c-8gf44] provisioning: [etcd] Successfully started [rke-log-linker] container on host [10.3.1.20]
2019/07/24 11:46:26 [INFO] Handling backend connection request [c-q89w5:m-10f33b2daa15]
2019/07/24 11:46:26 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.43:56516: i/o timeout
2019/07/24 11:46:27 [INFO] cluster [c-8gf44] provisioning: [remove/rke-log-linker] Successfully removed container on host [10.3.1.20]
2019/07/24 11:46:27 [INFO] cluster [c-8gf44] provisioning: [etcd] Successfully started etcd plane.. Checking etcd cluster health

2019/07/24 11:46:40 [INFO] error in remotedialer server [400]: read tcp 172.17.0.3:443->10.3.1.119:37006: i/o timeout
2019/07/24 11:46:41 [INFO] Handling backend connection request [c-5q2fz:m-623b23917cdf]
2019/07/24 11:46:42 [ERROR] cluster [c-8gf44] provisioning: [etcd] Failed to bring up Etcd Plane: [etcd] Etcd Cluster is not healthy
2019/07/24 11:46:42 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/07/24 11:46:42 [ERROR] ClusterController c-8gf44 [cluster-provisioner-controller] failed with : [etcd] Failed to bring up Etcd Plane: [etcd] Etcd Cluster is not healthy

Agent pods only show this:

INFO: Using resolv.conf: nameserver 10.3.3.254
INFO: https://rxxxxx/ping is accessible
INFO: rxxxx resolves to 10.3.1.1
time="2019-07-24T11:45:34Z" level=info msg="Rancher agent version v2.2.4 is starting"
time="2019-07-24T11:45:34Z" level=info msg="Option customConfig=map[address:10.3.1.20 internalAddress: roles:[] label:map[]]"
time="2019-07-24T11:45:34Z" level=info msg="Option etcd=false"
time="2019-07-24T11:45:34Z" level=info msg="Option controlPlane=false"
time="2019-07-24T11:45:34Z" level=info msg="Option worker=false"
time="2019-07-24T11:45:34Z" level=info msg="Option requestedHostname=slamdev-master-t14-20"
time="2019-07-24T11:45:34Z" level=info msg="Listening on /tmp/log.sock"
time="2019-07-24T11:45:34Z" level=info msg="Connecting to wss://rancher.adnz.co/v3/connect with token h9vt6wmf6kwv6s7m9qm257gwx8bv5wv5b8zq5qfs984mvmtk976lhw"
time="2019-07-24T11:45:34Z" level=info msg="Connecting to proxy" url="wss://rxxxxxxo/v3/connect"
time="2019-07-24T11:45:34Z" level=info msg="Starting plan monitor"

superxor · 2019-07-25T09:13:19Z

@superseb same issue on 2.2.6 master and 2.2.6 agent. This is 100% related to rancher upgrades from 2.1 or 2.2

2019-07-25 09:11:20.258357 N | etcdserver/membership: set the initial cluster version to 3.2
2019-07-25 09:11:20.258443 I | etcdserver/api: enabled capabilities for version 3.2
WARNING: 2019/07/25 09:11:20 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

superseb · 2019-07-25T12:16:22Z

Can you share the output of:

openssl x509 -in /etc/kubernetes/ssl/kube-etcd-10-3-1-20.pem -text -noout
openssl x509 -in /etc/kubernetes/ssl/kube-ca.pem -text -noout
ls -la /etc/kubernetes/ssl

superxor · 2019-07-25T12:51:00Z

Certificate:
Data:
Version: 3 (0x2)
Serial Number: 2119889748694189234 (0x1d6b5c7ea9e8c4b2)
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = kube-ca
Validity
Not Before: Jul 24 11:52:40 2019 GMT
Not After : Jul 22 09:10:43 2029 GMT
Subject: CN = kube-etcd
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
Modulus:
00:b2:3b:92:bc:f8:16:de:b6:fd:2c:c0:7d:ef:6d:
1a:94:e8:39:77:f8:0f:25:c1:bc:54:a8:5e:e8:7c:
e6:dd:d3:8f:ba:e8:bd:f8:cf:2e:f3:b9:5d:86:d1:
c6:98:4d:d1:27:d3:50:94:ec:0f:9b:1c:0e:e1:06:
d0:3a:5b:92:7c:73:60:2d:78:c6:9c:0f:4d:79:50:
7b:28:be:07:eb:77:a5:dc:f4:86:6f:b8:88:50:4b:
c5:be:66:fe:5b:2a:cd:58:79:ae:6b:31:e4:08:6c:
f7:95:03:53:50:e5:e2:2c:81:5d:66:1d:ca:8a:fb:
95:eb:27:3d:ae:82:15:a9:ba:b9:4e:36:0e:1a:c2:
6f:c3:d2:3d:65:00:c3:ec:b6:16:4f:35:27:1b:7f:
65:19:24:86:83:99:1b:a3:fd:cc:b8:30:3d:36:9a:
f3:52:d1:c5:ad:b9:42:e9:53:d2:bb:07:3d:f9:75:
02:80:85:a5:74:fe:31:f4:72:cc:37:e4:f7:5d:e6:
b1:0b:a5:98:fd:00:4e:ce:ac:14:aa:b2:92:bf:79:
b6:f3:90:b5:c2:46:f8:8e:c8:8e:cc:23:2b:76:43:
69:0b:19:5c:e4:86:27:00:6b:f5:da:b4:cd:c4:49:
f8:0e:59:74:6d:42:e3:e8:6e:34:35:e1:64:c1:f3:
bc:b3
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Client Authentication, TLS Web Server Authentication
X509v3 Subject Alternative Name:
DNS:slamdev-master-t17-20, DNS:localhost, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:10.3.1.20, IP Address:127.0.0.1, IP Address:10.43.0.1
Signature Algorithm: sha256WithRSAEncryption
26:47:d8:ae:1c:89:41:cb:2f:72:c5:c4:24:91:8a:ad:26:18:
39:56:fe:5b:18:54:92:83:3c:54:30:a8:18:1b:ef:bc:22:74:
4b:38:79:e7:d9:26:18:01:4a:2a:0c:77:d3:ac:4b:5b:e9:7b:
93:f6:69:43:c2:48:d2:15:60:13:0d:31:b1:5d:3c:c4:c5:08:
45:d5:89:b7:e2:46:9a:ef:01:3a:42:96:32:aa:21:6f:4b:d7:
13:69:5a:dc:12:42:9c:77:eb:b6:0a:47:7f:03:49:12:c9:32:
e7:81:85:54:8a:8d:04:6d:bf:c0:9f:6a:e0:ba:7d:0b:33:dd:
fe:45:0c:3e:fc:6a:4c:4b:4e:44:49:43:b7:e6:cc:12:f6:af:
94:d2:6a:cb:d9:d5:26:4e:a3:be:3f:5e:45:cc:ea:17:54:94:
5c:4a:f7:d8:2b:15:51:1a:9c:9e:ac:8d:fc:30:1e:e2:98:f9:
00:cf:f3:15:3a:7e:be:6c:1a:14:1f:be:a4:a2:9d:d8:09:d0:
49:10:b0:9a:46:4b:ae:da:59:63:3d:52:56:47:9f:d2:4d:62:
e5:af:27:59:e8:52:64:24:fd:dc:6b:58:23:70:a5:46:a0:8d:
bb:db:65:2b:49:58:df:85:35:4f:c9:f5:51:06:bf:70:09:b3:
81:a7:72:12

the ca.pem

Certificate:
Data:
Version: 3 (0x2)
Serial Number: 0 (0x0)
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = kube-ca
Validity
Not Before: Jan 13 13:00:00 2019 GMT
Not After : Jan 10 13:00:00 2029 GMT
Subject: CN = kube-ca
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
Modulus:
00:eb:d1:3e:ef:3c:21:60:25:02:47:46:d6:bd:41:
44:93:da:3e:b5:1c:fc:d0:c5:d8:f4:fb:05:05:c9:
7b:95:9c:93:ea:4f:5d:a0:e9:61:58:b1:49:fb:09:
d7:0b:1c:84:4f:cc:e1:ab:96:dc:76:bb:12:3f:5a:
e8:d3:46:c6:95:a2:c9:42:2c:a9:37:b2:1d:44:79:
66:f2:73:ce:6b:0a:23:f1:0b:da:85:72:69:bd:23:
8d:f7:a9:17:bc:cd:ee:81:2f:44:be:20:ed:67:6f:
1c:86:5a:e4:8a:96:be:c1:9c:ed:76:2d:88:3c:e3:
3c:a4:67:96:4f:9a:e5:93:a0:f6:89:a3:ee:d6:79:
5f:f7:28:95:b2:29:93:fe:e3:35:8e:ed:54:49:a7:
20:da:66:89:16:ac:a6:02:d5:83:75:9f:6b:7e:11:
0d:d4:3e:0b:f6:2b:ff:1d:64:2f:a6:a3:44:50:77:
eb:24:1f:ff:e9:63:34:ac:77:d8:e2:f6:c5:c7:15:
c2:0f:32:9f:0a:f2:df:5f:87:58:4e:a1:90:10:35:
a4:fd:76:98:49:e6:f7:f9:e5:bf:a8:0c:32:38:cb:
4c:08:0c:7c:fa:2b:50:19:22:cc:b6:96:3b:dc:37:
70:a4:92:2b:a9:93:fc:06:b0:9d:88:60:17:2a:76:
76:db
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Certificate Sign
X509v3 Basic Constraints: critical
CA:TRUE
Signature Algorithm: sha256WithRSAEncryption
92:56:d0:0b:5d:29:81:94:fe:b0:31:8f:45:84:e3:2c:d0:6c:
82:34:f0:67:ad:98:9a:08:8a:da:66:67:b6:24:07:c0:6e:13:
e8:bf:1f:32:ab:25:39:0a:65:c9:a8:4b:2a:bd:d9:6a:01:79:
96:fa:98:33:f0:ba:4b:b4:64:53:ff:24:98:31:e1:99:71:51:
9c:9a:35:25:80:fb:59:01:05:ce:dd:bc:b8:b7:cf:6c:f4:f7:
43:13:b1:13:92:16:b8:c0:9a:c2:ad:62:01:21:bd:9b:81:44:
2d:91:ee:8f:dc:a6:b0:9a:d3:5d:47:e2:d9:ba:00:28:18:bc:
2b:c0:d3:08:11:4c:c4:d5:49:6a:26:25:ca:e3:3f:bc:17:a4:
67:a7:26:64:43:98:e9:d4:71:27:de:77:75:00:f7:46:6c:87:
9e:c3:67:ad:4f:f0:88:1a:a4:91:56:ad:ca:7e:fe:1c:44:fb:
42:7e:cb:21:22:04:0d:fd:08:28:cc:29:9c:66:0e:21:4f:55:
53:93:a8:cf:f8:02:cf:67:ed:0e:2b:71:59:98:b6:c9:b1:28:
d1:37:50:42:a0:41:16:99:85:47:15:d5:10:49:a5:8e:b1:34:
9c:98:85:9b:b2:8b:4a:da:c8:77:18:45:c7:da:a8:77:6f:c4:
fd:ba:eb:ab

root@slamdev-master-t17-20:~# ls -la /etc/kubernetes/ssl
total 112
drwx------ 2 root root 4096 Jul 25 09:10 .
drwxr-xr-x 3 root root 4096 Jul 25 12:41 ..
-rw-r--r-- 1 root root 1679 Jul 25 09:10 kube-apiserver-key.pem
-rw-r--r-- 1 root root 1675 Jul 25 09:10 kube-apiserver-proxy-client-key.pem
-rw-r--r-- 1 root root 1107 Jul 25 09:10 kube-apiserver-proxy-client.pem
-rw-r--r-- 1 root root 1679 Jul 25 09:10 kube-apiserver-requestheader-ca-key.pem
-rw-r--r-- 1 root root 1082 Jul 25 09:10 kube-apiserver-requestheader-ca.pem
-rw-r--r-- 1 root root 1273 Jul 25 09:10 kube-apiserver.pem
-rw-r--r-- 1 root root 1679 Jul 25 09:10 kube-ca-key.pem
-rw------- 1 root root 1017 May 31 04:36 kube-ca.pem
-rw-r--r-- 1 root root 1675 Jul 25 09:10 kube-controller-manager-key.pem
-rw-r--r-- 1 root root 1062 Jul 25 09:10 kube-controller-manager.pem
-rw-r--r-- 1 root root 1675 Jul 25 09:10 kube-etcd-10-3-1-20-key.pem
-rw-r--r-- 1 root root 1269 Jul 25 09:10 kube-etcd-10-3-1-20.pem
-rw------- 1 root root 1675 May 31 04:36 kube-node-key.pem
-rw------- 1 root root 1070 May 31 04:36 kube-node.pem
-rw------- 1 root root 1679 May 31 04:36 kube-proxy-key.pem
-rw------- 1 root root 1046 May 31 04:36 kube-proxy.pem
-rw-r--r-- 1 root root 1675 Jul 25 09:10 kube-scheduler-key.pem
-rw-r--r-- 1 root root 1050 Jul 25 09:10 kube-scheduler.pem
-rw-r--r-- 1 root root 1675 Jul 25 09:10 kube-service-account-token-key.pem
-rw-r--r-- 1 root root 1058 Jul 25 09:10 kube-service-account-token.pem
-rw-r--r-- 1 root root 517 Jul 25 09:10 kubecfg-kube-apiserver-proxy-client.yaml
-rw-r--r-- 1 root root 533 Jul 25 09:10 kubecfg-kube-apiserver-requestheader-ca.yaml
-rw-r--r-- 1 root root 501 Jul 25 09:10 kubecfg-kube-controller-manager.yaml
-rw------- 1 root root 444 May 31 04:36 kubecfg-kube-node.yaml
-rw------- 1 root root 448 May 31 04:36 kubecfg-kube-proxy.yaml
-rw-r--r-- 1 root root 465 Jul 25 09:10 kubecfg-kube-scheduler.yaml

superseb · 2019-07-26T11:37:24Z

The timestamps of the certificates don't match, what is the output from ls -la /etc/kubernetes/ssl before you run the docker run command? This explains why you get the etcd error at least, are you saying that this node has nothing in /etc/kubernetes/ssl and then get all those files with different modification times placed on the node?

superxor · 2019-07-26T12:14:10Z

@superseb You're right, our base image was polluted for unknown reasons. I forgot to check that.
Mea culpa

Robbie-Cook · 2019-12-07T09:49:16Z

Deleting my /etc/kubernetes directory and running the docker run command again worked for me

norcis · 2020-01-18T23:42:46Z

/etc/kubernetes

Thank you! I spent all day while found it!

I believe Rancher team could improve provisioning script to detect if these files already persists and delete/warn about them.

Chrislevi · 2020-07-19T22:01:42Z

rke cert rotate Did the job for me.

MoulderUtes · 2020-11-13T22:17:35Z

Clearing /etc/kubernets worked for me thanks.

tinohager · 2020-11-16T15:18:41Z

Clearing /opt/rke/etc/kubernetes/ssl -f -r worked for me

travis-south · 2021-05-18T11:51:35Z

I've encountered this issue on a fresh VM install using v2.5.8. I checked the ssl and everything was correct and updated. I did cleanup all folders and followed the steps here to clean the nodes and still no luck.

The only thing that fixed it was to turn-off Authorized Cluster Endpoint in the cluster config. I'm a newbie in k8s and I still don't know why it fixed it.

superxor changed the title ~~master not registering on new cluster v2.2.4~~ [2.2.4] master not registering on new clusters Jul 13, 2019

superxor changed the title ~~[2.2.4] master not registering on new clusters~~ [2.2.4] etcd/controlplane master not registering on new clusters Jul 24, 2019

superseb added the status/more-info label Jul 24, 2019

superxor closed this as completed Jul 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2.2.4] etcd/controlplane master not registering on new clusters #21514

[2.2.4] etcd/controlplane master not registering on new clusters #21514

superxor commented Jul 13, 2019

slamdev commented Jul 13, 2019

superxor commented Jul 24, 2019

superxor commented Jul 24, 2019

superseb commented Jul 24, 2019

superxor commented Jul 24, 2019 •

edited

superseb commented Jul 24, 2019

superxor commented Jul 24, 2019 •

edited

superxor commented Jul 24, 2019

superseb commented Jul 24, 2019

superxor commented Jul 24, 2019 •

edited

superxor commented Jul 25, 2019

superseb commented Jul 25, 2019

superxor commented Jul 25, 2019

superseb commented Jul 26, 2019

superxor commented Jul 26, 2019

Robbie-Cook commented Dec 7, 2019

norcis commented Jan 18, 2020

Chrislevi commented Jul 19, 2020

MoulderUtes commented Nov 13, 2020

tinohager commented Nov 16, 2020

travis-south commented May 18, 2021

[2.2.4] etcd/controlplane master not registering on new clusters #21514

[2.2.4] etcd/controlplane master not registering on new clusters #21514

Comments

superxor commented Jul 13, 2019

slamdev commented Jul 13, 2019

superxor commented Jul 24, 2019

superxor commented Jul 24, 2019

superseb commented Jul 24, 2019

superxor commented Jul 24, 2019 • edited

superseb commented Jul 24, 2019

superxor commented Jul 24, 2019 • edited

superxor commented Jul 24, 2019

superseb commented Jul 24, 2019

superxor commented Jul 24, 2019 • edited

superxor commented Jul 25, 2019

superseb commented Jul 25, 2019

superxor commented Jul 25, 2019

superseb commented Jul 26, 2019

superxor commented Jul 26, 2019

Robbie-Cook commented Dec 7, 2019

norcis commented Jan 18, 2020

Chrislevi commented Jul 19, 2020

MoulderUtes commented Nov 13, 2020

tinohager commented Nov 16, 2020

travis-south commented May 18, 2021

superxor commented Jul 24, 2019 •

edited

superxor commented Jul 24, 2019 •

edited

superxor commented Jul 24, 2019 •

edited