Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't provision clusters in Rancher single node docker install with bring your own valid certs #28605

Closed
izaac opened this issue Aug 27, 2020 · 22 comments
Assignees
Labels
area/cluster kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement
Milestone

Comments

@izaac
Copy link
Contributor

izaac commented Aug 27, 2020

What kind of request is this:
bug

Steps to reproduce:

  • Perform a byo-valid certs Rancher docker install (Option C)
  • After installation provision a single node DO RKE cluster.
  • Or any other cluster even importing clusters fail

Result:

Node gets created but the kubernetes cluster won't get active
Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]

Other details that may be helpful:

2020/08/27 21:43:38 [INFO] Provisioning node izaac-do-error1 done
2020/08/27 21:43:38 [INFO] Generating and uploading node config izaac-do-error1
2020/08/27 21:43:38 [INFO] Creating jail for c-xb447
2020/08/27 21:43:38 [INFO] Provisioning cluster [c-xb447]
2020/08/27 21:43:38 [INFO] Creating cluster [c-xb447]
2020/08/27 21:43:38 [INFO] Generating and uploading node config
2020/08/27 21:43:43 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:36113
2020/08/27 21:43:43 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 21:43:43 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 21:43:43 [INFO] [certificates] Generating CA kubernetes certificates
2020/08/27 21:43:43 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 21:43:43 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 21:43:43 [ERROR] cluster [c-xb447] provisioning: [state] can't fetch legacy cluster state from Kubernetes: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 21:43:44 [INFO] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
2020/08/27 21:43:44 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 21:43:44 [INFO] [certificates] Generating Kubernetes API server certificates
2020/08/27 21:43:44 [INFO] [certificates] Generating Service account token key
2020/08/27 21:43:44 [INFO] [certificates] Generating Kube Controller certificates
2020/08/27 21:43:45 [INFO] [certificates] Generating Kube Scheduler certificates
2020/08/27 21:43:45 [INFO] [certificates] Generating Kube Proxy certificates
2020/08/27 21:43:45 [INFO] [certificates] Generating Node certificate
2020/08/27 21:43:45 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 21:43:46 [INFO] [certificates] Generating Kubernetes API server proxy client certificates
2020/08/27 21:43:46 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-224419388/cluster.rkestate]
2020/08/27 21:43:46 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 21:43:46 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config
2020/08/27 21:43:46 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 21:43:46 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config, requeuing
2020/08/27 21:44:16 [INFO] Provisioning cluster [c-xb447]
2020/08/27 21:44:16 [INFO] Creating cluster [c-xb447]
2020/08/27 21:44:21 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:36629
2020/08/27 21:44:21 [ERROR] Cluster c-xb447 previously failed to create
2020/08/27 21:44:21 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 21:44:21 [INFO] [certificates] Generating Kubernetes API server certificates
2020/08/27 21:44:21 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 21:44:21 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 21:44:21 [INFO] [certificates] Generating kube-etcd-164-90-151-38 certificate and key
2020/08/27 21:44:22 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-584146539/cluster.rkestate]
2020/08/27 21:44:22 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 21:44:22 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 21:44:22 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 21:44:22 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 21:44:22 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 21:44:22 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 21:44:22 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing
2020/08/27 21:45:22 [INFO] Provisioning cluster [c-xb447]
2020/08/27 21:45:22 [INFO] Creating cluster [c-xb447]
2020/08/27 21:45:27 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:41501
2020/08/27 21:45:27 [ERROR] Cluster c-xb447 previously failed to create
2020/08/27 21:45:27 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 21:45:27 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 21:45:27 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 21:45:27 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-310026958/cluster.rkestate]
2020/08/27 21:45:27 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 21:45:27 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 21:45:27 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 21:45:27 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 21:45:27 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 21:45:27 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 21:45:27 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing
2020-08-27 21:45:54.187620 I | mvcc: store.index: compact 2842
2020-08-27 21:45:54.220104 I | mvcc: finished scheduled compaction at 2842 (took 31.26481ms)
E0827 21:46:00.195739      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
E0827 21:46:34.958150      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
2020/08/27 21:47:27 [INFO] Provisioning cluster [c-xb447]
2020/08/27 21:47:27 [INFO] Creating cluster [c-xb447]
2020/08/27 21:47:32 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:43619
2020/08/27 21:47:32 [ERROR] Cluster c-xb447 previously failed to create
2020/08/27 21:47:32 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 21:47:32 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 21:47:32 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 21:47:32 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-020587477/cluster.rkestate]
2020/08/27 21:47:32 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 21:47:32 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 21:47:32 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 21:47:32 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 21:47:32 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 21:47:32 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 21:47:32 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing
2020-08-27 21:47:57.128777 I | http: TLS handshake error from 127.0.0.1:59720: EOF
2020-08-27 21:48:56.449297 I | http: TLS handshake error from 127.0.0.1:59732: EOF
2020-08-27 21:50:54.193837 I | mvcc: store.index: compact 4115
2020-08-27 21:50:54.219823 I | mvcc: finished scheduled compaction at 4115 (took 24.773859ms)
E0827 21:51:08.061851      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
2020/08/27 21:51:32 [INFO] Provisioning cluster [c-xb447]
2020/08/27 21:51:32 [INFO] Creating cluster [c-xb447]
2020/08/27 21:51:37 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:39023
2020/08/27 21:51:37 [ERROR] Cluster c-xb447 previously failed to create
2020/08/27 21:51:37 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 21:51:37 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 21:51:37 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 21:51:37 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-883038000/cluster.rkestate]
2020/08/27 21:51:37 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 21:51:37 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 21:51:37 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 21:51:37 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 21:51:37 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 21:51:37 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 21:51:37 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing
2020-08-27 21:55:54.197389 I | mvcc: store.index: compact 5352
2020-08-27 21:55:54.222819 I | mvcc: finished scheduled compaction at 5352 (took 24.453425ms)
E0827 21:55:56.389234      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
2020-08-27 21:56:05.166952 I | http: TLS handshake error from 127.0.0.1:59880: EOF
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
2020-08-27 21:56:33.134532 W | etcdserver: read-only range request "key:\"/registry/leases/kube-system/kube-scheduler\" " with result "range_response_count:1 size:286" took too long (442.0573ms) to execute
2020-08-27 21:56:33.134926 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:453" took too long (348.318345ms) to execute
E0827 21:58:43.598711      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
2020/08/27 21:59:37 [INFO] Provisioning cluster [c-xb447]
2020/08/27 21:59:37 [INFO] Creating cluster [c-xb447]
2020/08/27 21:59:42 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:41783
2020/08/27 21:59:42 [ERROR] Cluster c-xb447 previously failed to create
2020/08/27 21:59:42 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 21:59:42 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 21:59:42 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 21:59:42 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-621949135/cluster.rkestate]
2020/08/27 21:59:42 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 21:59:42 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 21:59:42 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 21:59:42 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 21:59:42 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 21:59:42 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 21:59:43 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing
2020-08-27 22:00:54.200806 I | mvcc: store.index: compact 6589
2020-08-27 22:00:54.228681 I | mvcc: finished scheduled compaction at 6589 (took 26.207635ms)
E0827 22:00:55.124402      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
2020-08-27 22:05:54.205136 I | mvcc: store.index: compact 7826
2020-08-27 22:05:54.232815 I | mvcc: finished scheduled compaction at 7826 (took 23.74734ms)
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
E0827 22:07:35.829778      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
2020-08-27 22:07:45.037780 I | http: TLS handshake error from 127.0.0.1:60242: EOF
2020/08/27 22:09:43 [INFO] Provisioning cluster [c-xb447]
2020/08/27 22:09:43 [INFO] Creating cluster [c-xb447]
2020/08/27 22:09:48 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:44715
2020/08/27 22:09:48 [ERROR] Cluster c-xb447 previously failed to create
2020/08/27 22:09:48 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 22:09:48 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 22:09:48 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 22:09:48 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-239748066/cluster.rkestate]
2020/08/27 22:09:48 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 22:09:48 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 22:09:48 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 22:09:48 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 22:09:48 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 22:09:48 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 22:09:48 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing
2020-08-27 22:10:54.212085 I | mvcc: store.index: compact 9053
2020-08-27 22:10:54.237335 I | mvcc: finished scheduled compaction at 9053 (took 24.111214ms)
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
2020-08-27 22:12:19.560694 I | http: TLS handshake error from 127.0.0.1:60334: EOF
E0827 22:12:37.531916      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
2020-08-27 22:15:54.215923 I | mvcc: store.index: compact 10289
2020-08-27 22:15:54.247905 I | mvcc: finished scheduled compaction at 10289 (took 23.667538ms)
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
E0827 22:19:34.208910      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
2020/08/27 22:19:48 [INFO] Provisioning cluster [c-xb447]
2020/08/27 22:19:48 [INFO] Creating cluster [c-xb447]
2020/08/27 22:19:53 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:36495
2020/08/27 22:19:53 [ERROR] Cluster c-xb447 previously failed to create
2020/08/27 22:19:53 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/08/27 22:19:53 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/08/27 22:19:53 [INFO] cluster [c-xb447] provisioning: Initiating Kubernetes cluster
2020/08/27 22:19:53 [INFO] cluster [c-xb447] provisioning: Successfully Deployed state file at [management-state/rke/rke-820190169/cluster.rkestate]
2020/08/27 22:19:53 [INFO] cluster [c-xb447] provisioning: Building Kubernetes cluster
2020/08/27 22:19:53 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/08/27 22:19:53 [INFO] cluster [c-xb447] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/08/27 22:19:53 [ERROR] cluster [c-xb447] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-xb447:m-b79jq]
2020/08/27 22:19:53 [ERROR] cluster [c-xb447] provisioning: Removing host [x.x.x.x] from node lists
2020/08/27 22:19:53 [ERROR] cluster [c-xb447] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/08/27 22:19:53 [ERROR] error syncing 'c-xb447': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing
2020-08-27 22:20:54.227505 I | mvcc: store.index: compact 11516
2020-08-27 22:20:54.252853 I | mvcc: finished scheduled compaction at 11516 (took 24.044065ms)
From https://git.rancher.io/charts
 * branch            dev-v2.5   -> FETCH_HEAD
From https://git.rancher.io/partner-charts
 * branch            dev-v2.5   -> FETCH_HEAD
E0827 22:21:36.202796      25 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted
2020-08-27 22:22:31.305741 I | http: TLS handshake error from 127.0.0.1:60578: EOF
2020-08-27 22:22:31.305813 I | http: TLS handshake error from 127.0.0.1:60576: EOF
2020-08-27 22:25:54.231449 I | mvcc: store.index: compact 12754

Environment information

  • Rancher version: master-head (08/27/2020) 5e1b21b
  • Installation option: single

gzrancher/rancher#12756

@izaac izaac added area/cluster kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement alpha-priority/0 labels Aug 27, 2020
@izaac izaac added this to the v2.5 milestone Aug 27, 2020
@izaac izaac self-assigned this Aug 27, 2020
@cbron
Copy link
Contributor

cbron commented Aug 28, 2020

Main issue: #28279

@izaac
Copy link
Contributor Author

izaac commented Sep 8, 2020

Rancher version: master-head (09/08/2020) 76c26fb

I'm still experiencing this error even using --privileged

docker run -d --privileged --name=master --restart=unless-stopped -p 80:80 -p 443:443 -v ./cert.pem:/etc/rancher/ssl/cert.pem -v ./privkey.pem:/etc/rancher/ssl/key.pem rancher/rancher:master-head --no-cacerts

2020/09/08 17:02:02 [INFO] Provisioning cluster [c-hlr5j]
2020/09/08 17:02:02 [INFO] Creating cluster [c-hlr5j]
2020/09/08 17:02:07 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:33147
2020/09/08 17:02:07 [ERROR] Cluster c-hlr5j previously failed to create
2020/09/08 17:02:07 [INFO] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
2020/09/08 17:02:07 [INFO] [certificates] Generating admin certificates and kubeconfig
2020/09/08 17:02:07 [INFO] cluster [c-hlr5j] provisioning: Initiating Kubernetes cluster
2020/09/08 17:02:07 [INFO] kontainerdriver rancherkubernetesengine stopped
2020/09/08 17:02:07 [INFO] cluster [c-hlr5j] provisioning: Successfully Deployed state file at [management-state/rke/rke-088564443/cluster.rkestate]
2020/09/08 17:02:08 [INFO] cluster [c-hlr5j] provisioning: Building Kubernetes cluster
2020/09/08 17:02:08 [INFO] cluster [c-hlr5j] provisioning: [dialer] Setup tunnel for host [x.x.x.x]
2020/09/08 17:02:08 [ERROR] cluster [c-hlr5j] provisioning: Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-hlr5j:m-5szzd]
2020/09/08 17:02:08 [ERROR] cluster [c-hlr5j] provisioning: Removing host [x.x.x.x] from node lists
2020/09/08 17:02:08 [ERROR] cluster [c-hlr5j] provisioning: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x]
2020/09/08 17:02:08 [ERROR] error syncing 'c-hlr5j': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [x.x.x.x], requeuing

Screen Shot 2020-09-08 at 10 09 30 AM

@cbron
Copy link
Contributor

cbron commented Sep 8, 2020

There is a new fix going out for main issue that should resolve the above issue also. DNS wasn't working so that would explain why ssh tunnel failed.

@cbron
Copy link
Contributor

cbron commented Sep 8, 2020

Back in test, but may want to wait for main issue to resolve first.

@izaac
Copy link
Contributor Author

izaac commented Sep 9, 2020

Rancher version master-head (09/08/2020) 343b8e1

I see core-dns in the local cluster in the docker Rancher install

docker run -d --privileged --name=master --restart=unless-stopped -p 80:80 -p 443:443 -v ./cert.pem:/etc/rancher/ssl/cert.pem -v ./privkey.pem:/etc/rancher/ssl/key.pem rancher/rancher:master-head --no-cacerts

But I'm still experiencing the issue mentioned here #28605 (comment)

@slash1387
Copy link

slash1387 commented Oct 8, 2020

No chance to provision any combination of api and authorized cluster endpoint on/off via docker run commands. Always stuck on API not being ready. Kubeapi server logs always brings this log:
authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, square/go-jose: error in cryptographic primitive]
We are using single docker installation with self signed cert. Please fix this asap as it is not usable in smaller setups with docker installation :(
Rancher v2.5.1

@izaac
Copy link
Contributor Author

izaac commented Oct 9, 2020

Re-tested on Rancher version v2.5.1

Coverage: Rancher single node install then RKE DO provisioned cluster.

The only certificate install option that failed for me was Build Your Own with Valid Certs
With Letsencrypt and Build Your Own Self-Signed I am able to provision the RKE DO downstream Cluster.

Symptoms in the BYO-Valid:

[ERROR] error syncing 'c-drn8q': handler cluster-provisioner-controller: Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config, requeuing
Failed to set up SSH tunneling for host [x.x.x.x]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-drn8q:m-q28d7]

Kubernetes version used in Downstream cluster: 1.19.2

@izaac
Copy link
Contributor Author

izaac commented Oct 9, 2020

Rancher version upgrade from v2.4.8 to v2.5.1 single install. With Letsencrpyt option. I provisioned an RKE DO Downstream cluster before upgrade and Provisioned a new RKE DO after Upgrade.

I was not able to reproduce the issue reported here: #28605 (comment)
@rene-demonsters could you provide the following info of c-r5h93 ?

  • Cluster type (DO, Linode, GKE, etc)
  • Or was it the local cluster after upgrading to v2.5.x ?

But that upgrade issue looks pre-existing #29131

@izaac
Copy link
Contributor Author

izaac commented Oct 9, 2020

@superseb dug into this and and I was using the cert.pem certificate file instead of the fullchain.pem for the byo-valid installation option.

After a fresh Rancher v2.5.1 install using the fullchain.pem I was able to provision an RKE DO cluster.

Option C states this clearly in this section of the docs: https://rancher.com/docs/rancher/v2.x/en/installation/other-installation-methods/single-node-docker/#option-c-bring-your-own-certificate-signed-by-a-recognized-ca

@izaac
Copy link
Contributor Author

izaac commented Oct 9, 2020

Closing, all four certificates installation methods worked in Rancher v2.5.1

@izaac izaac closed this as completed Oct 9, 2020
@slash1387
Copy link

slash1387 commented Oct 9, 2020

@izaac whats so hard to reproduce?

  1. fresh install
docker run -d --restart=unless-stopped --name=rancher2.5.1 -p 80:80 -p 443:443 --privileged rancher/rancher:v2.5.1
  1. Global -> Add Cluster -> Existing Nodes -> Choose all 3 roles (etcd, worker, control-plane)
  2. exec docker run on 2 nodes, they register and start to setup...
  3. wait...it will stop with ERROR: Cluster health check failed: cluster agent is not ready
    image

not working!

@izaac
Copy link
Contributor Author

izaac commented Oct 9, 2020

@slash1387 this is a different error related to custom clusters #28836 If the cluster doesn't recover after some time, could you please open an issue and provide the logs ? Thanks for reporting this issue.

@izaac izaac reopened this Oct 9, 2020
@izaac izaac closed this as completed Oct 9, 2020
@hrvatskibogmars
Copy link

I tried v2.5.1 what I get is when trying to import cluster is this. (Importedv1.14.9-eks)

Error while applying agent YAML, it will be retried automatically: exit status 1, Unable to connect to the server: remote error: tls: internal error

I can inspect the cluster but the message is still present there.

--

@a14stoner
Copy link

For me it helped when upgrading rancher from v2.4.5 to v2.5.9. Then add a new node to the cluster and remove it afterwards.
Now everything works fine and the cluster is up again.

@Kritika2206
Copy link

I tried v2.5.1 what I get is when trying to import cluster is this. (Importedv1.14.9-eks)

Error while applying agent YAML, it will be retried automatically: exit status 1, Unable to connect to the server: remote error: tls: internal error

I can inspect the cluster but the message is still present there.

--

Hi [hrvatskibogmars],

Did you got any solution for this TLS issue while importing cluster.

@hrvatskibogmars
Copy link

I tried v2.5.1 what I get is when trying to import cluster is this. (Importedv1.14.9-eks)
Error while applying agent YAML, it will be retried automatically: exit status 1, Unable to connect to the server: remote error: tls: internal error
I can inspect the cluster but the message is still present there.

Hi [hrvatskibogmars],

Did you got any solution for this TLS issue while importing cluster.

No. had to provision a new Rancher cluster from scratch.

@Kritika2206
Copy link

I tried v2.5.1 what I get is when trying to import cluster is this. (Importedv1.14.9-eks)

Error while applying agent YAML, it will be retried automatically: exit status 1, Unable to connect to the server: remote error: tls: internal error
I can inspect the cluster but the message is still present there.

Hi [hrvatskibogmars],
Did you got any solution for this TLS issue while importing cluster.

No. had to provision a new Rancher cluster from scratch.

If i will provision new rancher also then to my rancher is running on secure domain which is using AWS ACM certificate, not passing certificate as a secret in ingress part (This is the scenario ) , then to getting same error.

Thanks for your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement
Projects
None yet
Development

No branches or pull requests