-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Cluster failing to start: Failed to list with EOF and net/http: TLS handshake timeout #2687
Copy link
Copy link
Closed
Labels
platform/baremetalIPI bare metal hosts platformIPI bare metal hosts platform
Description
Please any help would be really appreciated!!
- Openshift: 4.1.18
- Platform: UPI
- Loadbalancer: haproxy
- DNS: Dnsmasq (as attached)
- PXEBoot: Default (as attached)
Attached:
- Logbundle zip file attached containing logs from Bootstrap and master
log-bundle.tar.zip
Loadbalancer config file:
haproxy.txt
Dnsmasq config file:
dnsmasq.txt
What happened?
- Generated all ignition files and ensured that hidden files have been cleared.
- Boostrap start and waits for etcd
- Masters 0,1,2 starts and downloads config from bootstrap
- Bootstrap signals as above "committed proposal" for each etcd
- Then errors "Failed to fetch discovery .... :6443 ... connection refused" (see below)
- Now bootstrap process Kube-apiserver (6443) seems to restarts constantly
all firewalls are disabled on DNS and load balancer servers.
Could it be:
- Certificate related?
- Race condition where PC not faster enough (shouldn't be i9 8 phy. cores)
Output Hightlights:
Bootstrap starts:
Masters 0/1/2 dowload config and etcd are noticed
Nov 19 12:48:07 bootstrap bootkube.sh[1379]: Error: unhealthy cluster
Nov 19 12:48:08 bootstrap bootkube.sh[1379]: etcdctl failed. Retrying in 5 seconds...
Nov 19 12:49:42 bootstrap bootkube.sh[1379]: https://etcd-0.test.fritz.box:2379 is healthy: successfully committed proposal: took = 502.844138ms
Nov 19 12:50:20 bootstrap bootkube.sh[1379]: https://etcd-2.test.fritz.box:2379 is healthy: successfully committed proposal: took = 109.785824ms
Nov 19 12:50:25 bootstrap bootkube.sh[1379]: https://etcd-1.test.fritz.box:2379 is healthy: successfully committed proposal: took = 43.563555ms
Nov 19 12:50:26 bootstrap bootkube.sh[1379]: etcd cluster up. Killing etcd certificate signer...
Nov 19 12:50:32 bootstrap bootkube.sh[1379]: e0d21c36b68a594bc7b3260bfe9521df3c3dde39beaa352c5a690a4d33744eb2
Nov 19 12:50:33 bootstrap bootkube.sh[1379]: Starting cluster-bootstrap...
Nov 19 12:51:08 bootstrap bootkube.sh[1379]: Starting temporary bootstrap control plane...
Nov 19 12:51:08 bootstrap bootkube.sh[1379]: [#1] failed to fetch discovery: Get https://localhost:6443/api?timeout=32s: dial tcp [::1]:6443: connect: connection refused
Nov 19 12:51:09 bootstrap bootkube.sh[1379]: E1119 12:51:09.175787 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF[core@bootstrap ~]$ watch netstat -plnt
tcp 0 0 0.0.0.0:57839 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 :::53471 :::* LISTEN -
tcp6 0 0 :::9537 :::* LISTEN -
tcp6 0 0 :::10250 :::* LISTEN -
tcp6 0 0 :::6443 :::* LISTEN -
tcp6 0 0 :::9099 :::* LISTEN -
tcp6 0 0 :::19531 :::* LISTEN -
tcp6 0 0 :::10255 :::* LISTEN -
tcp6 0 0 :::111 :::* LISTEN -
tcp6 0 0 :::10259 :::* LISTEN -Masters restart and shortly after bootstrap service (6443) restarts :
[core@bootstrap ~]$ journalctl -b -f -u bootkube.service
...
Nov 19 13:24:04 bootstrap bootkube.sh[31694]: E1119 13:24:04.841357 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF
Nov 19 13:24:06 bootstrap bootkube.sh[31694]: E1119 13:24:06.332132 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF
Nov 19 13:24:07 bootstrap bootkube.sh[31694]: E1119 13:24:07.440597 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF
Nov 19 13:24:08 bootstrap bootkube.sh[31694]: E1119 13:24:08.497494 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF
Nov 19 13:24:09 bootstrap bootkube.sh[31694]: E1119 13:24:09.522265 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF
Nov 19 13:24:10 bootstrap bootkube.sh[31694]: E1119 13:24:10.597813 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF
Nov 19 13:24:21 bootstrap bootkube.sh[31694]: E1119 13:24:21.630283 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: net/http: TLS handshake timeout
Nov 19 13:24:33 bootstrap bootkube.sh[31694]: E1119 13:24:33.132800 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: net/http: TLS handshake timeout
Nov 19 13:24:44 bootstrap bootkube.sh[31694]: E1119 13:24:44.168483 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: net/http: TLS handshake timeout
Nov 19 13:24:55 bootstrap bootkube.sh[31694]: E1119 13:24:55.276698 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: net/http: TLS handshake timeout
Nov 19 13:25:06 bootstrap bootkube.sh[31694]: E1119 13:25:06.414705 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: net/http: TLS handshake timeout
Nov 19 13:25:17 bootstrap bootkube.sh[31694]: E1119 13:25:17.438228 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://api.test.fritz.box:6443/api/v1/pods: EOF\masters 0/1/2
Nov 19 13:33:51 master0 hyperkube[2876]: E1119 13:33:51.904677 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.008815 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: I1119 13:33:52.031523 2876 eviction_manager.go:229] eviction manager: synchronize housekeeping
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.038710 2876 eviction_manager.go:246] eviction manager: failed to get summary stats: failed to get node info: node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.109972 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.215617 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.304643 2876 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://api-int.test.fritz.box:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster0&limit=500&resourceVersion=0: EOF
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.319586 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.421361 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.523345 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.625008 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.727907 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.799153 2876 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://api-int.test.fritz.box:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster0&limit=500&resourceVersion=0: EOF
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.809021 2876 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://api-int.test.fritz.box:6443/api/v1/services?limit=500&resourceVersion=0: EOF
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.849490 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:52 master0 hyperkube[2876]: E1119 13:33:52.955039 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:53 master0 hyperkube[2876]: E1119 13:33:53.056800 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:53 master0 hyperkube[2876]: E1119 13:33:53.159608 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:53 master0 hyperkube[2876]: E1119 13:33:53.262447 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:53 master0 hyperkube[2876]: I1119 13:33:53.313235 2876 reflector.go:160] Listing and watching *v1.Pod from k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47
Nov 19 13:33:53 master0 hyperkube[2876]: E1119 13:33:53.374617 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:53 master0 hyperkube[2876]: E1119 13:33:53.485333 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:53 master0 hyperkube[2876]: E1119 13:33:53.600113 2876 kubelet.go:2276] node "master0" not found
Nov 19 13:33:53 master0 hyperkube[2876]: E1119 13:33:53.713770 2876 kubelet.go:2276] node "master0" not foundWhat you expected to happen?
Start cluster successfully
Services are estalshed on masters
How to reproduce it
PXE Boot:
- DNS/DHCP boots bootstrap and master
- Loadbalancer
log-bundle.tar.zip
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
platform/baremetalIPI bare metal hosts platformIPI bare metal hosts platform