This repository has been archived by the owner on Jan 4, 2022. It is now read-only.

fails to start with a timeout with Kubernetes 1.11 #282

alban opened this issue Jun 30, 2018 · 5 comments

alban opened this issue Jun 30, 2018 · 5 comments


alban commented Jun 30, 2018

To Reproduce:

  • Install Fedora 28 from (GP2 image) on AWS:
    • m4.large
    • Disk: at least 50GiB
    • ssh: ssh -i ~/.ssh/$KEY fedora@$IP
  • Start a kube-spawn Kubernetes cluster on the AWS EC2 instance:
export KUBERNETES_VERSION=v1.9.9 # or other version
export KUBERNETES_VERSION=v1.10.5 # or other version
export KUBERNETES_VERSION=v1.11.0 # or other version

## Workarounds
sudo setenforce 0

## Install dependencies
sudo dnf install -y btrfs-progs git go iptables libselinux-utils polkit qemu-img systemd-container make docker
mkdir go
export GOPATH=$HOME/go
curl -fsSL -O
sudo mkdir -p /opt/cni/bin
sudo tar -C /opt/cni/bin -xvf cni-plugins-amd64-v0.6.0.tgz
sudo curl -Lo /usr/local/bin/kubectl${KUBERNETES_VERSION}/bin/linux/amd64/kubectl
sudo chmod +x /usr/local/bin/kubectl

## Compile and install
mkdir -p $GOPATH/src/
cd $GOPATH/src/
git clone
cd kube-spawn/
git checkout $KUBE_SPAWN_VERSION
sudo make install

## First attempt to use kube-spawn
sudo -E kube-spawn create --kubernetes-version $KUBERNETES_VERSION
sudo -E kube-spawn start --nodes=3
sudo -E kube-spawn destroy

## Workaround for "no space left on device":
sudo umount /var/lib/machines
sudo qemu-img resize -f raw /var/lib/machines.raw $((10*1024*1024*1024))
sudo mount -t btrfs -o loop /var/lib/machines.raw /var/lib/machines
sudo btrfs filesystem resize max /var/lib/machines
sudo btrfs quota disable /var/lib/machines

## Start kube-spawn
sudo -E kube-spawn create --kubernetes-version $KUBERNETES_VERSION
sudo -E kube-spawn start --nodes=3

Then the error message:

Download of complete.
Created new local image 'flatcar'.
Operation completed successfully.
nf_conntrack module is not loaded: stat /sys/module/nf_conntrack/parameters/hashsize: no such file or directory
Warning: nf_conntrack module is not loaded.
loading nf_conntrack module... 
making iptables FORWARD chain defaults to ACCEPT...
setting iptables rule to allow CNI traffic...
Starting 3 nodes in cluster default ...
Waiting for machine kube-spawn-default-worker-fjxan9 to start up ...
Waiting for machine kube-spawn-default-master-5y7clq to start up ...
Waiting for machine kube-spawn-default-worker-2ujr2f to start up ...
Started kube-spawn-default-worker-2ujr2f
Bootstrapping kube-spawn-default-worker-2ujr2f ...
Started kube-spawn-default-master-5y7clq
Bootstrapping kube-spawn-default-master-5y7clq ...
Cluster "default" started
Failed to start machine kube-spawn-default-worker-fjxan9: timeout waiting for "kube-spawn-default-worker-fjxan9" to start
Note: `kubeadm init` can take several minutes
master-5y7clq I0630 14:22:29.999557     380 feature_gate.go:230] feature gates: &{map[]}
              [init] using Kubernetes version: v1.11.0
              [preflight] running pre-flight checks
              [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
              [WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
              [WARNING FileExisting-crictl]: crictl not found in system path
              I0630 14:22:30.050775     380 kernel_validator.go:81] Validating kernel version
              I0630 14:22:30.051083     380 kernel_validator.go:96] Validating kernel config
              [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03
              [WARNING Hostname]: hostname "kube-spawn-default-master-5y7clq" could not be reached
              [WARNING Hostname]: hostname "kube-spawn-default-master-5y7clq" lookup kube-spawn-default-master-5y7clq on no such host
              reflight/images] Pulling images required for setting up a Kubernetes cluster
              [preflight/images] This might take a minute or two, depending on the speed of your internet connection
              [preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
              [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
              [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
              [preflight] Activating the kubelet service
              [certificates] Generated ca certificate and key.
              [certificates] Generated apiserver certificate and key.
              [certificates] apiserver serving cert is signed for DNS names [kube-spawn-default-master-5y7clq kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs []
              [certificates] Generated apiserver-kubelet-client certificate and key.
              [certificates] Generated sa key and public key.
              [certificates] Generated front-proxy-ca certificate and key.
              [certificates] Generated front-proxy-client certificate and key.
              [certificates] Generated etcd/ca certificate and key.
              [certificates] Generated etcd/server certificate and key.
              [certificates] etcd/server serving cert is signed for DNS names [kube-spawn-default-master-5y7clq localhost] and IPs [ ::1]
              [certificates] Generated etcd/peer certificate and key.
              [certificates] etcd/peer serving cert is signed for DNS names [kube-spawn-default-master-5y7clq localhost] and IPs [ ::1]
              [certificates] Generated etcd/healthcheck-client certificate and key.
              [certificates] Generated apiserver-etcd-client certificate and key.
              [certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
              [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
              [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
              [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
              [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
              [controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
              [controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
              [controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
              [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
              [init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
              [init] this might take a minute or longer if the control plane images have to be pulled
              [apiclient] All control plane components are healthy after 42.001677 seconds
              [uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
              [kubelet] Creating a ConfigMap "kubelet-config-1.11" in namespace kube-system with the configuration for the kubelets in the cluster
              [markmaster] Marking the node kube-spawn-default-master-5y7clq as master by adding the label "''"
              [markmaster] Marking the node kube-spawn-default-master-5y7clq as master by adding the taints []
              [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-spawn-default-master-5y7clq" as an annotation
              [bootstraptoken] using token: 1o71nu.v7s48wncryhbdmm7
              [bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
              [bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
              [bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
              [bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
              [addons] Applied essential addon: CoreDNS
              [addons] Applied essential addon: kube-proxy
              Your Kubernetes master has initialized successfully!
              To start using your cluster, you need to run the following as a regular user:
              mkdir -p $HOME/.kube
              sudo cp -i /etc/kubernetes/admin.conf
              sudo chown $(id -u):$(id -g) $HOME/.kube/config
              You should now deploy a pod network to the cluster.
              Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
              You can now join any number of machines by running the following on each node
              as root:
              kubeadm join --token 1o71nu.v7s48wncryhbdmm7 --discovery-token-ca-cert-hash sha256:c8ac2337adc7ed01725bed7d78605661dc759257fce213838f1cb89486fe263c
              I0630 14:23:47.569329    1140 feature_gate.go:230] feature gates: &{map[]}
              serviceaccount/weave-net created
              daemonset.extensions/weave-net created
worker-2ujr2f [preflight] running pre-flight checks
              [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
              you can solve this problem with following methods:
              1. Run 'modprobe -- ' to load missing kernel modules;
              2. Provide the missing builtin kernel ipvs support
              [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
              [WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
              [WARNING FileExisting-crictl]: crictl not found in system path
              I0630 14:23:49.919029     449 kernel_validator.go:81] Validating kernel version
              I0630 14:23:49.919338     449 kernel_validator.go:96] Validating kernel config
              [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03
              [WARNING Hostname]: hostname "kube-spawn-default-worker-2ujr2f" could not be reached
              [WARNING Hostname]: hostname "kube-spawn-default-worker-2ujr2f" lookup kube-spawn-default-worker-2ujr2f on no such host
              [discovery] Trying to connect to API Server ""
              [discovery] Created cluster-info discovery client, requesting info from ""
              [discovery] Failed to connect to API Server "": token id "aaaaaa" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token
              [discovery] Trying to connect to API Server ""
              [discovery] Created cluster-info discovery client, requesting info from ""
              [discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server ""
              [discovery] Successfully established connection with API Server ""
              [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
              [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
              [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
              [preflight] Activating the kubelet service
              [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
              [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-spawn-default-worker-2ujr2f" as an annotation
              This node has joined the cluster:
              * Certificate signing request was sent to master and a response
              was received.
              * The Kubelet was informed of the new secure connection details.
              Run 'kubectl get nodes' on the master to see this node join the cluster.
Failed to start cluster: provisioning the worker nodes with kubeadm didn't succeed

More debug info:

$ kubectl get nodes
NAME                               STATUS    ROLES     AGE       VERSION
kube-spawn-default-master-5y7clq   Ready     master    1m        v1.11.0
kube-spawn-default-worker-2ujr2f   Ready     <none>    46s       v1.11.0
$ machinectl 
MACHINE                          CLASS     SERVICE        OS      VERSION  ADDRESSES
kube-spawn-default-master-5y7clq container systemd-nspawn flatcar 1814.0.0
kube-spawn-default-worker-2ujr2f container systemd-nspawn flatcar 1814.0.0

2 machines listed.
$ df -h /var/lib/machines
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0       10G  1.7G  7.8G  18% /var/lib/machines

The third machine does not exist anymore?

Member Author

alban commented Jun 30, 2018

After a second attempt, it works.

arcolife commented Nov 12, 2018

I get this timeout just as @alban described, except it's reproducible every time.

$ kube-spawn start
Warning: kube-proxy could crash due to insufficient nf_conntrack hashsize.
setting nf_conntrack hashsize to 131072... 
making iptables FORWARD chain defaults to ACCEPT...
new poolSize to be : 5490739200
Starting 3 nodes in cluster default ...
Waiting for machine kube-spawn-default-worker-naz6fc to start up ...
Waiting for machine kube-spawn-default-master-yz3twq to start up ...
Waiting for machine kube-spawn-default-worker-u5fu6n to start up ...
Failed to start machine kube-spawn-default-master-yz3twq: timeout waiting for "kube-spawn-default-master-yz3twq" to start
Failed to start machine kube-spawn-default-worker-naz6fc: timeout waiting for "kube-spawn-default-worker-naz6fc" to start
Failed to start cluster: starting the cluster didn't succeed


  1. I face the same timeout issue, regardless of when I destroy the cluster and start again. Or if I mount a formatted btrfs and redo this.
  2. The first time I launched kube-spawn, it was with a manually formatted and mounted btrfs volume. That's when it complained "machine.raw" not found. So I unmounted and re-ran. So the systemd-nspawn did its job and created a machine.raw. I tried to re-spawn the cluster afterwards, except this time it didn't complain about .raw file obviously. But it timed out regardless.
  3. Even though I've been through the guide, SELinux has been a pita and as a result I've had to create about a dozen policies and semanage it all. Not the cake I was digging. pfft

for debugging, is there any place this things logs itself into?

  • kube-spawn v0.3.0
  • FS:
/dev/loop2     btrfs      40G  1.7G   39G   5% /var/lib/machines


/dev/sda4      btrfs      56G  1.7G   54G   4% /var/lib/machines
  • systemd-container-238-10.git438ac26.fc28.x86_64
  • qemu-img-2.11.2-4.fc28.x86_64
  • machinectl limit to 40G with loopback mount (as evident in the df output above too):
# machinectl show
  • OS: Linux 4.18.17-200.fc28.x86_64 GNU/Linux

arcolife commented Nov 12, 2018

ok nevermind.

all I had to do was:

  1. export KUBERNETES_VERSION=v1.12.0 (didn't do it earlier before create step)
  2. kube-spawn destroy
  3. kube-spawn create (this time, it populated /var/lib/kube-spawn/clusters. It was an empty trail of subdirs earlier.)
  4. kube-spawn start

and it works. jeez

krnowak commented Nov 12, 2018

Seems to be related to #325.

arcolife commented Nov 12, 2018

Seems to be related to #325.

sure, except I didn't destroy it first. Got the timeout from start as per #282 (comment) (so to speak, after creating the cluster)
..then resolved issue with #282 (comment)

apologies if that order in step 2 of resolution comment, created a confusion.

also I can't reproduce it now. :/

