unable to apply cluster api stack to bootstrap cluster #81

bsingarayan · 2018-12-27T21:55:39Z

I am following the exact guidelines as shown in
https://github.com/kubernetes-sigs/cluster-api-provider-gcp#getting-started
and cluster creation fails

Please let me know how to proceed. I haven't done any fancy stuffs, just following the guidelines.
and below are some useful information.

bsingarayan@bsingarayan-mbp ~/g/s/s/c/c/c/e/g/out> minikube version
minikube version: v0.32.0

I manually brought up minikube and applied the provider-components.yaml file and it had the same issue.

Below is the provider-components.yaml file for reference
provider-components.yaml.txt

BTW- I also discussed this issue in slack (cluster-api) thread and suggested to create an issue here.

Thanks

roberthbailey · 2019-01-03T17:53:27Z

The command gets further for me than it does for you, although it still doesn't finish:

$ ./bin/clusterctl create cluster --provider google -c cmd/clusterctl/examples/google/out/cluster.yaml -m cmd/clusterctl/examples/google/out/machines.yaml -p cmd/clusterctl/examples/google/out/provider-components.yaml -a cmd/clusterctl/examples/google/out/addons.yaml --minikube="kubernetes-version=v1.12.0"
I0103 09:03:54.902571   52109 machineactuator.go:813] Using the default GCP client
I0103 09:03:54.904024   52109 plugins.go:39] Registered cluster provisioner "google"
I0103 09:03:54.909807   52109 createbootstrapcluster.go:28] Creating bootstrap cluster
I0103 09:17:35.325963   52109 clusterdeployer.go:95] Applying Cluster API stack to bootstrap cluster
I0103 09:17:35.325980   52109 applyclusterapicomponents.go:27] Applying Cluster API Provider Components
I0103 09:17:44.551980   52109 clusterdeployer.go:100] Provisioning target cluster via bootstrap cluster
I0103 09:17:44.587990   52109 applycluster.go:37] Creating cluster object test1-p5sm0 in namespace "default"
I0103 09:17:44.617488   52109 clusterdeployer.go:109] Creating master  in namespace "default"
I0103 09:17:44.639946   52109 applymachines.go:37] Creating machines in namespace "default"
I0103 09:47:44.701690   52109 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
F0103 09:47:45.156812   52109 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

roberthbailey · 2019-01-03T17:57:11Z

I diff'd my provider components yaml file against yours and other than the expected differences, the only thing I see is the changes merged in #85.

roberthbailey · 2019-01-03T18:02:14Z

From your debugging output, it looks like the CRD is successfully created (that's what is defined in the provider components yaml) but the machine(s) cannot be successfully applied to your cluster. What does your machines.yaml look like? And can you verify that you have the correct validation (the disk part is at https://github.com/kubernetes-sigs/cluster-api-provider-gcp/blob/master/config/crds/gceproviderconfig_v1alpha1_gcemachineproviderspec.yaml#L19-L36)?

bsingarayan · 2019-01-03T19:50:44Z

Thanks much Robert for taking a look.
Am attaching the machines.yaml and cluster.yaml file. Please take a look.
machines.yaml.txt
cluster.yaml.txt

Let me know what is to be done.

roberthbailey · 2019-01-03T21:02:20Z

Those files look the same as mine (diffs only show different project names). I also noticed that you are using 1.12.4 instead of 1.12.0 as the k8s version in minikube but that doesn't seem to change any output that I'm seeing.

What version of kubectl is on your path? I think that there was an issue with kubectl at some point and I stopped upgrading. It looks like I'm still using 1.8.11:

kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.11", GitCommit:"1df6a8381669a6c753f79cb31ca2e3d57ee7c8a3", GitTreeState:"clean", BuildDate:"2018-04-05T17:24:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

roberthbailey · 2019-01-03T21:05:58Z

The issue that caused me to pin kubectl was kubernetes-sigs/cluster-api#137. It looks like that may have been fixed in a later release, but I haven't gone back and tried newer ones.

bsingarayan · 2019-01-03T21:30:00Z

Those files look the same as mine (diffs only show different project names). I also noticed that you are using 1.12.4 instead of 1.12.0 as the k8s version in minikube but that doesn't seem to change any output that I'm seeing.

What version of kubectl is on your path? I think that there was an issue with kubectl at some point and I stopped upgrading. It looks like I'm still using 1.8.11:
kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.11", GitCommit:"1df6a8381669a6c753f79cb31ca2e3d57ee7c8a3", GitTreeState:"clean", BuildDate:"2018-04-05T17:24:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

bsingarayan · 2019-01-03T23:04:33Z

Changing it the kubectl client version to v1.8.11 and modifying the code to pass validate==false still doesn't work.
'[apply --kubeconfig /var/folders/0l/mxwq2jtd3w7fkg2mxh5sbl9c0000gn/T/977848574 --validate=false]'

I get the same output
W0103 14:06:18.111008 19461 clusterclient.go:517] BABU - Kubectl args is '[apply --kubeconfig /var/folders/0l/mxwq2jtd3w7fkg2mxh5sbl9c0000gn/T/977848574 --validate=false]'
I0103 14:06:26.865768 19461 clusterdeployer.go:100] Provisioning target cluster via bootstrap cluster
I0103 14:06:26.888866 19461 applycluster.go:37] Creating cluster object test1-jtpdx in namespace "default"
I0103 14:06:26.901868 19461 clusterdeployer.go:109] Creating master in namespace "default"
I0103 14:06:26.911950 19461 applymachines.go:37] Creating machines in namespace "default"
I0103 14:36:26.983638 19461 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
F0103 14:36:29.173047 19461 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

justinsb · 2019-01-03T23:33:52Z

So "timed out waiting for the condition" looks like it means that the machine didn't go ready in time. You might want to try passing --v=2 or --v=4 though I don't think you'll get much more information.

I'm not sure if "Cleaning up bootstrap cluster" means the objects were deleted, but you could check the status of the machine in the default namespace during the 30 minute timeout... Maybe there are some clues there. You could also SSH to the machine using gcloud compute ssh <name> and look at the logs to see if kubeadm init was successful... I personally page through journalctl.

bsingarayan · 2019-01-04T01:40:52Z

I did v=10 earlier and below is the output

I0103 17:00:57.298647 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:07.299615 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:17.299163 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:27.296305 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:37.299580 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:47.300156 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:47.303803 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:47.306923 22785 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0103 17:01:47.306948 22785 minikube.go:58] Running: minikube [delete]
I0103 17:01:48.679814 22785 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...
Machine deleted.
F0103 17:01:48.680386 22785 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

I tried ssh but the machine doesn't seem to exist.

bsingarayan@bsingarayan-mbp ~/g/s/s/c/p/c/c/clientset> gcloud compute ssh gce-master-njnlj
Unable to find an instance with name [gce-master-njnlj].
For the following instance:

[gce-master-njnlj]
choose a zone:
[1] asia-east1-a

[49] us-east4-c
[50] us-west1-a
Did not print [5] options.
Too many options [55]. Enter "list" at prompt to print choices fully.
Please enter your numeric choice: 43

ERROR: (gcloud.compute.ssh) Could not fetch resource:

The resource 'projects/e2cluster/zones/us-central1-f/instances/gce-master-njnlj' was not found

Alternatively, I did log into the console gcloud account and I don't see any machines in the compute instance section.

roberthbailey · 2019-01-04T08:24:28Z

@bsingarayan - either changing the kubectl version or modifying the code to disable validation fixed your initial error and we are now seeing the same issue.

roberthbailey · 2019-01-04T08:40:01Z

Found the first issue - I'd pushed a new version of the gcp-provider-controller-manager image and forgot to set acls to publicly readable. You should now see the master machine get created on GCP running clusterctl using the latest image.

roberthbailey · 2019-01-04T08:57:48Z

On the GCE VM I see both cluster-api-controller-manager-0 and gcp-provider-controller-manager-0 crash looping because they are unable to dial the API server via the service IP:

$ kubectl --kubeconfig=kubeconfig logs cluster-api-controller-manager-0 -n cluster-api-system -p
2019/01/04 08:43:52 Get https://10.96.0.1:443/api?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout
$ kubectl --kubeconfig=kubeconfig get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   8m

kube-proxy is running on the master but what is interesting is that while I can successfully curl the service IP from on the machine, I cannot reach it from within a busybox pod that I run on the master. And I noticed that coredns on the master is also crash looping (just a bit slower than the cluster api provider pods):

$ kubectl --kubeconfig=kubeconfig logs -n kube-system coredns-576cbf47c7-wxkcg
E0104 08:56:11.889192       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:11.889397       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:11.889831       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:42.889800       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:42.890982       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:42.892157       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

roberthbailey · 2019-01-04T09:22:35Z

From my busybox pod I see a timeout trying to reach both the cluster IP and also the external IP for the kubernetes service:

# wget https://10.96.0.1:443
Connecting to 10.96.0.1:443 (10.96.0.1:443)
wget: can't connect to remote host (10.96.0.1): Connection timed out
# wget 35.232.180.114:443
Connecting to 35.232.180.114:443 (35.232.180.114:443)
wget: can't connect to remote host (35.232.180.114): Connection timed out

roberthbailey · 2019-01-04T09:54:32Z

Since the problem seems related to be the lack of pod <-> pod network connectivity, I tried installing calico into my cluster, but that doesn't seem to have fixed anything.

bsingarayan · 2019-01-04T20:59:32Z

Your system looks much better. I updated my sanbox with your commits, and I am running into below error. The machines are not getting created.

./bin/clusterctl create cluster --provider google -c cmd/clusterctl/examples/google/out/cluster.yaml -m cmd/clusterctl/examples/google/out/machines.yaml -p cmd/clusterctl/examples/google/out/provider-components.yaml -a cmd/clusterctl/examples/google/out/addons.yaml --minikube="kubernetes-version=v1.12.4" --v=4
I0104 12:48:30.209905 30277 machineactuator.go:813] Using the default GCP client
I0104 12:48:30.212017 30277 plugins.go:39] Registered cluster provisioner "google"
I0104 12:48:30.216788 30277 createbootstrapcluster.go:28] Creating bootstrap cluster
I0104 12:48:30.216845 30277 minikube.go:58] Running: minikube [start --bootstrapper=kubeadm --kubernetes-version=v1.12.4]
I0104 12:50:50.260581 30277 minikube.go:62] Ran: minikube [start --bootstrapper=kubeadm --kubernetes-version=v1.12.4] Output: Starting local Kubernetes v1.12.4 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Stopping extra container runtimes...
Starting cluster components...
Verifying kubelet health ...
Verifying apiserver health ...Kubectl is now configured to use the cluster.
Loading cached images from config file.

Everything looks great. Please enjoy minikube!
I0104 12:50:50.263619 30277 clusterdeployer.go:95] Applying Cluster API stack to bootstrap cluster
I0104 12:50:50.263637 30277 applyclusterapicomponents.go:27] Applying Cluster API Provider Components
I0104 12:50:50.263648 30277 clusterclient.go:521] Waiting for kubectl apply...
I0104 12:50:59.034678 30277 clusterclient.go:550] Waiting for Cluster v1alpha resources to become available...
I0104 12:50:59.047032 30277 clusterclient.go:563] Waiting for Cluster v1alpha resources to be listable...
I0104 12:50:59.079902 30277 clusterdeployer.go:100] Provisioning target cluster via bootstrap cluster
I0104 12:50:59.094264 30277 applycluster.go:37] Creating cluster object test1-jtpdx in namespace "default"
I0104 12:50:59.103577 30277 clusterdeployer.go:109] Creating master in namespace "default"
I0104 12:50:59.119836 30277 applymachines.go:37] Creating machines in namespace "default"
I0104 12:50:59.140076 30277 clusterclient.go:574] Waiting for Machine gce-master-rfp84 to become ready...
I0104 12:51:09.144008 30277 clusterclient.go:574] Waiting for Machine gce-master-rfp84 to become ready...
I0104 12:51:19.148023 30277 clusterclient.go:574] Waiting for Machine gce-master-rfp84 to become ready...
...
...

I0104 11:54:29.398109 28375 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0104 11:54:29.398138 28375 minikube.go:58] Running: minikube [delete]
I0104 11:54:30.784157 28375 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...
Machine deleted.
F0104 11:54:30.785301 28375 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

babu-selector · 2019-01-05T19:57:52Z

I get the below error when using the same clusterctl create command

I0104 15:26:40.158070 70886 clusterclient.go:574] Waiting for Machine gce-master-nplkn to become ready...
I0104 15:26:50.157910 70886 clusterclient.go:574] Waiting for Machine gce-master-nplkn to become ready...
I0104 15:27:00.159249 70886 clusterclient.go:574] Waiting for Machine gce-master-nplkn to become ready...
I0104 15:27:00.163162 70886 clusterdeployer.go:114] Updating bootstrap cluster object for cluster test1-zc1zl in namespace "default" with master () endpoint
I0104 15:27:00.828483 70886 clusterdeployer.go:119] Creating target cluster
I0104 15:27:00.833525 70886 clusterdeployer.go:206] Getting target cluster kubeconfig.
I0104 15:27:00.833542 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:03.797805 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 255
I0104 15:27:13.801542 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:16.631529 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:23.802087 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:26.283914 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:33.801773 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:36.210662 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:43.802653 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:46.220228 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:53.798289 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:56.150564 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:03.802817 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:28:06.028489 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:13.801325 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:28:16.246216 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:23.806084 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:28:26.095382 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:33.801219 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
I0104 15:28:36.287582 70886 applyaddons.go:25] Applying Addons
I0104 15:28:36.287604 70886 clusterclient.go:521] Waiting for kubectl apply...
I0104 15:28:37.054296 70886 clusterclient.go:526] Waiting for kubectl apply... server not yet available: couldn't kubectl apply: exit status 1, output: unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused

...
...
...

girikuncoro · 2019-01-13T16:35:22Z

I'm hitting this exact issue, where cluster-api-controller-manager-0 and gcp-provider-controller-manager-0 are crashlooping since they are not able to reach kube API server through service IP. I think this is mostly related to pod network issue, I tried installing flannel with network 192.168.0.0/16 from kubeadm doc, and it works out. Need further investigation to fix this issue.

...
I0113 23:04:09.489242   25220 clusterdeployer.go:160] Done provisioning cluster. You can now access your cluster with kubectl --kubeconfig kubeconfig
I0113 23:04:09.495618   25220 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0113 23:04:09.495644   25220 minikube.go:58] Running: minikube [delete]
I0113 23:04:09.955152   25220 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...

$ kubectl get po --all-namespaces
NAMESPACE             NAME                                       READY   STATUS    RESTARTS   AGE
cluster-api-system    cluster-api-controller-manager-0           1/1     Running   5          36m
gcp-provider-system   gcp-provider-controller-manager-0          1/1     Running   5          36m
kube-system           coredns-576cbf47c7-4khfr                   1/1     Running   2          36m
kube-system           coredns-576cbf47c7-dhp9x                   1/1     Running   2          36m
...

babu-selector · 2019-01-14T06:41:50Z

I'm hitting this exact issue, where cluster-api-controller-manager-0 and gcp-provider-controller-manager-0 are crashlooping since they are not able to reach kube API server through service IP. I think this is mostly related to pod network issue, I tried installing flannel with network 192.168.0.0/16 from kubeadm doc, and it works out. Need further investigation to fix this issue.

...
I0113 23:04:09.489242   25220 clusterdeployer.go:160] Done provisioning cluster. You can now access your cluster with kubectl --kubeconfig kubeconfig
I0113 23:04:09.495618   25220 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0113 23:04:09.495644   25220 minikube.go:58] Running: minikube [delete]
I0113 23:04:09.955152   25220 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...

$ kubectl get po --all-namespaces
NAMESPACE             NAME                                       READY   STATUS    RESTARTS   AGE
cluster-api-system    cluster-api-controller-manager-0           1/1     Running   5          36m
gcp-provider-system   gcp-provider-controller-manager-0          1/1     Running   5          36m
kube-system           coredns-576cbf47c7-4khfr                   1/1     Running   2          36m
kube-system           coredns-576cbf47c7-dhp9x                   1/1     Running   2          36m
...

Could you please exact steps to fix this issue as a workaround? It would unblock me and get it going.

Thanks

girikuncoro · 2019-01-15T04:06:24Z

@babu-selector this is hack that I did:

create cluster using clusterctl as usual

$ ./bin/clusterctl create cluster --provider google -c cmd/clusterctl/examples/google/out/cluster.yaml -m cmd/clusterctl/examples/google/out/machines.yaml -p cmd/clusterctl/examples/google/out/provider-components.yaml -a cmd/clusterctl/examples/google/out/addons.yaml --minikube="kubernetes-version=v1.12.4"

wait until it reaches creating nodes

...
I0115 05:54:45.615643   43663 applymachines.go:37] Creating machines in namespace "default"

at this stage, clusterctl will wait forever (eventually timingout) if you don't do something, since it's waiting for components in control plane to be ready. control plane node should be up in GCP, so ssh into it. Get gcloud command from your google console.

$ gcloud compute --project ${PROJECT_NAME} ssh --zone ${ZONE_NAME} ${VM_NAME}

from inside control plane VM, you can see cluster API objects are crashlooping

$ kubectl --kubeconfig /etc/kubernetes/admin.conf get po --all-namespaces

get flannel yaml from kubeadm doc, replace 10.244.0.0/16 to 192.168.0.0/16, then apply

$ kubectl apply -f flannel.yaml

wait for few seconds until flannel CNI is ready, the cluster API objects should be ready soon. check with kubectl logs as well. then node machine will get spawned up soon after this, clusterctl will complete.

hope it's useful for you

bsingarayan · 2019-01-20T19:27:14Z

Thanks alot @girikuncoro
Before I could get there this time the cluster creation fails at below location.

minikube logs shows the cluster creation is not progressing

...
...
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810365    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810446    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810473    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810491    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282516    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "sshkeys" (UniqueName: "kubernetes.io/secret/aeb18636-1e8d-11e9-97c3-080027234dd5-sshkeys") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282571    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "credentials" (UniqueName: "kubernetes.io/secret/aeb18636-1e8d-11e9-97c3-080027234dd5-credentials") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282593    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "certs" (UniqueName: "kubernetes.io/host-path/aeb18636-1e8d-11e9-97c3-080027234dd5-certs") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282691    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-pzxgz" (UniqueName: "kubernetes.io/secret/aeb18636-1e8d-11e9-97c3-080027234dd5-default-token-pzxgz") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282729    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "config" (UniqueName: "kubernetes.io/host-path/aeb18636-1e8d-11e9-97c3-080027234dd5-config") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282753    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "machine-setup" (UniqueName: "kubernetes.io/configmap/aeb18636-1e8d-11e9-97c3-080027234dd5-machine-setup") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.684425    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-rhdll" (UniqueName: "kubernetes.io/secret/aed28f73-1e8d-11e9-97c3-080027234dd5-default-token-rhdll") pod "cluster-api-controller-manager-0" (UID: "aed28f73-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:49 minikube kubelet[3169]: W0122 21:35:49.208750    3169 container.go:393] Failed to create summary reader for "/system.slice/run-r2408930bd9cf43b48913f3430e92f6dd.scope": none of the resources are being tracked.
Jan 22 21:35:49 minikube kubelet[3169]: W0122 21:35:49.211586    3169 pod_container_deletor.go:75] Container "10e7bdb53029f365ccec3dbb06ce7617cdadcc6ff148a7863247f6d7f9deb944" not found in pod's containers

Another thing I noticed is gcp pod wasn't getting created in minikube.

bsingarayan@bsingarayan-mbp ~/g/s/s/cluster-api-provider-gcp> kubectl get pods -o wide --all-namespaces
NAMESPACE            NAME                                    READY     STATUS    RESTARTS   AGE       IP           NODE
cluster-api-system   cluster-api-controller-manager-0        1/1       Running   0          15m       172.17.0.5   minikube
kube-system          coredns-576cbf47c7-xpjhb                1/1       Running   0          15m       172.17.0.2   minikube
kube-system          coredns-576cbf47c7-zrd68                1/1       Running   0          15m       172.17.0.3   minikube
kube-system          etcd-minikube                           1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kube-addon-manager-minikube             1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kube-apiserver-minikube                 1/1       Running   0          15m       10.0.2.15    minikube
kube-system          kube-controller-manager-minikube        1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kube-proxy-l284r                        1/1       Running   0          15m       10.0.2.15    minikube
kube-system          kube-scheduler-minikube                 1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kubernetes-dashboard-5bff5f8fb8-cpkvc   1/1       Running   0          15m       172.17.0.4   minikube
kube-system          storage-provisioner                     1/1       Running   0          15m       10.0.2.15    minikube

fejta-bot · 2019-04-28T22:26:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-28T23:08:26Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-27T23:59:39Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-27T23:59:47Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

roberthbailey added this to the v1alpha1 milestone Jan 11, 2019

roberthbailey added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jan 11, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 28, 2019

k8s-ci-robot closed this as completed Jun 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to apply cluster api stack to bootstrap cluster #81

unable to apply cluster api stack to bootstrap cluster #81

bsingarayan commented Dec 27, 2018

roberthbailey commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

bsingarayan commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

bsingarayan commented Jan 3, 2019

bsingarayan commented Jan 3, 2019

justinsb commented Jan 3, 2019

bsingarayan commented Jan 4, 2019 •

edited

Loading

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

bsingarayan commented Jan 4, 2019

babu-selector commented Jan 5, 2019

girikuncoro commented Jan 13, 2019

babu-selector commented Jan 14, 2019

girikuncoro commented Jan 15, 2019

bsingarayan commented Jan 20, 2019 •

edited

Loading

fejta-bot commented Apr 28, 2019

fejta-bot commented May 28, 2019

fejta-bot commented Jun 27, 2019

k8s-ci-robot commented Jun 27, 2019

unable to apply cluster api stack to bootstrap cluster #81

unable to apply cluster api stack to bootstrap cluster #81

Comments

bsingarayan commented Dec 27, 2018

roberthbailey commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

bsingarayan commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

roberthbailey commented Jan 3, 2019

bsingarayan commented Jan 3, 2019

bsingarayan commented Jan 3, 2019

justinsb commented Jan 3, 2019

bsingarayan commented Jan 4, 2019 • edited Loading

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

roberthbailey commented Jan 4, 2019

bsingarayan commented Jan 4, 2019

babu-selector commented Jan 5, 2019

girikuncoro commented Jan 13, 2019

babu-selector commented Jan 14, 2019

girikuncoro commented Jan 15, 2019

bsingarayan commented Jan 20, 2019 • edited Loading

fejta-bot commented Apr 28, 2019

fejta-bot commented May 28, 2019

fejta-bot commented Jun 27, 2019

k8s-ci-robot commented Jun 27, 2019

bsingarayan commented Jan 4, 2019 •

edited

Loading

bsingarayan commented Jan 20, 2019 •

edited

Loading