Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to apply cluster api stack to bootstrap cluster #81

Closed
bsingarayan opened this issue Dec 27, 2018 · 25 comments
Closed

unable to apply cluster api stack to bootstrap cluster #81

bsingarayan opened this issue Dec 27, 2018 · 25 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@bsingarayan
Copy link

I am following the exact guidelines as shown in
https://github.com/kubernetes-sigs/cluster-api-provider-gcp#getting-started
and cluster creation fails
screen shot 2018-12-27 at 1 48 40 pm

Please let me know how to proceed. I haven't done any fancy stuffs, just following the guidelines.
and below are some useful information.

bsingarayan@bsingarayan-mbp ~/g/s/s/c/c/c/e/g/out> minikube version
minikube version: v0.32.0

I manually brought up minikube and applied the provider-components.yaml file and it had the same issue.
screen shot 2018-12-27 at 1 52 16 pm

Below is the provider-components.yaml file for reference
provider-components.yaml.txt

BTW- I also discussed this issue in slack (cluster-api) thread and suggested to create an issue here.

Thanks

@roberthbailey
Copy link
Contributor

The command gets further for me than it does for you, although it still doesn't finish:

$ ./bin/clusterctl create cluster --provider google -c cmd/clusterctl/examples/google/out/cluster.yaml -m cmd/clusterctl/examples/google/out/machines.yaml -p cmd/clusterctl/examples/google/out/provider-components.yaml -a cmd/clusterctl/examples/google/out/addons.yaml --minikube="kubernetes-version=v1.12.0"
I0103 09:03:54.902571   52109 machineactuator.go:813] Using the default GCP client
I0103 09:03:54.904024   52109 plugins.go:39] Registered cluster provisioner "google"
I0103 09:03:54.909807   52109 createbootstrapcluster.go:28] Creating bootstrap cluster
I0103 09:17:35.325963   52109 clusterdeployer.go:95] Applying Cluster API stack to bootstrap cluster
I0103 09:17:35.325980   52109 applyclusterapicomponents.go:27] Applying Cluster API Provider Components
I0103 09:17:44.551980   52109 clusterdeployer.go:100] Provisioning target cluster via bootstrap cluster
I0103 09:17:44.587990   52109 applycluster.go:37] Creating cluster object test1-p5sm0 in namespace "default"
I0103 09:17:44.617488   52109 clusterdeployer.go:109] Creating master  in namespace "default"
I0103 09:17:44.639946   52109 applymachines.go:37] Creating machines in namespace "default"
I0103 09:47:44.701690   52109 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
F0103 09:47:45.156812   52109 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

@roberthbailey
Copy link
Contributor

I diff'd my provider components yaml file against yours and other than the expected differences, the only thing I see is the changes merged in #85.

@roberthbailey
Copy link
Contributor

From your debugging output, it looks like the CRD is successfully created (that's what is defined in the provider components yaml) but the machine(s) cannot be successfully applied to your cluster. What does your machines.yaml look like? And can you verify that you have the correct validation (the disk part is at https://github.com/kubernetes-sigs/cluster-api-provider-gcp/blob/master/config/crds/gceproviderconfig_v1alpha1_gcemachineproviderspec.yaml#L19-L36)?

@bsingarayan
Copy link
Author

Thanks much Robert for taking a look.
Am attaching the machines.yaml and cluster.yaml file. Please take a look.
machines.yaml.txt
cluster.yaml.txt

Let me know what is to be done.

@roberthbailey
Copy link
Contributor

Those files look the same as mine (diffs only show different project names). I also noticed that you are using 1.12.4 instead of 1.12.0 as the k8s version in minikube but that doesn't seem to change any output that I'm seeing.

What version of kubectl is on your path? I think that there was an issue with kubectl at some point and I stopped upgrading. It looks like I'm still using 1.8.11:

kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.11", GitCommit:"1df6a8381669a6c753f79cb31ca2e3d57ee7c8a3", GitTreeState:"clean", BuildDate:"2018-04-05T17:24:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

@roberthbailey
Copy link
Contributor

The issue that caused me to pin kubectl was kubernetes-sigs/cluster-api#137. It looks like that may have been fixed in a later release, but I haven't gone back and tried newer ones.

@bsingarayan
Copy link
Author

Those files look the same as mine (diffs only show different project names). I also noticed that you are using 1.12.4 instead of 1.12.0 as the k8s version in minikube but that doesn't seem to change any output that I'm seeing.

What version of kubectl is on your path? I think that there was an issue with kubectl at some point and I stopped upgrading. It looks like I'm still using 1.8.11:

kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.11", GitCommit:"1df6a8381669a6c753f79cb31ca2e3d57ee7c8a3", GitTreeState:"clean", BuildDate:"2018-04-05T17:24:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T06:59:37Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

@bsingarayan
Copy link
Author

Changing it the kubectl client version to v1.8.11 and modifying the code to pass validate==false still doesn't work.
'[apply --kubeconfig /var/folders/0l/mxwq2jtd3w7fkg2mxh5sbl9c0000gn/T/977848574 --validate=false]'

I get the same output
W0103 14:06:18.111008 19461 clusterclient.go:517] BABU - Kubectl args is '[apply --kubeconfig /var/folders/0l/mxwq2jtd3w7fkg2mxh5sbl9c0000gn/T/977848574 --validate=false]'
I0103 14:06:26.865768 19461 clusterdeployer.go:100] Provisioning target cluster via bootstrap cluster
I0103 14:06:26.888866 19461 applycluster.go:37] Creating cluster object test1-jtpdx in namespace "default"
I0103 14:06:26.901868 19461 clusterdeployer.go:109] Creating master in namespace "default"
I0103 14:06:26.911950 19461 applymachines.go:37] Creating machines in namespace "default"
I0103 14:36:26.983638 19461 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
F0103 14:36:29.173047 19461 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

@justinsb
Copy link
Contributor

justinsb commented Jan 3, 2019

So "timed out waiting for the condition" looks like it means that the machine didn't go ready in time. You might want to try passing --v=2 or --v=4 though I don't think you'll get much more information.

I'm not sure if "Cleaning up bootstrap cluster" means the objects were deleted, but you could check the status of the machine in the default namespace during the 30 minute timeout... Maybe there are some clues there. You could also SSH to the machine using gcloud compute ssh <name> and look at the logs to see if kubeadm init was successful... I personally page through journalctl.

@bsingarayan
Copy link
Author

bsingarayan commented Jan 4, 2019

I did v=10 earlier and below is the output

I0103 17:00:57.298647 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:07.299615 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:17.299163 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:27.296305 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:37.299580 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:47.300156 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:47.303803 22785 clusterclient.go:577] Waiting for Machine gce-master-njnlj to become ready...
I0103 17:01:47.306923 22785 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0103 17:01:47.306948 22785 minikube.go:58] Running: minikube [delete]
I0103 17:01:48.679814 22785 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...
Machine deleted.
F0103 17:01:48.680386 22785 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

I tried ssh but the machine doesn't seem to exist.

bsingarayan@bsingarayan-mbp ~/g/s/s/c/p/c/c/clientset> gcloud compute ssh gce-master-njnlj
Unable to find an instance with name [gce-master-njnlj].
For the following instance:

  • [gce-master-njnlj]
    choose a zone:
    [1] asia-east1-a

[49] us-east4-c
[50] us-west1-a
Did not print [5] options.
Too many options [55]. Enter "list" at prompt to print choices fully.
Please enter your numeric choice: 43

ERROR: (gcloud.compute.ssh) Could not fetch resource:

  • The resource 'projects/e2cluster/zones/us-central1-f/instances/gce-master-njnlj' was not found

Alternatively, I did log into the console gcloud account and I don't see any machines in the compute instance section.

@roberthbailey
Copy link
Contributor

@bsingarayan - either changing the kubectl version or modifying the code to disable validation fixed your initial error and we are now seeing the same issue.

@roberthbailey
Copy link
Contributor

Found the first issue - I'd pushed a new version of the gcp-provider-controller-manager image and forgot to set acls to publicly readable. You should now see the master machine get created on GCP running clusterctl using the latest image.

@roberthbailey
Copy link
Contributor

On the GCE VM I see both cluster-api-controller-manager-0 and gcp-provider-controller-manager-0 crash looping because they are unable to dial the API server via the service IP:

$ kubectl --kubeconfig=kubeconfig logs cluster-api-controller-manager-0 -n cluster-api-system -p
2019/01/04 08:43:52 Get https://10.96.0.1:443/api?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout
$ kubectl --kubeconfig=kubeconfig get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   8m

kube-proxy is running on the master but what is interesting is that while I can successfully curl the service IP from on the machine, I cannot reach it from within a busybox pod that I run on the master. And I noticed that coredns on the master is also crash looping (just a bit slower than the cluster api provider pods):

$ kubectl --kubeconfig=kubeconfig logs -n kube-system coredns-576cbf47c7-wxkcg
E0104 08:56:11.889192       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:11.889397       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:11.889831       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:42.889800       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:42.890982       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 08:56:42.892157       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

@roberthbailey
Copy link
Contributor

From my busybox pod I see a timeout trying to reach both the cluster IP and also the external IP for the kubernetes service:

# wget https://10.96.0.1:443
Connecting to 10.96.0.1:443 (10.96.0.1:443)
wget: can't connect to remote host (10.96.0.1): Connection timed out
# wget 35.232.180.114:443
Connecting to 35.232.180.114:443 (35.232.180.114:443)
wget: can't connect to remote host (35.232.180.114): Connection timed out

@roberthbailey
Copy link
Contributor

Since the problem seems related to be the lack of pod <-> pod network connectivity, I tried installing calico into my cluster, but that doesn't seem to have fixed anything.

@bsingarayan
Copy link
Author

Your system looks much better. I updated my sanbox with your commits, and I am running into below error. The machines are not getting created.

./bin/clusterctl create cluster --provider google -c cmd/clusterctl/examples/google/out/cluster.yaml -m cmd/clusterctl/examples/google/out/machines.yaml -p cmd/clusterctl/examples/google/out/provider-components.yaml -a cmd/clusterctl/examples/google/out/addons.yaml --minikube="kubernetes-version=v1.12.4" --v=4
I0104 12:48:30.209905 30277 machineactuator.go:813] Using the default GCP client
I0104 12:48:30.212017 30277 plugins.go:39] Registered cluster provisioner "google"
I0104 12:48:30.216788 30277 createbootstrapcluster.go:28] Creating bootstrap cluster
I0104 12:48:30.216845 30277 minikube.go:58] Running: minikube [start --bootstrapper=kubeadm --kubernetes-version=v1.12.4]
I0104 12:50:50.260581 30277 minikube.go:62] Ran: minikube [start --bootstrapper=kubeadm --kubernetes-version=v1.12.4] Output: Starting local Kubernetes v1.12.4 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Stopping extra container runtimes...
Starting cluster components...
Verifying kubelet health ...
Verifying apiserver health ...Kubectl is now configured to use the cluster.
Loading cached images from config file.

Everything looks great. Please enjoy minikube!
I0104 12:50:50.263619 30277 clusterdeployer.go:95] Applying Cluster API stack to bootstrap cluster
I0104 12:50:50.263637 30277 applyclusterapicomponents.go:27] Applying Cluster API Provider Components
I0104 12:50:50.263648 30277 clusterclient.go:521] Waiting for kubectl apply...
I0104 12:50:59.034678 30277 clusterclient.go:550] Waiting for Cluster v1alpha resources to become available...
I0104 12:50:59.047032 30277 clusterclient.go:563] Waiting for Cluster v1alpha resources to be listable...
I0104 12:50:59.079902 30277 clusterdeployer.go:100] Provisioning target cluster via bootstrap cluster
I0104 12:50:59.094264 30277 applycluster.go:37] Creating cluster object test1-jtpdx in namespace "default"
I0104 12:50:59.103577 30277 clusterdeployer.go:109] Creating master in namespace "default"
I0104 12:50:59.119836 30277 applymachines.go:37] Creating machines in namespace "default"
I0104 12:50:59.140076 30277 clusterclient.go:574] Waiting for Machine gce-master-rfp84 to become ready...
I0104 12:51:09.144008 30277 clusterclient.go:574] Waiting for Machine gce-master-rfp84 to become ready...
I0104 12:51:19.148023 30277 clusterclient.go:574] Waiting for Machine gce-master-rfp84 to become ready...
...
...

I0104 11:54:29.398109 28375 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0104 11:54:29.398138 28375 minikube.go:58] Running: minikube [delete]
I0104 11:54:30.784157 28375 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...
Machine deleted.
F0104 11:54:30.785301 28375 create_cluster.go:64] unable to create master machine: timed out waiting for the condition

@babu-selector
Copy link

I get the below error when using the same clusterctl create command

I0104 15:26:40.158070 70886 clusterclient.go:574] Waiting for Machine gce-master-nplkn to become ready...
I0104 15:26:50.157910 70886 clusterclient.go:574] Waiting for Machine gce-master-nplkn to become ready...
I0104 15:27:00.159249 70886 clusterclient.go:574] Waiting for Machine gce-master-nplkn to become ready...
I0104 15:27:00.163162 70886 clusterdeployer.go:114] Updating bootstrap cluster object for cluster test1-zc1zl in namespace "default" with master () endpoint
I0104 15:27:00.828483 70886 clusterdeployer.go:119] Creating target cluster
I0104 15:27:00.833525 70886 clusterdeployer.go:206] Getting target cluster kubeconfig.
I0104 15:27:00.833542 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:03.797805 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 255
I0104 15:27:13.801542 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:16.631529 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:23.802087 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:26.283914 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:33.801773 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:36.210662 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:43.802653 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:46.220228 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:27:53.798289 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:27:56.150564 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:03.802817 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:28:06.028489 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:13.801325 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:28:16.246216 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:23.806084 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
E0104 15:28:26.095382 70886 util.go:150] error executing command "gcloud compute ssh --project grand-brand-227020 --zone us-central1-f gce-master-nplkn --command sudo cat /etc/kubernetes/admin.conf -- -q": exit status 1
I0104 15:28:33.801219 70886 clusterdeployer.go:283] Waiting for kubeconfig on gce-master-nplkn to become ready...
I0104 15:28:36.287582 70886 applyaddons.go:25] Applying Addons
I0104 15:28:36.287604 70886 clusterclient.go:521] Waiting for kubectl apply...
I0104 15:28:37.054296 70886 clusterclient.go:526] Waiting for kubectl apply... server not yet available: couldn't kubectl apply: exit status 1, output: unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused
unable to recognize "STDIN": Get https://35.193.173.20:443/api?timeout=32s: dial tcp 35.193.173.20:443: connect: connection refused

...
...
...

@roberthbailey roberthbailey added this to the v1alpha1 milestone Jan 11, 2019
@roberthbailey roberthbailey added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jan 11, 2019
@girikuncoro
Copy link

I'm hitting this exact issue, where cluster-api-controller-manager-0 and gcp-provider-controller-manager-0 are crashlooping since they are not able to reach kube API server through service IP. I think this is mostly related to pod network issue, I tried installing flannel with network 192.168.0.0/16 from kubeadm doc, and it works out. Need further investigation to fix this issue.

...
I0113 23:04:09.489242   25220 clusterdeployer.go:160] Done provisioning cluster. You can now access your cluster with kubectl --kubeconfig kubeconfig
I0113 23:04:09.495618   25220 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0113 23:04:09.495644   25220 minikube.go:58] Running: minikube [delete]
I0113 23:04:09.955152   25220 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...
$ kubectl get po --all-namespaces
NAMESPACE             NAME                                       READY   STATUS    RESTARTS   AGE
cluster-api-system    cluster-api-controller-manager-0           1/1     Running   5          36m
gcp-provider-system   gcp-provider-controller-manager-0          1/1     Running   5          36m
kube-system           coredns-576cbf47c7-4khfr                   1/1     Running   2          36m
kube-system           coredns-576cbf47c7-dhp9x                   1/1     Running   2          36m
...

@babu-selector
Copy link

I'm hitting this exact issue, where cluster-api-controller-manager-0 and gcp-provider-controller-manager-0 are crashlooping since they are not able to reach kube API server through service IP. I think this is mostly related to pod network issue, I tried installing flannel with network 192.168.0.0/16 from kubeadm doc, and it works out. Need further investigation to fix this issue.

...
I0113 23:04:09.489242   25220 clusterdeployer.go:160] Done provisioning cluster. You can now access your cluster with kubectl --kubeconfig kubeconfig
I0113 23:04:09.495618   25220 createbootstrapcluster.go:37] Cleaning up bootstrap cluster.
I0113 23:04:09.495644   25220 minikube.go:58] Running: minikube [delete]
I0113 23:04:09.955152   25220 minikube.go:62] Ran: minikube [delete] Output: Deleting local Kubernetes cluster...
$ kubectl get po --all-namespaces
NAMESPACE             NAME                                       READY   STATUS    RESTARTS   AGE
cluster-api-system    cluster-api-controller-manager-0           1/1     Running   5          36m
gcp-provider-system   gcp-provider-controller-manager-0          1/1     Running   5          36m
kube-system           coredns-576cbf47c7-4khfr                   1/1     Running   2          36m
kube-system           coredns-576cbf47c7-dhp9x                   1/1     Running   2          36m
...

Could you please exact steps to fix this issue as a workaround? It would unblock me and get it going.

Thanks

@girikuncoro
Copy link

@babu-selector this is hack that I did:

  1. create cluster using clusterctl as usual
$ ./bin/clusterctl create cluster --provider google -c cmd/clusterctl/examples/google/out/cluster.yaml -m cmd/clusterctl/examples/google/out/machines.yaml -p cmd/clusterctl/examples/google/out/provider-components.yaml -a cmd/clusterctl/examples/google/out/addons.yaml --minikube="kubernetes-version=v1.12.4"
  1. wait until it reaches creating nodes
...
I0115 05:54:45.615643   43663 applymachines.go:37] Creating machines in namespace "default"
  1. at this stage, clusterctl will wait forever (eventually timingout) if you don't do something, since it's waiting for components in control plane to be ready. control plane node should be up in GCP, so ssh into it. Get gcloud command from your google console.
$ gcloud compute --project ${PROJECT_NAME} ssh --zone ${ZONE_NAME} ${VM_NAME}
  1. from inside control plane VM, you can see cluster API objects are crashlooping
$ kubectl --kubeconfig /etc/kubernetes/admin.conf get po --all-namespaces
  1. get flannel yaml from kubeadm doc, replace 10.244.0.0/16 to 192.168.0.0/16, then apply
$ kubectl apply -f flannel.yaml
  1. wait for few seconds until flannel CNI is ready, the cluster API objects should be ready soon. check with kubectl logs as well. then node machine will get spawned up soon after this, clusterctl will complete.

hope it's useful for you

@bsingarayan
Copy link
Author

bsingarayan commented Jan 20, 2019

Thanks alot @girikuncoro
Before I could get there this time the cluster creation fails at below location.

minikube logs shows the cluster creation is not progressing

...
...
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810365    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810446    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810473    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:46 minikube kubelet[3169]: W0122 21:35:46.810491    3169 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/system.slice/run-rbbe3ccd2b5b64e3da2f9fc7cb63704c7.scope: no such file or directory
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282516    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "sshkeys" (UniqueName: "kubernetes.io/secret/aeb18636-1e8d-11e9-97c3-080027234dd5-sshkeys") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282571    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "credentials" (UniqueName: "kubernetes.io/secret/aeb18636-1e8d-11e9-97c3-080027234dd5-credentials") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282593    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "certs" (UniqueName: "kubernetes.io/host-path/aeb18636-1e8d-11e9-97c3-080027234dd5-certs") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282691    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-pzxgz" (UniqueName: "kubernetes.io/secret/aeb18636-1e8d-11e9-97c3-080027234dd5-default-token-pzxgz") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282729    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "config" (UniqueName: "kubernetes.io/host-path/aeb18636-1e8d-11e9-97c3-080027234dd5-config") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.282753    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "machine-setup" (UniqueName: "kubernetes.io/configmap/aeb18636-1e8d-11e9-97c3-080027234dd5-machine-setup") pod "gcp-provider-controller-manager-0" (UID: "aeb18636-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:48 minikube kubelet[3169]: I0122 21:35:48.684425    3169 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-rhdll" (UniqueName: "kubernetes.io/secret/aed28f73-1e8d-11e9-97c3-080027234dd5-default-token-rhdll") pod "cluster-api-controller-manager-0" (UID: "aed28f73-1e8d-11e9-97c3-080027234dd5")
Jan 22 21:35:49 minikube kubelet[3169]: W0122 21:35:49.208750    3169 container.go:393] Failed to create summary reader for "/system.slice/run-r2408930bd9cf43b48913f3430e92f6dd.scope": none of the resources are being tracked.
Jan 22 21:35:49 minikube kubelet[3169]: W0122 21:35:49.211586    3169 pod_container_deletor.go:75] Container "10e7bdb53029f365ccec3dbb06ce7617cdadcc6ff148a7863247f6d7f9deb944" not found in pod's containers

Another thing I noticed is gcp pod wasn't getting created in minikube.

bsingarayan@bsingarayan-mbp ~/g/s/s/cluster-api-provider-gcp> kubectl get pods -o wide --all-namespaces
NAMESPACE            NAME                                    READY     STATUS    RESTARTS   AGE       IP           NODE
cluster-api-system   cluster-api-controller-manager-0        1/1       Running   0          15m       172.17.0.5   minikube
kube-system          coredns-576cbf47c7-xpjhb                1/1       Running   0          15m       172.17.0.2   minikube
kube-system          coredns-576cbf47c7-zrd68                1/1       Running   0          15m       172.17.0.3   minikube
kube-system          etcd-minikube                           1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kube-addon-manager-minikube             1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kube-apiserver-minikube                 1/1       Running   0          15m       10.0.2.15    minikube
kube-system          kube-controller-manager-minikube        1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kube-proxy-l284r                        1/1       Running   0          15m       10.0.2.15    minikube
kube-system          kube-scheduler-minikube                 1/1       Running   0          14m       10.0.2.15    minikube
kube-system          kubernetes-dashboard-5bff5f8fb8-cpkvc   1/1       Running   0          15m       172.17.0.4   minikube
kube-system          storage-provisioner                     1/1       Running   0          15m       10.0.2.15    minikube

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 28, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

7 participants