Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weave-net CrashLoopBackOff for the second node #66

Closed
mikedanese opened this issue Nov 22, 2016 · 42 comments
Closed

weave-net CrashLoopBackOff for the second node #66

mikedanese opened this issue Nov 22, 2016 · 42 comments
Assignees
Labels
kind/support Categorizes issue or PR as a support question. priority/backlog Higher priority than priority/awaiting-more-evidence. state/needs-more-information

Comments

@mikedanese
Copy link
Member

From @avkonst on October 5, 2016 14:12

Is this a request for help?

I think it is an issue either with software or documentation, but I am not quite sure. I have started with a question on stackoverflow: http://stackoverflow.com/questions/39872332/how-to-fix-weave-net-crashloopbackoff-for-the-second-node

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

I think it is a bug or request to improve documentation

Kubernetes version (use kubectl version):
1.4.0

Environment:

  • Cloud provider or hardware configuration: Vagrant
  • OS (e.g. from /etc/os-release): Ubuntu 16.04
  • Kernel (e.g. uname -a):
  • Install tools: kubeadm init/join
  • Others:

What happened:

I have got 2 VMs nodes. Both see each other either by hostname (through /etc/hosts) or by ip address. One has been provisioned with kubeadm as a master. Another as a worker node. Following the instructions (http://kubernetes.io/docs/getting-started-guides/kubeadm/) I have added weave-net. The list of pods looks like the following:

vagrant@vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS             RESTARTS   AGE
kube-system   etcd-vm-master                          1/1       Running            0          3m
kube-system   kube-apiserver-vm-master                1/1       Running            0          5m
kube-system   kube-controller-manager-vm-master       1/1       Running            0          4m
kube-system   kube-discovery-982812725-x2j8y          1/1       Running            0          4m
kube-system   kube-dns-2247936740-5pu0l               3/3       Running            0          4m
kube-system   kube-proxy-amd64-ail86                  1/1       Running            0          4m
kube-system   kube-proxy-amd64-oxxnc                  1/1       Running            0          2m
kube-system   kube-scheduler-vm-master                1/1       Running            0          4m
kube-system   kubernetes-dashboard-1655269645-0swts   1/1       Running            0          4m
kube-system   weave-net-7euqt                         2/2       Running            0          4m
kube-system   weave-net-baao6                         1/2       CrashLoopBackOff   2          2m

CrashLoopBackOff appears for each worker node connected. I have spent several ours playing with network interfaces, but it seems the network is fine. I have found similar question on stackoverflow, where the answer advised to look into the logs and no follow up. So, here are the logs:

vagrant@vm-master:~$ kubectl logs weave-net-baao6 -c weave --namespace=kube-system
2016-10-05 10:48:01.350290 I | error contacting APIServer: Get https://100.64.0.1:443/api/v1/nodes: dial tcp 100.64.0.1:443: getsockopt: connection refused; trying with blank env vars
2016-10-05 10:48:01.351122 I | error contacting APIServer: Get http://localhost:8080/api: dial tcp [::1]:8080: getsockopt: connection refused
Failed to get peers

What you expected to happen:

I would expect the weave-net to be in Running state

How to reproduce it (as minimally and precisely as possible):

I have not done anything special, just followed the documentation on Getting Started. If it is essencial, I can share Vagrant project, which I used to provision everything. Please, let me know if you need one.

Copied from original issue: kubernetes/kubernetes#34101

@mikedanese
Copy link
Member Author

From @avkonst on October 5, 2016 14:14

Please, let me know if it needs to be reported on weave-net project.

@mikedanese
Copy link
Member Author

From @Bach1 on October 5, 2016 16:12

I reported a similar issue here:
http://stackoverflow.com/questions/39869583/how-to-get-kube-dns-working-in-vagrant-cluster-using-kubeadm-and-weave

@mikedanese
Copy link
Member Author

From @avkonst on October 5, 2016 19:44

I wonder if there is any working step by step instruction on how to get kubernetes cluster up and running on a set of virtual machines? I am happy to downgrade the version of kubernetes if it is an option. I am totally confused with all of these options: kube-deploy, kube-up, and other, I found from 3rd parties..

@mikedanese
Copy link
Member Author

From @andreagardiman on October 6, 2016 13:24

I have the same issue. I'm using VirtualBox to run 2 VM based on minimal Centos 7 image. All VMs are attached to 2 interfaces, a NAT and an host-only network.
The two VMs are able to connect to each other using the host-only network interfaces.

I tried also with the instructions about Calico and Canal, and I cannot make them work either.

@mikedanese
Copy link
Member Author

From @avkonst on October 6, 2016 21:45

It seems Calico has got similar issue:

vagrant@vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS             RESTARTS   AGE
kube-system   calico-etcd-iemkd                       1/1       Running            0          57m
kube-system   calico-node-178gc                       1/2       CrashLoopBackOff   8          56m
kube-system   calico-node-zsym2                       2/2       Running            0          57m
kube-system   calico-policy-controller-6gh7b          1/1       Running            0          57m
kube-system   etcd-vm-master                          1/1       Running            0          56m
kube-system   kube-apiserver-vm-master                1/1       Running            0          58m
kube-system   kube-controller-manager-vm-master       1/1       Running            0          57m
kube-system   kube-discovery-982812725-7jsmb          1/1       Running            0          57m
kube-system   kube-dns-2247936740-xiee1               3/3       Running            0          57m
kube-system   kube-proxy-amd64-iywb7                  1/1       Running            0          57m
kube-system   kube-proxy-amd64-ok9bx                  1/1       Running            0          56m
kube-system   kube-scheduler-vm-master                1/1       Running            0          56m
kube-system   kubernetes-dashboard-1655269645-g4cyd   1/1       Running            0          57m
  info: 1 completed object(s) was(were) not shown in pods list. Pass --show-all to see all objects.

vagrant@vm-master:~$ kubectl logs calico-node-178gc -c calico-node --namespace=kube-system
Waiting for etcd connection...
No IP provided. Using detected IP: 10.0.10.11
Traceback (most recent call last):
  File "startup.py", line 336, in <module>
    main()
  File "startup.py", line 287, in main
    warn_if_hostname_conflict(ip)
  File "startup.py", line 210, in warn_if_hostname_conflict
    current_ipv4, _ = client.get_host_bgp_ips(hostname)
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 134, in wrapped
    "running?" % (fn.__name__, e.message))
pycalico.datastore_errors.DataStoreError: get_host_bgp_ips: Error accessing etcd (Connection to etcd failed due to MaxRetryError("HTTPConnectionPool(host='100.78.232.136', port=6666): Max retries exceeded with url: /v2/keys/calico/bgp/v1/host/vm-worker/ip_addr_v4 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f50416bfc50>, 'Connection to 100.78.232.136 timed out. (connect timeout=60)'))",)).  Is etcd running?
Calico node failed to start

What could I try to progress this issue further?

@mikedanese
Copy link
Member Author

From @tedstirm on October 12, 2016 0:20

Okay I ran into this exact same issue and here is how I fixed it.

This problem seems to be due to kube-proxy looking at the wrong network interface. If you look at the kube-proxy logs on a worker node you will most likely see something like:

-A KUBE-SEP-4C6YEJQ2VXV53FEZ -m comment --comment default/kubernetes:https -s 10.0.2.15/32 -j KUBE-MARK-MASQ

This is the wrong network interface. The kube-proxy should be looking at the master node's IP address not the NAT IP address.

As far as I know the kube-proxy gets this value from the Kube API Server when starting up. If you look at the Kube API Server's documentation it states that if --advertise-address flag isn't set it will default to --bind-address and if --bind-address isn't set it will default to host's default interface. Which in my case and yours seems to be the NAT interface, which isn't what we want. So what I did was set the Kube API Server's --advertise-address flag and everything started working. So right after Step 2 and before Step 3 of

Installing Kubernetes on Linux with kubeadm

You will need to update your /etc/kubernetes/manifests/kube-apiserver.json and add the --advertise-address flag to point to your master node's IP address.

So for example: My master node's IP address is 172.28.128.2, which means right after Step 2 I do:

cat <<EOF > /etc/kubernetes/manifests/kube-apiserver.json
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-apiserver",
    "namespace": "kube-system",
    "creationTimestamp": null,
    "labels": {
      "component": "kube-apiserver",
      "tier": "control-plane"
    }
  },
  "spec": {
    "volumes": [
      {
        "name": "certs",
        "hostPath": {
          "path": "/etc/ssl/certs"
        }
      },
      {
        "name": "pki",
        "hostPath": {
          "path": "/etc/kubernetes"
        }
      }
    ],
    "containers": [
      {
        "name": "kube-apiserver",
        "image": "gcr.io/google_containers/kube-apiserver-amd64:v1.4.0",
        "command": [
          "/usr/local/bin/kube-apiserver",
          "--v=4",
          "--insecure-bind-address=127.0.0.1",
          "--etcd-servers=http://127.0.0.1:2379",
          "--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota",
          "--service-cluster-ip-range=100.64.0.0/12",
          "--service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem",
          "--client-ca-file=/etc/kubernetes/pki/ca.pem",
          "--tls-cert-file=/etc/kubernetes/pki/apiserver.pem",
          "--tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem",
          "--token-auth-file=/etc/kubernetes/pki/tokens.csv",
          "--secure-port=443",
          "--allow-privileged",
          "--advertise-address=172.28.128.2",
          "--etcd-servers=http://127.0.0.1:2379"
        ],
        "resources": {
          "requests": {
            "cpu": "250m"
          }
        },
        "volumeMounts": [
          {
            "name": "certs",
            "mountPath": "/etc/ssl/certs"
          },
          {
            "name": "pki",
            "readOnly": true,
            "mountPath": "/etc/kubernetes/"
          }
        ],
        "livenessProbe": {
          "httpGet": {
            "path": "/healthz",
            "port": 8080,
            "host": "127.0.0.1"
          },
          "initialDelaySeconds": 15,
          "timeoutSeconds": 15
        }
      }
    ],
    "hostNetwork": true
  },
  "status": {}
}
EOF

I am not to sure if this is a valid long term solution, because if the default kube-apiserver.json changes then those changes wouldn't get reflected by doing what I am doing. Ideally, I think the user would want some way to set these flags via kubeadm or maybe the user should be responsible for parsing the JSON themselves. Thoughts?

However, it still may be a good idea to update Step 2 of Installing Kubernetes on Linux with kubeadm to atleast mention to the users that they can update the kube component flags by modifying the their json found at: /etc/kubernetes/manifests/.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 9:56

I am looking at this right now. We have a way of reproducing it with https://github.com/davidkbainbridge/k8s-playground, where we have already applied a fix that makes kubelet pickup desride IP for pods in host network namespaces, however somehow kubernetes service IP still points to 10.0.2.15.

@mikedanese
Copy link
Member Author

From @chenzhiwei on October 12, 2016 10:11

I encountered this issue too.

Adding --advertise-address when starting kube-apiserver solved this issue.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 10:32

@chenzhiwei I've already started diving into the code, and you saved me time to realise how this part worked exactly, thank you so much! This looks like an easy fix for kubeadm, PR is coming shortly.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 11:14

Ok, so it turns out that this flag is not enough, we still have an issue reaching kubernetes service IP. The simplest solution to this is to run kube-proxy with --proxy-mode=userspace. To enable this, you can use kubectl -n kube-system edit ds kube-proxy-amd64 && kubectl -n kube-system delete pods -l name=kube-proxy-amd64.

@mikedanese
Copy link
Member Author

From @avkonst on October 12, 2016 12:37

I am starting kubeadm with --advertise-address option:
kubeadm init --api-advertise-addresses=$master_address
where my $master_address is not NAT interface. It still not enough to remove this issue.

@mikedanese
Copy link
Member Author

From @avkonst on October 12, 2016 12:39

"Ok, so it turns out that this flag is not enough, we still have an issue reaching kubernetes service IP. The simplest solution to this is to run kube-proxy with --proxy-mode=userspace. To enable this, you can use kubectl -n kube-system edit ds kube-proxy-amd64 && kubectl -n kube-system delete pods -l name=kube-proxy-amd64."

@errordeveloper Could you, please, explain what flag are you setting, to what command in the getting started tutorial?
What does your last command do?

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 12:56

@avkonst see kubernetes/kubernetes#34607.

Also, you can do this for now:

jq \
   '.spec.containers[0].command |= .+ ["--advertise-address=172.42.42.1"]' \
   /etc/kubernetes/manifests/kube-apiserver.json > /tmp/kube-apiserver.json
mv /tmp/kube-apiserver.json /etc/kubernetes/manifests/kube-apiserver.json

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 12:57

Could you, please, explain what flag are you setting, to what command in the getting started tutorial?

We are going to fix this shortly. If you can hop on Slack, I'd be happy to help.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 14:49

What does your last command do?

As this thread is getting quite noisy, here is a recap.

First, find out what IP address you want to use on the master, it's probably the one on the second network interface. For this example I'll use IP="172.42.42.1".

Next, run kubeadm init --api-advertise-addresses=$IP.

Now, you want to append --advertise-address to kube-apiserver in the static pod manifest, you can do it like this:

jq \
   '.spec.containers[0].command |= .+ ["--advertise-address=$IP"]' \
   /etc/kubernetes/manifests/kube-apiserver.json > /tmp/kube-apiserver.json
mv /tmp/kube-apiserver.json /etc/kubernetes/manifests/kube-apiserver.json

And finally, you need to update flags in kube-proxy daemonset and append --proxy-mode=userspace, which can be done like this:

kubectl -n kube-system get ds -l 'component=kube-proxy-amd64' -o json \
  | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--proxy-mode=userspace"]' \
  |   kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy-amd64'

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 14:51

I'm working on docs for this. Also, @davidkbainbridge has provided a Vagrant+Ansible implementation, and latest fixes are in my fork.

@mikedanese
Copy link
Member Author

From @miry on October 12, 2016 19:47

@tedstirm @errordeveloper can you help me with similar problem, I have multiple eth*. And I am using 10.136.0.0/16 for private network, but eth1. For now I fixed issue with weave via open port 443 for eth0 and eth1. But found next issue:

I could not access to the instances from the cluster(example to redis that hosts on separate instance). I found that weave uses 10.0.0.0/8 network. And I am looking for solution to change this. Do you know how can I change it? Example to 101.2.0.0/16?

@mikedanese
Copy link
Member Author

From @tedstirm on October 12, 2016 19:54

@errordeveloper I didn't have to update the --proxy-mode for my kube-proxy. All I had to do was make sure the --advertise-address flag for the Kube API Server was set to a value listed in the --api-advertise-addresses flag. I'm just curious as to why you needed to set --proxy-mode in your setup and I don't? Any ideas? I looked over the fork and your Vagrant setup looks very similar to mine, except you are using Ubuntu vs my Centos.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 12, 2016 20:14

@miry

I found that weave uses 10.0.0.0/8 network.

No, it uses 10.32.0.0/12.

So for Weave we have 10.32.0.0/12, which expands to 10.32.0.1 - 10.47.255.254.
Kubernetes service range is 10.12.0.0/12, which expands to 10.0.0.1 - 10.15.255.254.
And you have 10.136.0.0/16, which expands to 10.136.0.1 - 10.136.255.254.
I don't see an overlap here, which is what you seem to be suggesting, but please correct me if I missunderstood. I think your problem is different, and please file another issue. Your previous issue #34031 didn't not describe the problem, please try to describe the problem.

@tedstirm

...your Vagrant setup looks very similar to mine, except you are using Ubuntu vs my Centos.

How many VMs are you running on? If it's a single-node setup, you won't notice anything. If it's not, I'm quite curious and would like to take a closer look, could you share your Vagrantfile?

@mikedanese
Copy link
Member Author

From @geotro on October 13, 2016 0:6

@errordeveloper, after applying your changes I was able to get kube-dns and weave-net working!

However, kube-scheduler-kubernetes won't run:

# kubectl logs kube-scheduler-kubernetes --namespace=kube-system
unknown flag: --advertise-address
Usage of /usr/local/bin/kube-scheduler:
      --address string                           The IP address to serve on (set to 0.0.0.0 for all interfaces) (default "0.0.0.0")
      --algorithm-provider string                The scheduling algorithm provider to use, one of: DefaultProvider | ClusterAutoscalerProvider (default "DefaultProvider")
      --alsologtostderr value                    log to standard error as well as files
      --failure-domains string                   Indicate the "all topologies" set for an empty topologyKey when it's used for PreferredDuringScheduling pod anti-affinity. (default "kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region")
      --feature-gates value                      A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
...

@mikedanese
Copy link
Member Author

From @errordeveloper on October 13, 2016 0:42

@geotro sorry, that was a copy/paste bug, I've updated the comments now, it's should had been kube-apiserver.json not kube-scheduler.json.

@mikedanese
Copy link
Member Author

From @geotro on October 13, 2016 0:50

Yeah I figured that out shortly after my comment. I have everything working as expected now. Thank you to everyone who participated in fixing this! :)

@mikedanese
Copy link
Member Author

From @errordeveloper on October 13, 2016 0:51

@geotro you are welcome! We should eliminate the need for these work-arounds soon, at least as far as week can get without total hacks.

@mikedanese
Copy link
Member Author

From @Bach1 on October 13, 2016 9:18

Thank you for the feedback. I'm getting closer to a working solution. With my basic Vagrant 2-node cluster (Centos) I now have weave up and running. The kube-dns service is however still not working.
Steps taken:

  1. kubeadm init on master with api-advertise-addresses flag set.
  2. Update flags in pod manifest and daemonsets per guidelines above.
  3. Join node.
  4. Install weave on master.

kube-dns is still stuck in ContainerCreating state:

NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE
kube-system   etcd-master                      1/1       Running             0          25m
kube-system   kube-apiserver-master            1/1       Running             1          17m
kube-system   kube-controller-manager-master   1/1       Running             0          20m
kube-system   kube-discovery-982812725-zlcw3   1/1       Running             0          20m
kube-system   kube-dns-2247936740-thz2p        0/3       ContainerCreating   0          19m
kube-system   kube-proxy-amd64-79prj           1/1       Running             0          17m
kube-system   kube-proxy-amd64-qroee           1/1       Running             0          16m
kube-system   kube-scheduler-master            1/1       Running             0          25m
kube-system   weave-net-35udy                  2/2       Running             0          15m
kube-system   weave-net-hs2xa                  2/2       Running             0          15m

I experimented with both bridged network settings and private network settings but the problem still persists.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 13, 2016 9:21

@Bach1 could you provide the output of kubectl -n kube-system describe pod -l name=kube-dns?

@mikedanese
Copy link
Member Author

From @miry on October 13, 2016 9:54

@errordeveloper

Kubernetes service range is 10.12.0.0/12, which expands to 10.0.0.1 - 10.15.255.254.

I thought it should be 100.64.0.0/12.

After update api server to use specific ip address the wave could not connect to 100.64.0.1:443.

Digital Ocean has also Anchor Ip to eth0 with mask: 10.10.0.0/16:

[master] # ip route | head 5
default via 159.203.176.1 dev eth0
10.10.0.0/16 dev eth0  proto kernel  scope link  src 10.10.0.12 
10.32.0.0/12 dev weave  proto kernel  scope link  src 10.32.0.1 
10.136.0.0/16 dev eth1  proto kernel  scope link  src 10.136.26.23 
159.203.176.0/20 dev eth0  proto kernel  scope link  src 159.203.188.178
[node] # ip route
default via 198.199.82.1 dev eth0 
10.10.0.0/16 dev eth0  proto kernel  scope link  src 10.10.0.17 
10.32.0.0/12 dev weave  proto kernel  scope link  src 10.36.0.0 
10.136.0.0/16 dev eth1  proto kernel  scope link  src 10.136.26.91 
169.254.0.0/16 dev eth0  scope link  metric 1002 
169.254.0.0/16 dev eth1  scope link  metric 1003 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 
198.199.82.0/24 dev eth0  proto kernel  scope link  src 198.199.82.168 

Updated:

So I have theory that 100.64.0.1 somehow pointed to 159.203.188.178 even when I specify api server address to 10.136.26.23

@mikedanese
Copy link
Member Author

From @Bach1 on October 13, 2016 10:3

@errordeveloper sure, see below. I assume the last cni-related errors propagate before weave is installed. The local NAT-interface address is still visible in Node: master/10.0.2.15. Is this an issue?

Name:		kube-dns-2247936740-yi89m
Namespace:	kube-system
Node:		master/10.0.2.15
Start Time:	Thu, 13 Oct 2016 09:46:44 +0000
Labels:		component=kube-dns
		k8s-app=kube-dns
		kubernetes.io/cluster-service=true
		name=kube-dns
		pod-template-hash=2247936740
		tier=node
Status:		Pending
IP:		
Controllers:	ReplicaSet/kube-dns-2247936740
Containers:
  kube-dns:
    Container ID:	
    Image:		gcr.io/google_containers/kubedns-amd64:1.7
    Image ID:		
    Ports:		10053/UDP, 10053/TCP
    Args:
      --domain=cluster.local
      --dns-port=10053
    Limits:
      cpu:	100m
      memory:	170Mi
    Requests:
      cpu:		100m
      memory:		170Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Liveness:		http-get http://:8080/healthz delay=60s timeout=5s period=10s #success=1 #failure=1
    Readiness:		http-get http://:8081/readiness delay=30s timeout=5s period=10s #success=1 #failure=3
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
  dnsmasq:
    Container ID:	
    Image:		gcr.io/google_containers/kube-dnsmasq-amd64:1.3
    Image ID:		
    Ports:		53/UDP, 53/TCP
    Args:
      --cache-size=1000
      --no-resolv
      --server=127.0.0.1#10053
    Limits:
      cpu:	100m
      memory:	170Mi
    Requests:
      cpu:		100m
      memory:		170Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
  healthz:
    Container ID:	
    Image:		gcr.io/google_containers/exechealthz-amd64:1.1
    Image ID:		
    Port:		8080/TCP
    Args:
      -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:53 >/dev/null && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
      -port=8080
      -quiet
    Limits:
      cpu:	10m
      memory:	50Mi
    Requests:
      cpu:		10m
      memory:		50Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-cx6kh:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-cx6kh
QoS Class:	Guaranteed
Tolerations:	dedicated=master:NoSchedule
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  8m		8m		1	{default-scheduler }			Normal		Scheduled	Successfully assigned kube-dns-2247936740-yi89m to master
  5m		5m		1	{kubelet master}			Warning		FailedMount	MountVolume.SetUp failed for volume "kubernetes.io/secret/f347bff6-9129-11e6-9232-525400c583ad-default-token-cx6kh" (spec.Name: "default-token-cx6kh") pod "f347bff6-9129-11e6-9232-525400c583ad" (UID: "f347bff6-9129-11e6-9232-525400c583ad") with: Get https://172.42.42.1:443/api/v1/namespaces/kube-system/secrets/default-token-cx6kh: dial tcp 172.42.42.1:443: getsockopt: connection refused
  8m		3m		130	{kubelet master}			Warning		FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "kube-dns-2247936740-yi89m_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-2247936740-yi89m_kube-system(f347bff6-9129-11e6-9232-525400c583ad)\" using network plugins \"cni\": cni config unintialized; Skipping pod"

@mikedanese
Copy link
Member Author

From @errordeveloper on October 13, 2016 11:2

@miry

I thought it should be 100.64.0.0/12.

You a right, it's that in the current release, but it changed in master.

After update api server to use specific ip address the wave could not connect to 100.64.0.1:443.
Digital Ocean has also Anchor Ip to eth0 with mask: 10.10.0.0/16...

Yes, there is an issue with that in some DO regions... you need to pass --service-cidr, but not with the build you are using.

If you are online now, it'd be easy if you could find me on Slack. Thanks.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 13, 2016 11:5

@Bach1 just as a sanity check, could you kill the DNS pod, i.e. kubectl -n kube-system delete pod -l name=kube-dns --all, may there is some bug on the CNI end and somehow it's not getting picked up for this pod...

@mikedanese
Copy link
Member Author

From @Bach1 on October 13, 2016 13:22

@errordeveloper Hmm no luck. After deletion:

[root@master ~]get pods --all-namespacese=kube-dns
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE
kube-system   etcd-master                      1/1       Running             0          3h
kube-system   kube-apiserver-master            1/1       Running             1          3h
kube-system   kube-controller-manager-master   1/1       Running             0          3h
kube-system   kube-discovery-982812725-qpndv   1/1       Running             0          3h
kube-system   kube-dns-2247936740-iffl7        0/3       ContainerCreating   0          2h
kube-system   kube-dns-2247936740-yi89m        0/3       Terminating         0          3h
kube-system   kube-proxy-amd64-7jgeg           1/1       Running             0          3h
kube-system   kube-proxy-amd64-e73eu           1/1       Running             0          3h
kube-system   kube-scheduler-master            1/1       Running             0          3h
kube-system   weave-net-lnvyx                  2/2       Running             0          3h
kube-system   weave-net-oox1r                  2/2       Running             0          3h
[root@master ~]# kubectl -n kube-system describe pod -l name=kube-dns
Name:		kube-dns-2247936740-iffl7
Namespace:	kube-system
Node:		node-1/10.0.2.15
Start Time:	Thu, 13 Oct 2016 11:11:37 +0000
Labels:		component=kube-dns
		k8s-app=kube-dns
		kubernetes.io/cluster-service=true
		name=kube-dns
		pod-template-hash=2247936740
		tier=node
Status:		Pending
IP:		
Controllers:	ReplicaSet/kube-dns-2247936740
Containers:
  kube-dns:
    Container ID:	
    Image:		gcr.io/google_containers/kubedns-amd64:1.7
    Image ID:		
    Ports:		10053/UDP, 10053/TCP
    Args:
      --domain=cluster.local
      --dns-port=10053
    Limits:
      cpu:	100m
      memory:	170Mi
    Requests:
      cpu:		100m
      memory:		170Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Liveness:		http-get http://:8080/healthz delay=60s timeout=5s period=10s #success=1 #failure=1
    Readiness:		http-get http://:8081/readiness delay=30s timeout=5s period=10s #success=1 #failure=3
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
  dnsmasq:
    Container ID:	
    Image:		gcr.io/google_containers/kube-dnsmasq-amd64:1.3
    Image ID:		
    Ports:		53/UDP, 53/TCP
    Args:
      --cache-size=1000
      --no-resolv
      --server=127.0.0.1#10053
    Limits:
      cpu:	100m
      memory:	170Mi
    Requests:
      cpu:		100m
      memory:		170Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
  healthz:
    Container ID:	
    Image:		gcr.io/google_containers/exechealthz-amd64:1.1
    Image ID:		
    Port:		8080/TCP
    Args:
      -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:53 >/dev/null && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
      -port=8080
      -quiet
    Limits:
      cpu:	10m
      memory:	50Mi
    Requests:
      cpu:		10m
      memory:		50Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-cx6kh:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-cx6kh
QoS Class:	Guaranteed
Tolerations:	dedicated=master:NoSchedule
No events.

Name:				kube-dns-2247936740-yi89m
Namespace:			kube-system
Node:				master/10.0.2.15
Start Time:			Thu, 13 Oct 2016 09:46:44 +0000
Labels:				component=kube-dns
				k8s-app=kube-dns
				kubernetes.io/cluster-service=true
				name=kube-dns
				pod-template-hash=2247936740
				tier=node
Status:				Terminating (expires Thu, 13 Oct 2016 11:12:07 +0000)
Termination Grace Period:	30s
IP:				
Controllers:			ReplicaSet/kube-dns-2247936740
Containers:
  kube-dns:
    Container ID:	
    Image:		gcr.io/google_containers/kubedns-amd64:1.7
    Image ID:		
    Ports:		10053/UDP, 10053/TCP
    Args:
      --domain=cluster.local
      --dns-port=10053
    Limits:
      cpu:	100m
      memory:	170Mi
    Requests:
      cpu:		100m
      memory:		170Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Liveness:		http-get http://:8080/healthz delay=60s timeout=5s period=10s #success=1 #failure=1
    Readiness:		http-get http://:8081/readiness delay=30s timeout=5s period=10s #success=1 #failure=3
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
  dnsmasq:
    Container ID:	
    Image:		gcr.io/google_containers/kube-dnsmasq-amd64:1.3
    Image ID:		
    Ports:		53/UDP, 53/TCP
    Args:
      --cache-size=1000
      --no-resolv
      --server=127.0.0.1#10053
    Limits:
      cpu:	100m
      memory:	170Mi
    Requests:
      cpu:		100m
      memory:		170Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
  healthz:
    Container ID:	
    Image:		gcr.io/google_containers/exechealthz-amd64:1.1
    Image ID:		
    Port:		8080/TCP
    Args:
      -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:53 >/dev/null && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
      -port=8080
      -quiet
    Limits:
      cpu:	10m
      memory:	50Mi
    Requests:
      cpu:		10m
      memory:		50Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cx6kh (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-cx6kh:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-cx6kh
QoS Class:	Guaranteed
Tolerations:	dedicated=master:NoSchedule
No events.

@mikedanese
Copy link
Member Author

From @miry on October 14, 2016 8:58

resolved my issue via adding a routing rule to use eth1 for node machines to kubernetes service range ip. Example:

echo "100.64.0.0/12 dev eth1" >> /etc/sysconfig/network-scripts/route-eth1
ip route add 100.64.0.0/12 dev eth1

@mikedanese
Copy link
Member Author

From @petergardfjall on October 18, 2016 6:24

Despite following the workaround steps ((1) specify --api-advertise-addresses flag when running kubeadm, (2) specify --advertise-address in kube-apiserver manifest, (3) append --proxy-mode=userspace to kube-proxy daemonset), I'm seeing the exact same issue as @Bach1.

That is, after installing the weave pod network, kube-dns is still stuck in ContainerCreating:

# kubectl get pods -n=kube-system
NAME                             READY     STATUS              RESTARTS   AGE
etcd-master                      1/1       Running             0          26m
kube-apiserver-master            1/1       Running             3          20m
kube-controller-manager-master   1/1       Running             0          26m
kube-discovery-982812725-8srv5   1/1       Running             0          27m
kube-dns-2247936740-0q6s1        0/3       ContainerCreating   0          23m
kube-proxy-amd64-8yk5x           1/1       Running             0          16m
kube-proxy-amd64-a6ghz           1/1       Running             0          16m
kube-proxy-amd64-osiin           1/1       Running             0          19m
kube-scheduler-master            1/1       Running             0          26m
weave-net-hjrin                  2/2       Running             0          15m
weave-net-lxrlf                  2/2       Running             0          15m
weave-net-posk8                  2/2       Running             0          15m
# kubectl logs -n=kube-system kube-dns-2247936740-0q6s1 kube-dns
...
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  24m		24m		1	{default-scheduler }			Normal		Scheduled	Successfully assigned kube-dns-2247936740-0q6s1 to master
  24m		15m		158	{kubelet master}			Warning		FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "kube-dns-2247936740-0q6s1_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-2247936740-0q6s1_kube-system(e300fe3a-94f7-11e6-ae73-021f4b4d9b0e)\" using network plugins \"cni\": cni config unintialized; Skipping pod"

@Bach1: did you ever resolve your problem?
Does anyone else have any ideas?

@mikedanese
Copy link
Member Author

From @miry on October 18, 2016 13:50

@petergardfjall but weave-net was up 15m ago and dns last event was 25m ago. try to restart dns pod.

@mikedanese
Copy link
Member Author

From @errordeveloper on October 18, 2016 14:11

It would be easier if folks could hop on #kubeadm channel in Slack and we
can debug step-by-step. It became really hard to pick individual setup
differences in this thread.

On Tue, 18 Oct 2016, 14:50 Michael Nikitochkin, notifications@github.com
wrote:

@petergardfjall https://github.com/petergardfjall but weave-net was up
15m ago and dns last event was 25m ago. try to restart dns pod.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
kubernetes/kubernetes#34101 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWSxJjedQFOQ2hP-HZtmeGrdpWpwWZks5q1M6VgaJpZM4KO2sA
.

@mikedanese
Copy link
Member Author

From @petergardfjall on October 19, 2016 5:43

@miry I did restart the pod with no luck (by the way, when you say restart a pod, I assume you mean deleting the pod (and have the replica set replace it), right)?

It is stuck in ContainerCreating, although it appears to have reached slightly farther, beeing able to at least create containers. However

# kubectl logs -n kube-system kube-dns-2247936740-f8nxa kube-dns
Error from server: Get https://kube-slave2:10250/containerLogs/kube-system/kube-dns-2247936740-f8nxa/kube-dns: dial tcp: lookup kube-slave2 on 10.0.2.3:53: no such host

But I guess I'll follow @errordeveloper's advice and take it on the slack channel.

@luxas
Copy link
Member

luxas commented Nov 23, 2016

@errordeveloper @lukemarsden What's the status of this issue?
I guess the core issue is fixed with the latest weave version...

@pires
Copy link
Contributor

pires commented Jan 7, 2017

Closing this one. Re-open if the problem persists.

@pires pires closed this as completed Jan 7, 2017
@Chabane
Copy link

Chabane commented Jan 13, 2017

I have this error
The connection to the server localhost:8080 was refused - did you specify the right host or port?
when I run this code via Ansible :

- name: Set --proxy-mode flag in kube-proxy daemonset 
  shell: "kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ [\"--proxy-mode=userspace\"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'"

Any idea?

@luxas
Copy link
Member

luxas commented Jan 13, 2017

Is that run from a node? Then you need to use the /etc/kubernetes/kubelet.conf kubeconfig file

@Chabane
Copy link

Chabane commented Jan 14, 2017

I don't understand what you mean by 'use the /etc/kubernetes/kubelet.conf'

This is my playbook from master node


- name: Ensure kubeadm initialization
  command: "kubeadm init --token {{kubeadmn_token}} --api-advertise-addresses={{kube_master_ip}}"

- name: Set --advertise-address flag in kube-apiserver static pod manifest 
  shell: "jq '.spec.containers[0].command |= .+ [\"--advertise-address={{kube_master_ip}}\"]' /etc/kubernetes/manifests/kube-apiserver.json > /tmp/kube-apiserver.json && mv /tmp/kube-apiserver.json /etc/kubernetes/manifests/kube-apiserver.json"
 
- name: Set --proxy-mode flag in kube-proxy daemonset 
  shell: "kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ [\"--proxy-mode=userspace\"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'"

- name: Start weave
  command : "start-weave"

I used Vagrant to create 1 master (kubeadm init) & 1 worker (kubeadm join).

@johnharris85
Copy link

Was this fixed? I still get the CrashLookBackOff on non-master nodes until I manually add the route (clean / up-to-date install of kubeadm etc...).

@luxas
Copy link
Member

luxas commented May 29, 2017

@johnharris85 Please open a new issue in that case with more details...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. priority/backlog Higher priority than priority/awaiting-more-evidence. state/needs-more-information
Projects
None yet
Development

No branches or pull requests

7 participants