New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weave-net CrashLoopBackOff for the second node #34101

Closed
avkonst opened this Issue Oct 5, 2016 · 56 comments

Comments

Projects
None yet
@avkonst
Copy link

avkonst commented Oct 5, 2016

Is this a request for help?

I think it is an issue either with software or documentation, but I am not quite sure. I have started with a question on stackoverflow: http://stackoverflow.com/questions/39872332/how-to-fix-weave-net-crashloopbackoff-for-the-second-node

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

I think it is a bug or request to improve documentation

Kubernetes version (use kubectl version):
1.4.0

Environment:

  • Cloud provider or hardware configuration: Vagrant
  • OS (e.g. from /etc/os-release): Ubuntu 16.04
  • Kernel (e.g. uname -a):
  • Install tools: kubeadm init/join
  • Others:

What happened:

I have got 2 VMs nodes. Both see each other either by hostname (through /etc/hosts) or by ip address. One has been provisioned with kubeadm as a master. Another as a worker node. Following the instructions (http://kubernetes.io/docs/getting-started-guides/kubeadm/) I have added weave-net. The list of pods looks like the following:

vagrant@vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS             RESTARTS   AGE
kube-system   etcd-vm-master                          1/1       Running            0          3m
kube-system   kube-apiserver-vm-master                1/1       Running            0          5m
kube-system   kube-controller-manager-vm-master       1/1       Running            0          4m
kube-system   kube-discovery-982812725-x2j8y          1/1       Running            0          4m
kube-system   kube-dns-2247936740-5pu0l               3/3       Running            0          4m
kube-system   kube-proxy-amd64-ail86                  1/1       Running            0          4m
kube-system   kube-proxy-amd64-oxxnc                  1/1       Running            0          2m
kube-system   kube-scheduler-vm-master                1/1       Running            0          4m
kube-system   kubernetes-dashboard-1655269645-0swts   1/1       Running            0          4m
kube-system   weave-net-7euqt                         2/2       Running            0          4m
kube-system   weave-net-baao6                         1/2       CrashLoopBackOff   2          2m

CrashLoopBackOff appears for each worker node connected. I have spent several ours playing with network interfaces, but it seems the network is fine. I have found similar question on stackoverflow, where the answer advised to look into the logs and no follow up. So, here are the logs:

vagrant@vm-master:~$ kubectl logs weave-net-baao6 -c weave --namespace=kube-system
2016-10-05 10:48:01.350290 I | error contacting APIServer: Get https://100.64.0.1:443/api/v1/nodes: dial tcp 100.64.0.1:443: getsockopt: connection refused; trying with blank env vars
2016-10-05 10:48:01.351122 I | error contacting APIServer: Get http://localhost:8080/api: dial tcp [::1]:8080: getsockopt: connection refused
Failed to get peers

What you expected to happen:

I would expect the weave-net to be in Running state

How to reproduce it (as minimally and precisely as possible):

I have not done anything special, just followed the documentation on Getting Started. If it is essencial, I can share Vagrant project, which I used to provision everything. Please, let me know if you need one.

@avkonst

This comment has been minimized.

Copy link
Author

avkonst commented Oct 5, 2016

Please, let me know if it needs to be reported on weave-net project.

@bach1

This comment has been minimized.

@avkonst

This comment has been minimized.

Copy link
Author

avkonst commented Oct 5, 2016

I wonder if there is any working step by step instruction on how to get kubernetes cluster up and running on a set of virtual machines? I am happy to downgrade the version of kubernetes if it is an option. I am totally confused with all of these options: kube-deploy, kube-up, and other, I found from 3rd parties..

@agardiman

This comment has been minimized.

Copy link

agardiman commented Oct 6, 2016

I have the same issue. I'm using VirtualBox to run 2 VM based on minimal Centos 7 image. All VMs are attached to 2 interfaces, a NAT and an host-only network.
The two VMs are able to connect to each other using the host-only network interfaces.

I tried also with the instructions about Calico and Canal, and I cannot make them work either.

@avkonst

This comment has been minimized.

Copy link
Author

avkonst commented Oct 6, 2016

It seems Calico has got similar issue:

vagrant@vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS             RESTARTS   AGE
kube-system   calico-etcd-iemkd                       1/1       Running            0          57m
kube-system   calico-node-178gc                       1/2       CrashLoopBackOff   8          56m
kube-system   calico-node-zsym2                       2/2       Running            0          57m
kube-system   calico-policy-controller-6gh7b          1/1       Running            0          57m
kube-system   etcd-vm-master                          1/1       Running            0          56m
kube-system   kube-apiserver-vm-master                1/1       Running            0          58m
kube-system   kube-controller-manager-vm-master       1/1       Running            0          57m
kube-system   kube-discovery-982812725-7jsmb          1/1       Running            0          57m
kube-system   kube-dns-2247936740-xiee1               3/3       Running            0          57m
kube-system   kube-proxy-amd64-iywb7                  1/1       Running            0          57m
kube-system   kube-proxy-amd64-ok9bx                  1/1       Running            0          56m
kube-system   kube-scheduler-vm-master                1/1       Running            0          56m
kube-system   kubernetes-dashboard-1655269645-g4cyd   1/1       Running            0          57m
  info: 1 completed object(s) was(were) not shown in pods list. Pass --show-all to see all objects.

vagrant@vm-master:~$ kubectl logs calico-node-178gc -c calico-node --namespace=kube-system
Waiting for etcd connection...
No IP provided. Using detected IP: 10.0.10.11
Traceback (most recent call last):
  File "startup.py", line 336, in <module>
    main()
  File "startup.py", line 287, in main
    warn_if_hostname_conflict(ip)
  File "startup.py", line 210, in warn_if_hostname_conflict
    current_ipv4, _ = client.get_host_bgp_ips(hostname)
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 134, in wrapped
    "running?" % (fn.__name__, e.message))
pycalico.datastore_errors.DataStoreError: get_host_bgp_ips: Error accessing etcd (Connection to etcd failed due to MaxRetryError("HTTPConnectionPool(host='100.78.232.136', port=6666): Max retries exceeded with url: /v2/keys/calico/bgp/v1/host/vm-worker/ip_addr_v4 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f50416bfc50>, 'Connection to 100.78.232.136 timed out. (connect timeout=60)'))",)).  Is etcd running?
Calico node failed to start

What could I try to progress this issue further?

@tedstirm

This comment has been minimized.

Copy link

tedstirm commented Oct 12, 2016

Okay I ran into this exact same issue and here is how I fixed it.

This problem seems to be due to kube-proxy looking at the wrong network interface. If you look at the kube-proxy logs on a worker node you will most likely see something like:

-A KUBE-SEP-4C6YEJQ2VXV53FEZ -m comment --comment default/kubernetes:https -s 10.0.2.15/32 -j KUBE-MARK-MASQ

This is the wrong network interface. The kube-proxy should be looking at the master node's IP address not the NAT IP address.

As far as I know the kube-proxy gets this value from the Kube API Server when starting up. If you look at the Kube API Server's documentation it states that if --advertise-address flag isn't set it will default to --bind-address and if --bind-address isn't set it will default to host's default interface. Which in my case and yours seems to be the NAT interface, which isn't what we want. So what I did was set the Kube API Server's --advertise-address flag and everything started working. So right after Step 2 and before Step 3 of

Installing Kubernetes on Linux with kubeadm

You will need to update your /etc/kubernetes/manifests/kube-apiserver.json and add the --advertise-address flag to point to your master node's IP address.

So for example: My master node's IP address is 172.28.128.2, which means right after Step 2 I do:

cat <<EOF > /etc/kubernetes/manifests/kube-apiserver.json
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-apiserver",
    "namespace": "kube-system",
    "creationTimestamp": null,
    "labels": {
      "component": "kube-apiserver",
      "tier": "control-plane"
    }
  },
  "spec": {
    "volumes": [
      {
        "name": "certs",
        "hostPath": {
          "path": "/etc/ssl/certs"
        }
      },
      {
        "name": "pki",
        "hostPath": {
          "path": "/etc/kubernetes"
        }
      }
    ],
    "containers": [
      {
        "name": "kube-apiserver",
        "image": "gcr.io/google_containers/kube-apiserver-amd64:v1.4.0",
        "command": [
          "/usr/local/bin/kube-apiserver",
          "--v=4",
          "--insecure-bind-address=127.0.0.1",
          "--etcd-servers=http://127.0.0.1:2379",
          "--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota",
          "--service-cluster-ip-range=100.64.0.0/12",
          "--service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem",
          "--client-ca-file=/etc/kubernetes/pki/ca.pem",
          "--tls-cert-file=/etc/kubernetes/pki/apiserver.pem",
          "--tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem",
          "--token-auth-file=/etc/kubernetes/pki/tokens.csv",
          "--secure-port=443",
          "--allow-privileged",
          "--advertise-address=172.28.128.2",
          "--etcd-servers=http://127.0.0.1:2379"
        ],
        "resources": {
          "requests": {
            "cpu": "250m"
          }
        },
        "volumeMounts": [
          {
            "name": "certs",
            "mountPath": "/etc/ssl/certs"
          },
          {
            "name": "pki",
            "readOnly": true,
            "mountPath": "/etc/kubernetes/"
          }
        ],
        "livenessProbe": {
          "httpGet": {
            "path": "/healthz",
            "port": 8080,
            "host": "127.0.0.1"
          },
          "initialDelaySeconds": 15,
          "timeoutSeconds": 15
        }
      }
    ],
    "hostNetwork": true
  },
  "status": {}
}
EOF

I am not to sure if this is a valid long term solution, because if the default kube-apiserver.json changes then those changes wouldn't get reflected by doing what I am doing. Ideally, I think the user would want some way to set these flags via kubeadm or maybe the user should be responsible for parsing the JSON themselves. Thoughts?

However, it still may be a good idea to update Step 2 of Installing Kubernetes on Linux with kubeadm to atleast mention to the users that they can update the kube component flags by modifying the their json found at: /etc/kubernetes/manifests/.

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

I am looking at this right now. We have a way of reproducing it with https://github.com/davidkbainbridge/k8s-playground, where we have already applied a fix that makes kubelet pickup desride IP for pods in host network namespaces, however somehow kubernetes service IP still points to 10.0.2.15.

@chenzhiwei

This comment has been minimized.

Copy link
Contributor

chenzhiwei commented Oct 12, 2016

I encountered this issue too.

Adding --advertise-address when starting kube-apiserver solved this issue.

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

@chenzhiwei I've already started diving into the code, and you saved me time to realise how this part worked exactly, thank you so much! This looks like an easy fix for kubeadm, PR is coming shortly.

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

Ok, so it turns out that this flag is not enough, we still have an issue reaching kubernetes service IP. The simplest solution to this is to run kube-proxy with --proxy-mode=userspace. To enable this, you can use kubectl -n kube-system edit ds kube-proxy-amd64 && kubectl -n kube-system delete pods -l name=kube-proxy-amd64.

@avkonst

This comment has been minimized.

Copy link
Author

avkonst commented Oct 12, 2016

I am starting kubeadm with --advertise-address option:
kubeadm init --api-advertise-addresses=$master_address
where my $master_address is not NAT interface. It still not enough to remove this issue.

@avkonst

This comment has been minimized.

Copy link
Author

avkonst commented Oct 12, 2016

"Ok, so it turns out that this flag is not enough, we still have an issue reaching kubernetes service IP. The simplest solution to this is to run kube-proxy with --proxy-mode=userspace. To enable this, you can use kubectl -n kube-system edit ds kube-proxy-amd64 && kubectl -n kube-system delete pods -l name=kube-proxy-amd64."

@errordeveloper Could you, please, explain what flag are you setting, to what command in the getting started tutorial?
What does your last command do?

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

@avkonst see #34607.

Also, you can do this for now:

jq \
   '.spec.containers[0].command |= .+ ["--advertise-address=172.42.42.1"]' \
   /etc/kubernetes/manifests/kube-apiserver.json > /tmp/kube-apiserver.json
mv /tmp/kube-apiserver.json /etc/kubernetes/manifests/kube-apiserver.json
@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

Could you, please, explain what flag are you setting, to what command in the getting started tutorial?

We are going to fix this shortly. If you can hop on Slack, I'd be happy to help.

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

What does your last command do?

As this thread is getting quite noisy, here is a recap.

First, find out what IP address you want to use on the master, it's probably the one on the second network interface. For this example I'll use IP="172.42.42.1".

Next, run kubeadm init --api-advertise-addresses=$IP.

Now, you want to append --advertise-address to kube-apiserver in the static pod manifest, you can do it like this:

jq \
   '.spec.containers[0].command |= .+ ["--advertise-address=$IP"]' \
   /etc/kubernetes/manifests/kube-apiserver.json > /tmp/kube-apiserver.json
mv /tmp/kube-apiserver.json /etc/kubernetes/manifests/kube-apiserver.json

And finally, you need to update flags in kube-proxy daemonset and append --proxy-mode=userspace, which can be done like this:

kubectl -n kube-system get ds -l 'component=kube-proxy-amd64' -o json \
  | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--proxy-mode=userspace"]' \
  |   kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy-amd64'
@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

I'm working on docs for this. Also, @davidkbainbridge has provided a Vagrant+Ansible implementation, and latest fixes are in my fork.

k8s-merge-robot added a commit that referenced this issue Oct 12, 2016

Merge pull request #34607 from errordeveloper/apiserver-adv-addr
Automatic merge from submit-queue

Append first address from `--api-advertise-addresses` to `kube-apiserver` flags

**What this PR does / why we need it**:

We have `--api-advertise-addresses` already, but it's only used for SANs in server certificates we generate. Currently setting this flag doesn't affect what address API server advertises, and this PR fixes that. In particular, this has been an issue for VirtualBox users (see #34101).

**Which issue this PR fixes**: fixes #34101

**Release note**:

```release-note
Make `kubeadm` append first address from `--api-advertise-addresses` to `kube-apiserver` flags as `--advertise-address`
```

@luxas luxas reopened this Oct 12, 2016

@miry

This comment has been minimized.

Copy link
Contributor

miry commented Oct 12, 2016

@tedstirm @errordeveloper can you help me with similar problem, I have multiple eth*. And I am using 10.136.0.0/16 for private network, but eth1. For now I fixed issue with weave via open port 443 for eth0 and eth1. But found next issue:

I could not access to the instances from the cluster(example to redis that hosts on separate instance). I found that weave uses 10.0.0.0/8 network. And I am looking for solution to change this. Do you know how can I change it? Example to 101.2.0.0/16?

@tedstirm

This comment has been minimized.

Copy link

tedstirm commented Oct 12, 2016

@errordeveloper I didn't have to update the --proxy-mode for my kube-proxy. All I had to do was make sure the --advertise-address flag for the Kube API Server was set to a value listed in the --api-advertise-addresses flag. I'm just curious as to why you needed to set --proxy-mode in your setup and I don't? Any ideas? I looked over the fork and your Vagrant setup looks very similar to mine, except you are using Ubuntu vs my Centos.

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 12, 2016

@miry

I found that weave uses 10.0.0.0/8 network.

No, it uses 10.32.0.0/12.

So for Weave we have 10.32.0.0/12, which expands to 10.32.0.1 - 10.47.255.254.
Kubernetes service range is 10.12.0.0/12, which expands to 10.0.0.1 - 10.15.255.254.
And you have 10.136.0.0/16, which expands to 10.136.0.1 - 10.136.255.254.
I don't see an overlap here, which is what you seem to be suggesting, but please correct me if I missunderstood. I think your problem is different, and please file another issue. Your previous issue #34031 didn't not describe the problem, please try to describe the problem.

@tedstirm

...your Vagrant setup looks very similar to mine, except you are using Ubuntu vs my Centos.

How many VMs are you running on? If it's a single-node setup, you won't notice anything. If it's not, I'm quite curious and would like to take a closer look, could you share your Vagrantfile?

@enieuw

This comment has been minimized.

Copy link

enieuw commented Oct 20, 2016

I run a similar setup based on CoreOS boxes(two nics, eth0 nat and eth1 private) on Vagrant. I had to make two changes to get the networking to work:

  • Set the --node-ip on each kubelet process to the private network ip for each node
  • Enable masquerading on the proxy to respond over the correct interface(--masquerade-all)
@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Oct 26, 2016

Pease checkout https://github.com/errordeveloper/kubernetes-ansible-vagrant, there is code to handle firewalld rules and SELinux stuff (if you are on CentOS), as well as VirtualBox issues (applies to Ubuntu as well as CentOS). We are planing to provide plain text documentation soon too. Find me on Slack if you still have problems. It's always a good idea to create a new issue rather then comment on an existing one that seems like it describes your problem.

@raghu67

This comment has been minimized.

Copy link

raghu67 commented Nov 4, 2016

I am seeing a similar issue. Here are the details:
[root@m0062421 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:16:57Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:10:32Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
[root@m0062421 ~]# kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"5+", GitVersion:"v1.5.0-alpha.0.1534+cf7301f16c0363-dirty", GitCommit:"cf7301f16c036363c4fdcb5d4d0c867720214598", GitTreeState:"dirty", BuildDate:"2016-09-27T18:10:39Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
[root@m0062421 ~]#

CentOS 7.2. Kernel Version:
[root@m0062421 ~]# uname -a
Linux m0062421.lab.ppops.net 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

These are based on an old version of OpenStack and KVM. if that is relevant, I can find the details

@avkonst

This comment has been minimized.

Copy link
Author

avkonst commented Nov 6, 2016

@errordeveloper, letting you know that workaround command is now being rejected:

ubuntu@k8s1:~$ kubectl -n kube-system get ds -l 'component=kube-proxy-amd64' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--cluster-cidr=10.32.0.0/12"]' | kubectl
apply --validate=false -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy-amd64'
Object 'Kind' is missing in '{
      "spec": {
        "template": {
          "spec": {
            "containers": [
              {
                "command": [
                  "--cluster-cidr=10.32.0.0/12"
                ]
              }
            ]
          }
        }
      }
    }'

What could I do to fix it?

@jamstar

This comment has been minimized.

Copy link

jamstar commented Nov 8, 2016

i am having the same issue as @avkonst. If you leave validation on, you get this error. error validating "STDIN": error validating data: items[0].apiVersion not set;
i included the input from kubectl -n kube-system get ds -l 'component=kube-proxy-amd64' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--proxy-mode=userspace"]' and both kind and apiVersion are clearly there.
{ "kind": "List", "apiVersion": "v1", "metadata": {}, "items": [ { "spec": { "template": { "spec": { "containers": [ { "command": [ "--cluster-cidr=10.32.0.0/12" ] } ] } } } } ] }

kubectl apply -f - it seems to not get the outermost portion of the input which is where the kind and apiVersion are both at.

@dos1701

This comment has been minimized.

Copy link

dos1701 commented Nov 10, 2016

Hi, I can confirm the problem still exists on a two nodes kubernetes cluster installed with Kubeadm.
The nodes are Ubuntu Xenial 16.04 connected to each other using a tinc VPN.
Everything works fine on the tainted master node but on the second (worker only) node every CNI network plugin container keep crashing. (I tried Flannel and Weake)
It seems like the network-fabric container is unable to reach the API server on 16.96.0.1
I tried the workaround proposed but couldn't get it through becouse of the issue psoted by @jamstar and apparentely the first two steps are insufficient.

@miry

This comment has been minimized.

Copy link
Contributor

miry commented Nov 11, 2016

@thedos1701 you need add route to 16.96.0.1 via your you VPN interface.

#34101 (comment)

ramukima added a commit to ramukima/k8s-playground that referenced this issue Nov 13, 2016

daemonset component label for kube-proxy is actually kube-proxy and n…
…ot kube-proxy-amd64. The last patch for issue kubernetes/kubernetes#34101 is broken. This fix re-establishes that patch
@tsvenkat

This comment has been minimized.

Copy link

tsvenkat commented Nov 21, 2016

@tedstirm
In my case, my apiserver's advertise-address was correct, but I still had the crashloop issue with the weave pods. Adding "--proxy-mode=userspace" using @errordeveloper 's neat jq trick solved the issue for me.

@errordeveloper
Though the issue is solved, I don't have a good understanding of why it solved. I looked at the kube-proxy documentation(http://kubernetes.io/docs/admin/kube-proxy) and it says:

Which proxy mode to use: 'userspace' (older) or 'iptables' (faster). If blank, look at the Node object on the Kubernetes API and respect the 'net.experimental.kubernetes.io/proxy-mode' annotation if provided. Otherwise use the best-available proxy (currently iptables). If the iptables proxy is selected, regardless of how, but the system's kernel or iptables versions are insufficient, this always falls back to the userspace proxy.

Should we expect any side effects as we are setting it to the "older" option? Thanks.

@clifinger

This comment has been minimized.

Copy link

clifinger commented Nov 22, 2016

Hi everyone,
I just finished a Vagrant box with centos 7.2 and kubeadm with weave-net and dashboard.
vagrant-centos-kubernetes
I include all solved issues in this post. Thank you to @petergardfjall and @errordeveloper !

@avkonst replace all occurences of kube-proxy-amd64 by kube-proxy maybe ... just check it with a get pods in the global namespace

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Nov 23, 2016

@mikedanese mikedanese closed this Nov 23, 2016

willauld added a commit to willauld/HA-kube-vagrant that referenced this issue Feb 3, 2017

Fixing a network problem that prevents skyDNS pod from starting
I ran across a similar problem to the one I was having in getting
skyDNS to run properly. It was at:
kubernetes/kubernetes#34101
The problem was that only 1 of the three containers in the pod would
start up correctly. After readding this thread there were two things
that I hadn't done in the solution. This first was to start kube-proxy
with --proxy-mode=userspace rather than iptables. I made this change
and retry starting skyDNS. It comes fully up now. Still need to test
that it is working correctly.
@thomasmodeneis

This comment has been minimized.

Copy link

thomasmodeneis commented May 26, 2017

I'm also experiencing the same issue:

kubectl get pods --namespace=kube-system
NAME                                   READY     STATUS             RESTARTS   AGE
etcd-kubernetes-1                      1/1       Running            0          2h
kube-apiserver-kubernetes-1            1/1       Running            0          2h
kube-controller-manager-kubernetes-1   1/1       Running            0          2h
kube-dns-3913472980-79r8q              0/3       Pending            0          2h
kube-proxy-11bfl                       1/1       Running            0          2h
kube-proxy-8qn3z                       1/1       Running            0          1h
kube-proxy-f7ptd                       1/1       Running            0          1h
kube-scheduler-kubernetes-1            1/1       Running            0          2h
tiller-deploy-1651596238-87wvd         0/1       Pending            0          1h
weave-cortex-agent-2343136017-gnt6z    0/1       Pending            0          45m
weave-cortex-node-exporter-9qh20       1/1       Running            0          45m
weave-cortex-node-exporter-g6hlj       1/1       Running            0          45m
weave-cortex-node-exporter-q2zpm       1/1       Running            0          45m
weave-flux-agent-478881469-4dctc       0/1       Pending            0          45m
weave-net-0rlkr                        1/2       CrashLoopBackOff   14         50m
weave-net-35tkz                        1/2       CrashLoopBackOff   14         50m
weave-net-ph8wj                        1/2       CrashLoopBackOff   14         50m
weave-scope-agent-750g6                1/1       Running            0          45m
weave-scope-agent-f6cd5                1/1       Running            0          45m
weave-scope-agent-lwwfl                1/1       Running            0          45m



$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:33:17Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}


$ kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:33:17Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

$ uname -a
Linux kubernetes-1 4.4.0-77-generic #98-Ubuntu SMP Wed Apr 26 08:34:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Any ideas on how to deal with the issue ?

@marccarre

This comment has been minimized.

Copy link

marccarre commented Jun 6, 2017

@thomasmodeneis, what do the weave container's logs say?

@luxas

This comment has been minimized.

Copy link
Member

luxas commented Jun 6, 2017

@thomasmodeneis Please open a new issue in the kubeadm repo to track that

@thomasmodeneis

This comment has been minimized.

Copy link

thomasmodeneis commented Jun 7, 2017

@marccarre would you mind sending me the cmd for getting this logs out into a gist for example ? that would help.
@luxas I will raise a new ticket asap.

@marccarre

This comment has been minimized.

Copy link

marccarre commented Jun 12, 2017

@thomasmodeneis either:

  • $ docker logs <weave-kube's container ID> or
  • $ kubectl logs weave-net-0rlkr weave -n kube-system

should do

@boukandouramhamed

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment