New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Failed to setup network for pod \ using network plugins \"cni\": no IP addresses available in network: podnet; Skipping pod" #39557

Closed
Reifier opened this Issue Jan 6, 2017 · 12 comments

Comments

Projects
None yet
@Reifier
Copy link

Reifier commented Jan 6, 2017

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

I tried requesting for help on Slack and SO. No results.

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

Here is the issue that I think resembles mine the most, but it has a different version on kubernetes:
#25281


Is this a BUG REPORT or FEATURE REQUEST? (choose one):

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.3", GitCommit:"4957b090e9a4f6a68b4a40375408fdc74a212260", GitTreeState:"clean", BuildDate:"2016-10-16T06:36:33Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7+coreos.0", GitCommit:"0581d1a5c618b404bd4766544bec479aedef763e", GitTreeState:"clean", BuildDate:"2016-12-12T19:04:11Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: aws , m4.large instance
  • OS (e.g. from /etc/os-release):
NAME=CoreOS
ID=coreos
VERSION=1185.5.0
VERSION_ID=1185.5.0
BUILD_ID=2016-12-07-0937
PRETTY_NAME="CoreOS 1185.5.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
  • Kernel (e.g. uname -a):
    Linux ip-10-254-195-139.us-west-1.aws.wflops.net 4.7.3-coreos-r3 #1 SMP Wed Dec 7 09:29:55 UTC 2016 x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz GenuineIntel GNU/Linux
  • Install tools:
    kubelet
  • Others:

What happened:
I have a Jenkins plugin set up which schedules containers on the master node just fine, but when it comes to minions there is a problem. My setup follows standard coreos guide:
https://coreos.com/kubernetes/docs/latest/getting-started.html

kubectl describe pod jenkinsminions-162720e18dbccc
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  1m		1m		1	{default-scheduler }				Normal		Scheduled	Successfully assigned jenkinsminions-162720e18dbccc to 10.254.194.102
  1m		1s		43	{kubelet 10.254.194.102}			Warning		FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "jenkinsminions-162720e18dbccc_default" with SetupNetworkError: "Failed to setup network for pod \"jenkinsminions-162720e18dbccc_default(09ae5aea-d46a-11e6-ac8a-026030ef5380)\" using network plugins \"cni\": no IP addresses available in network: podnet; Skipping pod"

What you expected to happen:
Expect the pod to schedule normally without an error.

How to reproduce it (as minimally and precisely as possible):
Follow the standard coreos guide, set up just two nodes, with coreos version 1.4.7. Setup jenkins with kubernetes plugin and try to schedule a container(not on k8s master).

Anything else do we need to know:
Logs on the kubelet slave server also show the interface error consistently:

Jan 06 23:23:28 ip-10-254-194-102.us-west-1.aws.wflops.net kubelet-wrapper[25651]: E0106 23:23:28.272729   25651 nestedpendingoperations.go:253] Operation for "\"kubernetes.io/secret/ca7fc3a9-d209-11e6-82eb-026030ef5380-default-token-x2819\" (\"ca7fc3a9-d209-11e6-82eb-026030ef5380\")" failed. No retries permitted until 2017-01-06 23:25:28.272709989 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/ca7fc3a9-d209-11e6-82eb-026030ef5380-default-token-x2819" (volume.spec.Name: "default-token-x2819") pod "ca7fc3a9-d209-11e6-82eb-026030ef5380" (UID: "ca7fc3a9-d209-11e6-82eb-026030ef5380") with: rename /var/lib/kubelet/pods/ca7fc3a9-d209-11e6-82eb-026030ef5380/volumes/kubernetes.io~secret/default-token-x2819 /var/lib/kubelet/pods/ca7fc3a9-d209-11e6-82eb-026030ef5380/volumes/kubernetes.io~secret/wrapped_default-token-x2819.deleting~019409588: device or resource busy
Jan 06 23:23:28 ip-10-254-194-102.us-west-1.aws.wflops.net kubelet-wrapper[25651]: E0106 23:23:28.920477   25651 docker_manager.go:746] Logging security options: {key:seccomp value:unconfined msg:}
Jan 06 23:23:30 ip-10-254-194-102.us-west-1.aws.wflops.net kubelet-wrapper[25651]: E0106 23:23:30.190398   25651 docker_manager.go:357] NetworkPlugin cni failed on the status hook for pod 'jenkinsminions-1625fdd8c07460' - Unexpected command output Device "eth0" does not exist.
Jan 06 23:23:30 ip-10-254-194-102.us-west-1.aws.wflops.net kubelet-wrapper[25651]:  with error: exit status 1
Jan 06 23:23:30 ip-10-254-194-102.us-west-1.aws.wflops.net kubelet-wrapper[25651]: I0106 23:23:30.338410   25651 reconciler.go:299] MountVolume operation started for volume "kubernetes.io/secret/5d3004b4-d224-11e6-b91e-026030ef5380-default-token-csbm2" (spec.Name: "default-token-csbm2") to pod "5d3004b4-d224-11e6-b91e-026030ef5380" (UID: "5d3004b4-d224-11e6-b91e-026030ef5380"). Volume is already mounted to pod, but remount was requested.
Jan 06 23:23:30 ip-10-254-194-102.us-west-1.aws.wflops.net kubelet-wrapper[25651]: I0106 23:23:30.365306   25651 operation_executor.go:802] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/5d3004b4-d224-11e6-b91e-026030ef5380-default-token-csbm2" (spec.Name: "default-token-csbm2") pod "5d3004b4-d224-11e6-b91e-026030ef5380" (UID: "5d3004b4-d224-11e6-b91e-026030ef5380").
Jan 06 23:23:30 ip-10-254-194-102.us-west-1.aws.wflops.net kubelet-wrapper[25651]: E0106 23:23:30.539405   25651 cni.go:255] Error adding network: "cni0" already has an IP address different from 10.2.44.1/24
@resouer

This comment has been minimized.

Copy link
Member

resouer commented Jan 9, 2017

What CNI network plugin are you using? Flannel?

@Reifier

This comment has been minimized.

Copy link

Reifier commented Jan 9, 2017

Yes, flannel.

@resouer resouer added the sig/network label Jan 10, 2017

@filipenf

This comment has been minimized.

Copy link

filipenf commented Jan 11, 2017

I had a simliar issue while testing kubernetes with kubeadm. This started to happen after I did a kubeadm reset and kubeadm init ... again.

Here's what I did to fix (on master and slaves):

kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down

(you may need to manually umount filesystems from /var/lib/kubelet before calling rm on that dir)

After doing that I started docker and kubelet back again and restarted the kubeadm process

@Reifier

This comment has been minimized.

Copy link

Reifier commented Jan 16, 2017

Interesting. As I have mentioned there is no kubeadm involved since it is a coreos install.

After leaving the cluster alone for a while this issue got mysteriously resolved. On to other things now.

@Reifier Reifier closed this Jan 16, 2017

@ChrisBuchholz

This comment has been minimized.

Copy link

ChrisBuchholz commented Feb 16, 2017

@filipenf: I ran into the same problem, and your workaround seems to fix it. Thanks.

@deedubs

This comment has been minimized.

Copy link

deedubs commented Mar 13, 2017

Hey we had this a couple times. Threw this together and run it on a timer https://github.com/swiftmedical/cni-cleanup

@aysark

This comment has been minimized.

Copy link

aysark commented Nov 7, 2017

In addition to @filipenf you may also need to do:

ip link delete cni0
ip link delete flannel.1
@wujie1993

This comment has been minimized.

Copy link

wujie1993 commented Feb 4, 2018

@filipenf It work,Thanks:)

@Paxa

This comment has been minimized.

Copy link

Paxa commented Apr 5, 2018

Same issue may happen when using flannel with ipv6 disabled coreos/flannel#936

@KevinTHU

This comment has been minimized.

Copy link

KevinTHU commented Apr 27, 2018

@Paxa yes, you are right, I have fix it with ipv6 reopen. and some comment on issue coreos/flannel#936 (comment)

@apetal1

This comment has been minimized.

Copy link

apetal1 commented Sep 10, 2018

Just a quick heads up on the way we worked around a similar issue (using AWS EKS)
Seems related to ENI and number of IPs allowed for each instance type. Apparently we were using t2.micro instances for our tests that allowed only 4 IPs per worker node (check AWS doc here).
Issue fixed by switching to t2.medium instances for our EKS workers which allows for 6 IPv4 Addresses per Network interface.
Not sure if related to all cases above, but hope it will help some avoid spending hours troubleshooting AWK EKS installations

@makanijatin

This comment has been minimized.

Copy link

makanijatin commented Sep 17, 2018

kubeadm reset systemctl stop kubelet systemctl stop docker rm -rf /var/lib/cni/ rm -rf /var/lib/kubelet/* rm -rf /etc/cni/ ifconfig cni0 down ifconfig flannel.1 down ifconfig docker0 down

It worked for me too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment