Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flannel (NetworkPlugin cni) error: /run/flannel/subnet.env: no such file or directory #70202

Closed
ghost opened this issue Oct 24, 2018 · 25 comments
Assignees
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@ghost
Copy link

ghost commented Oct 24, 2018

/kind bug

@kubernetes/sig-contributor-experience-bugs

What happened:
Installed a single-node kubernetes cluster on centos 7 (VM running on virtual box); my application pod (created via k8s deployment) won’t go into Ready state

Pod Event: Warning FailedCreatePodSandBox . . . Kubelet . . . Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox . . . network for pod "companyemployees-deployment-766c7c7767-t7mc5": NetworkPlugin cni failed to set up pod "companyemployees-deployment-766c7c7767-t7mc5_default" network: open /run/flannel/subnet.env: no such file or directory

In addition, it looks like the kubernetes coredns docker container keeps exiting – e.g. docker ps -a | grep -i coredns: 6341ce0be652 k8s.gcr.io/pause:3.1 "/pause" . . . Exited (0) 1 second ago k8s_POD_coredns-576cbf47c7-9bxxg_kube-system_e84afb7a-d7b7-11e8-bafa-08002745c4bc_581

What you expected to happen:
Flannel not to have the error & Pod to go into ready state

How to reproduce it (as minimally and precisely as possible):
Create a simple deployment after creating docker image and pushing the image to a private docker registry
kubectl create -f companyemployees-deployment.yaml
deployment yaml:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: companyemployees-deployment
  labels:
    app: companyemployees
spec:
  replicas: 1
  selector:
    matchLabels:
      app: companyemployees
  template:
    metadata:
      labels:
        app: companyemployees
    spec:
      containers:
      - name: companyemployees
        image: localhost:5000/companyemployees:1.0
        ports:
        - containerPort: 9092

Anything else we need to know?:
ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
link/ether 08:00:27:45:c4:bc brd ff:ff:ff:ff:ff:ff
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
link/ether 08:00:27:21:0f:92 brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
link/ether 02:42:1b:04:1f:7c brd ff:ff:ff:ff:ff:ff
6: veth3f5bcb4@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT
link/ether b2:1f:d4:fb:84:2e brd ff:ff:ff:ff:ff:ff link-netnsid 0
7: flannel.1: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT
link/ether e6:44:ed:15:dd:97 brd ff:ff:ff:ff:ff:ff

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:36:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:
    Single-node kubernetes cluster on CentOS 7 VM running on virtual box (virtual box is running on windows 7 pro)

  • OS (e.g. from /etc/os-release):
    cat /etc/os-release:
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

rpm -q centos-release
centos-release-7-4.1708.el7.centos.x86_64

My team's centos image had docker, kubernetes, flannel and docker private registry already on the image; it was working and then recently I had issues w/ it that resulted in my uninstalling kubernetes, docker and flannel and reinstalling.

Install steps:

Switch to root: su - root

install docker

  1. yum install -y yum-utils device-mapper-persistent-data lvm2
  2. yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  3. yum install docker-ce
  4. systemctl daemon-reload
  5. systemctl enable docker
  6. systemctl start docker
  7. docker run hello-world

install private docker registry

  1. docker pull registry
  2. docker run -d -p 5000:5000 --restart=always --name registry registry
  3. Note: firewalld is not running

install k8s:

  1. setenforce 0
  2. sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
  3. swapoff -a
  4. Edit /etc/fstab and comment-out /dev/mapper/centos-swap swap
  5. Add kubernetes repo for yum - edit /etc/yum.repos.d/kubernetes.repo and add
[kubernetes]
	name=Kubernetes
	baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
	enabled=1
	gpgcheck=1
	repo_gpgcheck=1
	gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
		https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
  1. yum install -y kubelet kubeadm kubectl
  2. systemctl enable kubelet
  3. systemctl start kubelet
  4. kubeadm init --pod-network-cidr= 10.244.0.0/16
  5. k8s config for user – running as root: export KUBECONFIG=/etc/kubernetes/admin.conf

install flannel:

  1. sysctl net.bridge.bridge-nf-call-iptables=1
  2. kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

Remove master node taint (to allow scheduling pods on master): kubectl taint nodes --all node-role.kubernetes.io/master-

  • Others:
    Prior to installing, uninstalled using following steps:

Switch to root: su - root

Uninstall k8s
(Although on master node, I did this a few times and included draining the node the last time)

  1. kubectl drain mynodename --delete-local-data --force --ignore-daemonsets
  2. kubectl delete node mynodename
  3. kubeadm reset
  4. systemctl stop kubelet
  5. yum remove kubeadm kubectl kubelet kubernetes-cni kube*
  6. yum autoremove
  7. rm -rf ~/.kube
  8. rm -rf /var/lib/kubelet/*

Uninstall docker:

  1. docker rm docker ps -a -q``
  2. docker stop (as needed)
  3. docker rmi -f docker images -q``
  4. Check that all containers and images were deleted: docker ps -a; docker images
  5. systemctl stop docker
  6. yum remove yum-utils device-mapper-persistent-data lvm2
  7. yum remove docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-selinux docker-engine-selinux docker-engine
  8. yum remove docker-ce
  9. rm -rf /var/lib/docker
    12. rm -rf /etc/docker

Uninstall flannel

  1. rm -rf /var/lib/cni/
  2. rm -rf /run/flannel
  3. rm -rf /etc/cni/
  4. Remove interfaces related to docker and flannel:
    ip link
    For each interface for docker or flannel, do the following
    ifconfig <name of interface from ip link> down
    ip link delete <name of interface from ip link>
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 24, 2018
@dims
Copy link
Member

dims commented Oct 24, 2018

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 24, 2018
@prodanlabs
Copy link

I am using a binary running flannel.
After starting flanneld with the systemctl command, mk-docker-opts.sh -i is automatically executed to generate the following two file environment variable files.

run/flannel/subnet.env
/run/docker_opts.env

flanneld.service add ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker

@katanyoussef
Copy link

Hi,
i got the same error, if someone have an answer can you plz help, i re-did it 3 time the same result...maybe i m missing something also the coredns dns is showing conatinercreating all the time.....

@jwatte
Copy link

jwatte commented Jan 16, 2019

I have the same problem on Ubuntu 18.04.1 with kubelet 1.13 and docker-ce 18.09.
The same setup worked with kubelet 1.12 and docker-ce 18.06.
(Note that kubelet and docker were updated in place and the machine rebooted; downgrading versions goes back to working.)
One question I have: Do I need to run flanneld on the node hosting my kubernetes, even though it's single-node (master==slave)? ubuntu doesn't have a modern flanneld to install, and no installation instructions talk about this -- apparently just applying .yml should be enough?

@wborgo
Copy link

wborgo commented Jan 29, 2019

I have the same problem on CentOS Linux release 7.6.1810 (Core)
kubelet 1.13.2
Docker 18.09.1
Image k8s.gcr.io/coredns:1.2.6

Everytime I reboot the server, the file /run/flannel/subnet.env is created after a minute or two.
I tried to change the owner and group to the non root user:

sudo chown $(id -u):$(id -g) /run/flannel/subnet.env

@jansmets
Copy link

Anyone has a possible solution for this chicken-egg problem? It's a problem when you want to do jumbo frames.
Thanks

@thockin thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019
@discostur
Copy link

Just got the same problem - fixed it by manually adding the file:

/run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

@unmurphy
Copy link

@discostur thx, resolved this issues with your answer!

@suryastef
Copy link

in my case, using centos in DO , the file /run/flannel/subnet.env exist, but same issue: /run/flannel/subnet.env: no such file or directory

at first I tried different subnet while running kubeadm init --pod-network-cidr=192.168.255.0/24

I tried @discostur solution, with changing the file manually, but the subnet.env restored to its original state when I restarted the master

this only solved by kubeadm reset and use flannel default network-cidr kubeadm init --pod-network-cidr=10.244.0.0/16

@discostur
Copy link

discostur commented Apr 20, 2019 via email

@manukasa
Copy link

Just got the same problem - fixed it by manually adding the file:

/run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Thanks this worked for us

@RamanPndy
Copy link

Just got the same problem - fixed it by manually adding the file:

/run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

this solution worked for me. but i've one doubt. What are the means of these values and how flannel is using these values?

@ryanjfrizzell
Copy link

Just got the same problem - fixed it by manually adding the file:
/run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

this solution worked for me. but i've one doubt. What are the means of these values and how flannel is using these values?

This will get it started, but it won't survive a reboot...still struggling with this myself

@caseydavenport
Copy link
Member

The subnet.env file is written out by the flannel daemonset pods and probably shouldn't be modified by hand.

If that file isn't getting written, it suggests another problem preventing the flannel pod from starting up. Are there other logs in the flannel pod? You can check with something like kubectl logs -n kube-system <flannel-pod-name>

Happy to continue discussing, but I'm going to close this since it appears to be a flannel issue rather than a Kubernetes one. Might also be worth raising as a support issue against the flannel repo too: https://github.com/coreos/flannel

/remove-triage unresolved
/remove-kind bug
/close

@k8s-ci-robot k8s-ci-robot removed the triage/unresolved Indicates an issue that can not or will not be resolved. label May 2, 2019
@k8s-ci-robot
Copy link
Contributor

@caseydavenport: Closing this issue.

In response to this:

The subnet.env file is written out by the flannel daemonset pods and probably shouldn't be modified by hand.

If that file isn't getting written, it suggests another problem preventing the flannel pod from starting up. Are there other logs in the flannel pod? You can check with something like kubectl logs -n kube-system <flannel-pod-name>

Happy to continue discussing, but I'm going to close this since it appears to be a flannel issue rather than a Kubernetes one. Might also be worth raising as a support issue against the flannel repo too: https://github.com/coreos/flannel

/remove-triage unresolved
/remove-kind bug
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@HankTheCrank
Copy link

I know this is old but I wanted to comment here as I too had this issue, but in my case it was a symptom to a different issue. In my case, there was no subnet.env file but it was not getting created because my flannel daemonset was failing. The error from the pod (kubectl --namespace=kube-system logs <POD_NAME>) showed "Error registering network: failed to acquire lease: node "<NODE_NAME>" pod cidr not assigned". The node was missing a spec for podCIDR, so I ran "kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'" for each node and the issue went away.

@sudo-undefined
Copy link

Thanks @HankTheCrank.
I just deployed kubeadm, kubelet, kubeclt 1.15.7 together with whatever CIDR version that are in kubernetes 1.17, I've read there were some changes in CIDR but I don't know really what it is and now I don't have to spend hours trying to figure it out (at the moment),
The node patch command made flannel come alive on my worker nodes.

@rprasad17088
Copy link

I also encountered exactly same problem while creating rook-ceph-operator pod, enforcing SELinux to 0 on worker nodes resolved the issue.

@rjshk013
Copy link

in my case myAmazonEKSCNIRole was the culprit.I didn't give proper oidc id in that role.I rechecked trust relationship section & corrected the values .After that my pods shows running
kube-system aws-node-29rwz 1/1 Running 20 74m
kube-system aws-node-ffvc4 1/1 Running 20 74m
kube-system coredns-65ccb76b7c-f96q4 1/1 Running 0 92m
kube-system coredns-65ccb76b7c-x7gdg 1/1 Running 0 92m

@ilmal
Copy link

ilmal commented Jan 18, 2022

Hope I can be of help!

The solution for me:

My problem was that flannel-ds pods weren't running on all of my nodes (check that the amount of flannel pods in kube-system match the amount of nodes that are in the cluster).

In my case, two of my nodes had the taint NoExecute, which also blocks flannel pods. If this is the case for you, edit the daemonset and add toleration for NoExecute. Problem solved!

@pint1022
Copy link

the solution works for me.

@audioscavenger
Copy link

audioscavenger commented Feb 17, 2022

creating the /run/flannel/subnet.env fixes the coredns issue not starting but it's only temporary.
My solution for the master/control-plane :

  1. kubeadm init --control-plane-endpoint=whatever --node-name whatever --pod-network-cidr=10.244.0.0/16
  2. kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
  3. restart all
systemctl stop kubelet
systemctl stop docker
iptables --flush
iptables -tnat --flush
systemctl start kubelet
systemctl start docker

@attila123
Copy link

Thanks, I just needed a quick solution for a test system running some old k8s. I scripted the workaround which recreates the missing /run/flannel/subnet.env:

#! /bin/bash

set -x


# See https://github.com/kubernetes/kubernetes/issues/70202
# Run as root (e.g. with sudo)

mkdir -p /run/flannel

cat << EOF > /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
EOF

@laurijssen
Copy link

laurijssen commented Sep 1, 2023

This issue started on 6 nodes after I removed apparmor daemon from the system. (after a while). As soon as I reinstalled apparmor it worked again.
One node required a kubeadm reset/join and a restart before working again, The flannel pod itself had the error permission denied on obtaining network interface so the subnet.env did not appear.
I dont know why as there are no flannel profiles in apparmor conf but it worked!

@abdelghanimeliani
Copy link

/run/flannel/subnet.env

worked for me, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests