CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

carlosrmendes · 2018-07-17T19:29:19Z

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version 1.11

Environment:

Kubernetes version (use kubectl version): 1.11
Cloud provider or hardware configuration: aws ec2 with (16vcpus 64gb RAM)
OS (e.g. from /etc/os-release): centos 7
Kernel (e.g. uname -a): 3.10.0-693.17.1.el7.x86_64
Others: weave as cni add-on

What happened?

after kubeadm init the coreos pods stay in Error

NAME                                   READY     STATUS    RESTARTS   AGE
coredns-78fcdf6894-ljdjp               0/1       Error     6          9m
coredns-78fcdf6894-p6flm               0/1       Error     6          9m
etcd-master                            1/1       Running   0          8m
heapster-5bbdfbff9f-h5h2n              1/1       Running   0          9m
kube-apiserver-master                  1/1       Running   0          8m
kube-controller-manager-master         1/1       Running   0          8m
kube-proxy-5642r                       1/1       Running   0          9m
kube-scheduler-master                  1/1       Running   0          8m
kubernetes-dashboard-6948bdb78-bwkvx   1/1       Running   0          9m
weave-net-r5jkg                        2/2       Running   0          9m

The logs of both pods show the following:
standard_init_linux.go:178: exec user process caused "operation not permitted"

The text was updated successfully, but these errors were encountered:

neolit123 · 2018-07-17T19:39:20Z

@kubernetes/sig-network-bugs

@carlosmkb, what is your docker version?

timothysc · 2018-07-17T21:03:37Z

I find this hard to believe, we pretty extensively test CentOS 7 on our side.

Do you have the system and pod logs?

dims · 2018-07-17T21:54:10Z

this one? https://stackoverflow.com/questions/44127247/does-anyone-know-a-workaround-for-no-new-privileges-blocking-selinux-transitions

carlosrmendes · 2018-07-17T23:50:50Z

@dims , can make sense, I will try

@neolit123 and @timothysc

docker version: docker-1.13.1-63.git94f4240.el7.centos.x86_64

coredns pods log: standard_init_linux.go:178: exec user process caused "operation not permitted"
system log journalctl -xeu kubelet:

Jul 17 23:45:17 server.raid.local kubelet[20442]: E0717 23:45:17.679867   20442 pod_workers.go:186] Error syncing pod dd030886-89f4-11e8-9786-0a92797fa29e ("cas-7d6d97c7bd-mzw5j_raidcloud(dd030886-89f4-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "cas" with ImagePullBackOff: "Back-off pulling image \"registry.raidcloud.io/raidcloud/cas:180328.pvt.01\""
Jul 17 23:45:18 server.raid.local kubelet[20442]: I0717 23:45:18.679059   20442 kuberuntime_manager.go:513] Container {Name:json2ldap Image:registry.raidcloud.io/raidcloud/json2ldap:180328.pvt.01 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-f2cmq ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:18 server.raid.local kubelet[20442]: E0717 23:45:18.680001   20442 pod_workers.go:186] Error syncing pod dcc39ce2-89f4-11e8-9786-0a92797fa29e ("json2ldap-666fc85686-tmxrr_raidcloud(dcc39ce2-89f4-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "json2ldap" with ImagePullBackOff: "Back-off pulling image \"registry.raidcloud.io/raidcloud/json2ldap:180328.pvt.01\""
Jul 17 23:45:21 server.raid.local kubelet[20442]: I0717 23:45:21.678232   20442 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-6nhgg ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:21 server.raid.local kubelet[20442]: I0717 23:45:21.678311   20442 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)"
Jul 17 23:45:21 server.raid.local kubelet[20442]: I0717 23:45:21.678404   20442 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)
Jul 17 23:45:21 server.raid.local kubelet[20442]: E0717 23:45:21.678425   20442 pod_workers.go:186] Error syncing pod 9b44aa92-89f7-11e8-9786-0a92797fa29e ("coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)"
Jul 17 23:45:22 server.raid.local kubelet[20442]: I0717 23:45:22.679145   20442 kuberuntime_manager.go:513] Container {Name:login Image:registry.raidcloud.io/raidcloud/admin:180329.pvt.05 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:login-config ReadOnly:true MountPath:/usr/share/nginx/conf/ SubPath: MountPropagation:<nil>} {Name:default-token-f2cmq ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:5,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:22 server.raid.local kubelet[20442]: E0717 23:45:22.679941   20442 pod_workers.go:186] Error syncing pod dc8392a9-89f4-11e8-9786-0a92797fa29e ("login-85ffb66bb8-5l9fq_raidcloud(dc8392a9-89f4-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "login" with ImagePullBackOff: "Back-off pulling image \"registry.raidcloud.io/raidcloud/admin:180329.pvt.05\""
Jul 17 23:45:23 server.raid.local kubelet[20442]: I0717 23:45:23.678172   20442 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-6nhgg ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:23 server.raid.local kubelet[20442]: I0717 23:45:23.678412   20442 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)"
Jul 17 23:45:23 server.raid.local kubelet[20442]: I0717 23:45:23.678532   20442 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)
Jul 17 23:45:23 server.raid.local kubelet[20442]: E0717 23:45:23.678554   20442 pod_workers.go:186] Error syncing pod 9b45a068-89f7-11e8-9786-0a92797fa29e ("coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)"

chrisohaver · 2018-07-19T15:07:12Z

Found a couple instances of the same errors reported in other scenarios in the past.
Might try removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps.

sectorsize512 · 2018-07-19T21:27:43Z

Same issue for me. Similar setup CentOS 7.4.1708, Docker version 1.13.1, build 94f4240/1.13.1 (comes with CentOS):

[root@faas-A01 ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                                 READY     STATUS             RESTARTS   AGE
kube-system   calico-node-2vssv                                    2/2       Running            0          9m
kube-system   calico-node-4vr7t                                    2/2       Running            0          7m
kube-system   calico-node-nlfnd                                    2/2       Running            0          17m
kube-system   calico-node-rgw5w                                    2/2       Running            0          23m
kube-system   coredns-78fcdf6894-p4wbl                             0/1       CrashLoopBackOff   9          30m
kube-system   coredns-78fcdf6894-r4pwf                             0/1       CrashLoopBackOff   9          30m
kube-system   etcd-faas-a01.sl.cloud9.ibm.com                      1/1       Running            0          29m
kube-system   kube-apiserver-faas-a01.sl.cloud9.ibm.com            1/1       Running            0          29m
kube-system   kube-controller-manager-faas-a01.sl.cloud9.ibm.com   1/1       Running            0          29m
kube-system   kube-proxy-55csj                                     1/1       Running            0          17m
kube-system   kube-proxy-56r8c                                     1/1       Running            0          30m
kube-system   kube-proxy-kncql                                     1/1       Running            0          9m
kube-system   kube-proxy-mf2bp                                     1/1       Running            0          7m
kube-system   kube-scheduler-faas-a01.sl.cloud9.ibm.com            1/1       Running            0          29m
[root@faas-A01 ~]# kubectl logs --namespace=all coredns-78fcdf6894-p4wbl
Error from server (NotFound): namespaces "all" not found
[root@faas-A01 ~]# kubectl logs --namespace=kube-system coredns-78fcdf6894-p4wbl
standard_init_linux.go:178: exec user process caused "operation not permitted"

sectorsize512 · 2018-07-19T21:43:40Z

just in case, selinux is in permissive mode on all nodes.

sectorsize512 · 2018-07-19T21:45:01Z

An I'm using Calico (not weave as @carlosmkb).

chrisohaver · 2018-07-23T14:34:06Z

[root@faas-A01 ~]# kubectl logs --namespace=kube-system coredns-78fcdf6894-p4wbl
standard_init_linux.go:178: exec user process caused "operation not permitted"

Ah - This is an error from kubectl when trying to get the logs, not the contents of the logs...

carlosrmendes · 2018-07-23T15:43:07Z

@chrisohaver the kubectl logs works with another kube-system pods

chrisohaver · 2018-07-23T15:44:33Z

OK - have you tried removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps?

chrisohaver · 2018-07-23T15:55:09Z

... does a kubectl describe of the coredns pod show anything interesting?

Leozki · 2018-07-26T16:54:11Z

Same issue for me.
CentOS Linux release 7.5.1804 (Core)
Docker version 1.13.1, build dded712/1.13.1
flannel as cni add-on

[root@k8s ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY     STATUS             RESTARTS   AGE
kube-system   coredns-78fcdf6894-cfmm7             0/1       CrashLoopBackOff   12         15m
kube-system   coredns-78fcdf6894-k65js             0/1       CrashLoopBackOff   11         15m
kube-system   etcd-k8s.master                      1/1       Running            0          14m
kube-system   kube-apiserver-k8s.master            1/1       Running            0          13m
kube-system   kube-controller-manager-k8s.master   1/1       Running            0          14m
kube-system   kube-flannel-ds-fts6v                1/1       Running            0          14m
kube-system   kube-proxy-4tdb5                     1/1       Running            0          15m
kube-system   kube-scheduler-k8s.master            1/1       Running            0          14m
[root@k8s ~]# kubectl logs coredns-78fcdf6894-cfmm7 -n kube-system
standard_init_linux.go:178: exec user process caused "operation not permitted"
[root@k8s ~]# kubectl describe pods coredns-78fcdf6894-cfmm7 -n kube-system
Name:           coredns-78fcdf6894-cfmm7
Namespace:      kube-system
Node:           k8s.master/192.168.150.40
Start Time:     Fri, 27 Jul 2018 00:32:09 +0800
Labels:         k8s-app=kube-dns
                pod-template-hash=3497892450
Annotations:    <none>
Status:         Running
IP:             10.244.0.12
Controlled By:  ReplicaSet/coredns-78fcdf6894
Containers:
  coredns:
    Container ID:  docker://3b7670fbc07084410984d7e3f8c0fa1b6d493a41d2a4e32f5885b7db9d602417
    Image:         k8s.gcr.io/coredns:1.1.3
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:db2bf53126ed1c761d5a41f24a1b82a461c85f736ff6e90542e9522be4757848
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 27 Jul 2018 00:46:30 +0800
      Finished:     Fri, 27 Jul 2018 00:46:30 +0800
    Ready:          False
    Restart Count:  12
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-vqslm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-vqslm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-vqslm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                 Message
  ----     ------            ----                ----                 -------
  Warning  FailedScheduling  16m (x6 over 16m)   default-scheduler    0/1 nodes are available: 1 node(s) were not ready.
  Normal   Scheduled         16m                 default-scheduler    Successfully assigned kube-system/coredns-78fcdf6894-cfmm7 to k8s.master
  Warning  BackOff           14m (x10 over 16m)  kubelet, k8s.master  Back-off restarting failed container
  Normal   Pulled            14m (x5 over 16m)   kubelet, k8s.master  Container image "k8s.gcr.io/coredns:1.1.3" already present on machine
  Normal   Created           14m (x5 over 16m)   kubelet, k8s.master  Created container
  Normal   Started           14m (x5 over 16m)   kubelet, k8s.master  Started container
  Normal   Pulled            11m (x4 over 12m)   kubelet, k8s.master  Container image "k8s.gcr.io/coredns:1.1.3" already present on machine
  Normal   Created           11m (x4 over 12m)   kubelet, k8s.master  Created container
  Normal   Started           11m (x4 over 12m)   kubelet, k8s.master  Started container
  Warning  BackOff           2m (x56 over 12m)   kubelet, k8s.master  Back-off restarting failed container
[root@k8s ~]# uname
Linux
[root@k8s ~]# uname -a
Linux k8s.master 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@k8s ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 
[root@k8s ~]# docker --version
Docker version 1.13.1, build dded712/1.13.1

zimnyjakub · 2018-07-27T09:56:03Z

I have the same issue when selinux is in persmissive mode. When I disable it in /etc/selinux/conf SELINUX=disabled and reboot machine the pod starts up.

Redhat 7.4, kernel 3.10.0-693.11.6.el7.x86_64
docker-1.13.1-68.gitdded712.el7.x86_64

chrisohaver · 2018-07-27T16:43:49Z

FYI, Also works for me with SELinux disabled (not permissive, but disabled).
Docker version 1.13.1, build dded712/1.13.1
CentOS 7

[root@centosk8s ~]# kubectl logs coredns-78fcdf6894-rhx9p -n kube-system
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c
2018/07/27 16:37:31 [INFO] CoreDNS-1.1.3
2018/07/27 16:37:31 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/07/27 16:37:31 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138

lareeth · 2018-07-30T16:06:53Z

We are also experiencing this issue, we provision infrastructure through automation, so requiring a restart to completely disable selinux is not acceptable. Are there any other workarounds why we wait for this to be fixed?

chrisohaver · 2018-07-30T16:10:53Z

Try removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps.
Updating to a more recent version of docker (than 1.13) may also help.

thdrnsdk · 2018-08-01T01:50:50Z

Same issue here
Docker version 1.2.6
CentOS 7
like @lareeth we also provision kubernetes automation on use kubeadm, and also requiring a restart to completely disable selinux is not acceptable.
@chrisohaver try requiring a restart to completely disable selinux is not acceptable. it use helpful. thank you !
But as i know coredns options is not setting in kubeadm configuration
Is there no other way?

chrisohaver · 2018-08-01T12:13:13Z

Try removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps.
Updating to a more recent version of docker (e.g. to the version recommended by k8s) may also help.

chrisohaver · 2018-08-01T14:17:32Z

I verified that removing "allowPrivilegeEscalation: false" from the coredns deployment resolves the issue (with SE linux enabled in permissive mode).

chrisohaver · 2018-08-01T15:09:25Z

I also verified that upgrading to a version of docker recommended by Kubernetes (docker 17.03) resolves the issue, with "allowPrivilegeEscalation: false" left in place in the coredns deployment, and SELinux enabled in permissive mode.

chrisohaver · 2018-08-01T15:58:20Z

So, it appears as if there is a incompatibility between old versions of docker and SELinux with the allowPrivilegeEscalation directive which has apparently been resolved in later versions of docker.

There appear to be 3 different work-arounds:

Upgrade to newer version of docker, e.g. 17.03, the version currently recommended by k8s
Or remove allowPrivilegeEscalation=false from the deployment's pod spec
Or disable SELinux

Leozki · 2018-08-01T16:09:58Z

@chrisohaver I have resolved the issue by upgrading to newer version of docker 17.03. thx

neolit123 · 2018-08-01T18:15:46Z

thanks for the investigation @chrisohaver 💯

kuznero · 2018-08-10T14:28:24Z

Thanks, @chrisohaver !

This worked:

kubectl -n kube-system get deployment coredns -o yaml | \
  sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
  kubectl apply -f -

neolit123 · 2018-08-10T15:16:10Z

@chrisohaver
do you think we should document this step in the kubeadm troubleshooting guide for SELinux nodes in the lines of:

`coredns` pods have `CrashLoopBackOff` or `Error` state

If you have nodes that are running SELinux with an older version of Docker you might experience a scenario where the coredns pods are not starting. To solve that you can try one of the following options:

Upgrade to a newer version of Docker - 17.03 is confirmed to work.
Disable SELinux.
Modify the coredns deployment to set allowPrivilegeEscalation to true:

kubectl -n kube-system get deployment coredns -o yaml | \
  sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
  kubectl apply -f -

WDYT? please suggest amends to the text if you think something can be improved.

chrisohaver · 2018-08-10T15:59:59Z

Thats fine. We should perhaps mention that there are negative security implications when disabling SELinux, or changing the allowPrivilegeEscalation setting.

The most secure solution is to upgrade Docker to the version that Kubernetes recommends (17.03)

neolit123 · 2018-08-10T16:57:17Z

@chrisohaver
understood, will amend the copy and submit a PR for this.

mydockergit · 2019-01-23T14:38:36Z

There is also answer for that in stackoverflow:
https://stackoverflow.com/questions/53075796/coredns-pods-have-crashloopbackoff-or-error-state

This error

[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected

is caused when CoreDNS detects a loop in the resolve configuration, and it is the intended behavior. You are hitting this issue:

#1162

coredns/coredns#2087

Hacky solution: Disable the CoreDNS loop detection

Edit the CoreDNS configmap:

kubectl -n kube-system edit configmap coredns

Remove or comment out the line with loop, save and exit.

Then remove the CoreDNS pods, so new ones can be created with new config:

kubectl -n kube-system delete pod -l k8s-app=kube-dns

All should be fine after that.

Preferred Solution: Remove the loop in the DNS configuration

First, check if you are using systemd-resolved. If you are running Ubuntu 18.04, it is probably the case.

systemctl list-unit-files | grep enabled | grep systemd-resolved

If it is, check which resolv.conf file your cluster is using as reference:

ps auxww | grep kubelet

You might see a line like:

/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf

The important part is --resolv-conf - we figure out if systemd resolv.conf is used, or not.

If it is the resolv.conf of systemd, do the following:

Check the content of /run/systemd/resolve/resolv.conf to see if there is a record like:

nameserver 127.0.0.1

If there is 127.0.0.1, it is the one causing the loop.

To get rid of it, you should not edit that file, but check other places to make it properly generated.

Check all files under /etc/systemd/network and if you find a record like

DNS=127.0.0.1

delete that record. Also check /etc/systemd/resolved.conf and do the same if needed. Make sure you have at least one or two DNS servers configured, such as

DNS=1.1.1.1 1.0.0.1

After doing all that, restart the systemd services to put your changes into effect:
systemctl restart systemd-networkd systemd-resolved

After that, verify that DNS=127.0.0.1 is no more in the resolv.conf file:

cat /run/systemd/resolve/resolv.conf

Finally, trigger re-creation of the DNS pods

kubectl -n kube-system delete pod -l k8s-app=kube-dns

Summary: The solution involves getting rid of what looks like a DNS lookup loop from the host DNS configuration. Steps vary between different resolv.conf managers/implementations.

chrisohaver · 2019-01-23T14:49:48Z

Thanks. It's also covered in the CoreDNS loop plugin readme ...

mengxifl · 2019-03-20T03:36:55Z

I have same problem , and another problem
1、mean i can not found dns . the error is
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:57088->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:38819->172.16.254.1:53: i/o timeout
........

my /etc/resolv.com
noly have
nameserver 172.16.254.1 #this is my dns
nameserver 8.8.8.8 #another dns in net
i run

kubectl -n kube-system get deployment coredns -o yaml |
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' |
kubectl apply -f -

then pod rebuild only have one error

[ERROR] plugin/errors: 2 10594135170717325.8545646296733374240. HINFO: unreachable backend: no upstream host

I don't know if that's normal . maybe

2、the coredns cannot found my api service . error is

kube-dns Failed to list *v1.Endpoints getsockopt: 10.96.0.1:6443 api connection refused

coredns restart again and again ,at last will CrashLoopBackOff

so i have to run coredns on master node i do that

kubectl edit deployment/coredns --namespace=kube-system
spec.template.spec
nodeSelector:
node-role.kubernetes.io/master: ""

I don't know if that's normal

at last give my env

Linux 4.20.10-1.el7.elrepo.x86_64 /// centos 7

docker Version: 18.09.3

[root@k8smaster00 ~]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-controller-manager v1.13.3 0482f6400933 6 weeks ago 146MB
k8s.gcr.io/kube-proxy v1.13.3 98db19758ad4 6 weeks ago 80.3MB
k8s.gcr.io/kube-apiserver v1.13.3 fe242e556a99 6 weeks ago 181MB
k8s.gcr.io/kube-scheduler v1.13.3 3a6f709e97a0 6 weeks ago 79.6MB
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 7 weeks ago 52.6MB
k8s.gcr.io/coredns 1.2.6 f59dcacceff4 4 months ago 40MB
k8s.gcr.io/etcd 3.2.24 3cab8e1b9802 6 months ago 220MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 15 months ago 742kB

kubenets is 1.13.3

I think this is a bug Expect an official update or a solution

chrisohaver · 2019-03-20T13:28:45Z

I have same problem ...

@mengxifl, Those errors are significantly different than the ones reported and discussed in this issue.

[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:57088->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:38819->172.16.254.1:53: i/o timeout

Those errors mean that the CoreDNS pod (and probably all other pods) cannot reach your nameservers. This suggests a networking problem in your cluster to the outside world. Possibly flannel misconfiguration or firewalls.

the coredns cannot found my api service ...
so i have to run coredns on master node

This is also not normal. If I understand you correctly, you are saying that CoreDNS can contact the API from the master node but not other nodes. This would suggest pod to service networking problems between nodes within your cluster - perhaps an issue with flannel configuration or firewalls.

mengxifl · 2019-03-21T06:28:57Z

I have same problem ...

@mengxifl, Those errors are significantly different than the ones reported and discussed in this issue.

[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:57088->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:38819->172.16.254.1:53: i/o timeout

Those errors mean that the CoreDNS pod (and probably all other pods) cannot reach your nameservers. This suggests a networking problem in your cluster to the outside world. Possibly flannel misconfiguration or firewalls.

the coredns cannot found my api service ...
so i have to run coredns on master node

This is also not normal. If I understand you correctly, you are saying that CoreDNS can contact the API from the master node but not other nodes. This would suggest pod to service networking problems between nodes within your cluster - perhaps an issue with flannel configuration or firewalls.

Thank you for your reply

maybe i should put up my yaml file

I use
kubeadm init --config=config.yaml

my config.yaml content is

apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
apiEndpoint:
  advertiseAddress: "172.16.254.74"
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
kubernetesVersion: "v1.13.3"
etcd:
  external:
    endpoints:
    - "https://172.16.254.86:2379" 
    - "https://172.16.254.87:2379"
    - "https://172.16.254.88:2379"
    caFile: /etc/kubernetes/pki/etcd/ca.pem
    certFile: /etc/kubernetes/pki/etcd/client.pem
    keyFile: /etc/kubernetes/pki/etcd/client-key.pem
networking:
  podSubnet: "10.224.0.0/16"
  serviceSubnet: "10.96.0.0/12"
apiServerCertSANs:
- k8smaster00
- k8smaster01
- k8snode00
- k8snode01
- 172.16.254.74
- 172.16.254.79
- 172.16.254.80
- 172.16.254.81
- 172.16.254.85 #Vip
- 127.0.0.1
clusterName: "cluster"
controlPlaneEndpoint: "172.16.254.85:6443"

apiServerExtraArgs:
  service-node-port-range: 20-65535

my fannel yaml is default

https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

systemctl status firewalld
ALL node say
Unit firewalld.service could not be found.

cat /etc/sysconfig/iptables
ALL node say
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p tcp -m tcp --dport 1:65535 -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 1:65535 -j ACCEPT
-A FORWARD -p tcp -m tcp --dport 1:65535 -j ACCEPT
-A FORWARD -p tcp -m tcp --sport 1:65535 -j ACCEPT
COMMI

cat /etc/resolv.conf & ping bing.com
ALL node say
[1] 6330
nameserver 172.16.254.1
nameserver 8.8.8.8
PING bing.com (13.107.21.200) 56(84) bytes of data.
64 bytes from 13.107.21.200 (13.107.21.200): icmp_seq=2 ttl=111 time=149 ms

uname -rs
master node say
Linux 4.20.10-1.el7.elrepo.x86_64

uname -rs
slave node say
Linux 4.4.176-1.el7.elrepo.x86_64

so i don't think firewall have issue mybe fannel ? but i use default config . And maybe linux version . i don't know .

OK I run
/sbin/iptables -t nat -I POSTROUTING -s 10.224.0.0/16 -j MASQUERADE

on all my node that work for me . thanks

neolit123 added the priority/needs-more-evidence label Jul 17, 2018

neolit123 mentioned this issue Jul 19, 2018

Do not merge - check CoreDNS image kubernetes/kubernetes#66309

Closed

neolit123 self-assigned this Aug 10, 2018

neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/documentation Categorizes issue or PR as related to documentation. and removed priority/needs-more-evidence labels Aug 10, 2018

neolit123 added this to the v1.12 milestone Aug 10, 2018

chuckha added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Aug 15, 2018

neolit123 mentioned this issue Aug 15, 2018

troubleshooting-kubeadm: add guide for fixing stale CoreDNS pods kubernetes/website#9872

Merged

k8s-ci-robot closed this as completed in kubernetes/website#9872 Aug 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

carlosrmendes commented Jul 17, 2018

neolit123 commented Jul 17, 2018

timothysc commented Jul 17, 2018 •

edited

Loading

dims commented Jul 17, 2018

carlosrmendes commented Jul 17, 2018

chrisohaver commented Jul 19, 2018

sectorsize512 commented Jul 19, 2018

sectorsize512 commented Jul 19, 2018

sectorsize512 commented Jul 19, 2018

chrisohaver commented Jul 23, 2018

carlosrmendes commented Jul 23, 2018

chrisohaver commented Jul 23, 2018

chrisohaver commented Jul 23, 2018

Leozki commented Jul 26, 2018

zimnyjakub commented Jul 27, 2018 •

edited

Loading

chrisohaver commented Jul 27, 2018

lareeth commented Jul 30, 2018

chrisohaver commented Jul 30, 2018 •

edited

Loading

thdrnsdk commented Aug 1, 2018

chrisohaver commented Aug 1, 2018 •

edited

Loading

chrisohaver commented Aug 1, 2018 •

edited

Loading

chrisohaver commented Aug 1, 2018 •

edited

Loading

chrisohaver commented Aug 1, 2018 •

edited

Loading

Leozki commented Aug 1, 2018

neolit123 commented Aug 1, 2018

kuznero commented Aug 10, 2018

neolit123 commented Aug 10, 2018 •

edited

Loading

chrisohaver commented Aug 10, 2018

neolit123 commented Aug 10, 2018

mydockergit commented Jan 23, 2019 •

edited

Loading

chrisohaver commented Jan 23, 2019

mengxifl commented Mar 20, 2019 •

edited

Loading

chrisohaver commented Mar 20, 2019

mengxifl commented Mar 21, 2019 •

edited

Loading

CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

Comments

carlosrmendes commented Jul 17, 2018

Is this a BUG REPORT or FEATURE REQUEST?

Versions

What happened?

neolit123 commented Jul 17, 2018

timothysc commented Jul 17, 2018 • edited Loading

dims commented Jul 17, 2018

carlosrmendes commented Jul 17, 2018

chrisohaver commented Jul 19, 2018

sectorsize512 commented Jul 19, 2018

sectorsize512 commented Jul 19, 2018

sectorsize512 commented Jul 19, 2018

chrisohaver commented Jul 23, 2018

carlosrmendes commented Jul 23, 2018

chrisohaver commented Jul 23, 2018

chrisohaver commented Jul 23, 2018

Leozki commented Jul 26, 2018

zimnyjakub commented Jul 27, 2018 • edited Loading

chrisohaver commented Jul 27, 2018

lareeth commented Jul 30, 2018

chrisohaver commented Jul 30, 2018 • edited Loading

thdrnsdk commented Aug 1, 2018

chrisohaver commented Aug 1, 2018 • edited Loading

chrisohaver commented Aug 1, 2018 • edited Loading

chrisohaver commented Aug 1, 2018 • edited Loading

chrisohaver commented Aug 1, 2018 • edited Loading

Leozki commented Aug 1, 2018

neolit123 commented Aug 1, 2018

kuznero commented Aug 10, 2018

neolit123 commented Aug 10, 2018 • edited Loading

coredns pods have CrashLoopBackOff or Error state

chrisohaver commented Aug 10, 2018

neolit123 commented Aug 10, 2018

mydockergit commented Jan 23, 2019 • edited Loading

chrisohaver commented Jan 23, 2019

mengxifl commented Mar 20, 2019 • edited Loading

chrisohaver commented Mar 20, 2019

mengxifl commented Mar 21, 2019 • edited Loading

timothysc commented Jul 17, 2018 •

edited

Loading

zimnyjakub commented Jul 27, 2018 •

edited

Loading

chrisohaver commented Jul 30, 2018 •

edited

Loading

chrisohaver commented Aug 1, 2018 •

edited

Loading

chrisohaver commented Aug 1, 2018 •

edited

Loading

chrisohaver commented Aug 1, 2018 •

edited

Loading

chrisohaver commented Aug 1, 2018 •

edited

Loading

neolit123 commented Aug 10, 2018 •

edited

Loading

`coredns` pods have `CrashLoopBackOff` or `Error` state

mydockergit commented Jan 23, 2019 •

edited

Loading

mengxifl commented Mar 20, 2019 •

edited

Loading

mengxifl commented Mar 21, 2019 •

edited

Loading