-
Notifications
You must be signed in to change notification settings - Fork 712
-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998
Comments
@kubernetes/sig-network-bugs @carlosmkb, what is your docker version? |
I find this hard to believe, we pretty extensively test CentOS 7 on our side. Do you have the system and pod logs? |
@dims , can make sense, I will try @neolit123 and @timothysc docker version: docker-1.13.1-63.git94f4240.el7.centos.x86_64 coredns pods log:
|
Found a couple instances of the same errors reported in other scenarios in the past. |
Same issue for me. Similar setup CentOS 7.4.1708, Docker version 1.13.1, build 94f4240/1.13.1 (comes with CentOS):
|
just in case, selinux is in permissive mode on all nodes. |
An I'm using Calico (not weave as @carlosmkb). |
Ah - This is an error from kubectl when trying to get the logs, not the contents of the logs... |
@chrisohaver the |
OK - have you tried removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps? |
... does a |
Same issue for me.
|
I have the same issue when selinux is in persmissive mode. When I disable it in /etc/selinux/conf SELINUX=disabled and reboot machine the pod starts up. Redhat 7.4, kernel 3.10.0-693.11.6.el7.x86_64 |
FYI, Also works for me with SELinux disabled (not permissive, but disabled).
|
We are also experiencing this issue, we provision infrastructure through automation, so requiring a restart to completely disable selinux is not acceptable. Are there any other workarounds why we wait for this to be fixed? |
Try removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps. |
Same issue here |
Try removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps. |
I verified that removing "allowPrivilegeEscalation: false" from the coredns deployment resolves the issue (with SE linux enabled in permissive mode). |
I also verified that upgrading to a version of docker recommended by Kubernetes (docker 17.03) resolves the issue, with "allowPrivilegeEscalation: false" left in place in the coredns deployment, and SELinux enabled in permissive mode. |
So, it appears as if there is a incompatibility between old versions of docker and SELinux with the allowPrivilegeEscalation directive which has apparently been resolved in later versions of docker. There appear to be 3 different work-arounds:
|
@chrisohaver I have resolved the issue by upgrading to newer version of docker 17.03. thx |
thanks for the investigation @chrisohaver 💯 |
Thanks, @chrisohaver ! This worked: kubectl -n kube-system get deployment coredns -o yaml | \
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
kubectl apply -f - |
@chrisohaver
|
Thats fine. We should perhaps mention that there are negative security implications when disabling SELinux, or changing the allowPrivilegeEscalation setting. The most secure solution is to upgrade Docker to the version that Kubernetes recommends (17.03) |
@chrisohaver |
There is also answer for that in stackoverflow: This error
is caused when CoreDNS detects a loop in the resolve configuration, and it is the intended behavior. You are hitting this issue: Hacky solution: Disable the CoreDNS loop detection Edit the CoreDNS configmap:
Remove or comment out the line with Then remove the CoreDNS pods, so new ones can be created with new config:
All should be fine after that. Preferred Solution: Remove the loop in the DNS configuration First, check if you are using
If it is, check which
You might see a line like:
The important part is If it is the Check the content of
If there is To get rid of it, you should not edit that file, but check other places to make it properly generated. Check all files under
delete that record. Also check
After doing all that, restart the systemd services to put your changes into effect: After that, verify that
Finally, trigger re-creation of the DNS pods
Summary: The solution involves getting rid of what looks like a DNS lookup loop from the host DNS configuration. Steps vary between different resolv.conf managers/implementations. |
Thanks. It's also covered in the CoreDNS loop plugin readme ... |
I have same problem , and another problem my /etc/resolv.com kubectl -n kube-system get deployment coredns -o yaml | then pod rebuild only have one error [ERROR] plugin/errors: 2 10594135170717325.8545646296733374240. HINFO: unreachable backend: no upstream host I don't know if that's normal . maybe 2、the coredns cannot found my api service . error is kube-dns Failed to list *v1.Endpoints getsockopt: 10.96.0.1:6443 api connection refused coredns restart again and again ,at last will CrashLoopBackOff so i have to run coredns on master node i do that kubectl edit deployment/coredns --namespace=kube-system I don't know if that's normal at last give my env Linux 4.20.10-1.el7.elrepo.x86_64 /// centos 7 docker Version: 18.09.3 [root@k8smaster00 ~]# docker image ls -a kubenets is 1.13.3 I think this is a bug Expect an official update or a solution |
@mengxifl, Those errors are significantly different than the ones reported and discussed in this issue.
Those errors mean that the CoreDNS pod (and probably all other pods) cannot reach your nameservers. This suggests a networking problem in your cluster to the outside world. Possibly flannel misconfiguration or firewalls.
This is also not normal. If I understand you correctly, you are saying that CoreDNS can contact the API from the master node but not other nodes. This would suggest pod to service networking problems between nodes within your cluster - perhaps an issue with flannel configuration or firewalls. |
Thank you for your reply maybe i should put up my yaml file I use my config.yaml content is
my fannel yaml is default https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
so i don't think firewall have issue mybe fannel ? but i use default config . And maybe linux version . i don't know . OK I run on all my node that work for me . thanks |
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version 1.11
Environment:
kubectl version
): 1.11uname -a
): 3.10.0-693.17.1.el7.x86_64What happened?
after kubeadm init the coreos pods stay in Error
The logs of both pods show the following:
standard_init_linux.go:178: exec user process caused "operation not permitted"
The text was updated successfully, but these errors were encountered: