CoreDNS pods left in crashloopbackoff state after running upgrade #6596

thegreenbear · 2020-08-27T10:29:47Z

Environment:

Cloud provider or hardware configuration: bare-metal
OS: Fedora CoreOS v31.20200517.3.0
Version of Ansible: 2.9.6
Version of Python: 3.7.7

Kubespray version (commit): 39fa950

Network plugin used: calico and flannel

Full inventory with variables:
Relevant bits from k8s-cluster.ym group vars:
kube_version: v1.16.10
kube_network_plugin: calico
dns_mode: coredns

Command used to invoke ansible:
ansible-playbook -b -i $inventory kubespray/upgrade-cluster.yml -vv

Output of ansible run:
See gist
The interesting bit is:

stderr: |-
    W0826 09:01:43.472706  129030 defaults.go:199] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
    W0826 09:01:43.493193  129030 defaults.go:199] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
            [WARNING CoreDNSUnsupportedPlugins]: start version '1.6.5' not supported
            [WARNING CoreDNSMigration]: CoreDNS will not be upgraded: start version '1.6.5' not supported
    W0826 09:01:48.741707  129030 dns.go:245] the CoreDNS Configuration was not migrated: unable to migrate CoreDNS ConfigMap: start version '1.6.5' not supported. The existing CoreDNS Corefile configuration has been retained.
  stderr_lines: <omitted>

Anything else do we need to know:
CoreDNS version deployed is 1.6.5

Analysis
It seems running kubeadm upgrade, causes it to attempt migrating CoreDNS (whether or not it needs to).
But a specific kubeadm version seems to only support migration of some versions of CoreDNS (which would make sense).
In some scenario, kubeadm will not be able to migrate CoreDNS and will:

edit the config map
edit the deployment

This results in the following invalid deployment:

apiVersion: apps/v1
kind: Deployment
[...]
spec:
  [...]
  template:
  [...]
    spec:
      [...]
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: docker.io/coredns/coredns:1.6.5
      [...]
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile-backup
            path: Corefile-backup

The path to config file as specified to the container command and the path where the config is actually mounted are different.

I'm not sure what solution there is as upgrading kubeadm/kubernetes is not always easy in prod environments.
The best I can think of, is a workaround to detect this migration failure after kubeadm upgrade was run and to fix it the deployment then.

I did create such a patch and it is working. But perhaps someone has a more elegant solution in mind?

Cheers,

The text was updated successfully, but these errors were encountered:

floryut · 2020-08-27T10:37:27Z

Yes we got a lot of issue with that, that's why we are now checking (when PR are created) and ensuring that CoreDNS version is supported by Corefile-migration bundled with Kubeadm

floryut · 2020-08-27T11:39:34Z

Also the bug with configmap being left erroneous was also fixed on kubernetes end kubernetes/kubernetes#88811

I suggest to close this issue as it should not happen with recent version of either spray or kubernetes

thegreenbear · 2020-08-28T08:53:34Z

Sounds good. I am happy to see there is a fix to the root cause. Do you know in which Kubernetes version this is fixed? Should I still bother creating a PR with my proposed work around with a TODO to remove it once older versions are not supported or you think it's not worth it? I'm just wondering if we're the only ones bothered by the issue or not :-) Cheers,

…

On Thu, Aug 27, 2020, 13:39 Florian Ruynat ***@***.***> wrote: Also the bug with configmap being left erroneous was also fixed on kubernetes end kubernetes/kubernetes#88811 <kubernetes/kubernetes#88811> I suggest to close this issue as it should not happen with recent version of either spray or kubernetes — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#6596 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2IFBOTZMVGTAEU5WI6E3TSCZAYJANCNFSM4QM2QFBQ> .

floryut · 2020-08-28T09:02:41Z

Sounds good. I am happy to see there is a fix to the root cause. Do you know in which Kubernetes version this is fixed? Should I still bother creating a PR with my proposed work around with a TODO to remove it once older versions are not supported or you think it's not worth it? I'm just wondering if we're the only ones bothered by the issue or not :-) Cheers,
…
On Thu, Aug 27, 2020, 13:39 Florian Ruynat @.***> wrote: Also the bug with configmap being left erroneous was also fixed on kubernetes end kubernetes/kubernetes#88811 <kubernetes/kubernetes#88811> I suggest to close this issue as it should not happen with recent version of either spray or kubernetes — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#6596 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2IFBOTZMVGTAEU5WI6E3TSCZAYJANCNFSM4QM2QFBQ .

Looks like the fix in Kubernetes landed in 1.19, so pretty recent.

That's nice of you to have a patch and might be useful for some people (if they land on this issue while searching) so you may paste it here.
But I don't think we would merge it in master as it will be deprecated really soon (and we pin coreDNS version since 2.13 to be sure not to have this error) 😄

floryut · 2020-08-31T07:05:40Z

/close
@thegreenbear feel free to post your patch here, if anyone happens to need it.
otherwise tldr: CoreDNS version should be supported by corefile-migration lib bundled with kubernetes (since k8s 1.15) otherwise you will end up with weird thing during upgrade/deploy

k8s-ci-robot · 2020-08-31T07:05:47Z

@floryut: Closing this issue.

In response to this:

/close
@thegreenbear feel free to post your patch here, if anyone happens to need it.
otherwise tldr: CoreDNS version should be supported by corefile-migration lib bundled with kubernetes (since k8s 1.15) otherwise you will end up with weird thing during upgrade/deploy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

thegreenbear · 2020-08-31T07:13:24Z

Thanks @floryut .
For those interested in a patch, the following works for me: 0001-Edited-kubeadm-upgrade-to-fix-CoreDNS-deployment-whe.txt

thegreenbear added the kind/bug Categorizes issue or PR as related to a bug. label Aug 27, 2020

k8s-ci-robot closed this as completed Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreDNS pods left in crashloopbackoff state after running upgrade #6596

CoreDNS pods left in crashloopbackoff state after running upgrade #6596

thegreenbear commented Aug 27, 2020

floryut commented Aug 27, 2020 •

edited

floryut commented Aug 27, 2020

thegreenbear commented Aug 28, 2020 via email

floryut commented Aug 28, 2020

floryut commented Aug 31, 2020

k8s-ci-robot commented Aug 31, 2020

thegreenbear commented Aug 31, 2020 •

edited

CoreDNS pods left in crashloopbackoff state after running upgrade #6596

CoreDNS pods left in crashloopbackoff state after running upgrade #6596

Comments

thegreenbear commented Aug 27, 2020

floryut commented Aug 27, 2020 • edited

floryut commented Aug 27, 2020

thegreenbear commented Aug 28, 2020 via email

floryut commented Aug 28, 2020

floryut commented Aug 31, 2020

k8s-ci-robot commented Aug 31, 2020

thegreenbear commented Aug 31, 2020 • edited

floryut commented Aug 27, 2020 •

edited

thegreenbear commented Aug 31, 2020 •

edited