Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS pods left in crashloopbackoff state after running upgrade #6596

Closed
thegreenbear opened this issue Aug 27, 2020 · 7 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@thegreenbear
Copy link
Contributor

Environment:

  • Cloud provider or hardware configuration: bare-metal
  • OS: Fedora CoreOS v31.20200517.3.0
  • Version of Ansible: 2.9.6
  • Version of Python: 3.7.7

Kubespray version (commit): 39fa950

Network plugin used: calico and flannel

Full inventory with variables:
Relevant bits from k8s-cluster.ym group vars:
kube_version: v1.16.10
kube_network_plugin: calico
dns_mode: coredns

Command used to invoke ansible:
ansible-playbook -b -i $inventory kubespray/upgrade-cluster.yml -vv

Output of ansible run:
See gist
The interesting bit is:

stderr: |-
    W0826 09:01:43.472706  129030 defaults.go:199] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
    W0826 09:01:43.493193  129030 defaults.go:199] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
            [WARNING CoreDNSUnsupportedPlugins]: start version '1.6.5' not supported
            [WARNING CoreDNSMigration]: CoreDNS will not be upgraded: start version '1.6.5' not supported
    W0826 09:01:48.741707  129030 dns.go:245] the CoreDNS Configuration was not migrated: unable to migrate CoreDNS ConfigMap: start version '1.6.5' not supported. The existing CoreDNS Corefile configuration has been retained.
  stderr_lines: <omitted>

Anything else do we need to know:
CoreDNS version deployed is 1.6.5

Analysis
It seems running kubeadm upgrade, causes it to attempt migrating CoreDNS (whether or not it needs to).
But a specific kubeadm version seems to only support migration of some versions of CoreDNS (which would make sense).
In some scenario, kubeadm will not be able to migrate CoreDNS and will:

  • edit the config map
  • edit the deployment

This results in the following invalid deployment:

apiVersion: apps/v1
kind: Deployment
[...]
spec:
  [...]
  template:
  [...]
    spec:
      [...]
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: docker.io/coredns/coredns:1.6.5
      [...]
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile-backup
            path: Corefile-backup

The path to config file as specified to the container command and the path where the config is actually mounted are different.

I'm not sure what solution there is as upgrading kubeadm/kubernetes is not always easy in prod environments.
The best I can think of, is a workaround to detect this migration failure after kubeadm upgrade was run and to fix it the deployment then.

I did create such a patch and it is working. But perhaps someone has a more elegant solution in mind?

Cheers,

@thegreenbear thegreenbear added the kind/bug Categorizes issue or PR as related to a bug. label Aug 27, 2020
@floryut
Copy link
Member

floryut commented Aug 27, 2020

Yes we got a lot of issue with that, that's why we are now checking (when PR are created) and ensuring that CoreDNS version is supported by Corefile-migration bundled with Kubeadm

@floryut
Copy link
Member

floryut commented Aug 27, 2020

Also the bug with configmap being left erroneous was also fixed on kubernetes end kubernetes/kubernetes#88811

I suggest to close this issue as it should not happen with recent version of either spray or kubernetes

@thegreenbear
Copy link
Contributor Author

thegreenbear commented Aug 28, 2020 via email

@floryut
Copy link
Member

floryut commented Aug 28, 2020

Sounds good. I am happy to see there is a fix to the root cause. Do you know in which Kubernetes version this is fixed? Should I still bother creating a PR with my proposed work around with a TODO to remove it once older versions are not supported or you think it's not worth it? I'm just wondering if we're the only ones bothered by the issue or not :-) Cheers,

On Thu, Aug 27, 2020, 13:39 Florian Ruynat @.***> wrote: Also the bug with configmap being left erroneous was also fixed on kubernetes end kubernetes/kubernetes#88811 <kubernetes/kubernetes#88811> I suggest to close this issue as it should not happen with recent version of either spray or kubernetes — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#6596 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2IFBOTZMVGTAEU5WI6E3TSCZAYJANCNFSM4QM2QFBQ .

Looks like the fix in Kubernetes landed in 1.19, so pretty recent.

That's nice of you to have a patch and might be useful for some people (if they land on this issue while searching) so you may paste it here.
But I don't think we would merge it in master as it will be deprecated really soon (and we pin coreDNS version since 2.13 to be sure not to have this error) 😄

@floryut
Copy link
Member

floryut commented Aug 31, 2020

/close
@thegreenbear feel free to post your patch here, if anyone happens to need it.
otherwise tldr: CoreDNS version should be supported by corefile-migration lib bundled with kubernetes (since k8s 1.15) otherwise you will end up with weird thing during upgrade/deploy

@k8s-ci-robot
Copy link
Contributor

@floryut: Closing this issue.

In response to this:

/close
@thegreenbear feel free to post your patch here, if anyone happens to need it.
otherwise tldr: CoreDNS version should be supported by corefile-migration lib bundled with kubernetes (since k8s 1.15) otherwise you will end up with weird thing during upgrade/deploy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thegreenbear
Copy link
Contributor Author

thegreenbear commented Aug 31, 2020

Thanks @floryut .
For those interested in a patch, the following works for me: 0001-Edited-kubeadm-upgrade-to-fix-CoreDNS-deployment-whe.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants