Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promote coredns 1.8.4 in kube 1.22 #99751

Closed
12 tasks done
pacoxu opened this issue Mar 4, 2021 · 17 comments
Closed
12 tasks done

promote coredns 1.8.4 in kube 1.22 #99751

pacoxu opened this issue Mar 4, 2021 · 17 comments
Assignees
Labels
area/dns kind/feature Categorizes issue or PR as related to a new feature. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pacoxu
Copy link
Member

pacoxu commented Mar 4, 2021

As far as I know, before that, we should update https://github.com/coredns/corefile-migration as well for kubeadm upgrading or so.

As 1.21 code freeze is next week, I'm not sure whether we should support it in this release or next.

I also checked the node local dns part.

https://github.com/ldir-EDB0/dnsmasq/blob/master/CHANGELOG#L111-L114

Cleanup of kube-dns:

BTW, is there a plan to remove https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/kube-dns as deprecated? (kube-dns rbac related issue #60897)

What would you like to be added:

Currently, the coredns using in Kubernetes is 1.7.0.
The newer version includes 1.7.1\1.8.0\1.8.3\1.8.4.
The kubeadm supports 1.8.0 so I did a cleanup in #

Why is this needed:

I go through the release notes in coredns/coredns. More info can be found in https://github.com/coredns/coredns/blob/master/notes/coredns-1.8.3.md

/cc @prameshj @rajansandeep

Not in 1.22 as it is not merged yet:

@pacoxu pacoxu added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 4, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 4, 2021
@prameshj
Copy link
Contributor

prameshj commented Mar 4, 2021

Thanks for taking this up! I have reviewed some of the DNS PRs.

BTW, is there a plan to remove https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/kube-dns as deprecated?
Not yet.

@chrisohaver
Copy link
Contributor

There is a bug in CoreDNS 1.8.3 (and 1.8.1) that causes CoreDNS to fail to start while the k8s API is down. Eventually once the k8s API is up, CoreDNS will start. I don't know if this is severe enough of an issue to hold back until it is fixed. It would at least have the potential to cause a notable delay in CoreDNS starting, due to crashloop backoff. I don't know if it could cause more serious bootstrap type issues (wherein some process relies on CoreDNS forwarding queries out of cluster before the API can start).

@neolit123
Copy link
Member

neolit123 commented Mar 4, 2021

@pacoxu

As 1.21 code freeze is next week, I'm not sure whether we should support it in this release or next.

i think it's too late for bumping coredns for 1.21 at this point.
i'm basing this statement on contributor / reviewer bandwidth from the past.

BTW, is there a plan to remove https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/kube-dns as deprecated?

the cluster/addons folder used to be owned by SIG Cluster Lifecycle, but now this falls under the GCP cloud provider as they consume the addon installer for testing. from my POV the public use of addons/* is depracated.

@chrisohaver

There is a bug in CoreDNS 1.8.3 (and 1.8.1) that causes CoreDNS to fail to start while the k8s API is down. Eventually once the k8s API is up, CoreDNS will start

thanks for the info, this contributes to my vote to delay the update.
(even if it doesn't look like a severe problem)

@pacoxu pacoxu changed the title promote coredns 1.8.3 promote coredns 1.8.3 (maybe 1.22) Mar 5, 2021
@pacoxu
Copy link
Member Author

pacoxu commented Mar 5, 2021

@neolit123
Copy link
Member

/sig network cluster-lifecycle
/triage accepted

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 5, 2021
@pacoxu
Copy link
Member Author

pacoxu commented Mar 11, 2021

/sig network
/area dns

@chrischdi
Copy link
Member

chrischdi commented Apr 23, 2021

Hi,

I hope I don't disturb on this issue but maybe I have relevant information.

We already planned to make use of coredns 1.8.3. However we found an issue regarding headless services.

Beginning with coredns 1.8.3 (I did not find any images for 1.8.1 or 1.8.2 to test), coredns does still resolve names of headless services, while the pods itself is not yet marked as ready.

Am I missing something here or does this look like a bug or breaking change?

I have a gist here to reproduce it using kind, which also shows the different behaviour between 1.8.0 and 1.8.3.

Edit: I forgot to mention that the description of svc.spec.publishNotReadyAddresses explicitly says that this behaviour could be toggled.

publishNotReadyAddresses indicates that any agent which deals with
endpoints for this Service should disregard any indications of
ready/not-ready. The primary use case for setting this field is for a
StatefulSet's Headless Service to propagate SRV DNS records for its Pods
for the purpose of peer discovery.

@pacoxu
Copy link
Member Author

pacoxu commented Apr 23, 2021

@chrischdi
I will confirm/test it.
Not in upgrade coredns yet. We may have to move to a stable version laer if so.

@chrisohaver
Copy link
Contributor

Beginning with coredns 1.8.3 (I did not find any images for 1.8.1 or 1.8.2 to test), coredns does still resolve names of headless services, while the pods itself is not yet marked as ready.

Am I missing something here or does this look like a bug or breaking change?

Thanks for identifying it!, Yes, it's a bug, introduced with the move from the Endpoints API to EndpointSlices API.

I opened a PR to fix it. coredns/coredns#4580

@chrischdi
Copy link
Member

@pacoxu : we are also running the e2e conformance tests. Shouldn't there be a test for this (something like not ready headless service should not resolve when pod is not ready)? I'd be happy to contribute :-)

@aojea
Copy link
Member

aojea commented Jun 8, 2021

/cc

@BenTheElder
Copy link
Member

cluster/ is still necessary for CI for the immediate future, we should continue to minimally maintain there for the purposes of kubernetes CI signal. It's not really relevant to GCP, we just run it in GCP because:

  • the jobs exist and work
  • we have the credits necessary to do e.g. scale testing
  • there is no comparable CI. Kind alone is insufficient, and we would want different CI to compare against anyhow

I disagree that this is SIG cloud provider. It's SIG everyone that wants e2e coverage beyond kind.

@BenTheElder
Copy link
Member

it's also important to note that hack/local-up-cluster.sh shares these addons, which is very much used for local development by contributors e.g. @dims @liggitt, we should not let the coreDNS addon lapse

DNS_ADDON=${DNS_ADDON:-"coredns"}

cp "${KUBE_ROOT}/cluster/addons/dns/${DNS_ADDON}/${DNS_ADDON}.yaml.in" dns.yaml

@pacoxu
Copy link
Member Author

pacoxu commented Jun 16, 2021

I made a mistake here. I thought that the dns-autoscaler only supports old kube-dns, however, it supports both kube-dns and coredns.

  1. one practice is running coredns deployment with 2 or 3 replicas, plus node-local-dns daemonset(This is GAed https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/)
  2. another practice is coredns deployment, plus dns-auto-scaler. (I missed this practice. https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/)
  3. the old practice is Kube-dns deployment, plus dns-auto-scaler.(this is deprecated)

I am not sure which is the best practice. Or we should use coredns + node-local-dns + dns auto-scaler.

BTW, I will look into #100795 later for the dns-autoscaling case failure before.

@chrisohaver
Copy link
Contributor

I am not sure which is the best practice. Or we should use coredns + node-local-dns + dns auto-scaler.

My 2 cents: I don't know about best practice, but theoretically it does makes sense to use the cluster proportional auto-scaler in conjunction with a coredns + node-local-dns deployment. However the cores/nodes per replica value should be set much higher, since the load of caching and upstream forwarding is distributed to the node local dns instances. The amount of load distributed would be dependent on the ratio of in-cluster to out-of-cluster dns name lookups, and in-cluster cache hit ratio, which could be highly variable among different clusters.

https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/ doesn't provide guidance on what the value should be. It just says "Modify the fields according to your needs."

@pacoxu pacoxu changed the title promote coredns 1.8.4 (maybe 1.22) promote coredns 1.8.4 in kube 1.22 Aug 19, 2021
@thockin
Copy link
Member

thockin commented Sep 8, 2021

Did this get done?

@pacoxu
Copy link
Member Author

pacoxu commented Sep 9, 2021

Yes I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dns kind/feature Categorizes issue or PR as related to a new feature. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants