Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change default 5s ttl to 30s for coredns to be same with kube-dns/dnsmasq #76238

Merged
merged 1 commit into from Apr 23, 2019

Conversation

@Dieken
Copy link
Contributor

commented Apr 7, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

According to https://coredns.io/plugins/kubernetes/, coredns uses 5s ttl for kubernetes zone by default, this isn't consistent with 30s ttl in dnsmasq based kube-dns. Although https://github.com/kubernetes/dns/blob/master/docs/specification.md doesn't specify the value for ttl, it's better to keep it same, actually this issue confused me a lot when I did benchmark against two kubernetes cluster, one uses dnsmasq, one uses coredns, I spent a lot of time to locate the root cause.

Special notes for your reviewer:

I use kube-dns only for k8s service, I'm not sure whether this change is appropriate for other kinds of k8s resources such as k8s pod.

Does this PR introduce a user-facing change?:

Default TTL for DNS records in kubernetes zone is changed from 5s to 30s to keep consistent with old dnsmasq based kube-dns. The TTL can be customized with command `kubectl edit -n kube-system configmap/coredns`.
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2019

Hi @Dieken. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Dieken

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

@mattjmcnaughton

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2019

/ok-to-test

@Dieken

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

/retest

No idea why it failed...

@chrisohaver

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

Behavior change aside: What value makes more sense as a default 5 or 30 seconds? I think 30 second ttl, especially for headless services is probably too long.

@Dieken

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

  1. The old dnsmasq based kube-dns uses 30s ttl
  2. Generally 5s ttl for DNS record is too short, short ttl has huge performance impact to service and usually unnecessary high load on coredns
  3. For normal service with cluster IP, usually the cluster IP is stable, people just upgrade the corresponding deployment, the service yaml is often unchanged.
  4. If 30s ttl has problem to headless service, 5s ttl still has problem, I guess most people will use normal service or a better service discovery that directly integrates with kubernetes apiserver.

Longer ttl does lead to stale DNS record, but this is a well-known problem of DNS system, 5s ttl is not a good default value for DNS record.

@neolit123
Copy link
Member

left a comment

the PR needs a release note.
could we end up on something in between 5 and 30 instead?

/priority backlog
/kind cleanup

@Dieken

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

@neolit123 I added a release note to the PR description.

I would like to just use 30s because kube-dns has used it for a long time and seems no big complaint yet. For most use cases, default 30s is good enough.

Actually k8s default coredns configuration overrides TTL for external DNS records which is very bad, dnsmasq correctly respects upstream DNS servers' TTL.

@chrisohaver

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

The old dnsmasq based kube-dns uses 30s ttl

Yes. But CoreDNS has used TTL = 5 for many k8s releases, I think since 1.9. We actually had a few issues opened saying that the ttl is too long (i.e. that we cache outdated records for too long). For this reason we now allow users to set TTL to 0 so the records do not enter the cache at all, but we left the 5 as a default.

Generally 5s ttl for DNS record is too short, short ttl has huge performance impact to service and usually unnecessary high load on coredns

But cluster DNS is not a general DNS use case. Cluster DNS provides records for micro-services, which can change much more rapidly than general DNS applications. I don't see how short ttl has a huge performance impact on the service - i think most services always look up a name, regardless of the TTL value in the response it got last time, i.e. they don't keep a local cache.

For normal service with cluster IP, usually the cluster IP is stable, people just upgrade the corresponding deployment, the service yaml is often unchanged.

I agree that cluster IP services don't change often. This is however only one type of record/response. a 30 TTL will also make negative responses and endpoint records cache for 30 seconds.

If 30s ttl has problem to headless service, 5s ttl still has problem,

Sure, there is still a problem with 5... but 6X less of a problem.

I guess most people will use normal service or a better service discovery that directly integrates with kubernetes apiserver.

What other k8s service discovery options are there? CoreDNS already integrates with the kubernetes apiserver. It keeps open a watch so it can update records quickly when they change. The 5 second TTL already partly nullifies this. A 30 second TTL would result in leaving outdated information in the cache even longer.

@Dieken

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

The old dnsmasq based kube-dns uses 30s ttl

Yes. But CoreDNS has used TTL = 5 for many releases, I think since 1.9. We actually had a few issues opened saying that the ttl is too long (i.e. that we cache outdated records for too long). For this reason we now allow users to set TTL to 0 so the records do not enter the cache at all, but we left the 5 as a default.

I guess most k8s users are still using k8s < 1.11 which uses kube-dns by default, in my company we touched k8s-1.11 several months ago :-) Those users will suddenly suffer from performance issue, unfortunately I'm one of them, I spent a lot of time to debug DNS resolution failure under high QPS and latency increase.

Of course some people would like small TTL even zero, some people would like longer TTL, there is no right or wrong, I just hope no surprise, and wonder whether 5s default TTL is good for most people.

Generally 5s ttl for DNS record is too short, short ttl has huge performance impact to service and usually unnecessary high load on coredns

But cluster DNS is not a general DNS use case. Cluster DNS provides records for micro-services, which can change much more rapidly than general DNS applications. I don't see how short ttl has a huge performance impact on the service - i think most services always look up a name, regardless of the TTL value in the response it got last time, i.e. they don't keep a local cache.

Java has DNS cache, nodejs doesn't have, many people complained that, that's why https://www.npmjs.com/package/dnscache and https://github.com/eduardbcom/lookup-dns-cache are created, actually the most important feature of dnsmasq is DNS cache.

Somebody reported QPS increases 100 times with local DNS cache compared to no local cache (Sorry I lose the source url), in my company we find 3 times QPS increase, it could be higher if we resolve bottlenecks in other places.

Local DNS cache really matters to performance.

For normal service with cluster IP, usually the cluster IP is stable, people just upgrade the corresponding deployment, the service yaml is often unchanged.

I agree that cluster IP services don't change often. This is however only one type of record/response. a 30 TTL will also make negative responses and endpoint records cache for 30 seconds.

If 30s ttl has problem to headless service, 5s ttl still has problem,

Sure, there is still a problem with 5... but 6X less of a problem.

I guess most people will use normal service or a better service discovery that directly integrates with kubernetes apiserver.

What other k8s service discovery options are there? CoreDNS already integrates with the kubernetes apiserver. It keeps open a watch so it can update records quickly when they change. The 5 second TTL already partly nullifies this. A 30 second TTL would result in leaving outdated information in the cache even longer.

It's said some Java developers created homebrew Java-only RPC solution which directly integrates with k8s apiserver, this isn't mainstream usage in kubernetes community.

For my company, DNS based service discovery is good enough, we often update k8s deployments but rarely update k8s services, so the service IP is basically stable, I guess this is also true for most k8s users, this is why I suggest 30s for default TTL, we can easily realize TTL is too long and
customize it, but it's hard to realize DNS TTL is too short and lead to performance issue
.

@chrisohaver

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

Well, making it match kube-dns behavior, purely for consistency sake is a valid argument. And after all, it's just the default value, so if it cause problems for someone, they can change it. I don't think users could change it in kube-dns.

@bowei

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

/approve
/lgtm

cc: @prameshj

@k8s-ci-robot k8s-ci-robot added the lgtm label Apr 22, 2019

@neolit123
Copy link
Member

left a comment

/approve
as per the discussion above.
will monitor for potential kubeadm complains.

@kubernetes/sig-cluster-lifecycle-pr-reviews

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, Dieken, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

commented Apr 23, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot

This comment has been minimized.

Copy link

commented Apr 23, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

commented Apr 23, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 888b81b into kubernetes:master Apr 23, 2019

20 checks passed

cla/linuxfoundation Dieken authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Context retired. Status moved to "pull-kubernetes-dependencies".
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
rfranzke added a commit to gardener/gardener that referenced this pull request Jun 20, 2019
rfranzke added a commit to gardener/gardener that referenced this pull request Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.