avoid excessive search lines in CI #1860

BenTheElder · 2020-09-18T21:01:07Z

There's not a good way to manage this in KIND, it's reasonable that we expect the host to have sane DNS and it's intentional that we use upstream DNS pointed at the host DNS resolution, but in CI we have additional search paths by way of running inside another cluster. We're already using FQDN for interacting with services in that cluster outside of KIND (namely the bazel build cache), so we shouldn't need search paths at all.

Basically we should mitigate this up front:
kubernetes/test-infra#19080 (comment)

/assign
/priority important-soon

xref: #303

aojea · 2020-09-18T22:10:22Z

/cc

aojea · 2020-09-18T22:18:49Z

CoreDNS config kubernetes/kubernetes#94794 (comment)

BenTheElder · 2020-09-19T17:56:08Z

@aojea if you check the context link the issue we had was in the kubelet sooo ...

aojea · 2020-09-19T21:26:42Z

does not forwarding the queries to the upstream dns server solve the problem?

BenTheElder · 2020-09-20T03:49:28Z

At the kind node level ...? Kubelet doesn't talk to coreDNS ...

…

On Sat, Sep 19, 2020, 14:26 Antonio Ojea ***@***.***> wrote: does not forwarding the queries to the upstream dns server solve the problem? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1860 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHADK46EEYJPKMGMJYWKNLSGUOZ5ANCNFSM4RSQMN3Q> .

aojea · 2020-09-20T10:08:30Z

Sep 01 19:07:11 kind-worker kubelet[611]: E0901 19:07:11.320994     611 dns.go:125] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: volume-expand-4184-7003.svc.cluster.local svc.cluster.local cluster.local test-pods.svc.cluster.local us-central1-b.c.k8s-infra-prow-build.internal c.k8s-infra-prow-build.internal

too many layers 😄

maybe we should add an option to kubelet in addition to the cluster-domain one ?
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

--cluster-domain string
  | Domain for this cluster. If set, kubelet will configure all containers to search this domain in addition to the host's search domains (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's `--config` flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

this will require a KEP but seems very simple to implement, something like?

--cluster-domain-only bool
If set, kubelet will not add the host's search domains to the pods

aojea · 2020-09-20T11:52:36Z

🤔 I' m not sure a new flag will require a KEP, beccause is not changing current behavior and just adding a new configuration option.

@thockin what do you think?
what will be the best way to avoid (globally) appending the host resolv.conf additional search domains to pods?

BenTheElder · 2020-09-20T20:11:00Z

We can do it without a KEP in this case, we are able to mutate the file from the entrypoint as mentioned above so that the node's DNS config is reasonable. Kubelet also already has the upstream mechanisms to do this IIRC, kubelet can use a DNS config file that is not the host global one, but as I said, we don't want kind to have special CI-only behavior here. The problem is that the CI environment's DNS config is unsuitable. The nested cluster is behaving correctly.

…

On Sun, Sep 20, 2020 at 4:52 AM Antonio Ojea ***@***.***> wrote: 🤔 I' m not sure a new flag will require a KEP, beccause is not changing current behavior and just adding a new configuration option. @thockin <https://github.com/thockin> what do you think? what will be the best way to avoid (globally) appending the host resolv.conf additional search domains to pods? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1860 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHADK2H5G56SIOMIHHJD7LSGXUI7ANCNFSM4RSQMN3Q> .

aojea · 2020-09-20T21:47:27Z

my point is that this is not only a KIND problem :) , it is possible that you don't want to have your host search domains in your pods, and AFAIK the possibilities that we have now are

kubelet can use a DNS config file that is not the host global one,

this option is not great because forces the admins to keep both files in sync, it will require another layer of orchestration/configuration to the cluster

using pods dns config https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config

that seems overkill if we can just indicate the kubelet to not copy the search domains from the host .

BenTheElder · 2020-09-21T21:27:12Z

This gets a little confusing to talk about. Reminding anyone in the thread that the layering is:

[GKE Node] -- runs a kubelet, has GKE managed DNS config

[[GKE Pod / ProwJob Pod]] -- docker in docker runs here. We could manage this config ourselves in the same script that sets up docker.

[[[kind "node" container"]]] -- container created by kind to be a "node". We want this to respect the "host" DNS locally using docker embedded DNS. "host" here is the prowjob pod. we should improve the prowjob pod to be a better environment for running KIND.

[[[[kind cluster pods]]]] -- not actually relevant here, the problem in kubernetes/test-infra#19080 (comment) was with kubelet, at the level above (kind "node" container)

my point is that this is not only a KIND problem :) , it is possible that you don't want to have your host search domains in your pods, and AFAIK the possibilities that we have now are

My point was that it's not a KIND problem, it's a "kind inside kubernetes" problem which is not something reasonable to optimize for in Kubernetes.

that seems overkill if we can just indicate the kubelet to not copy the search domains from the host .

In this case we need the GKE kubelet to do that, which we're not going to be able to configure regardless of whatever upstream options are available 🙃, it's managed. Upstream options to customize DNS for the kubelet already exist though.

Just for the GKE cluster pods in which we run kind, we want to reduce the searches, we don't need them in the inner cluster nodes.

We should reduce ndots and searches at the prowjob pod level to look more like a typical host. We're already hacking around Kubernetes in abnormal ways to do the docker-in-docker bit, so tweaking DNS there is not a big deal.

aojea · 2020-09-21T21:53:55Z

ic, I was trying to kill two birds with one shot, but seems one was not really a bird 😄 and after looking at the other today I'm not sure that will solve it and just mitigate it

BenTheElder · 2021-01-14T22:36:50Z

this one is not the worst thing ever but it should definitely happen eventually. when we finish sorting out the cgroups in the entrypoint I'm going to try to take a moment to port the dind fixes including CGROUP_PARENT and this.

KenMacD · 2022-12-08T16:24:55Z

I'm not sure if this is the same issue, so please tell me if I should open a new one. I've seen similar behaviour when using a host with an entry in the search domain that points to a domain hosted at cloudflare and that has DNSSEC enabled.

In these cases a combination of factors lead to things like service DNS lookups failing. Here's what happens:

A golang service compiled against musl attempts to lookup 'service.ns.svc.cluster.local'
Because K8/Kind set NDOTS to 5 this query is tried against the search domain first
service.ns.svc.cluster.local.ns.svc.cluster.lookup is tested, NXDOMAIN is received
service.ns.svc.cluster.local.svc.cluster.local is tested, NXDOMAIN
service.ns.svc.cluster.local.cluster.local is tested, NSDOMAIN
service.ns.svc.cluster.local.HOSTDOMAIN is tested. Cloudflare does not return an NXDOMAIN

Because of what Cloudflare returns musl gives up on resolving the domain, meaning services can no longer be found. This result would be found with any golang programs linked to musl, but with the default NDOTS=1 it's rarely seen.

Workarounds exist to either use service.ns to lookup the name, or add a . to the end of the name to force direct lookup.

BenTheElder · 2022-12-08T17:18:29Z

You can customize ndots on your pod but I recommend not using musl in kubernetes. There's a whole history of DNS resolver issues not specific to KIND there.

KenMacD · 2022-12-08T17:45:09Z

Thanks @BenTheElder. Yes, some more workarounds are:

Not using musl
GODEBUG=netdns=go
Setting dns_searches = [ "."] in containers.conf, but this does affect more than just the cluster. I have no reason for my cluster to need the host search domain though, so it works for me.

You mentioned setting ndots on the pod... there's still no way to set that globally on the cluster is there? Also I figured if ndots was specifically set to 5 instead of the default 1 that there was probably a good reason, is there?

BenTheElder · 2022-12-08T18:03:59Z

Even if there is a way to configure ndots globally on the cluster that configuration would be highly non standard and this workload would be broken on other clusters, which isn't really in alignment with the spirit of kind enabling conformant cluster testing.

The host searches is awkward, I don't know the best answer for kind to handle this. Ideally the host environment should be reasonable, previously on this issue we were discussing clusters inside clusters which causes this issue but can be solved by configuring the DNS on the pod in the outer cluster that kind runs within.

It's 5 by default because service SRV
https://dev.to/imjoseangel/tune-up-your-kubernetes-application-performance-with-a-small-dns-configuration-1o46
This has a better explanation of that already written 😅

BenTheElder · 2023-04-18T05:18:32Z

The e2e script should adopt #3097

BenTheElder added the kind/bug Categorizes issue or PR as related to a bug. label Sep 18, 2020

k8s-ci-robot assigned BenTheElder Sep 18, 2020

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 18, 2020

BenTheElder mentioned this issue Sep 18, 2020

(ci|pull)-kubernetes-kind-e2e.* jobs are failing kubernetes/test-infra#19080

Closed

BenTheElder changed the title ~~avoid excessive search lines in~~ avoid excessive search lines in CI Sep 23, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2020

kubernetes-sigs deleted a comment from fejta-bot Jan 14, 2021

BenTheElder added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 14, 2021

BenTheElder mentioned this issue Apr 18, 2023

use dnsSearch in CI #3178

Merged

k8s-ci-robot closed this as completed in #3178 Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid excessive search lines in CI #1860

avoid excessive search lines in CI #1860

BenTheElder commented Sep 18, 2020

aojea commented Sep 18, 2020

aojea commented Sep 18, 2020

BenTheElder commented Sep 19, 2020

aojea commented Sep 19, 2020

BenTheElder commented Sep 20, 2020 via email

aojea commented Sep 20, 2020

aojea commented Sep 20, 2020

BenTheElder commented Sep 20, 2020 via email

aojea commented Sep 20, 2020

BenTheElder commented Sep 21, 2020

aojea commented Sep 21, 2020

BenTheElder commented Jan 14, 2021

KenMacD commented Dec 8, 2022

BenTheElder commented Dec 8, 2022

KenMacD commented Dec 8, 2022

BenTheElder commented Dec 8, 2022

BenTheElder commented Apr 18, 2023

avoid excessive search lines in CI #1860

avoid excessive search lines in CI #1860

Comments

BenTheElder commented Sep 18, 2020

aojea commented Sep 18, 2020

aojea commented Sep 18, 2020

BenTheElder commented Sep 19, 2020

aojea commented Sep 19, 2020

BenTheElder commented Sep 20, 2020 via email

aojea commented Sep 20, 2020

aojea commented Sep 20, 2020

BenTheElder commented Sep 20, 2020 via email

aojea commented Sep 20, 2020

BenTheElder commented Sep 21, 2020

aojea commented Sep 21, 2020

BenTheElder commented Jan 14, 2021

KenMacD commented Dec 8, 2022

BenTheElder commented Dec 8, 2022

KenMacD commented Dec 8, 2022

BenTheElder commented Dec 8, 2022

BenTheElder commented Apr 18, 2023