New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid excessive search lines in CI #1860
Comments
/cc |
CoreDNS config kubernetes/kubernetes#94794 (comment) |
@aojea if you check the context link the issue we had was in the kubelet sooo ... |
does not forwarding the queries to the upstream dns server solve the problem? |
At the kind node level ...? Kubelet doesn't talk to coreDNS ...
…On Sat, Sep 19, 2020, 14:26 Antonio Ojea ***@***.***> wrote:
does not forwarding the queries to the upstream dns server solve the
problem?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#1860 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHADK46EEYJPKMGMJYWKNLSGUOZ5ANCNFSM4RSQMN3Q>
.
|
too many layers 😄 maybe we should add an option to kubelet in addition to the
this will require a KEP but seems very simple to implement, something like?
|
🤔 I' m not sure a new flag will require a KEP, beccause is not changing current behavior and just adding a new configuration option. @thockin what do you think? |
We can do it without a KEP in this case, we are able to mutate the file
from the entrypoint as mentioned above so that the node's DNS config is
reasonable.
Kubelet also already has the upstream mechanisms to do this IIRC, kubelet
can use a DNS config file that is not the host global one, but as I said,
we don't want kind to have special CI-only behavior here.
The problem is that the CI environment's DNS config is unsuitable. The
nested cluster is behaving correctly.
…On Sun, Sep 20, 2020 at 4:52 AM Antonio Ojea ***@***.***> wrote:
🤔 I' m not sure a new flag will require a KEP, beccause is not changing
current behavior and just adding a new configuration option.
@thockin <https://github.com/thockin> what do you think?
what will be the best way to avoid (globally) appending the host
resolv.conf additional search domains to pods?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#1860 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHADK2H5G56SIOMIHHJD7LSGXUI7ANCNFSM4RSQMN3Q>
.
|
my point is that this is not only a KIND problem :) , it is possible that you don't want to have your host search domains in your pods, and AFAIK the possibilities that we have now are
this option is not great because forces the admins to keep both files in sync, it will require another layer of orchestration/configuration to the cluster
that seems overkill if we can just indicate the kubelet to not copy the search domains from the host . |
This gets a little confusing to talk about. Reminding anyone in the thread that the layering is: [GKE Node] -- runs a kubelet, has GKE managed DNS config [[GKE Pod / ProwJob Pod]] -- docker in docker runs here. We could manage this config ourselves in the same script that sets up docker. [[[kind "node" container"]]] -- container created by kind to be a "node". We want this to respect the "host" DNS locally using docker embedded DNS. "host" here is the prowjob pod. we should improve the prowjob pod to be a better environment for running KIND. [[[[kind cluster pods]]]] -- not actually relevant here, the problem in kubernetes/test-infra#19080 (comment) was with kubelet, at the level above (kind "node" container)
My point was that it's not a KIND problem, it's a "kind inside kubernetes" problem which is not something reasonable to optimize for in Kubernetes.
In this case we need the GKE kubelet to do that, which we're not going to be able to configure regardless of whatever upstream options are available 🙃, it's managed. Upstream options to customize DNS for the kubelet already exist though. Just for the GKE cluster pods in which we run kind, we want to reduce the searches, we don't need them in the inner cluster nodes. We should reduce ndots and searches at the prowjob pod level to look more like a typical host. We're already hacking around Kubernetes in abnormal ways to do the docker-in-docker bit, so tweaking DNS there is not a big deal. |
ic, I was trying to kill two birds with one shot, but seems one was not really a bird 😄 and after looking at the other today I'm not sure that will solve it and just mitigate it |
this one is not the worst thing ever but it should definitely happen eventually. when we finish sorting out the cgroups in the entrypoint I'm going to try to take a moment to port the dind fixes including CGROUP_PARENT and this. |
I'm not sure if this is the same issue, so please tell me if I should open a new one. I've seen similar behaviour when using a host with an entry in the search domain that points to a domain hosted at cloudflare and that has DNSSEC enabled. In these cases a combination of factors lead to things like service DNS lookups failing. Here's what happens:
Because of what Cloudflare returns musl gives up on resolving the domain, meaning services can no longer be found. This result would be found with any golang programs linked to musl, but with the default NDOTS=1 it's rarely seen. Workarounds exist to either use |
You can customize ndots on your pod but I recommend not using musl in kubernetes. There's a whole history of DNS resolver issues not specific to KIND there. |
Thanks @BenTheElder. Yes, some more workarounds are:
You mentioned setting |
Even if there is a way to configure ndots globally on the cluster that configuration would be highly non standard and this workload would be broken on other clusters, which isn't really in alignment with the spirit of kind enabling conformant cluster testing. The host searches is awkward, I don't know the best answer for kind to handle this. Ideally the host environment should be reasonable, previously on this issue we were discussing clusters inside clusters which causes this issue but can be solved by configuring the DNS on the pod in the outer cluster that kind runs within. It's 5 by default because service SRV |
The e2e script should adopt #3097 |
There's not a good way to manage this in KIND, it's reasonable that we expect the host to have sane DNS and it's intentional that we use upstream DNS pointed at the host DNS resolution, but in CI we have additional search paths by way of running inside another cluster. We're already using FQDN for interacting with services in that cluster outside of KIND (namely the bazel build cache), so we shouldn't need search paths at all.
Basically we should mitigate this up front:
kubernetes/test-infra#19080 (comment)
/assign
/priority important-soon
xref: #303
The text was updated successfully, but these errors were encountered: