Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some control plane images lack /etc/nsswitch.conf and will not respect /etc/hosts #69195

Closed
BenTheElder opened this Issue Sep 28, 2018 · 23 comments

Comments

Projects
None yet
7 participants
@BenTheElder
Copy link
Member

BenTheElder commented Sep 28, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened: My local docker cluster setup broke after #68890, after much debugging we discovered that the API server could not reach itself on localhost, further debugging showed that DNS lookups for localhost were occuring. Long story short, while we have /etc/hosts entries for localhost, the api server was not respecting them.

What you expected to happen: our standard control plane deployments should not lookup localhost via dns. they should respect /etc/hosts

How to reproduce it (as minimally and precisely as possible): create a cluster and monitor network traffic. then try again after adding a minimal /etc/nsswitch.conf to the api server pod.

Anything else we need to know?: Prior art: gliderlabs/docker-alpine#367 (comment)

This happens because we build with go's build in DNS resolution, which respects glibc's nsswitch.conf, but we build images based on busybox for some components (like the apiserver) which does not ship with one.

Environment: kubernetes using kind

  • Kubernetes version (use kubectl version): v1.13.0-alpha.0.1636+e9fe3f77e9b181

/sig release

@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Sep 28, 2018

/assign
cc @ixdy

Thanks a million to @MrHohn @neolit123 @justinsb and @liggitt for looking into this with me! this was tricky to resolve.

@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Sep 28, 2018

We should be able to fix this by just adding a minimal nsswitch.conf to the images make go's resolver happy.

alternatively we can standardize on eg debian-base which already ships one

@dims

This comment has been minimized.

Copy link
Member

dims commented Sep 28, 2018

oh wow. tough one to debug :)

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Sep 28, 2018

/milestone v1.13
/priority important-soon

@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Sep 28, 2018

@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Sep 28, 2018

@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Sep 28, 2018

For now adding the nsswitch.conf, but discussing with @ixdy and @Random-Liu, it sounds like switching to debian-base everywhere is the best long term answer. It will actually save a mb or so on each node 😅 ... and give a more consistent base for component images to build on.

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 14, 2018

I just got bit by this.

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 14, 2018

This issue will bite anyone deploying a statically linked go binary, not just the api server.

Can we get kubelet to slide in a default nsswitch.conf that is sane so it wont keep biting people? Similar to resolv.conf.

@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Nov 14, 2018

We'd need a feature request to sig-node and probably sig-network.

I do like this idea, but, nsswitch.conf is used for things other than networking so injecting one might be slightly problematic. In this case it was fine because we control the images and the binaries.

People deploying go images need to make sure they have a sane nsswitch.conf for now, or not deploy staticly linked binaries, unfortunately.

@dims

This comment has been minimized.

Copy link
Member

dims commented Nov 14, 2018

For 1.13, we should just document the problem

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 14, 2018

would it be difficult to add the file only if it doesn't exist in the image?

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 14, 2018

workaround for 1.12:
echo hosts: files dns myhostname > /etc/kubernetes/nsswitch.conf

and add to the ClusterConfig:

apiServerExtraVolumes:
- name: "some-volume"
  hostPath: "/etc/kubernetes/nsswitch.conf"
  mountPath: "/etc/nsswitch.conf"
  writable: false
  pathType: File
controllerManagerExtraVolumes:
- name: "some-volume"
  hostPath: "/etc/kubernetes/nsswitch.conf"
  mountPath: "/etc/nsswitch.conf"
  writable: false
  pathType: File
schedulerExtraVolumes:
- name: "some-volume"
  hostPath: "/etc/kubernetes/nsswitch.conf"
  mountPath: "/etc/nsswitch.conf"
  writable: false
  pathType: File 
@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Nov 14, 2018

Yeah that work around will work for now, I'm not sure how difficult / expensive it would be for the kubelet to conditionally add this, and it might be a surprising behavior change for arbitrary containers / pods.

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 14, 2018

it may be surprising, but its very surprising now too, that dns takes precident to hosts file unexpectedly. :(
I'd rather it be secure by default and surprise them that its safe. I didn't even know glibc had this backwards behavior when nsswitch.conf was missing. I haven't run up against a glibc based system that didn't have an nsswitch file. All distro's add one.

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 15, 2018

hmm... etcd is crashing too.... I hope there is an equiv of etcdExtraVolumes....

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 15, 2018

never mind on the etcd thing that was something else. The above yaml tweaks did solve the issue.

@nickperry

This comment has been minimized.

Copy link

nickperry commented Nov 20, 2018

We have just wasted a TON of time on this when a misconfigured Samsung TV on the other side of the world, plus lack of rigor or our DDNS policy broke our Kube 1.12 clusters.

I really don't like that these control plane images are hitting DNS for every lookup of localhost. I am very keen to see #71082 implemented.

@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Nov 20, 2018

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 20, 2018

@BenTheElder I would argue, that 1.12 was a broken release. 1.11 and below did not have the issue. As K8S is maturing, some of us, especially in more enterprise like environments, are not upgrading as fast, so the issues are being found slower. I think backporting the nsswitch.conf fix for the control plane images is appropriate to resolve the regression until a more universal fix can be put in place.

@dims

This comment has been minimized.

Copy link
Member

dims commented Nov 21, 2018

@nickperry that sounds like a great WHODUNIT war story :)

@BenTheElder

This comment has been minimized.

Copy link
Member Author

BenTheElder commented Nov 21, 2018

@kfox1111 generally I'm not upgrading production workloads particularly quickly either :-)

We can file cherry picks back to 1.12, but lack of an /etc/nsswitch.conf in the control plane images is not new: EG k8s.gcr.io/kube-apiserver:v1.11.1 does not contain an /etc/nsswitch.conf.

IMO, kubeadm setting ClusterFirstWithHostNet on control plane pods was certainly broken, which I questioned, but not just for this reason, and that change was eventually reverted IIRC. /cc @neolit123

Regarding #71082 -- because Kubernetes wants to be stable there are processes for proposing and vetting features prior to adding them, particularly for user facing changes! If you'd like to see this, please discuss it with sig-network and sig-node.
I don't personally have the bandwidth for this right now and I'm not even sure that it is even the right solution.

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Nov 21, 2018

IMO, kubeadm setting ClusterFirstWithHostNet on control plane pods was certainly broken, which I questioned, but not just for this reason, and that change was eventually reverted IIRC.

yes the change was reverted.
it also spawned a little bit of a discussion here for a rather complicated workaround:
#71010 (comment)

given this affects images that are already deployed at GCR, it's unlikely that it can be cherry picked. the images do support revisions, but this would also mean porting changes to all the bundled tools that use them as hardcoded tags. also this means revisions for all patch release images.

overall i see this as something that will not be approved by sig-network and the GCR maintainers.

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Nov 26, 2018

I guess to prevent a future regression, we probably need a test that inserts a localhost.example.com record somewhere during the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.