Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listen peer URLs accepts domain names incorrectly #6336

Closed
chancez opened this issue Sep 2, 2016 · 13 comments
Closed

Listen peer URLs accepts domain names incorrectly #6336

chancez opened this issue Sep 2, 2016 · 13 comments
Labels
Milestone

Comments

@chancez
Copy link
Contributor

chancez commented Sep 2, 2016

From the docs: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/configuration.md#--listen-peer-urls etcd should not allow domain names in listen-peer-urls, but I'm finding that it does.

I don't actually know if this causes any problems, but it seems like it shouldn't be allowed regardless if it does work.

Unfortunately, this was found in my setup of etcd 2.3.2 so I didn't test with 3.0.0 nor 2.3.7 since neither are included in the CoreOS image and I didn't have time to check if this is true of the latest releases. However, I believe it does work, as @dghubble's examples in coreos-baremetal set up etcd using the domain name as peer URLs, which he says has been working.

My etcd2.service is located in the gist along with logs. As you can see, I'm getting no validation errors, and etcd proudly proclaims it's listening on a domain name + port.

https://gist.github.com/chancez/2b97ae9d2b0c225e98ac964e740f27bd

@heyitsanthony
Copy link
Contributor

I tried to reproduce this on my machine:

12:01:07 etcd1 | Starting etcd1 on port 5000
12:01:07 etcd1 | 2016-09-02 12:01:07.593722 I | etcdmain: etcd Version: 2.3.2
12:01:07 etcd1 | 2016-09-02 12:01:07.593762 I | etcdmain: Git SHA: ce63f10
12:01:07 etcd1 | 2016-09-02 12:01:07.593767 I | etcdmain: Go Version: go1.6.3
12:01:07 etcd1 | 2016-09-02 12:01:07.593771 I | etcdmain: Go OS/Arch: linux/amd64
12:01:07 etcd1 | 2016-09-02 12:01:07.593776 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
12:01:07 etcd1 | 2016-09-02 12:01:07.593781 W | etcdmain: no data-dir provided, using default data-dir ./infra1.etcd
12:01:07 etcd1 | 2016-09-02 12:01:07.635226 C | etcdmain: listen tcp: lookup what.ever on 75.75.75.75:53: no such host
12:01:07 etcd1 | Terminating etcd1

From the log, etcd seems to be reporting a DNS error (oddly, it is not "no such host") and terminating:

Sep 02 17:19:34 node1.chancez.test etcd2[799]: listen tcp: lookup node1.chancez.test: Temporary failure in name resolution

However, etcd then restarts and can suddenly listen:

Sep 02 17:19:44 node1.chancez.test etcd2[956]: listening for peers on http://node1.chancez.test:2380

What's going on with your DNS? dig node1.chancez.test?

@chancez
Copy link
Contributor Author

chancez commented Sep 2, 2016

@heyitsanthony it's just an A record pointing to 172.17.0.21 from my dnsmasq.

@heyitsanthony
Copy link
Contributor

OK, I see. You're right there's no validation that it's not a domain name. Some cursory googling shows that some people rely on this behavior though. Is there a reason not to support binding to a domain name so long as the resolved IP is available for binding? /cc @xiang90

@chancez
Copy link
Contributor Author

chancez commented Sep 2, 2016

@heyitsanthony it just seems...wrong to me. You don't "bind" to domain names. You bind to an interface, so if its doing dns queries to "figure out" its IP and bind to that, that just seems way too magic and way too likely to be incorrect in practice.

Imagine if the domain name is pointing to the IP of your load balancer, it would cause etcd to incorrectly try to bind using an IP it doesn't have, which seems like a bad thing. Also: this feels like a potential exploitation avenue, maybe @gtank has feelings on that though.

This has always been documented this way, so I don't think breaking people is a huge concern IMO as its been documented this way for a long time. I know this because I was the one who pushed for the docs to explicitly call this out. I'm a bit curious if this ever got validated though..I recall when I asked for the docs to explicitly call out IPs only, that there was code to validate it,..but i might be imagining things.

While I don't think this a feature etcd needs/should have, even if it was entirely a reasonable feature it probably shouldn't be allowed by default. If you wanted this to be a feature, it should be behind another option.. like --allow-listen-domains or something.

@heyitsanthony heyitsanthony added this to the v3.1.0 milestone Sep 2, 2016
@gtank
Copy link
Contributor

gtank commented Sep 6, 2016

Since etcd is security-relevant to several projects, we probably should err on the side of not letting a DNS compromise escalate (for example) to control of a Kubernetes cluster.

If there's a user need for this exposure we can think through it again, but if there isn't then why not eliminate the risk?

@hongchaodeng
Copy link
Contributor

Here's the use case. When GCE allocates a machine, it gives a name "kubernetes-master", etc. When the machine restarts, it could change internal/external IP. Name is the most convenient handle for network access.

@chancez @gtank Any thoughts on this?

@philips philips reopened this Sep 12, 2016
@philips
Copy link
Contributor

philips commented Sep 12, 2016

We should revert #6365. We can make a security note about using DNS in our docs but we just broke existing users without any transition or documentation updates.

@philips
Copy link
Contributor

philips commented Sep 12, 2016

@hongchaodeng do you have a link to the kubernetes upstream issue?

@hongchaodeng
Copy link
Contributor

Here's my opinion:
For 3.0.x we shouldn't break existing feature.
This is security fix and we will need it to prevent user from making mistakes. Let's move the change to 3.1+.

@xiang90
Copy link
Contributor

xiang90 commented Sep 13, 2016

@heyitsanthony Probably we shall print out security warning to start with?

@heyitsanthony
Copy link
Contributor

@xiang90 no one is going to pay attention to those warnings. Let's just revert in 3.0 and do the full change in 3.1? If it's serious enough to warrant a lock out, I don't think it makes sense to bother with a warning.

@xiang90
Copy link
Contributor

xiang90 commented Sep 13, 2016

@heyitsanthony OK. Let's revert this for now. We can discuss about how to "fix" this later.

@xiang90
Copy link
Contributor

xiang90 commented Sep 15, 2016

Move this to 3.2. We only print our warning and communicate we are going to do this in 3.1 for compatibility and ux reasons.

heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jan 26, 2017
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jan 26, 2017
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jan 26, 2017
gyuho added a commit to gyuho/etcd that referenced this issue Dec 7, 2017
gyuho added a commit to gyuho/etcd that referenced this issue Dec 8, 2017
denverwilliams added a commit to denverwilliams/kubernetes-linuxkit that referenced this issue Feb 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

6 participants