query is vulnerable to DNS hijacking when hostname is a second-level domain #6

Closed
John-Nagle opened this Issue Apr 4, 2012 · 2 comments

Projects

None yet

2 participants

@John-Nagle

Here's a DNS hijacking exploit with dnspython.
The host name of this host is "sitetruth.com", running CentOS 6, 64 bit, Python 2.7 and dnspython 1.9.4.

Here, we look up "noexample.com", which is a nonexistent domain.

>>> import dns
>>> resolv = dns.resolver.Resolver()
>>> resolv.domain
<DNS name com.>
>>> resolv.query("noexample.com")
<dns.resolver.Answer object at 0x2984b90>
>>> result = resolv.query("noexample.com")
>>> result
<dns.resolver.Answer object at 0x2984e10>
>>> result[0]
<DNS IN A rdata: 64.30.224.112>

64.30.224.112 is "search.com", an ad-heavy search site. We've had a DNS hijacking.

Compare what the Linux "host" command returns:

> host noexample.com
Host noexample.com not found: 3(NXDOMAIN)

So "host" gets it right, and dnspython gets hijacked. So this isn't a problem out in DNS; it's local. Really. This is not a problem with a bad DNS server. It's a client side problem.

Here's what's going on. Notice "resolv.domain" above, with a value "com". The host name is "sitetruth.com", so the "domain" of the host is "com". If the initial lookup fails, the "domain" is appended to the query and the query is retried. So the second lookup is "noexample.com.com", and the query has thus been hijacked to "com.com".

The proprietors of "com.com" have their DNS server set up to accept all queries for any unknown subdomain under "com.com" and divert it to the IP address of "search.com", a fake search engine which returns mostly ads.

The bug is that there should never be a second search with the "domain" appended when "domain" is a top-level domain. If "domain" doesn't have at least 2 components, it should not be used.

Most hosts have longer primary names such as "gator123.hostgator.com"; web sites they host are aliased. They don't hit this problem. But hosts where the host name is a primary domain name in ".com" are vulnerable.

It's possible to work around this bug by assigning

resolv2.domain = dns.name.from_text("")

This prevents such hijacking.

The same bug exists in glibc's "getaddrinfo". See that report for more detail, and how this interacts with the relevant RFC.
Ref: "http://sourceware.org/bugzilla/show_bug.cgi?id=13935"

@rthalley
Owner
rthalley commented Apr 7, 2012

I could easily make it so that there was no implicit searching for hosts which had names immediately below a GTLD. But this doesn't really fix the problem.

Consider a host "just-an-example.co.uk". Its implicit search domain is "co.uk.", and this passes the proposed "at least 2 components" rule (which is really a "you need a least three labels" rule, counting the root label). But if the co.uk registry ever registered "uk.co.uk", then the owners of that domain could do the same thing as com.com. A lookup for nonexistent-name.co.uk on just-an-example.co.uk would cause a lookup for nonexistent-name.co.uk.co.uk. For that matter, if 'uk.co.uk' were well behaved but delegated 'co.uk.co.uk' which was not so nice, then the same problem would also happen.

The bit of RFC 1535 which you quote in the glibc bug report is right, you don't want to search outside of your local administrative boundary. The current behavior, and your proposed change, attempt to determine this boundary with a simple rule, but as the example above shows, the boundary is far more complex.

The real problem here is the implicit generation of the search domain by stripping a label off of the host name. That's just not the right thing if your host happens to have the same name as the domain you want to search. Really we'd have been better off if there never were implicit searching, and if you wanted it you had to configure it. Dnspython's resolver is trying to behave like the usual libc resolver, and it parses resolv.conf in the same way. If I change its behavior, then it will act differently from how people expect, and things will break.

Possible workarounds (any one of them will do):

  1. administrators on machines whose hostname is their administrative domain should configure domain or search in /etc/resolv.conf explicitly, instead of allowing domain to default.

  2. Set the dnspython resolver object's search list to []

  3. Set the dnspython resolver object's domain to dns.name.root (this is the same as the suggestion in the bug report to use dns.name.from_text(""))

  4. Use absolute names when invoking the resolver, e.g instead of dns.resolver.query('some-name.com') do dns.resolver.query('some-name.com.') (note the trailing '.')

@John-Nagle

I agree that there's no great solution. The fundamental problem is that domain resolution needs to know what a TLD is. That is not a well-defined concept, and will get worse if ICANN is allowed to add a vast number of proprietary TLDs. As a practical matter, "com.com", "net.net", and "org.org" do capture unassigned domains, and com.com actually diverts traffic to ads. None of the major ccTLDs seem to do this.

Editing "resolv.conf" is not a good option, because it's usually generated by system administration software on modern systems. On some systems, it's regenerated on every reboot, and when a network interface goes up or down. Changing "domain" is often not an option; "webmin" insists that "domain" in resolv.conf agree with DNS. Also, the C implementation of getaddrinfo ignores "search" with an empty list.

I'd argue that dnspython should mimic the behavior of "host" and "nslookup", which don't have this behavior. That appears to be because they use a different interpretation of "ndots" than "getaddrinfo" does. The manual page for "host" says

"Names with fewer dots are interpreted as relative names and will be searched for in the domains listed in the search or domain directive in /etc/resolv.conf."

By comparison, the manual page for "resolv" says

"ndots:n sets a threshold for the number of dots which must appear in a name given to res_query(3) (see resolver(3)) before an initial absolute query will be made. The default for n is 1, meaning that if there are any dots in a name, the name will be tried first as an absolute name before any search list elements are appended to it."

So, with the default "ndots:1", and given an input domain name with two or more components, "host" doesn't do a relative name search at all, while "getaddrinfo" and dnspython do. Given that the RFC is ambiguous, one could arguably go either way, and the approach used in "host" is less likely to produce unexpected behavior. Looking up local one-component names will still work. That handles most of the real-life use cases.

(I discovered this because my web crawler went off into the fake search site at "nosuchdomain.com.com" for each bad link in ".com". That was due to the lookup in glibc. So I tried using dnspython to catch bad domains, and discovered that it had the same problem. At least I could stop that behavior with dnspython by creating a resolver and setting "domain" to empty" )

@rthalley rthalley closed this Mar 31, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment