Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_getaddrinfo with family AF_UNSPEC in resolver_ares.py always queries both ipv4 and ipv6 #1012

Closed
asterite3 opened this issue Aug 13, 2017 · 6 comments · Fixed by #1594
Closed
Assignees
Labels
Type: Enhancement
Milestone

Comments

@asterite3
Copy link

@asterite3 asterite3 commented Aug 13, 2017

  • gevent version: any
  • Python version: any
  • Operating System: Linux (did not test other ones, they are probably affected too if ares works on them)

Description:

When ares resolver is used, gevent.socket.getaddrinfo is implemented by gevent.resolver_ares.Resolver.getaddrinfo which just calls gevent.resolver_ares.Resolver._getaddrinfo - a wrapper around ares. When address family parameter is set to AF_UNSPEC, it will use ares to make both ipv4 and ipv6 query - and wait for both results. The relevant code is (https://github.com/gevent/gevent/blob/27db87a/src/gevent/resolver_ares.py#L199):

        if family == AF_UNSPEC:
            ares_values = Values(self.hub, 2)
            ares.gethostbyname(ares_values, host, AF_INET)
            ares.gethostbyname(ares_values, host, AF_INET6)

This is not exactly how libc's (and, hence, built-in python's) getaddrinfo works and has problems when host is a domain name defined in /etc/hosts for only one of ipv4 or ipv6, but not both. In this case standard getaddrinfo will immediately return info about the address it found without querying DNS server through network. This is consistent with man page for getaddrinfo, which says AF_UNSPEC requests adresses for any family, (as far as I understand, it means not nesessarily both):

ai_family   This  field  specifies  the  desired  address  family  for  the returned
            addresses.  Valid values for this field include  AF_INET  and  AF_INET6.
            The  value  AF_UNSPEC  indicates that getaddrinfo() should return socket
            addresses for any address family (either IPv4 or IPv6, for example) that
            can be used with node and service.

The code in resolver_ares.py, on the contrary, will force ares to resolve for both families, causing it to make DNS requests through network for the address family for which hostname is not defined in /etc/hosts.
This has several problems:

  1. If a server can contact it's DNS server, it will lead to unnesessary request to it, which leads to leakage of private hostnames (a lot of admins configure servers to use some public DNS, like 8.8.8.8) and some extra delay.
  2. If a server has a DNS server configured, but can not access it (for example, due to a firewall blocking outgoing traffic, network or DNS server problems etc.), the function may hang for a long time (until some timeout occurs).

A situation when a hostname is configured only for ipv4 is rather common - for example, a fresh ubuntu install will add a record for its hostname for ipv4, but not ipv6. Situation when a program feeds it's hostname to getaddrinfo is also not uncommon - it happens when it calls getfqdn with no arguments or with argument 0.0.0.0 , for example it can happen when gevent's WSGIServer starts: https://github.com/gevent/gevent/blob/master/src/gevent/pywsgi.py#L1485

Reproducing

I used the following script for testing:

import gevent.socket

# ares-dns-test is a hostname of VM, it is an alias in /etc/hosts for 127.0.0.1 but not ::1
print(gevent.socket.getaddrinfo('ares-dns-test', None))

I tested the issue on two test stands - a VM and a docker container. Here is how it looks on a VM (Ubuntu 17.04):

$ cat /etc/resolv.conf 
nameserver 8.8.8.8
$ cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       ares-dns-test

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Running test script without ares (regular resolver is used):

$ strace -f -e connect,sendto python test.py
strace: Process 1014 attached
[pid  1014] connect(6, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid  1014] connect(6, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[(2, 1, 6, '', ('127.0.1.1', 0)), (2, 2, 17, '', ('127.0.1.1', 0)), (2, 3, 0, '', ('127.0.1.1', 0))]
[pid  1014] +++ exited with 0 +++
+++ exited with 0 +++

Using ares:

$ GEVENT_RESOLVER=ares strace -f -e connect,sendto python test.py
connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
sendto(6, "\276L\1\0\0\1\0\0\0\0\0\0\rares-dns-test\0\0\34\0\1", 31, MSG_NOSIGNAL, NULL, 0) = 31
...

Wireshark shows:
ares-dns

Now, if we cut out network (for docker, bringing down docker0 interface of host will give the same effect):

$ ip r
default via 10.0.2.2 dev enp0s3 
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15
$ sudo ip route del default
$ # add wrong default GW to simulate network problem/firewall
$ sudo ip route add default via 10.0.2.99
$ time python test.py 
[(2, 1, 6, '', ('127.0.1.1', 0)), (2, 2, 17, '', ('127.0.1.1', 0)), (2, 3, 0, '', ('127.0.1.1', 0))]

real    0m0.069s
user    0m0.056s
sys     0m0.008s
$ time GEVENT_RESOLVER=ares python test.py 
[(2, 1, 6, '', ('127.0.1.1', 0)), (2, 2, 17, '', ('127.0.1.1', 0)), (2, 3, 0, '', ('127.0.1.1', 0))]

real    0m58.049s
user    0m0.040s
sys     0m0.008s

The same happens with getfqdn and WSGIServer:

$ time GEVENT_RESOLVER=ares python -c 'import gevent.socket; gevent.socket.getfqdn()'

real    0m59.051s                                                                             
user    0m0.044s                                                                              
sys     0m0.012s
$ time GEVENT_RESOLVER=ares python -c 'from gevent.pywsgi import WSGIServer; WSGIServer("0.0.0.0:8000").start()'

real    0m58.071s                                                                             
user    0m0.052s                                                                              
sys     0m0.024s
@jamadden
Copy link
Member

@jamadden jamadden commented Aug 13, 2017

My man page for getaddrinfo say something different about ai_family:

ai_family      The protocol family that should be used.  When ai_family is set to PF_UNSPEC,
               it means the caller will accept any protocol family supported by 
               the operating system.

I don't see anything in either text that definitively says only one or the other will ever be queried, or indeed spells out behaviour when nameservers are/not reachable.

There is a list of differences in how the c-ares resolver works compared to the native resolver. I would consider a PR that summarizes this into a bullet point for that list. The entry about "ipv6 and ipv4 may be considered and returned in different orders" comes close.

Handling of /etc/hosts has long been one of the biggest differences for the c-ares resolver compared to the native resolver (a few of the bullet points reference that explicitly). Given the nature of the c-ares resolver, there's not much we can do about that. That's why the native resolver remains the default. You could file a bug against the c-ares project if you'd like it to short-circuit and abort lookups in some set of circumstances related to the hosts file and network reachability (but given that it's primarily a DNS library, I don't know how that would be received).

UPDATE: This c-ares bug is relevant and explains why gevent does what it does.

@jamadden
Copy link
Member

@jamadden jamadden commented Aug 13, 2017

Thank you for the detailed report and analysis, by the way.

@asterite3
Copy link
Author

@asterite3 asterite3 commented Aug 13, 2017

UPDATE: This c-ares bug is relevant and explains why gevent does what it does.

Yes, c-ares/c-ares#70 is a reason why feeding AF_UNSPEC to ares.gethostbyname instead of calling it twice will have some issues: for example, if a host will have a name defined in /etc/hosts for both ipv4 and ipv6, getaddrinfo of standard resolver will return info about both addresses, while ares returns only one of them:

# cat /etc/hosts
127.0.0.1       localhost testing
::1     localhost ip6-localhost ip6-loopback testing
# python -c 'import gevent.socket; print gevent.socket.getaddrinfo("testing", None)'
[(10, 1, 6, '', ('::1', 0, 0, 0)), (10, 2, 17, '', ('::1', 0, 0, 0)), (10, 3, 0, '', ('::1', 0, 0, 0)), (2, 1, 6, '', ('127.0.0.1', 0)), (2, 2, 17, '', ('127.0.0.1', 0)), (2, 3, 0, '', ('127.0.0.1', 0))]

With fix feeding AF_UNSPEC to ares instead of double call applied:

# GEVENT_RESOLVER=ares python -c 'import gevent.socket; print gevent.socket.getaddrinfo("testing", None)'
[(2, 1, 6, '', ('127.0.0.1', 0)), (2, 2, 17, '', ('127.0.0.1', 0)), (2, 3, 0, '', ('127.0.0.1', 0))]

The fix applied here is

diff --git a/src/gevent/resolver_ares.py b/src/gevent/resolver_ares.py
index 196e4c47..cf08af46 100644
--- a/src/gevent/resolver_ares.py
+++ b/src/gevent/resolver_ares.py
@@ -197,9 +197,8 @@ class Resolver(object):
         ares = self.ares
 
         if family == AF_UNSPEC:
-            ares_values = Values(self.hub, 2)
-            ares.gethostbyname(ares_values, host, AF_INET)
-            ares.gethostbyname(ares_values, host, AF_INET6)
+            ares_values = Values(self.hub, 1)
+            ares.gethostbyname(ares_values, host, AF_UNSPEC)
         elif family == AF_INET:
             ares_values = Values(self.hub, 1)
             ares.gethostbyname(ares_values, host, AF_INET)

On the other hand, it fixes issues described above, removing problems from gevent itself and leaving some on ares side. What do you think? Is it better to wait until ares makes fixes/implements getaddrinfo (there is also some discussion in c-ares/c-ares#112 and c-ares/c-ares#94)?

@jamadden
Copy link
Member

@jamadden jamadden commented Aug 13, 2017

I view the current gevent behaviour as a reasonable compromise until the c-ares issue is fixed. Once it is, we can update and adapt.

@jamadden
Copy link
Member

@jamadden jamadden commented Feb 7, 2018

FWIW, the new-in-1.3a2 dnspython resolver shares this issue. I think we're in a better place to address that, though, since we're handling /etc/hosts directly.

@therealprof
Copy link

@therealprof therealprof commented Feb 19, 2018

I've danced around AF_UNSPEC problems for quite some time now and have to agree that always querying for both A and AAAA records is the only sensible way to deal with the funkiness of different OSses and on Linux with different libc implementations. I don't think that the Ubuntu default configuration errors leading to potential information leaks mentioned by @asterite3 are really important enough to break properly configured systems. If DNS is blocked you're screwed anyway, regardless of whether you (unnecessarily) try to resolve AAAA records.

@jamadden jamadden added this to the 1.5.0 milestone Mar 23, 2020
@jamadden jamadden self-assigned this Apr 27, 2020
@jamadden jamadden added the Type: Enhancement label Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants