-
-
Notifications
You must be signed in to change notification settings - Fork 180
Skip getaddrinfo if "host" is already an IP address. #302
Conversation
elif type == socket.SOCK_DGRAM: | ||
proto = socket.IPPROTO_UDP | ||
else: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure of this section, it reproduces more of getaddrinfo's own logic than I'm comfortable with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine, though. AFAIK you can't really combine SOCK_STREAM
with anything other than IPPROTO_TCP
, same for UDP. And they only make sense for AF_INET
and AF_INET6
.
return None | ||
|
||
try: | ||
addr = ipaddress.ip_address(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put an LRU cache around this call?
Also, wrap _ipaddr_infos in an LRU cache.
I've added another patch now, thanks for your comments. |
@@ -533,9 +535,37 @@ def _getaddrinfo_debug(self, host, port, family, type, proto, flags): | |||
logger.debug(msg) | |||
return addrinfo | |||
|
|||
@functools.lru_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I don't want lru_cache
internals to hold references to event loops. How about we factor _ipaddr_infos
method into a top-level function in base_events.py
, returning None
or a tuple. This way we'll also share this cache between different loops. I'd also bump the cache size to 1024 or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point; I've added another commit that moves this to module scope and increases the cache size.
Allows loops to share the _ipaddr_infos cache, and prevents the cache from holding references to loops.
LGTM, but please wait for @gvanrossum's final approval. If you're going to commit this yourself, please squash all commits into one and merge by hand. Don't forget to update the CPython repo! |
Thanks so much Yury. I'll squash and commit here once we hear from Guido, but I don't think I have permissions to update CPython. |
return None | ||
else: | ||
af = socket.AF_INET if addr.version == 4 else socket.AF_INET6 | ||
return [(af, type, proto, '', (host, port))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something makes me worry that a recipient of this function could mutate this list and mess up the cached value. Maybe this should return a tuple and the call site could call list() on the result.
My suggested change should be simple enough. I have no other comments so you can commit with the suggested fix after either Yury or myself approves. |
New patch implements @gvanrossum's suggestion: _ipaddr_infos now returns a tuple and the call site calls list() on the result, to prevent inadvertent mutation of the value in the LRU cache. |
Done: _ipaddr_infos is now _ipaddr_info. |
@ajdavis Looks good! Please merge the PR (and I'll sync CPython repo myself). |
@@ -535,7 +559,12 @@ def _getaddrinfo_debug(self, host, port, family, type, proto, flags): | |||
|
|||
def getaddrinfo(self, host, port, *, | |||
family=0, type=0, proto=0, flags=0): | |||
if self._debug: | |||
info = _ipaddr_info(host, port, family, type, proto) | |||
if info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use is not None
here.
Cool to see the solution. LGTM (except maybe the |
Sorry, I didn't read the history of this pull request. _check_resolved_address() already has code to check if a string is a valid IPv4 or IPv6 address. I didn't remember why exactly, but I remember that we choose to not use the ipaddress module for this function. Recently, the call to _check_resolved_address() was made optional (only done in debug mode) for performance reason. But socket.inet_pton() is now used if available, instead of getaddrinfo(), for performance reason. I would prefer to not use ipaddress or use ipaddress in both functions (_check_resolved_address() & _ipaddr_info()). @functools.lru_cache(maxsize=1024) Is it worth to use a cache on such simple function? |
The reason why The most worrisome part of this code (from a quick glance) seems to be that I do like the fact that the ipaddress module never calls getaddrinfo() - |
IIRC Guido didn't want asyncio to depend on ipaddress, which was a relatively new (and provisional) module in the stdlib back then. I think it's OK now. I like the idea of using Here's what I propose:
The function is simple, the instantiation of |
Sounds good. On Sun, Dec 13, 2015 at 12:34 PM, Yury Selivanov notifications@github.com
--Guido van Rossum (python.org/~guido) |
Interesting.
@gvanrossum @Haypo what do you think? |
FWIW inet_pton() does support those (even though the man page on my Mac
I swear I didn't put that there, so it's probably due to some default Mac I don't know if it's acceptable to just ignore the % stuff if we don't have --Guido On Mon, Dec 14, 2015 at 10:06 AM, Yury Selivanov notifications@github.com
--Guido van Rossum (python.org/~guido) |
This is all clear, thanks - I'll send an updated PR in the next day or two. |
@ajdavis Please reflect this in a comment on why we use ipaddress module as a fallback. I was re-reading the issue, and forgot why we shouldn't try to use getaddrinfo/AI_NUMERICHOST until I saw that Guido's message. |
This has certainly been educational. Thanks for the simplifying suggestions; I've uploaded a new patch. |
try: | ||
addr = ipaddress.IPv4Address(host) | ||
except ValueError: | ||
host = host.partition('%')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't modify the host
variable -- it should be returned as-is.
For me too :) I've left a few more comments, but we're almost there. |
# If we happen to make an invalid address look valid, the code | ||
# will fail later on sock.connect or sock.bind. | ||
if af == socket.AF_INET6: | ||
host = host.partition('%')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary, socket.inet_pton
supports zone indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Mac, inet_pton accepts "::1%lo", and on Windows it accepts "::1%0". On Linux I can't get it to accept any address with a zone index, I don't think Linux's inet_pton supports zone indexes. I think we should keep stripping the zone index before passing to inet_pton, in order to reliably check whether the address is resolved or not. But you're right, I should return the original "host" parameter, not the one with zone index stripped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I just checked and couldn't get it to parse zone index on Linux too :( Then it's worth adding a comment there.
New patch. |
if proto not in {0, socket.IPPROTO_TCP, socket.IPPROTO_UDP} or host is None: | ||
return None | ||
|
||
if type & socket.SOCK_STREAM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment that on Linux, socket.type
can be a bit-mask. That was something new to me, for instance.
LGTM. @gvanrossum, please take a final look. |
elif family == socket.AF_INET6: | ||
host, port = address[:2] | ||
else: | ||
# Even though getaddrinfo with AI_NUMERICHOST would be non-blocking, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also move this comment ("Even though ...") to line 113, before we use the ipaddress
module? That will explain why we use it instead of AI_NUMERICHOST
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, new patch.
# On Windows, socket.inet_pton() is only available since Python 3.4. | ||
if hasattr(socket, 'inet_pton'): | ||
if family == socket.AF_UNSPEC: | ||
afs = socket.AF_INET, socket.AF_INET6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would make this a list (and also on line 96 below). Tuples ought to be used for "record"-like structures, and only rarely for read-only sequences. Here there's no particular need for the sequence to be read-only, and the cost of creating a 2-item list vs. a 2-item tuple isn't really worth worrying about given all the other things we do here.
Admirable work! I have a few style nits and one platform question. |
Thanks! All looks good now. FWIW I'd prefer not to see your local revision history in the asyncio logs. |
Squashed and committed in 39c135b |
Pushed to CPython in https://hg.python.org/cpython/rev/51df143c098d and https://hg.python.org/cpython/rev/33c7980552a6 |
getaddrinfo takes an exclusive lock on some platforms, causing clients to queue up waiting for the lock if many names are being resolved concurrently. Users may want to handle name resolution in their own code, for the sake of caching, using an alternate resolver, or to measure DNS duration separately from connection duration. Skip getaddrinfo if the "host" passed into create_connection is already resolved. See python/asyncio#302 for details. Patch by A. Jesse Jiryu Davis. --HG-- branch : 3.4
Response to this discussion:
https://groups.google.com/d/topic/python-tulip/-SFI8kkQEj4/discussion