Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add a DNSpython resolver #1088
This adds a resolver implementation using dnspython.org.
The hope was that it would be simpler than c-ares (it is, especially once rthalley/dnspython#300 lands, if it does) and also faster than the threaded resolver.
Here's a trivial benchmark, lifted from the test case:
def resolve(res): for index in range(100): try: res.gethostbyname('www.x%s.com' % index) except socket.error: pass
All times in seconds, smallest is best, hitting Google's public DNS server (18.104.22.168):
So, dnspython is not near as fast as the system (blocking) resolver or the c-ares resolver (even on pypy), but it has overhead comparable to the threaded resolver.
dnspython uses a cache of results. If I change the benchmark to just use a range of 20 and call it twice in a row, I get these results:
On PyPy, faster than c-ares now. Note that these results are wildly variable and extremely suspect. For example, on one run I had ares at 8.4s and dnspython at 1.4, but the next run had ares at 2.2s and dnspython at 23.6s. See below for more reliable numbers.
dnspython is just a DNS library, it does not use
This needs some further documentation updates before merging, and I'm not 100% convinced that it's really worth merging (the refactoring, yes, the dnspython, maybe not, based on those numbers---but we need better numbers). Feedback is, as always, extremely welcome.
Excellent question. I think the answer is "no", based on this vmprof trace (where I'm actually running 200 iterations): http://vmprof.com/#/556030b6-df67-4e71-bc38-120621e3cfbe
But there was so much noise from the DNS servers that it was basically pointless to measure anyway. I've set up a caching DNS server locally and am working to get some more reliable numbers.
New numbers! This script is hitting a single DNS server, a local dnsmasq instance configured to have everything cached. The result is numbers that are much more stable. The default backends were used (corecext and corecffi).
(PyPy (vmprof trace))
The number is the time taken by the total of 300 sequential
So if the threaded resolver doesn't work for you, and you can live without access to
Now, the above benchmarks are completely sequential, not demonstrating any parallelism at all. What happens if we batch up 100 greenlets and 100 DNS queries at the same time?
(Note that under PyPy if I let an exception bubble up, the ares process could crash.)
To put that in relative terms, how much faster was the parallel operation? (What percent of the sequential time did parallel take)?
So the dnspython resolver appears to scale better than the ares resolver or the threaded resolver.
[EDIT: Simplified the tables.]
DNSPython provides another valuable feature - TTL-obeying cache and the one you can reset. System caches are often not quite compliant and impossible to expire. This is a great addition IMO.
Could you also add latest production CPython (3.6) to bench? 3.7 is expected to have an additional 20% performance gain due to new method-resolution instruction set which would not be the case in 3.6.
A salient point.
If I enable the dnspython cache, PyPy's time goes from 0.18 to 0.07 for parallel, while 2.7 goes from 0.29 to 0.07 and 3.6 goes from 0.21 to 0.05.
Sequential time effectively becomes the same as the parallel time, after the first iteration.
The dnspython cache appears to respect the TTL of the DNS answers; although it does incur a syscall (
Feb 1, 2018
4 of 5 checks passed
@jamadden I decided to try to reproduce the results you've got here. I had issues with thread dns resolver as mentioned here then I moved to ares. In my experience, ares is fast and stable. I wanted to check if dnspython can do better than ares.
However, dnspython performs horribly on my machine. Here's the result using your benchmarking script:
dnspython is at least 10x slower than all other resolvers. I tried with dnspython 1.15.0 and also github master branch.
Any clue? Thanks.
The earlier script evolved into a repeatable benchmark shipped with gevent sources. I run it against a local dnsmasq server configured to cache everything (I found there was a tremendous degree of variability based on caches):
$ sudo dnsmasq -d --cache-size=100000 --local-ttl=1000000 --neg-ttl=10000000 --max-ttl=100000000 --min-cache-ttl=10000000000 --no-poll --auth-ttl=100000000000
And then I run the benchmark script like so: (you may want to tweak the values of the two constants to determine how long you want the script to run; don't forget to edit
$ GEVENT_RESOLVER_NAMESERVERS=127.0.0.1 python benchmarks/bench_dns_resolver.py --inherit-environ GEVENT_RESOLVER_NAMESERVERS
Running in Python 3.7.0 on my macOS machine this morning with