Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name resolver with IPv6 support #4227

Merged
merged 16 commits into from Jan 25, 2020
Merged

Name resolver with IPv6 support #4227

merged 16 commits into from Jan 25, 2020

Conversation

elacuesta
Copy link
Member

@elacuesta elacuesta commented Dec 11, 2019

Fixes #1031. Based on #1031 (comment).

Added the ability to enable an experimental name resolver with IPv6 support (DNS_RESOLVER setting, which defaults to the current scrapy.resolver.CachingThreadedResolver class). So far I couldn't find a "native" way to enforce a specific timeout for DNS requests, the approach in f1c1846 is not working (see https://twistedmatrix.com/trac/ticket/9748). Still, I think we can document this fact properly and give users the chance to choose, knowing the implications.

To do:

  • IPv6-specific tests (the current test suite works with the new resolver)
  • Docs

Shell examples:

$ scrapy shell http://ipv6.google.com -s DNS_RESOLVER=scrapy.resolver.CachingHostnameResolver
(...)
2020-01-16 03:58:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://ipv6.google.com> (referer: None)
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x101d635c0>
[s]   item       {}
[s]   request    <GET http://ipv6.google.com>
[s]   response   <200 http://ipv6.google.com>
[s]   settings   <scrapy.settings.Settings object at 0x104627ef0>
[s]   spider     <DefaultSpider 'default' at 0x104b5bef0>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response.url
Out[1]: 'http://ipv6.google.com'
$ scrapy shell http://ipv6.google.com
(...)
2020-01-16 03:59:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://ipv6.google.com> (failed 1 times): DNS lookup failed: no results for hostname lookup: ipv6.google.com.
2020-01-16 03:59:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://ipv6.google.com> (failed 2 times): DNS lookup failed: no results for hostname lookup: ipv6.google.com.
2020-01-16 03:59:38 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://ipv6.google.com> (failed 3 times): DNS lookup failed: no results for hostname lookup: ipv6.google.com.
Traceback (most recent call last):
  File "/.../scrapy/venv-scrapy/bin/scrapy", line 11, in <module>
    load_entry_point('Scrapy', 'console_scripts', 'scrapy')()
  File "/.../scrapy/scrapy/cmdline.py", line 145, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/.../scrapy/scrapy/cmdline.py", line 99, in _run_print_help
    func(*a, **kw)
  File "/.../scrapy/scrapy/cmdline.py", line 153, in _run_command
    cmd.run(args, opts)
  File "/.../scrapy/scrapy/commands/shell.py", line 74, in run
    shell.start(url=url, redirect=not opts.no_redirect)
  File "/.../scrapy/scrapy/shell.py", line 46, in start
    self.fetch(url, spider, redirect=redirect)
  File "/.../scrapy/scrapy/shell.py", line 114, in fetch
    reactor, self._schedule, request, spider)
  File "/.../scrapy/venv-scrapy/lib/python3.6/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "/.../scrapy/venv-scrapy/lib/python3.6/site-packages/twisted/python/failure.py", line 467, in raiseException
    raise self.value.with_traceback(self.tb)
twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: ipv6.google.com.

scrapy/resolver.py Outdated Show resolved Hide resolved
@codecov
Copy link

@codecov codecov bot commented Jan 15, 2020

Codecov Report

Merging #4227 into master will increase coverage by 0.12%.
The diff coverage is 94.64%.

@@            Coverage Diff             @@
##           master    #4227      +/-   ##
==========================================
+ Coverage   84.06%   84.19%   +0.12%     
==========================================
  Files         166      166              
  Lines        9730     9799      +69     
  Branches     1454     1467      +13     
==========================================
+ Hits         8180     8250      +70     
+ Misses       1296     1295       -1     
  Partials      254      254
Impacted Files Coverage Δ
scrapy/crawler.py 89.26% <100%> (-0.36%) ⬇️
scrapy/settings/default_settings.py 98.71% <100%> (+0.01%) ⬆️
scrapy/resolver.py 92.06% <93.02%> (+2.06%) ⬆️
scrapy/http/response/text.py 100% <0%> (ø) ⬆️
scrapy/http/request/__init__.py 100% <0%> (ø) ⬆️
scrapy/utils/defer.py 97.5% <0%> (+0.2%) ⬆️
scrapy/http/response/__init__.py 94.11% <0%> (+0.46%) ⬆️
scrapy/pipelines/__init__.py 92.85% <0%> (+0.54%) ⬆️
scrapy/core/downloader/__init__.py 90.83% <0%> (+1.52%) ⬆️
... and 1 more

scrapy/crawler.py Outdated Show resolved Hide resolved
scrapy/resolver.py Outdated Show resolved Hide resolved
scrapy/resolver.py Outdated Show resolved Hide resolved
scrapy/resolver.py Outdated Show resolved Hide resolved
@elacuesta elacuesta marked this pull request as ready for review Jan 18, 2020
@elacuesta elacuesta removed the discuss label Jan 18, 2020
scrapy/resolver.py Outdated Show resolved Hide resolved
@kmike
Copy link
Member

@kmike kmike commented Jan 25, 2020

Thanks @elacuesta!

@kmike kmike merged commit 8b8df31 into scrapy:master Jan 25, 2020
2 checks passed
@elacuesta elacuesta deleted the name-resolver branch Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants