-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPv6 support? Problem running home page example from an IPv6 network #1031
Comments
Scrapy uses ThreadedResolver (see https://github.com/scrapy/scrapy/blob/master/scrapy/resolver.py), and it uses stdlib socket.gethostbyname which doesn't support IPv6. So yes, Scrapy doesn't support IPv6 now. It looks like Twisted itself supports IPv6, and Scrapy may start supporting it by switching to some other resolver (from twisted.names?), but I haven't checked the details. |
Thanks for the explanation! |
https://docs.python.org/2/library/socket.html#socket.gethostbyname suggests using getaddrinfo, btw. |
Yes, it looks like ThreadedResolver subclass which uses getaddrinfo is an option. |
An implementation based on socket.getaddrinfo: #1104 |
While #1104 is not in the |
any progress in this? it seems that other than the here discussed address resolution scrapy seems to be able to do ipv6 requests: |
@qknight, is this really still an issue? Can you please provide a testcase/log? |
@nyov i don't have a clue about the current implementation but it seems, out of your response, that it supports ipv6 now. is there an explicit switch to force scrapy to use ipv6? |
Indeed, this is still an issue. Scrapy disables Twisted's IPv6 support by installing a non-IPv6-aware resolver. The problem is here: Line 289 in 1fd1702
If you don't want to trust the operating system's DNS caching for some reason, you can use the more modern API to install a custom resolver: https://twistedmatrix.com/documents/18.9.0/api/twisted.internet.interfaces.IReactorPluggableNameResolver.html#installNameResolver and, rather than subclassing a resolver within Twisted (you shouldn't need the internal Hope that this helps! |
I'm running into problems while trying to run the example on the scrapy.org home page from the FOSDEM IPv6-only Wi-Fi network. (The scraper works fine from an IPv4 network.)
If both IPv4 and IPv6 are enabled on my computer (OS X Yosemite), and the IPv4 is configured with DHCP, and thus gets a self-assigned address (169.254.x.x), then I get timeout errors:
If I turn off IPv4 completely, then scrapy fails with "No route to host" errors:
Note that I can open the blog.scrapinghub.com site in Safari, so the target web site does support IPv6 and the problem seems to be on scrapy's side.
The text was updated successfully, but these errors were encountered: