-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CachingHostnameResolver does not work with reactor.resolve() #4802
Comments
Thanks for the detailed report! |
Nice find! At the moment I'm not finding the place from where I took that inheritance pattern used in #4227. This is indeed weird, but I'm going to take the liberty of removing the "bug" label and adding "upstream issue" instead, because we are in fact receiving a class instead of an instance from Twisted (this might grant an issue in the Twisted tracker). |
@elacuesta Thanks for the quick response! I was able to test our your branch and it did fix the original exception for me. I like your general approach and it was better than what I could come up with late last night 😄 However, the next thing I ran into is that the cache doesn't exactly work in its current implementation. The first DNS request from my scraper resolves fine, but the second request to the same domain will just hang indefinitely. I think that returning the cached value from Here's what I came up with. I haven't tested thoroughly with IPv4 / IPv4 but it seems to work and has me unblocked for now. from twisted.internet._resolver import HostResolution
...
def resolveHostName(
self, resolutionReceiver, hostName, portNumber=0, addressTypes=None, transportSemantics="TCP"
):
cached_addresses = dnscache.get(hostName)
if cached_addresses:
resolutionReceiver.resolutionBegan(HostResolution(hostName))
for address in cached_addresses:
resolutionReceiver.addressResolved(address)
resolutionReceiver.resolutionComplete()
return resolutionReceiver
@provider(IResolutionReceiver)
class CachingResolutionReceiver:
def __init__(self):
self.addresses = []
def resolutionBegan(self, resolution):
resolutionReceiver.resolutionBegan(resolution)
def addressResolved(self, address):
resolutionReceiver.addressResolved(address)
self.addresses.append(address)
def resolutionComplete(self):
resolutionReceiver.resolutionComplete()
if self.addresses:
dnscache[hostName] = tuple(self.addresses)
return self.original_resolver.resolveHostName(
CachingResolutionReceiver(),
hostName,
portNumber,
addressTypes,
transportSemantics,
) |
Re-tagging as "bug" because of the hanging issue mentioned above. |
Description
Hi. Thank you for maintaining this awesome software :)
I am working on a project using scrapy that implements a custom downloader class (link).
I want to resolve IPv6 addresses, and I found the section in the documentation about the
DNS_RESOLVER
setting that was added in #4227. I tried enabling the newDNS_RESOLVER = "scrapy.resolver.CachingHostnameResolver"
and was immediately greeted with this exception.Steps to Reproduce
This is also reproducible using the bundled FTP downloader
scrapy startproject scrapy_test
scrapy genspider example mozz.us
DNS_RESOLVER = "scrapy.resolver.CachingHostnameResolver"
to the settings fileftp://mozz.us
scrapy crawl scrapy_test
Versions
Additional context
This was a tricky one to debug because everything works as expected with the HTTP Agent downloader. This issue only appears when you implement a downloader that depends on calling
reactor.resolve()
directly without usingtwisted.internet.endpoints.HostnameEndpoint
.I discovered that in the twisted IHostnameResolver interface, the
resolutionReceiver
method argument is expected to be an instance of a resolution receiver class, and not a type of a resolution receiver class. So I believe the scrapy code below is incorrect:scrapy/scrapy/resolver.py
Lines 76 to 80 in 5e99758
The subclass here only works with the Scrapy Agent because the
HostnameEndpoint
does this weird thing where it defines a class with only static methods, so it can pass the class itself instead of instantiating it.https://github.com/twisted/twisted/blob/22f949f7ce187513f0c218b73186c8a73baa00b4/src/twisted/internet/endpoints.py#L942-L958
However, there are other places in the twisted reactor where twisted does pass an object instance directly to this method.
https://github.com/twisted/twisted/blob/7e3ce790ca9f004ab386f9ecbba8f505d66cd3bd/src/twisted/internet/_resolver.py#L307
The text was updated successfully, but these errors were encountered: