[MRG] Enforce DNS resolution timeout #2496
Conversation
This is worth a backport on 1.0, 1.1 and 1.2 branches. |
Codecov Report@@ Coverage Diff @@
## master #2496 +/- ##
==========================================
+ Coverage 83.46% 83.48% +0.02%
==========================================
Files 161 161
Lines 8780 8779 -1
Branches 1288 1287 -1
==========================================
+ Hits 7328 7329 +1
+ Misses 1204 1203 -1
+ Partials 248 247 -1
Continue to review full report at Codecov.
|
@@ -16,8 +16,7 @@ def __init__(self, reactor, cache_size, timeout): | |||
def getHostByName(self, name, timeout=None): | |||
if name in dnscache: | |||
return defer.succeed(dnscache[name]) | |||
if not timeout: | |||
timeout = self.timeout | |||
timeout = (self.timeout,) |
dangra
Jan 31, 2017
Member
it means getHostByName callers can't not override timeout which is acceptable but it is possible for self.timeout
to be a tuple and convert it only if not. This way we don't lost access to existent functionality provided by t.i.b.ThreadedResolver.
it means getHostByName callers can't not override timeout which is acceptable but it is possible for self.timeout
to be a tuple and convert it only if not. This way we don't lost access to existent functionality provided by t.i.b.ThreadedResolver.
redapple
Jan 31, 2017
Author
Contributor
Sorry, I don't understand.
Sorry, I don't understand.
dangra
Jan 31, 2017
•
Member
np, it's me being obfuscated :)
the question is: Can DNS_TIMEOUT
be a tuple?
np, it's me being obfuscated :)
the question is: Can DNS_TIMEOUT
be a tuple?
dangra
Jan 31, 2017
Member
the question is: Can DNS_TIMEOUT be at tuple?
I guess not without some other major changes at
Line 290
in
4ca191e
the question is: Can DNS_TIMEOUT be at tuple?
I guess not without some other major changes at
Line 290 in 4ca191e
dangra
Jan 31, 2017
Member
Do I understand that you would like to allow users to pass a tuple of ints/floats as DNS_TIMEOUT setting value?
Yes, that was it.
Do I understand that you would like to allow users to pass a tuple of ints/floats as DNS_TIMEOUT setting value?
Yes, that was it.
kmike
Jan 31, 2017
Member
@redapple could you please add a comment above this line which explains why timeout
argument is ignored?
@redapple could you please add a comment above this line which explains why timeout
argument is ignored?
LGTM. |
I would vote for accepting tuple or int/float by Scrapy. There are still desperate people on this planet who tries to do broad crawling and they could benefit from this functionality. Cutting it off, you also cut DNS retries management. Someday DNS providers will thank you for having this option. |
@sibiryakov , I do not understand why accepting tuple helps with broad crawls (honest question). |
@sibiryakov , are you ok with opening a seperate new issue to discuss DNS resolution implementation and configuration options that would play nicer for broad crawls? |
sure @redapple |
Fixes #2461
As @rolando noticed,
DNS_TIMEOUT
setting was never actually used by Scrapy with 14<=Twisted<=16.6, because a(1, 3, 11, 45)
tuple was always passed to the DNS resolver by Twisted internally:And Twisted 16.7 actually stops passing this default value, but a tuple (or list) of ints is expected.