Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] DNS cache expiration. #2333

Open
wants to merge 7 commits into
base: master
from

Conversation

@ArturGaspar
Copy link
Contributor

commented Oct 18, 2016

Expires entries in the DNS cache. Fixes #905.

@codecov-io

This comment has been minimized.

Copy link

commented Oct 18, 2016

Current coverage is 83.40% (diff: 100%)

Merging #2333 into master will increase coverage by 0.08%

@@             master      #2333   diff @@
==========================================
  Files           161        161          
  Lines          8721       8756    +35   
  Methods           0          0          
  Messages          0          0          
  Branches       1284       1288     +4   
==========================================
+ Hits           7266       7303    +37   
+ Misses         1204       1203     -1   
+ Partials        251        250     -1   

Powered by Codecov. Last update bd4f156...b461e12

@eliasdorneles

This comment has been minimized.

Copy link
Member

commented Oct 18, 2016

Looks good to me!

@eliasdorneles eliasdorneles changed the title DNS cache expiration. [MRG+1] DNS cache expiration. Oct 18, 2016

# TODO: cache misses

dnscache = LocalCache(10000)
class ExpiringCache(LocalCache):

This comment has been minimized.

Copy link
@redapple

redapple Oct 19, 2016

Contributor

This could go into scrapy.utils.datatypes next to LocalCache

This comment has been minimized.

Copy link
@kmike

kmike Oct 19, 2016

Member

+1; also, it deserves its own tests

@redapple

This comment has been minimized.

Copy link
Contributor

commented Oct 19, 2016

Thanks @ArturGaspar ! A long-needed improvement.
Have you considered using 3rd party libraries for this? e.g. cachetools.TTLCache


DNS cache expiration time in seconds.

.. setting:: DNSCACHE_EXPIRATION

This comment has been minimized.

Copy link
@kmike

kmike Oct 19, 2016

Member

Scrapy has HTTPCACHE_EXPIRATION_SECS option; I think it is better to use consistent naming for similar settings.

DNSCACHE_EXPIRATION
-------------------

Default: ``86400``

This comment has been minimized.

Copy link
@kmike

kmike Oct 19, 2016

Member

I'd mention somewhere that it is 1 day

@kmike

This comment has been minimized.

Copy link
Member

commented Oct 19, 2016

@redapple the implementation is simple enough, my vote is not to add a dependency for it

@ArturGaspar ArturGaspar force-pushed the ArturGaspar:dns_caching branch from 2f3a78b to 47f3f78 Oct 24, 2016

@ArturGaspar ArturGaspar force-pushed the ArturGaspar:dns_caching branch from 2cf14a7 to b461e12 Oct 24, 2016

@redapple

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2016

@kmike , what do you think of the PR?
If we merge it, I believe it means we start working towards a 1.3.x series

# For iteritems() and itervalues() above.
return self.keys()
items = iteritems
values = itervalues

This comment has been minimized.

Copy link
@kmike

kmike Nov 14, 2016

Member

I think there is little value in implementing these PY2/PY3 differences. In Python3 items() is not really a generator anyways, it is a special view object which e.g. has a repr with values. I'd have the same methods for Python 2 and Python 3 here - why introduce porting issues ourselves?

This comment has been minimized.

Copy link
@ArturGaspar

ArturGaspar Dec 6, 2016

Author Contributor

items() and values() are inherited from OrderedDict. Starting with Python 3.5, they will bypass the custom __getitem__ (because OrderedDict is implemented in C now) and expose the actual (time, value) tuple.

In Python up to 3.4:

>>> list(d.items())
[('a', 1)]

In Python 3.5:

>>> list(d.items())
[('a', (1481032887.784956, 1))]

@redapple redapple added this to the v1.3 milestone Nov 16, 2016

@sibiryakov

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2016

Guys, each DNS record has a TTL, and I suggest to use it instead of using local setting https://support.google.com/a/answer/48090#TTL. Many DNS servers are respecting DNS record TTL when caching.

@redapple

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2016

It looks like supporting DNS-server-advertized TTLs would be quite some work as the current implementation uses twisted.internet.base.ThreadedResolver which will only get us one IP address, without any TTL (socket.gethostbyname() is used under the hood)

What about merging this (with a lower default expiration timeout like @dangra would prefer)
and work on a newer implementation (IPv6 compatible, TTL-respecting) as another PR?

Note that there is also twisted.internet.endpoints.HostnameEndpoint (since Twisted 13.2) which uses socket.getaddrinfo. If someone wants to have a look at that.

  • Twisted [13.2.0] now includes a HostnameEndpoint implementation which uses
    IPv4 and IPv6 in parallel, speeding up the connection by using
    whichever connects first (the 'Happy Eyeballs'/RFC 6555 algorithm).
    (#4859)
@kmike

This comment has been minimized.

Copy link
Member

commented Dec 22, 2017

I'm fine with merging a simple expiring cache first, without respecting TTLs.

@kmike kmike removed this from the v1.5 milestone Dec 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.