New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache proxy bypass #3885
Cache proxy bypass #3885
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3885 +/- ##
==========================================
+ Coverage 89.08% 89.32% +0.24%
==========================================
Files 15 15
Lines 1896 1940 +44
==========================================
+ Hits 1689 1733 +44
Misses 207 207
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so this is a good start! I've left some notes inline in the diff.
requests/structures.py
Outdated
@@ -103,3 +104,94 @@ def __getitem__(self, key): | |||
|
|||
def get(self, key, default=None): | |||
return self.__dict__.get(key, default) | |||
|
|||
|
|||
class TimedCache(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is basically a collections.MutableMapping
! We should make it actually be one, because that gives us a number of handy-dandy convenience methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update to have TimedCache inherit collections.MutableMapping
requests/structures.py
Outdated
:return: the value in the cache, or None | ||
""" | ||
|
||
return self.get(key, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, generally speaking it is better to implement get
in terms of __getitem__
than the other way around. __getitem__
should almost always maintain dict-like behaviour of raising KeyError
if the entry is not present, rather than returning None
. The core concern here is that it needs to be possible to distinguish between "key not present in dictionary" and "key present with a value of None
".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update.
requests/structures.py
Outdated
|
||
found = self._dict.get(key, default) | ||
if not found: | ||
return default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is wrong. Happily, it doesn't matter: if you make this a subclass of MutableMapping
you don't actually need to implement this method at all, we get it for free.
requests/structures.py
Outdated
return default | ||
|
||
occurred, value = found | ||
now = int(time.time()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is a problem.
If possible we'd like to use time.monotonic
instead, as time.time
can go backwards. So I think in this module you'll need to feature detect on import to see whether time.monotonic
is present and to prefer it if it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. To clarify, you're saying if time.monotonic
is not present, it's ok to fall back to time.time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
requests/structures.py
Outdated
:param value: the value to be added to the cache | ||
""" | ||
|
||
now = int(time.time()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note about monotonic time.
requests/structures.py
Outdated
Locates the value at lookup key, | ||
if cache is full, will evict the first | ||
found expired entry, otherwise will evict | ||
the oldest non-expired item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're wrapping this paragraph very short. Feel free to go all the way out to 89 columns!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update.
requests/structures.py
Outdated
Locates the value at lookup key, | ||
if cache is full, will evict the first | ||
found expired entry, otherwise will evict | ||
the oldest non-expired item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here, can you add a blank line separating the paragraph blurb from the :param:
directives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update.
requests/structures.py
Outdated
if len(self._dict) >= self.maxlen: | ||
replacement_candidate_key = None | ||
max_time_delta = 0 | ||
for cur_key, (occurred, _) in self._dict.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complete search can probably be avoided by using an OrderedDict
, can't it? urllib3 includes one that works on all the Python versions we support, so you can import that and use it as the base implementation to save ourselves this iterative walk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will replace self._dict
with an OrderedDict
.
requests/utils.py
Outdated
PROXY_BYPASS_CACHE[netloc] = result | ||
return result | ||
else: | ||
return cached_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is that this be reimplemented as a generic decorator, rather than written to specifically handle proxy_bypass
. That way we get a much easier time of testing the timed cache and all the relevant code. =)
tests/test_structures.py
Outdated
@@ -74,3 +74,56 @@ def test_getitem(self, key, value): | |||
@get_item_parameters | |||
def test_get(self, key, value): | |||
assert self.lookup_dict.get(key) == value | |||
|
|||
|
|||
class TestTimedCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid making this an old-style class by putting (object)
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update.
@Lukasa I see a line like this already in the code base
as a top level definition. I've attempted to mock those, but my attempts so far have had some code smells. I've added a new commit that attempts to mock those in the least smelly way, but it's not ideal. Can you help me iron these out so they aren't so strange? My approaches so far:
Would appreciate any suggestions. |
Your mocking will get a lot easier if you make the |
I think I must have had a typo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this is generally looking really good: substantial progress! I've written some more notes.
requests/structures.py
Outdated
""" | ||
Evicts entries after expiration_secs. | ||
If none are expired and maxlen is hit, | ||
will evict the oldest cached entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have a slightly wider docstring than this! 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
requests/structures.py
Outdated
self._dict = OrderedDict() | ||
|
||
def __repr__(self): | ||
return '<timed-cache maxlen:%d len:%d>' % \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be TimedCache
, and it should also provide the expiration_secs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
requests/structures.py
Outdated
try: | ||
return self.__getitem__(key) | ||
except KeyError: | ||
return default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to implement this: collections.MutableMapping
will do it for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
requests/structures.py
Outdated
|
||
if len(self._dict) >= self.maxlen: | ||
for _ in range(len(self._dict) - self.maxlen + 1): | ||
self._dict.popitem(last=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be more cleanly expressed as:
while len(self._dict) >= self.maxlen:
self._dict.popitem(last=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated.
|
||
def clear(self): | ||
"""Clears the cache""" | ||
return self._dict.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one we also technically get for free from MutableMapping
, but in this case we can do it much faster than the generic implementation so I'm happy to have this override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I've left the override intact.
tests/test_structures.py
Outdated
assert fnc_stub.call_count is 1 | ||
assert len(cache) is 1 | ||
|
||
assert invocation_spy.call_count is 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, so this is a good test, but it mostly tests implementation details. For this, it's enough to just mock out the time and then write a small function that constantly returns an incrementing value, then wrap it in the decorator and use the timer to demonstrate the behaviour.
Generally speaking, avoiding mocks is a good thing wherever we can. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I've re-made the test more along these lines. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, really close now! A couple of minor stylistic notes and one request to expand a test slightly and we're good to go!
requests/structures.py
Outdated
:param expiration_secs: the number of seconds to hold on | ||
to entries | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor nit, we don't need this blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
requests/utils.py
Outdated
|
||
:rtype: bool | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same again, don't need this blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
requests/utils.py
Outdated
Looks for netloc in the cache, | ||
if not found, will call proxy_bypass | ||
for the netloc and store its result | ||
in the cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This again can be flowed out to column 79 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
first_result = some_method(1) | ||
assert first_result is 1 | ||
second_result = some_method(1) | ||
assert second_result is 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also check that a different argument doesn't return a cached value.
@Lukasa thanks for your help with this. As for squashing up all of the commits this PR has accumulated--who should do that? Do the maintainers of this lib do that? Or I can squash locally and push to a new PR (or just force push). |
If you squash locally and force push that's fine, GitHub will work it all out. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed some extra whitespace. 😔 Let's clear all of that up and then we will be good to go. 👍
requests/structures.py
Outdated
""" | ||
Wrap a function call in a timed cache | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one.
requests/structures.py
Outdated
Evicts entries after expiration_secs. If none are expired and maxlen is hit, | ||
will evict the oldest cached entry | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one
requests/structures.py
Outdated
:param key: which entry to look up | ||
:return: the value in the cache, or None | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one
requests/structures.py
Outdated
:param key: the key to search the cache for | ||
:param value: the value to be added to the cache | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one
tests/test_structures.py
Outdated
|
||
|
||
class TestTimedCache(object): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one
tests/test_structures.py
Outdated
|
||
|
||
class TestTimedCacheManagedDecorator(object): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one
tests/test_structures.py
Outdated
class TestTimedCacheManagedDecorator(object): | ||
|
||
def test_caches_repeated_calls(self, mocker): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one.
ab9fc8e
to
2d4438d
Compare
Used to alleviate long gethostbyaddr calls Made new TimedCache and decorator to wrap a function with a cache * Entries looked up older than a minute (default amount) are evicted. * When full, evicts the oldest entry
2d4438d
to
c121b98
Compare
Force pushed commit addresses the whitespace issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'm happy with this! As this is a big-ish change in functionality, I'd like to give @sigmavirus24 a chance to review before I merge, but if he doesn't get time I'll merge it Monday. Great work @davidfontenot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll submit these updates in a separate pull request
(self.maxlen, len(self._dict), self.expiration_secs) | ||
|
||
def __iter__(self): | ||
return map(lambda kv: (kv[0], kv[1][1]), self._dict.items()).__iter__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we don't just build a generator here.
map()
is slow andlambda
s are slow. This intentional on Guido's part.- Calling
__iter__
is really poor form. - This would be far simpler and easier to read as
return ((key, value[1]) for key, value in self._dict.items())
return map(lambda kv: (kv[0], kv[1][1]), self._dict.items()).__iter__() | ||
|
||
def __delitem__(self, item): | ||
return self._dict.__delitem__(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not write this as:
del self._dict[item]
__delitem__
doesn't return anything so there's no need to worry about returning something here. And it would avoid calling the dunder method directly, which I'll repeat, is poor form.
|
||
if now - occurred > self.expiration_secs: | ||
del self._dict[key] | ||
raise KeyError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should raise a KeyError
with the key as an argument, e.g., KeyError(key)
.
while len(self._dict) >= self.maxlen: | ||
self._dict.popitem(last=False) | ||
|
||
return self._dict.__setitem__(key, (now, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, __setitem__
doesn't return anything. This should just be
self._dict[key] = (now, value)
There's no need to obfuscate any of this.
There were some odd decisions made about the implementation of some of the required methods for MuttableMapping in the TimedCache object. This cleans those up and makes the implementation, ever so slightly, easier to read. See also psf#3885
Added simple cache over
urllib.proxy_bypass
to remedy #2988. Holds on to entries for a minute. Once it reaches its max size, will evict the first entry that is the oldest.Added tests over the structure, did not add a test that
should_bypass_proxies
invokes the cache.This is my first pull request; please let me know if I've overlooked something.
Additionally, when I check out master, run
make coverage
in pipenv, the test named 'TestRequests.test_proxy_error' fails. This is still the only failing test when I use my branch. The error isConnectionError: ('Connection aborted.', BadStatusLine("''",))
. Is there some additional setup I need to do to run the test suite?