Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Retries #326

Merged
merged 39 commits into from 4 months ago

5 participants

Kevin Burke Andrey Petrov Donald Stufft Ian Cordasco Cory Benfield
Kevin Burke

Okay:

  • Adds support for retries, per the spec we laid out (mostly).
  • Adds docs etc.
  • Splits out util.py into submodules - I need to know what you want to do here because as written it would change from urllib3.util.Timeout into urllib3.util.timeout.Timeout which is not good.
  • adds a new /successful_retry handler to the DummyTestCase which keys based on a test-name header and returns 200 only after the request has been retried once.
  • I believe there are some API changes here.
    • some subclasses of httplib.HTTPException can actually be raised as connection errors because they imply the request never actually got sent to the server.
    • urlopen previously would retry on read timeouts, which violated the urlopen contract (as I understood it) of only retrying things that couldn't possibly be side effecting. this code does not retry read timeouts by default.

I am also testing this in two new environments - in the office which places my IP on 10.* subnet and I think has weird/different behavior when connecting to TARPIT_HOST than do standard wifi networks, and without an Internet connection, in which case a bunch of tests fail. Also, it's difficult to test some of these exceptional cases because the errors raised rely on the network stack, which (I think) is why the tests are failing on the branch. I'm still looking into it.

Either way I am losing some confidence in the connection timeout tests; getting a network to reliably generate ECONNREFUSED, or not generate it and tie up a connection, is tricky, unless we want to go down a path like this: http://bugs.python.org/file26890/test_timeout.patch

[Edit: This is an implementation of #260]

Kevin Burke

This is a lot! Sorry!

Kevin Burke kevinburke closed this
Kevin Burke kevinburke deleted the branch
Kevin Burke kevinburke reopened this
Kevin Burke

Also open to ideas about how to make this more manageable - looking at the util.py split into smaller modules is a good candidate.

Andrey Petrov shazow commented on the diff
dummyserver/handlers.py
@@ -168,6 +170,21 @@ def encodingrequest(self, request):
def headers(self, request):
return Response(json.dumps(request.headers))
+ def successful_retry(self, request):
Andrey Petrov Owner
shazow added a note

IMO these things are easier to do in socket-level tests (we already have some similar ones). Any particular reason why not do it that way?

I guess I just preferred to use HTTP as server interface, and I am not too solid on the semantics of when to call accept(), close(), whether I have to close the server, whether you can reuse sockets across requests, etc.

There's also a standing issue with DummyServer where the tests hang if an exception is raised during the run.

Andrey Petrov Owner
shazow added a note

There are lots of example tests written. Here's one that does exactly what you want: https://github.com/shazow/urllib3/blob/master/test/with_dummyserver/test_socketlevel.py#L134

Up to you though. We can keep it the way it is if you're happy with it.

Cory Benfield Collaborator
Lukasa added a note

I'm +0.5 on the socket-level tests: I find them substantially easier to follow when I'm coming in to the library because all the logic is in one place. Take that under advisement, though. =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/response.py
@@ -0,0 +1,70 @@
+# urllib3/util.py
+# Copyright 2008-2014 Andrey Petrov and contributors (see CONTRIBUTORS.txt)
+#
+# This module is part of urllib3 and is released under
+# the MIT License: http://www.opensource.org/licenses/mit-license.php
+
+from base64 import b64encode
+
+from ..packages import six
+
+
+def make_headers(keep_alive=None, accept_encoding=None, user_agent=None,
Andrey Petrov Owner
shazow added a note

Should this be in urllib3.util.request rather than response?

Resolved by b179f65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((118 lines not shown))
+ 1, 2, 4... seconds between retries.
+
+ By default, the backoff factor is 0, which means that urllib3 will not
+ sleep between retry attempts.
+ """
+
+ DEFAULT_METHOD_WHITELIST = set(['HEAD', 'GET', 'PUT',
+ 'DELETE', 'OPTIONS', 'TRACE'])
+ SERVER_ERROR_RESPONSE = xrange(500, 599)
+ NON_200_RESPONSE = xrange(300, 599)
+
+ def __init__(self, total=None, connect=3, read=0, redirects=3,
+ method_whitelist=DEFAULT_METHOD_WHITELIST,
+ codes_whitelist=None, backoff_factor=0):
+
+ # This is kind of ugly; we get in this situation because max in Python
Andrey Petrov Owner
shazow added a note

Why not just use a _Default = object() sentinel for total=_Default instead, like we do in other places?

I suppose, but then you'd still have to convert total to a number so that max(connect_error_count, total) works in Python 3.

>>> max(3, object())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: object() > int()

Alternatively, there could be logic that checks the value of total before trying to apply the max function. Though I think I'd prefer just setting total as an integer as it's easier to reason about that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((152 lines not shown))
+ self.backoff_factor = backoff_factor
+
+ self._total_error_count = 0
+ self._connect_error_count = 0
+ self._read_error_count = 0
+ self._redirect_count = 0
+
+
+ def _compute_backoff(self):
+ """ Formula for computing the current backoff
+
+ :rtype: float
+ """
+ return self.backoff_factor * (2 ** max(self._total_error_count - 1, 0))
+
+ def sleep(self):
Andrey Petrov Owner
shazow added a note

Hmm I feel weird about having an explicit .sleep() function call every time we increment. Any thoughts about making it implicit within the .increment(..)? Could even include an .increment(..., backoff_sleep=True) param for disabling.

I'd prefer this as well, the problem is you need to do the following in order:

  1. increment the counter
  2. check if retries are exhausted. If so, raise from the calling code; the Retry object doesn't and shouldn't contain the state to raise the proper error.
  3. Sleep, only if we haven't raised an exception.

I suppose increment could also check exhaustion, and raise? I believe there is at least one place in urlopen where increment is not immediately followed by a raise or a sleep.

Andrey Petrov Owner
shazow added a note

Yes, let's do that. I'm happy to include flags which disable raising/sleeping if necessary, though not sure where that place is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((237 lines not shown))
+ """ Increment the retry counters
+
+ :param response: A response object, or None, if the server did not
+ return a response.
+ :type response: :class:`~urllib3.response.HTTPResponse`
+ :param error: An error encountered during the request, or None if there
+ was none.
+ :type error: :class:`~urllib3.exceptions.ConnectTimeoutError`,
+ :class:`~urllib3.exceptions.ReadTimeoutError`
+ """
+ self._total_error_count += 1
+
+ # In the event of an unanticipated error, we want to ensure we
+ # increment *something.* Otherwise you can get into an infinite loop, if
+ # we keep catching errors but never incrementing a counter.
+ _incremented = False
Andrey Petrov Owner
shazow added a note

Strange to _-prefix a local var.

+1

fixed in 57c24be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((268 lines not shown))
+ """ Determine whether HTTP response is an error response """
+ return self._is_error_response(request_method, response)
+
+
+ def is_exhausted(self):
+ """ Determine whether we're out of retry attempts
+
+ For each of connection errors, read errors (timeouts or bad status
+ codes), and redirects, compute the maximum of the specified number of
+ errors and the total allowable number of errors (if it was specified),
+ and check it against the number of errors we've seen so far.
+
+ :rtype: bool
+ """
+
+ def _should_apply_total():
Andrey Petrov Owner
shazow added a note

What's the purpose of this method?

Just to make the if statement immediately below it read better. I suppose we could just name the variable _should_apply_total and be done with it.

Andrey Petrov Owner
shazow added a note

Let's avoid making extra closures and function calls when they're not adding anything. Even performance aside, it's just confusing. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/secure.py
@@ -0,0 +1,131 @@
+from binascii import hexlify, unhexlify
+from hashlib import md5, sha1
+
Andrey Petrov Owner
shazow added a note

Maybe just call this ssl_.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Andrey Petrov
Owner

Thanks for this Kevin! That's a lot of really useful work. :) I really appreciate it.

Some feedback:

  • Adds support for retries, per the spec we laid out (mostly).

I noted a few questions in code comments.

  • Splits out util.py into submodules - I need to know what you want to do here because as written it would change from urllib3.util.Timeout into urllib3.util.timeout.Timeout which is not good.

In the future, a separate change might have been more productive, but we can try to plow through this. Could you add the relevant imports to urllib3/util/__init__.py such that all the existing urllib3.util.Timeout and such continue to work?

  • adds a new /successful_retry handler to the DummyTestCase which keys based on a test-name header and returns 200 only after the request has been retried once.

As I mentioned, I think this would be better done in socket-level tests, unless you're fully satisfied with what you have now.

  • urlopen previously would retry on read timeouts, which violated the urlopen contract (as I understood it) of only retrying things that couldn't possibly be side effecting. this code does not retry read timeouts by default.

Isn't that exactly why we have method_whitelist?

Thanks again for doing this. You're much more thorough than I would have been in my first implementation. :)

P.S. Worth noting that there's a Py3 failure right now.

urllib3/response.py
@@ -50,6 +49,21 @@ def _get_decoder(mode):
return DeflateDecoder()
+def is_fp_closed(obj):
Andrey Petrov Owner
shazow added a note

Why is this here now?

Cory Benfield Collaborator
Lukasa added a note

I'm pretty sure that @shazow is off the grid for the next few days, but in his absence: I think the rationale for util is that they're things that could potentially be used independently of the rest of urllib3. This method fits that definition: I think we want it in util.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((162 lines not shown))
+
+ :rtype: float
+ """
+ return self.backoff_factor * (2 ** max(self._total_error_count - 1, 0))
+
+ def sleep(self):
+ """ Sleep between retry attempts
+
+ The amount of time to sleep for is expressed in `_compute_backoff`. By
+ default, the backoff factor is 0 and urllib3 will not sleep.
+ """
+ backoff = self._compute_backoff()
+ if backoff <= 0:
+ return
+ time.sleep(backoff)
+
Andrey Petrov Owner
shazow added a note

Could you make the line spacing between methods more consistent? I think we use 1 line generally, unless it's at root level.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((252 lines not shown))
+ incremented = False
+ if self._is_connection_error(error):
+ self._connect_error_count += 1
+ incremented = True
+
+ if response and response.get_redirect_location():
+ self._redirect_count += 1
+ incremented = True
+
+ if (self._is_read_error(error)
+ or self._is_error_response(method, response)
+ or not incremented):
+ self._read_error_count += 1
+
+
+ def should_retry_response(self, request_method, response):
Andrey Petrov Owner
shazow added a note

What's the purpose of this method? Looks like it's not doing anything except calling _is_error_response?

I appreciate what you're trying to do with documentation-through-naming but I think this may be overkill, especially for one-liner methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((214 lines not shown))
+ :rtype: bool
+ """
+ return (resp is not None
+ and meth in self.method_whitelist
+ and resp.status in self.codes_whitelist)
+
+ def _is_read_error(self, err):
+ """ Check if the server had an error responding to the request
+
+ Even though we didn't get a response back from the server, these
+ exceptions are different than connection errors, because they imply
+ the the remote server accepted the request. The server may have begun
+ processing the request and performed some side effects (wrote data to a
+ database, sent a message, etc).
+ """
+ if isinstance(err, ReadTimeoutError):
Andrey Petrov Owner
shazow added a note

Could just do return isinstance(err, ReadTimeoutError) or isinstance(err, HTTPLIB_READ_EXCEPTIONS)

Or better yet, could rename it to READ_EXCEPTIONS and add ReadTimeoutError to the list, then it's just return isinstance(err, READ_EXCEPTIONS).

At that point, I'd get rid of _is_read_error(...) altogether which would make the increment(..) code more linear to read. I think this is a good idea in general, to get rid of these one-liner helpers unless they add some other value (ie. reused in other places or useful for overriding).

Also I'd add READ_EXCEPTIONS as a static member of the Retry class rather than module-global, so it could be overridden per-instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
test/__init__.py
@@ -11,7 +15,11 @@ def requires_network(test):
"""Helps you skip tests that require the network"""
def _is_unreachable_err(err):
- return hasattr(err, 'errno') and err.errno == errno.ENETUNREACH
+ return hasattr(err, 'errno') and (
+ err.errno == errno.ENETUNREACH
+ # some networks try to resolve 10.* hosts and send back ECONNREFUSED
+ # this means we can't test connect timeouts on those hosts
+ or err.errno == errno.ECONNREFUSED)
Cory Benfield Collaborator
Lukasa added a note

Stylistic comment: can this not be err.errno in (errno.ENETUNREACH, errno.ECONNREFUSED)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
test/test_retry.py
@@ -0,0 +1,70 @@
+import unittest
+
+from urllib3.packages.six import moves
Cory Benfield Collaborator
Lukasa added a note

Is the import of moves really necessary? You're only using xrange from that package, and you're only ever using xrange(4). You can safely avoid the import and use range(): I'm sure the tests can bear the 4-element list you'll construct on Python 2. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
test/with_dummyserver/test_connectionpool.py
((13 lines not shown))
+ headers = {'test-name': 'test_read_total_retries'}
+ retry = Retry(read=0, connect=0, total=3, codes_whitelist=[418])
+ resp = self.pool.request('GET', '/successful_retry',
+ headers=headers, retries=retry)
+ self.assertEqual(resp.status, 200)
+
+ def test_retries_wrong_whitelist(self):
+ retry = Retry(read=3, connect=0, codes_whitelist=[202])
+ resp = self.pool.request('GET', '/successful_retry',
+ headers={'test-name': 'test_wrong_whitelist'},
+ retries=retry)
+ self.assertEqual(resp.status, 418)
+
+ def test_retries_odd_whitelist(self):
+ retry = Retry(read=3, connect=0, codes_whitelist=[418])
+ resp = self.pool.request('OPTIONS', '/successful_retry',
Cory Benfield Collaborator
Lukasa added a note

Is there any particular reason this is an OPTIONS request? It leaps out at me as being different but I can't see why it is, it seems irrelevant for the test at hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
test/with_dummyserver/test_socketlevel.py
@@ -135,7 +137,7 @@ def socket_handler(listener):
sock = listener.accept()[0]
# First request.
# Pause before responding so the first request times out.
- time.sleep(0.002)
+ time.sleep(0.005)
Cory Benfield Collaborator
Lukasa added a note

I'm picking this instance at random, but it applies all over this diff: you changed a lot of these sleep/timeout values, but I'm not sure why. Can you clarify? =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((281 lines not shown))
+ """
+
+ def _should_apply_total():
+ # _total_specified here checks whether we should be applying the
+ # limit at all.
+ return self._total_specified
+
+ return (
+ (_should_apply_total() and self._total_error_count > self.total)
+ or self._connect_error_count > max(self.connect, self.total)
+ or self._redirect_count > max(self.redirects, self.total)
+ or self._read_error_count > max(self.read, self.total))
+
+
+ @classmethod
+ def from_int(self, raw_retry):
Andrey Petrov Owner
shazow added a note

I don't think we need this. Retry(x) is the same as Retry.from_int(x).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/connectionpool.py
((12 lines not shown))
# Connection broken, discard. It will be replaced next _get_conn().
conn = None
# This is necessary so we can access e below
err = e
- if retries == 0:
+ retries.increment(error=e)
+ if retries.is_exhausted():
+ raise MaxRetryError(self, url, e)
+ retries.sleep()
+
+ except SocketError as e:
+ # Connection broken, discard. It will be replaced next _get_conn().
+ conn = None
+ # This is necessary so we can access e below
+ err = e
+
+ retries.increment(error=e)
+ if retries.is_exhausted():
if isinstance(e, SocketError) and self.proxy is not None:
Cory Benfield Collaborator
Lukasa added a note

If you're going to split these out, this isinstance() check is redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((112 lines not shown))
+
+ :param float backoff_factor:
+ A backoff factor to apply between attempts. urllib3 will sleep for
+ `(backoff factor * (2 ^ (number of total retries - 1))` seconds. So
+ if the backoff factor is 1, urllib3 will sleep 1, 2, 4, 8... seconds
+ between retries. If the backoff factor is 0.5, urllib3 will sleep 0.5,
+ 1, 2, 4... seconds between retries.
+
+ By default, the backoff factor is 0, which means that urllib3 will not
+ sleep between retry attempts.
+ """
+
+ DEFAULT_METHOD_WHITELIST = set(['HEAD', 'GET', 'PUT',
+ 'DELETE', 'OPTIONS', 'TRACE'])
+ SERVER_ERROR_RESPONSE = xrange(500, 599)
+ NON_200_RESPONSE = xrange(300, 599)
Cory Benfield Collaborator
Lukasa added a note

Uh, the idea of doing a linear search over 300 values to determine that, for instance, 200 is not an error code gives me a rash. Can this really be the best way of approaching that behaviour?

Andrey Petrov Owner
shazow added a note

Actually, I believe xrange does fancy stuff in __contains__ to avoid linear membership tests. Fairly sure this is fine.

Cory Benfield Collaborator
Lukasa added a note

Really? Ok, fair enough then. =)

Andrey Petrov Owner
shazow added a note

Also I think I was the one who suggested this syntax. >.>

Of course I can't find any docs to prove this, but you can witness it by experimentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((184 lines not shown))
+ the server may begin processing the request, which can lead to bad
+ consequences and is handled in the read section below. Connection errors
+ in general are retryable because the remote server hasn't received any
+ data.
+ """
+ if err is None:
+ return False
+
+ # This is too difficult of an error to generate reliably and quickly,
+ # cross-Python and cross-platform, In theory you could hit localhost on
+ # a port in the test, but it would be possible for some dev to have
+ # a device listening on that port.
+ if (isinstance(err, SocketError) # pragma: no cover
+ and hasattr(err, 'errno')
+ and (err.errno == errno.ECONNREFUSED
+ or err.errno == errno.ENETUNREACH)):
Cory Benfield Collaborator
Lukasa added a note

Once again, we could probably do err.errno in here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((203 lines not shown))
+ if isinstance(err, HTTPLIB_CONNECT_EXCEPTIONS):
+ return True
+ return False
+
+ def _is_error_response(self, meth, resp):
+ """ Determine if an HTTP response is an error response.
+
+ Checks the response against the method whitelist and the status code
+ whitelist. If the method and the code are in the whitelist, returns
+ True.
+
+ :rtype: bool
+ """
+ return (resp is not None
+ and meth in self.method_whitelist
+ and resp.status in self.codes_whitelist)
Cory Benfield Collaborator
Lukasa added a note

See my above comment re. linear searches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Andrey Petrov
Owner

... a Retry object is not safe to use in multiple calls to request() because the state of the object gets modified when an error condition is hit.

I think that's fine to start with. Let's add a note to the documentation that people need to be aware of this. I want to make sure the pattern we're promoting is pool.request(..., retry=Retry(...)).

We could add an easy .clone() or .new() function or something if people want to do...

retry = Retry(...)
pool.request(..., retry=retry.new())
pool.request(..., retry=retry.new())
...

But that can happen in a future PR.

Kevin Burke

I actually ended up fixing this by making the Retry object immutable, e.g. you update the state by creating a new Retry object. See kevinburke@4a1849e...3eebd06 for details. I actually like it a fair amount, it removes a lot of the complexity in the Retry class, but can back it out if you wish.

I'm going to squash commits at some point, but I'm not sure how you review updates to existing PR's so have left it as is for the moment.

urllib3/util/retry.py
((24 lines not shown))
+
+from ..packages import six
+
+from ..exceptions import (
+ ConnectTimeoutError,
+ ReadTimeoutError,
+)
+
+xrange = six.moves.xrange
+
+# How many retries we should set for "total" if you do not specify a total. This
+# limit only applies if you set ``connect`` and ``read`` to higher values than
+# this. In any event, it may be an academic exercise since Python's recursion
+# limit will probably be reached before this one. Users can "edit" this value by
+# passing a ``total`` value to the Retry object.
+MAX_SANE_RETRY = 1000
Andrey Petrov Owner
shazow added a note

-0.5 on this. Seems like superfluous and surprising.

Andrey Petrov Owner
shazow added a note

Couple of options:

  1. None could be a good start (which would semantically mean "no limit").

  2. Same as the current default retry behaviour. I think it's at 5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((162 lines not shown))
+
+ # This is kind of ugly; we get in this situation because max in Python
+ # 3 needs to compare integers, and it's probably good to assign integers
+ # to totals anyway. Just assigning total to 0 does not work because
+ # total=0 has a semantic meaning separate from total=None; it means
+ # "never retry anything". We can't set it to sys.maxsize because that
+ # ruins the max() checks on totals below. Instead we store the state in
+ # a variable.
+ self._total_specified = total is not None
+ if self._total_specified:
+ self.total = self._validate_retry(total, 'total')
+ else:
+ # If it's None, the bottleneck becomes the specified sub-limits.
+ self.total = MAX_SANE_RETRY
+
+ self.connect = self._validate_retry(connect, 'connect')
Andrey Petrov Owner
shazow added a note

Can we remove the type-checking? I'm happy to allow for unexpected behavior with unexpected input. Garbage in, garbage out. :P

We generally don't do this kind of thing in urllib3 and it just adds a lot of code for little benefit.

Perhaps this is more of a Requests-level thing.

Cory Benfield Collaborator
Lukasa added a note

As a further note, this is another feature that Requests is unlikely to plumb through in this form. We really don't like class-based parameters to our functions, and tuples as arguments aren't much better. =)

Andrey Petrov Owner
shazow added a note

@Lukasa I don't see a way around class-based params for this kind of configuration. I'd advise for requests to keep doing what it's doing, but also allow people to pass in urllib3's Retry/Timeout objects if they need to, which will just be passed down back to urllib3 and do the Right Thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((294 lines not shown))
+ """
+ return (response is not None
+ and request_method in self.method_whitelist
+ and response.status in self.codes_whitelist)
+
+ def is_exhausted(self):
+ """ Determine whether we're out of retry attempts
+
+ For each of connection errors, read errors (timeouts or bad status
+ codes), and redirects, compute the maximum of the specified number of
+ errors and the total allowable number of errors (if it was specified),
+ and check it against the number of errors we've seen so far.
+
+ :rtype: bool
+ """
+
Andrey Petrov Owner
shazow added a note

I'm not 100% sure what this code is doing... Shouldn't it just be...

return min(self.total, self.connect, self.read, self.redirects) < 0

?

Andrey Petrov Owner
shazow added a note

I'd go with defaulting to None or the current default (5?).

Something like this:

if self.total is not None and self.total < 0:
    return True

return min(self.connect, self.read, self.redirects) < 0

Well we've said that the actual behavior if both connect and total are specified should be to take the max of the two and apply that. So with the code posted in your comment, Retry(connect=0, total=3) would fail after one connection error, which is not correct.

I suppose the constructor could set self.connect to the max of the specified connect value and the total.

Andrey Petrov Owner
shazow added a note

Well we've said that the actual behavior if both connect and total are specified should be to take the max of the two and apply that.

I don't remember this. What makes sense to me is for total to always be a superset constraint (if defined). Retry(total=3, connect=0) should exactly fail after one connection error—that is correct. Here are the examples that outline this behaviour: #260 (comment)

Similarly, if Retry(total=MAX_INT, connect=0) that does not mean that there will be MAX_INT retries no matter what. First connection error should fail.

The current code reflects this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((117 lines not shown))
+ want to pass a custom set of codes here depending on the logic in your
+ server.
+
+ :param float backoff_factor:
+ A backoff factor to apply between attempts. urllib3 will sleep for::
+
+ (backoff factor * (2 ^ (number of total retries - 1))
+
+ seconds. So if the backoff factor is 1, urllib3 will sleep 1, 2, 4,
+ 8... seconds between retries. If the backoff factor is 0.5, urllib3
+ will sleep 0.5, 1, 2, 4... seconds between retries.
+
+ By default, the backoff factor is 0, which means that urllib3 will not
+ sleep between retry attempts.
+
+ :param int observed_errors: The number of errors observed so far. This is
Andrey Petrov Owner
shazow added a note

Maybe call this backoff_count?

Should we be backing off on every type of retry? Certainly not on redirects, right? (I believe this is a bug right now.)

Andrey Petrov Owner
shazow added a note

Oh I thought we were going to merge .increment, .is_exhausted, .should_retry_response, and .sleep into fewer calls...

Still, isn't the current code counting a redirect as an error (in observed_errors)? That's a bug which would inflate the backoff time prematurely. Would probably be good to have a test for this scenario.

The existing code decrements the number of retries for each redirect: https://github.com/shazow/urllib3/blob/master/urllib3/connectionpool.py#L555

Andrey Petrov Owner
shazow added a note

Yes I'm aware of that. The existing code does not have a time.sleep() which gets affected by retries, though.

On redirects, we decrement the redirect counter and the total but don't decrement observed_errors, which means we don't inflate the sleep time.

We could push all of the logic into one function but it would be pretty complex. Here's a summary of the behavior in the class:

  • after a TimeoutError:

    • increment
    • check exhaustion, then raise the timeouterror
    • sleep
  • after a HTTPException:

    • increment
    • check exhaustion, then raise a MaxRetryError with the url and the error as parameters.
    • sleep
  • after a SocketError:

    • increment
    • check exhaustion, then raise a ProxyError or a MaxRetryError depending on the state of self.proxy.
    • sleep
  • after a redirect:

    • increment (Nothing else: the top of the following recursive call has a check for exhaustion.)
  • after a HTTP response was received:

    • increment
    • check exhaustion; if exhausted, return the response
    • sleep

We could push this behavior down into one function. I am concerned about the complexity of determining which error to raise, in the event of exhaustion. Also, in the last case, we don't actually raise an exception, instead increment and then pass through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Andrey Petrov
Owner

@kevinburke How are we feeling about this? I'd love to include this in urllib3 v1.8, but not a big rush. :)

Kevin Burke
Andrey Petrov
Owner

No worries. Good things take time. :)

Andrey Petrov
Owner

I feel we need to recruit an extra pair of eyes to look over this. :)

@kevinburke Is there anything left to do here as far as you're concerned?

Kevin Burke

Agreed; it's a lot of code and would be easy to introduce errors. If you're happy with the redirect behavior as is then yes I am happy with the code.

There are reference docs but no "user-guide" docs. These would be nice to have, though I am reluctant to just chuck another section onto the end of the frontpage.

Andrey Petrov
Owner

@kevinburke I'll ping some people. If you know anyone who'd like some practice doing Python code reviews, toss them this way. :)

Kevin Burke

There is one thing I noticed which is kind of not good. This interface couples HTTP methods and status codes for retries, so you can't say something like, "retry all HTTP methods if it's a 503 or 429, but only retry GET's and DELETE's for 502, 504, etc." Which I feel is a valid use case as 429 generally means the server didn't do any processing and the request is safe to retry.

I don't know a better way around it, besides some kind of ugly map structure.

Andrey Petrov
Owner

Fair point. My advice would be one of the following (in increasing order of difficulty):

  1. Document this fact and make sure it's easy enough for folks to extend their own Retry object with whatever custom logic they want in this scenario.

  2. Add support for passing some kind of callable into codes_whitelist (and maybe rename it accordingly) which takes some params like the response and returns True/False.

  3. ... I had a third one, but now that I write it, it's not as good as the second one. :)

Andrey Petrov
Owner

By the way, this branch is not mergeable anymore because it probably conflicts with some other changes due to the big refactor included. Might be a good time to take out the refactor parts from this PR. :)

Kevin Burke

Okay... I've rebased this branch against master, so it passes again and should be able to merge.

The retries=False logic added some new behavior, I believe, where if retries is set to False and you get a 303 or similar, this will return the response. However, if you set retries to 1 and you get 2 redirects, urllib3 raises a MaxRetryError. I added a parameter for raise_on_redirect to instruct the caller to raise a RetryError on redirects, or to just return the response. I couldn't figure out how else to distinguish the behavior.

Per our discussion, I also added a retry_callable parameter which takes a method and a response and is expected to return a boolean. codes_whitelist and method_whitelist call this under the hood.

Kevin Burke kevinburke closed this
Kevin Burke kevinburke reopened this
urllib3/connectionpool.py
@@ -455,15 +463,24 @@ def urlopen(self, method, url, body=None, headers=None, retries=3,
if headers is None:
headers = self.headers
- if retries < 0 and retries is not False:
- raise MaxRetryError(self, url)
+ if retries is _Default:
+ retries = Retry()
+ elif retries is False:
+ retries = Retry(redirects=0, raise_on_redirect=False)
Andrey Petrov Owner
shazow added a note

What do you think about overloading Retry(redirects=False) rather than adding a raise_on_redirect flag?

(I've mixed feelings about this, but worth floating the idea.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/connectionpool.py
@@ -455,15 +465,24 @@ def urlopen(self, method, url, body=None, headers=None, retries=3,
if headers is None:
headers = self.headers
- if retries < 0 and retries is not False:
- raise MaxRetryError(self, url)
+ if retries is _Default:
+ retries = Retry()
+ elif retries is False:
+ retries = Retry(redirects=0, raise_on_redirect=False)
+ elif not isinstance(retries, Retry):
+ retries = Retry(total=retries, read=retries, connect=retries,
Andrey Petrov Owner
shazow added a note

This contradicts what the docstring for the retries param says just above.

+ Pass an integer number to retry connection errors that many times,
+ but no other types of errors. Pass zero to never retry.

I believe both are wrong. Shouldn't it just be retries = Retry(total=retries)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/connectionpool.py
@@ -514,32 +533,48 @@ def urlopen(self, method, url, body=None, headers=None, retries=3,
release_conn = True
raise SSLError(e)
- except (TimeoutError, HTTPException, SocketError) as e:
Andrey Petrov Owner
shazow added a note

I'm pretty sure this is a bad merge. We refactored this bit in a recent PR to reduce redundant code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/connectionpool.py
((18 lines not shown))
if response.status == 303:
method = 'GET'
log.info("Redirecting %s -> %s" % (url, redirect_location))
- return self.urlopen(method, redirect_location, body, headers,
- retries - 1, redirect, assert_same_host,
- timeout=timeout, pool_timeout=pool_timeout,
- release_conn=release_conn, **response_kw)
+ retries = retries.increment(response=response)
+ if retries.is_exhausted():
Andrey Petrov Owner
shazow added a note

Would be nice to reduce the excessive branching in this block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Andrey Petrov
Owner

Almost there! Thanks again, Kevin. :)

I need to ponder on the retry_callable design/naming a bit more, but I think we're close.

Andrey Petrov
Owner

I think this PR is going to need some refreshing. Pretty sure #399 affects this.

@kevinburke Do you have any notion on what's left here? Sorry if it fell through my cracks. :(

Let me know if you don't have time to pursue this further.

Kevin Burke

We both thought the code was good, but wanted to get a third pair of eyes due to the high impact of the changes involved.

I blogged about it, posted on Twitter etc but didn't find anyone for a code review. :(

I'll take a look at updating it with the newer changes.

Andrey Petrov
Owner

I'm going to spend some time on this next week and hopefully get it merged. :) If anyone has feedback, this week is the time.

Donald Stufft

I'll take a closer look at this later, but I did just skim it. First thing that stood out to me is it's real weird to me for redirects to be associated with retries. I don't group those things together conceptually at all.

Donald Stufft

To expand, a retry to mean means something failed, and I want to attempt to do that thing again because maybe it'll work this time. Generally I expect a retry to have the exact same "inputs" into the "thing" because the problem is "thing" is unreliable. On the other hand, a redirect isn't a failed connection or response, it's just telling you that the thing you're looking for exists somewhere else and to go ask some other URL.

Andrey Petrov
Owner

We treat retries as anything that makes more requests than the first one you originally triggered. Most of the time, you want to be a Good Citizen and avoid spamming, or stay within some quota, or avoid getting banned. It's important to have good accounting of how many requests you're making.

Donald Stufft

yea I grok the reasoning, just seems weird to me to lump those two together. That doesn't nesc mean you shouldn't do it, just that it feels weird to at least one person :)

Andrey Petrov
Owner

I welcome any suggestions for nomenclature to reduce the ambiguity. I am fairly set on keeping these things together conceptually. In urllib3 land, we generally prioritize more about being technically-accurate than conceptually-friendly, but being both is great when possible. :)

Andrey Petrov
Owner

/me force-pushes some rebasing to @kevinburke's branch.

kevinburke and others added some commits
Kevin Burke kevinburke Implement retry logic
- The total number of retries for any type of error will be the maximum of the
specified value for the retry type, and the specified total number of retries.
So, if you specify total=5 and connect=3, and there are 4 retries, we will not
raise a MaxRetryError. Alternatively, if you specify total=3 and connect=5, we
will also not raise a MaxRetryError. Please see the tests for a full
description of the functionality of the class.

- Adds a new successful_retry handler to dummyserver/handlers.py which throws
a 418 the first time you call it and a 200 the second time. It uses
a class-level defaultdict to store state; I'm not sure how else to implement
this.

- Adds docs for new features

- Splits out util.py into submodules - I need to know what you want to do here because as written it would change from `urllib3.util.Timeout` into `urllib3.util.timeout.Timeout` which is not good.

- I believe there are some API changes here.
    - some subclasses of httplib.HTTPException can actually be raised as connection errors because they imply the request never actually got sent to the server.
    - urlopen previously would retry on read timeouts, which violated the urlopen contract (as I understood it) of only retrying things that couldn't possibly be side effecting. this code does not retry read timeouts by default.

I am also testing this in two new environments - in the office which places my IP on 10.* subnet and I think has weird/different behavior when connecting to TARPIT_HOST than do standard wifi networks, and without an Internet connection, in which case a bunch of tests fail. Also, it's difficult to test some of these exceptional cases because the errors raised rely on the network stack, which (I think) is why the tests are failing on the branch. I'm still looking into it.

Either way I am losing some confidence in the connection timeout tests; getting a network to reliably generate ECONNREFUSED, or not generate it and tie up a connection, is tricky, unless we want to go down a path like this: http://bugs.python.org/file26890/test_timeout.patch

Ports in find_unused_port from test_support! This will be super useful in the
future.
36378bc
Andrey Petrov Rephrasing comments. 1c35c8e
Kevin Burke kevinburke fix issues from looking at diff 161ca5b
Andrey Petrov Fixing rebase-related bugs. 2c8d3db
Andrey Petrov More rebase fixen. a26ec33
Andrey Petrov Move copypasta to its own module, and bump up @timed limits for pypy. 31061b1
Andrey Petrov Oops missing copypasta module. 59ed5f0
Andrey Petrov Tweaks. 1bc15af
Andrey Petrov More rebase-related fixes. I hate rebasing. 0869677
Kevin Burke

What is left to do? I'm sorry, I am not sure about the current state wrt master.

Owner

@kevinburke No worries. :) Right now I'm butchering your retries PR (now that I made it work again). Will push a separate branch in a bit for review.

Owner

Butchering progress here: #421

Andrey Petrov

Some of these cannot be triggered with the way we're using httplib.

The rest, I really don't want to care about. Still working on a way to prune them by depending more on urllib3 internal state.

Fair enough... just wanted to point out that they could be raised, and some of them implied side effects & some didn't :)

Owner

Yea, we already try to wrap every httplib exception with our own. If any httplib thing gets through, that's a bug in urllib3.

Andrey Petrov
Owner

I don't think the immutable Retry strategy is going to work. (See refactoring in #421 for broken details.)

One big problem is that we have no way to retain Retry state between intra-host retries and inter-host retries in a PoolManager.

That is, we pass in a Retry object to a PoolManager.urlopen, it passes it to a ConnectionPool, which might do some retries and create immutable clones of the Retry object with new state. Now say 3 retries later, there is a cross-host redirect so it bubbles up to the PoolManager layer—all of the immutable Retry state from the ConnectionPool is lost as it enters a new ConnectionPool.

// Note to self: Write a test for this.

Not quite sure how we should deal with this. Bubbling up the Retry state with the response object does not seem like a good idea either. :'(

Andrey Petrov
Owner

Thinking about it some more, this is actually the same behaviour we always had, so probably okay to keep it. Will document that each cross-host redirect cycle is independent. Will need to make sure we don't get infinite cross-host redirects, though.

Andrey Petrov
Owner

Did some pretty big changes, fairly happy with where it's going. Thoughts appreciated.

Thinking of removing Retries._observed_errors in place of somehow tracking the retry history in the response object and relying on those instead?

Ian Cordasco sigmavirus24 commented on the diff
test/port_helpers.py
@@ -0,0 +1,100 @@
+# These helpers are copied from test_support.py in the Python 2.7 standard
+# library test suite.
+
+import socket
+
+
+# Don't use "localhost", since resolving it uses the DNS under recent
+# Windows versions (see issue #18792).
+HOST = "127.0.0.1"
+HOSTv6 = "::1"
+
+def find_unused_port(family=socket.AF_INET, socktype=socket.SOCK_STREAM):
+ """Returns an unused port that should be suitable for binding. This is

Holy docstring Batman!

Andrey Petrov Owner
shazow added a note

Yea it's just copypasta from the stdlib I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/retry.py
((98 lines not shown))
+
+ By default, backoff is disabled (set to 0).
+
+ :param bool raise_on_redirect: Whether, if the number of redirects is
+ exhausted, to raise a MaxRetryError, or to return a response with a
+ response code in the 3xx range.
+ """
+
+ DEFAULT_METHOD_WHITELIST = frozenset([
+ 'HEAD', 'GET', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'])
+
+ #: Maximum backoff time.
+ BACKOFF_MAX = 120
+
+ def __init__(self, total=10, connect=None, read=None, redirect=None,
+ _observed_errors=0,

This seems odd to just be hanging out here.

Andrey Petrov Owner
shazow added a note

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
urllib3/util/timeout.py
((31 lines not shown))
- .. code-block:: python
+ Example usage: ::

More simply Example usage::, the :: will append a : to the line it is on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ian Cordasco

LGTM. :shipit:

Andrey Petrov
Owner

I hope I don't regret this... :P

Andrey Petrov shazow merged commit c858e19 into from
Andrey Petrov shazow closed this
Ian Cordasco

@shazow do you regret it yet? =P

Andrey Petrov
Owner

We'll see when v1.9 goes out. :)

Kevin Burke

Mind adding more comment on the critical flaws discovered?

Owner

I think I was referring to c08c7c1#diff-0401b3595c5b19c0b003447baaaf35d0R172 but later decided that it wasn't a big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jun 25, 2014
  1. Kevin Burke

    Implement retry logic

    kevinburke authored committed
    - The total number of retries for any type of error will be the maximum of the
    specified value for the retry type, and the specified total number of retries.
    So, if you specify total=5 and connect=3, and there are 4 retries, we will not
    raise a MaxRetryError. Alternatively, if you specify total=3 and connect=5, we
    will also not raise a MaxRetryError. Please see the tests for a full
    description of the functionality of the class.
    
    - Adds a new successful_retry handler to dummyserver/handlers.py which throws
    a 418 the first time you call it and a 200 the second time. It uses
    a class-level defaultdict to store state; I'm not sure how else to implement
    this.
    
    - Adds docs for new features
    
    - Splits out util.py into submodules - I need to know what you want to do here because as written it would change from `urllib3.util.Timeout` into `urllib3.util.timeout.Timeout` which is not good.
    
    - I believe there are some API changes here.
        - some subclasses of httplib.HTTPException can actually be raised as connection errors because they imply the request never actually got sent to the server.
        - urlopen previously would retry on read timeouts, which violated the urlopen contract (as I understood it) of only retrying things that couldn't possibly be side effecting. this code does not retry read timeouts by default.
    
    I am also testing this in two new environments - in the office which places my IP on 10.* subnet and I think has weird/different behavior when connecting to TARPIT_HOST than do standard wifi networks, and without an Internet connection, in which case a bunch of tests fail. Also, it's difficult to test some of these exceptional cases because the errors raised rely on the network stack, which (I think) is why the tests are failing on the branch. I'm still looking into it.
    
    Either way I am losing some confidence in the connection timeout tests; getting a network to reliably generate ECONNREFUSED, or not generate it and tie up a connection, is tricky, unless we want to go down a path like this: http://bugs.python.org/file26890/test_timeout.patch
    
    Ports in find_unused_port from test_support! This will be super useful in the
    future.
  2. Rephrasing comments.

    authored
  3. Kevin Burke

    fix issues from looking at diff

    kevinburke authored committed
  4. Fixing rebase-related bugs.

    authored
  5. More rebase fixen.

    authored
  6. Oops missing copypasta module.

    authored
  7. Tweaks.

    authored
Commits on Jun 26, 2014
  1. Tests pass.

    authored
  2. Py3.

    authored
  3. Docs.

    authored
Commits on Jun 27, 2014
  1. Retry.count

    authored
Commits on Jun 30, 2014
  1. Coverage full,.

    authored
Commits on Jul 01, 2014
  1. Tweakage.

    authored
  2. Merge pull request #421 from shazow/retries-v2

    authored
    Retries v2: Retries strike back.
  3. Docs.

    authored
  4. CHANGES issues.

    authored
  5. CHANGES issues.

    authored
  6. Docs formatting.

    authored
  7. : :: -> ::

    authored
Something went wrong with that request. Please try again.