RFC: Retries #765

florimondmanca · 2020-01-14T22:59:22Z

Retry functionality has already been discussed in #108, and a first draft was tackled in #134, but later abandoned as HTTPX was still at an early stage back then.

I think we're at a good point to start discussing the design of a potential retry functionality? I also remember @StephenBrown2 mentioned it again in one of the threads when we were preparing the 0.11 release, though I just can't find where that was again…

I submit this as a PR so that we can comment and edit the proposal interactively. Contains:

A Markdown document describing a minimum retry functionality, and extension APIs. (The RFC look of it is mostly for fun — nothing formal!). You might want to look at the rendered form.
Proof-of-concept code (doesn't include an implementation for the extension mechanism, though). You can try it out with this script...

import asyncio
import httpx


async def main() -> None:
    url = "http://localhost:8000"
    retries = httpx.Retries(limit=3, backoff_factor=0.2)
    async with httpx.AsyncClient(retries=retries) as client:
        r = await client.get(url)
        print(r)


if __name__ == "__main__":
    asyncio.run(main())

Without even starting a server, trying running the script with:

$ HTTPX_LOG_LEVEL=debug python example.py

You should see output similar to...

DEBUG [2020-01-14 23:49:50] httpx.client - HTTP Request failed - NetworkError(OSError("Multiple exceptions: [Errno 61] Connect call failed ('::1', 8000, 0, 0), [Errno 61] Connect call failed ('127.0.0.1', 8000), [Errno 61] Connect call failed ('fe80::1', 8000, 0, 1)"))
DEBUG [2020-01-14 23:49:50] httpx.client - Retrying in 0 seconds
DEBUG [2020-01-14 23:49:50] httpx.client - HTTP Request failed - NetworkError(OSError("Multiple exceptions: [Errno 61] Connect call failed ('::1', 8000, 0, 0), [Errno 61] Connect call failed ('127.0.0.1', 8000), [Errno 61] Connect call failed ('fe80::1', 8000, 0, 1)"))
DEBUG [2020-01-14 23:49:50] httpx.client - Retrying in 0.2 seconds
DEBUG [2020-01-14 23:49:50] httpx.client - HTTP Request failed - NetworkError(OSError("Multiple exceptions: [Errno 61] Connect call failed ('::1', 8000, 0, 0), [Errno 61] Connect call failed ('127.0.0.1', 8000), [Errno 61] Connect call failed ('fe80::1', 8000, 0, 1)"))
DEBUG [2020-01-14 23:49:50] httpx.client - Retrying in 0.4 seconds
DEBUG [2020-01-14 23:49:50] httpx.client - HTTP Request failed - NetworkError(OSError("Multiple exceptions: [Errno 61] Connect call failed ('::1', 8000, 0, 0), [Errno 61] Connect call failed ('127.0.0.1', 8000), [Errno 61] Connect call failed ('fe80::1', 8000, 0, 1)"))
Traceback (most recent call last):
  File "debug/retries.py", line 45, in <module>
    asyncio.run(main())
  File "/Users/florimond/.pyenv/versions/3.8.0/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/Users/florimond/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "debug/retries.py", line 17, in main
    r = await client.send(request)
  File "/Users/florimond/Developer/python-projects/httpx/httpx/client.py", line 1125, in send
    response = await self.send_handling_retries(
  File "/Users/florimond/Developer/python-projects/httpx/httpx/client.py", line 1171, in send_handling_retries
    raise exc from None
  File "/Users/florimond/Developer/python-projects/httpx/httpx/client.py", line 1156, in send_handling_retries
    return await self.send_handling_redirects(
  File "/Users/florimond/Developer/python-projects/httpx/httpx/client.py", line 1194, in send_handling_redirects
    response = await self.send_handling_auth(
  File "/Users/florimond/Developer/python-projects/httpx/httpx/client.py", line 1230, in send_handling_auth
    response = await self.send_single_request(request, timeout)
  File "/Users/florimond/Developer/python-projects/httpx/httpx/client.py", line 1254, in send_single_request
    response = await dispatcher.send(request, timeout=timeout)
  File "/Users/florimond/Developer/python-projects/httpx/httpx/dispatch/connection_pool.py", line 157, in send
    raise exc
  File "/Users/florimond/Developer/python-projects/httpx/httpx/dispatch/connection_pool.py", line 153, in send
    response = await connection.send(request, timeout=timeout)
  File "/Users/florimond/Developer/python-projects/httpx/httpx/dispatch/connection.py", line 42, in send
    self.connection = await self.connect(timeout=timeout)
  File "/Users/florimond/Developer/python-projects/httpx/httpx/dispatch/connection.py", line 62, in connect
    socket = await self.backend.open_tcp_stream(
  File "/Users/florimond/Developer/python-projects/httpx/httpx/backends/auto.py", line 33, in open_tcp_stream
    return await self.backend.open_tcp_stream(hostname, port, ssl_context, timeout)
  File "/Users/florimond/Developer/python-projects/httpx/httpx/backends/asyncio.py", line 197, in open_tcp_stream
    stream_reader, stream_writer = await asyncio.wait_for(  # type: ignore
  File "/Users/florimond/.pyenv/versions/3.8.0/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/florimond/Developer/python-projects/httpx/httpx/utils.py", line 368, in as_network_error
    raise NetworkError(exc) from exc
httpx.exceptions.NetworkError: Multiple exceptions: [Errno 61] Connect call failed ('::1', 8000, 0, 0), [Errno 61] Connect call failed ('127.0.0.1', 8000), [Errno 61] Connect call failed ('fe80::1', 8000, 0, 1)

Happy to discuss!

StephenBrown2 · 2020-01-14T23:09:38Z

Wow, thank you for putting the time and thought into this that it deserves. Definitely better planned than I had in mind!

I look forward to the discussion, but it currently looks very well done and immediately usable, with a very reasonable interface that is small enough to maintain but with the idea to be extensible enough to use in those corner cases some may ask for.

As the squeaky wheel, I certainly give my 👍 to it, even if it only gets merged as it stands now!

tomchristie · 2020-01-15T12:05:25Z

✨ Wow, fantastic stuff! ✨

Very impressive.

Some initial thoughts...

Do we currently have any use-cases/examples of "retry, altering the request"?
We'd need to think carefully about interactions with auth. Altering the request would require re-running any signed auth. Also authentication schemes that use nonces might need re-applying if you wanted to retry after a mid-flight failure? If we've got conservative retry defaults then that might not be an issue so long as we document any constraints it might introduce?
"an error that might imply that the server has already started processing the request (such as WriteTimeout" WriteTimeout would imply that the server probably hasn't started processing the request. What are urllib3's set of retry-able cases in it's default configuration? (I'd guess connection failures only?)
I probably think that either off-by-default or on-but-only-for-connection-failures is probably a reasonable default for retries?
I'd probably try to avoid a RetryableError subclass & heirarchy. We could still have a RETRYABLE_EXCEPTIONS = (...) constant in the codebase to avoid repetition, without inroducing extra type heirarchy?
RetryLimiter/RetrySchedule APIs. My initial feeling here would be that we might prefer not introduce additional API for these subsets? Would it make sense just had a Retry configuration class with a nice default set of defaults, plus a programatic way of implementing an alternate retry flow? (Eg. support retries=<int>, retries=Retry(<more complex config), retires=<Subclass of Retry with custom auth flow>). We should probably tend to prefer keeping the API surface minimal wherever possible, and it's okay for us to push users wanting complex uses towards writing a tiny bit more code to do so.

florimondmanca · 2020-01-15T19:26:41Z

@tomchristie Thanks! Replying to your notes…

Do we currently have any use-cases/examples of "retry, altering the request"?

I mostly included that requirement here based on @sethmlarson's description, but one possible example might be "if HEAD doesn't work, try with GET and strip the output out", or something like that? Advanced stuff in any case, IMO.

If we've got conservative retry defaults then that might not be an issue so long as we document any constraints it might introduce?

Yes, I think if we allow modifying the request we'd need to document the constraints it might introduce w.r.t. anything that relies on request "uniqueness".

WriteTimeout would imply that the server probably hasn't started processing the request

Well, that would only hold for the first .write(), I suppose? I'm not sure on this one though.

What are urllib3's set of retry-able cases in it's default configuration? (I'd guess connection failures only?)

It retries on the following exceptions:

        except (
            TimeoutError,
            HTTPException,
            SocketError,
            ProtocolError,
            BaseSSLError,
            SSLError,
            CertificateError,
        ) as e:

These count as errors that will .increment() the counters of Retry, and trigger a retry if we haven't run out of attempts. So urllib3 retries on connection-related errors (connection couldn't be established, or it was broken), as well as timeouts (but urllib3 doesn't handle write timeouts), and SSL/TLS-related issues.

I probably think that either off-by-default or on-but-only-for-connection-failures is probably a reasonable default for retries?

On-but-only-for-connection-failures by default sounds appealing to me! Would handling other times of failures be deferred to an extension of the default Retries class?

We could still have a RETRYABLE_EXCEPTIONS = (...) constant in the codebase

Yes that's a possible implementation too and probably less intrusive.

Would it make sense just had a Retry configuration class with a nice default set of defaults, plus a programatic way of implementing an alternate retry flow?

Or sure, yes. We can probably get away with just having the Retries.retry_flow() method as an extension entrypoint. Though that might mean that some built-in functionality we'd decide to provide (e.g. handling Retry-After header) would have to be re-implemented if overriding that method. Not sure if/what we could do about that. But I agree starting with just .retry_flow() is a good starting point in terms of API.

florimondmanca · 2020-01-18T13:01:43Z

Closing in favor of #778. Thanks for all the feedback so far! :-)

RFC: Retries

2adadcc

florimondmanca added the enhancement New feature or request label Jan 14, 2020

florimondmanca mentioned this pull request Jan 18, 2020

Add retries #778

Closed

florimondmanca closed this Jan 18, 2020

tomchristie deleted the retries branch April 8, 2020 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Retries #765

RFC: Retries #765

florimondmanca commented Jan 14, 2020

StephenBrown2 commented Jan 14, 2020 •

edited

Loading

tomchristie commented Jan 15, 2020 •

edited

Loading

florimondmanca commented Jan 15, 2020

florimondmanca commented Jan 18, 2020

RFC: Retries #765

RFC: Retries #765

Conversation

florimondmanca commented Jan 14, 2020

StephenBrown2 commented Jan 14, 2020 • edited Loading

tomchristie commented Jan 15, 2020 • edited Loading

florimondmanca commented Jan 15, 2020

florimondmanca commented Jan 18, 2020

StephenBrown2 commented Jan 14, 2020 •

edited

Loading

tomchristie commented Jan 15, 2020 •

edited

Loading