New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable async capabilities for HTTP requests #8391
Comments
I’m hitting an issue with this work: HTTP authentication.
The Requests library supports advanced authentication schemes, such as Digest auth or potentially even NTLM through https://github.com/requests/requests-ntlm. Both require multiple HTTP request-response cycle for authentication.
Based on the PR that introduced the Is dropping support for complex authentication an option? |
An option could be to use |
Personally, I don't object to drop support them (I'd forgotten we have such a feature!). But I can't say nobody uses some complex authentication. I guess nobody uses it, but... |
Another backward incompatibility: AIOHTTP does not return a A wrapper can be made for the most common bits (status code, url, encoding, content, etc.), but it cannot cover all methods from The current plan for synchronous requests with AIOHTTP is to schedule a coroutine that makes the HTTP request on a new event loop, blocks until the event loop completes and returns the result. |
@jayaddison would you be interested in looking into this? httpx may be a good candidate as I believe it looks to keep the interface of Requests. Alternativley, we create a new To François' point above, I don't know how AIO-HTTP has developed w.r.t. authentication in the last three years. A |
I’ve been using httpx for a while now, and it’s been an easy switch from |
Thanks; I'm tempted - I think I should try to learn more about async programming Python beforehand, though. I'm also not sure I'm quite there in terms of familiarity with the existing linkchecker. Learning, but not confident that I understand it well enough to make significant changes yet.
The first task about expanding test coverage rings true. I'd suggest performance testing (throughput and resource usage) measurement too, so that we learn about any regressions. What I'm less certain about and still thinking through is what the best path might be, after further coverage is available:
(my sense of my development experience recently is that I tend to lean more towards debugging, maintenance and fixes, and so that'd lead me towards the second option. but I don't know how much time I can commit, and don't want to get in the way of other approaches) |
The test coverage has been expanded quite a bit since I wrote that issue. The goal at the time was to better handle rate limiting (esp. from GitHub), and async provides a nicer way to schedule the next check time than the current threads waking up to verify nothing needs doing, then sleeping some. I would probably go the second route as well. Because of the my findings 2/3 years ago, I decided to opt for the threaded version, which didn’t need to change |
There's another candidate that aim to be a drop-in replacement for |
The multiplexing aspect of Niquests may come in handy to avoid converting the whole code into async immediately and make a significant gain. |
Is your feature request related to a problem? Please describe.
From #6629, the
linkcheck
command is currently using threads to concurrently check the status of links in the documentation. Threads are not the most efficient way to concurrently check links: once all threads are busy waiting, the work queue stops being consumed.Describe the solution you'd like
Using an event loop allows the URL verifier to yield control to another coroutine until it gets a response. That means a single thread is able to send multiple requests concurrently and process the response as they arrive. It also facilitates handling rate-limiting, because a coroutine can be scheduled to run in the future.
The first step toward using asynchronous concurrency in
linkchecker
is to replace therequests
library uses with an async-compatible HTTP library. Theaiohttp
library has an API pretty similar to that ofrequests
and is well-established and under active development, it seems like a good choice.Describe alternatives you've considered
Tried handling rate limits with a PriorityQueue as described in #6629 (comment).
TODO
requests.Response
to use anaiohttp.ClientResponse
. Both look pretty similar.aiohttp
expects a different input. ConsiderREQUESTS_CA_BUNDLE
,tls_cacerts
,auth_info
for thelinkcheck_auth
setting.The text was updated successfully, but these errors were encountered: