New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Consider switching to httpx
instead of requests
#1368
Comments
I don't know enough about |
maybe worth asking the community about this, too |
As part of #1394 I used So no need for httpx for now. It would have been cleaner to have a single httpx.Client but adding a new dependency was too heavy compared to the implemented workaround. I'm closing this issue as "wont-fix". Let's revisit httpx usage if we introduce async at some point. |
1.
requests.Session
would be really coolhuggingface_hub
is a library essentially to make calls to the Hub. For that we are usingrequests
which is great but has its limitations.requests
has the concept ofSession
to keep a connection alive when doing multiple requests in a row. I made some experiments this morning and the gain is very substantial. The following snippet useshfh
to create a repo, upload 2 (almost empty) files and delete it. It's not perfect but quite representative to what users could do.I tweaked
huggingface_hub
and ran this snippet to run with and withoutrequests.Session
. Average execution time goes down from 9.3s to 5.5s (-40%). This gain comes from the fact that we don't start a new HTTPS connection at each call.I also tried a real example when loading weights with
diffusers
when they are already cached. Discussion has been started by @patrickvonplaten yesterday (internal link).=> it turns out I save around 1s (from 4.6s to 3.7s) which is again quite substantial. In this particular case,
diffusers
could also be improved but it still highlights thathuggingface_hub
can do better.Note: besides improving a lot the performances, being able to pass a
Session
object is also useful for users that needs to configure their proxy (and headers/user-agents/cookies/...). At the proxies can be configured inhf_hub_download
andsnapshow_download
but not inHfApi
.2. But
requests.Session
is not thread-safeI was about to push a draft PR to add an (optional) session parameter everywhere when I realized that doing so is not entirely thread-safe. The issue has been raised in 2015 and referenced quite a lot but the issue is not solved. It seems that most problems have been fixed by the underlying
urllib3
package but it's not enough. See for example this issue reported in Dec 2022. I haven't investigated the why behind it but I think it's quite a no-go for us. We use threads to upload/download files (though theSession
is less needed in that case) and any user can also make concurrent calls usingHfApi
(for example loop over repos and do something). Since it would be nice to still guarantee thatHfApi
is thread-safe, let's not userequests.Session
.3. Use
httpx
insteadhttpx
self-describes itself as HTTPX is a fully featured HTTP client for Python 3, which provides sync and async APIs, and support for both HTTP/1.1 and HTTP/2. What's interesting for us is:httpx
would also be a good start if we want to support someasync
stuff some day.4. Caveats and workarounds
Switching to
httpx
still has some drawbacks:httpx.HTTPError
andrequests.HTTPError
are not the same objects (same forResponse
andRequest
). It's not so much an issue sincehuggingface_hub
never return a Request/Response directly. However I think what we should do is to makeHfHubHttpError
inherit from bothhttpx.HTTPError
andrequests.HTTPError
. This way we don't break any try/except implemented by downstream librarieshf.co
=>huggingface.co
)stream=True
is used once in hfh and behavior has changed for it => should be ok5. What's next?
WDYT about it? Any hard feeling about not doing it? My take is that it's a good idea to do it now, deal with the few issues that will happen and then we'll be fine to build on top of it.
So basically the plan is:
requests
tohttpx
without adding features. Keeprequests
as a dependency.httpx
, use a session everywhere (calledhttpx.Client
). Also deprecateproxies
parameters since configuration can be handled by the user directlyasync
capabilitiestagging @sgugger @LysandreJik @patrickvonplaten @lhoestq @julien-c @osanseviero for feedback
The text was updated successfully, but these errors were encountered: