Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More advanced pipelining for hf_transfer to increase download speed even further #32

Open
aikitoria opened this issue Mar 28, 2024 · 7 comments

Comments

@aikitoria
Copy link

Is your feature request related to a problem? Please describe.
hf_transfer is very fast for individual files, but for models with many split files, it's not quite as fast as it could be.

Describe the solution you'd like
Currently, it seems like hf_transfer is invoked serially for each file, starting over from scratch with a new set of connections. That results in a timeline similar to this:
image

To fully maximize the download speed, what it should do instead is use a shared pool of connections that is reused between small files and chunks of multiple larger files in parallel:
image

@aikitoria
Copy link
Author

aikitoria commented Mar 28, 2024

A simpler alternative solution might be to launch the next hf_transfer instance when the previous file is about 75% done rather than waiting for 100%. But reusing connections from a pool, ideally with at least http/2 so they can launch the second download onwards at full speed immediately, would be better.

@Wauplin
Copy link
Contributor

Wauplin commented Mar 29, 2024

Hi @aikitoria, thanks for the suggestion! Do you have a repo example where such an improvement could have a significant impact? At the moment hf_transfer is optimized for single-file downloads which is usually enough for model repos where at most a few big files have to be downloaded (and therefore the time to open a connection is neglectable in comparison to the download time). The changes you're suggesting is not trivial so we would need good reasons to implement them :)

I'm also cc-ing @Narsil who's implemented hf_transfer.

@Wauplin Wauplin transferred this issue from huggingface/huggingface_hub Mar 29, 2024
@aikitoria
Copy link
Author

aikitoria commented Mar 29, 2024

Sure, here is an example where it would make a difference:

https://huggingface.co/databricks/dbrx-instruct/tree/main

When downloading this on a server with 10gbit/s download speed, restarting the connections and growing the sliding window again takes up a significant portion of the time. We end up with something like this (exaggerated):

image

@aikitoria
Copy link
Author

@Wauplin here is another model where this would be very beneficial:

https://huggingface.co/CohereForAI/c4ai-command-r-plus

@Narsil
Copy link
Collaborator

Narsil commented Jun 4, 2024

@aikitoria

The reconnect only occurs because of this: https://github.com/huggingface/hf_transfer/blob/main/src/lib.rs#L162
hyperium/hyper#2136 (comment)

The reconnects probably only occur because large chunks take more that 20s to download.
Or maybe you're targeting a server than doesn't support http2?
Also your first picture is incorrect I believe, the reconnects don't happen all at the same time, but continuously too (and if the download doesn't exceed the keep alive, then you shouldn't see any reconnects. misread your naming.

Edit: Reflecting on this, since we're cloning the client across task, it's very well possible that connection reuse doesn't happen, http2 or not.

@aikitoria
Copy link
Author

Or maybe you're targeting a server than doesn't support http2

I dunno? Huggingface are hosting the server themselves. Wasn't even aware you can use hf_transfer for something other than downloading models from Huggingface.

@Narsil
Copy link
Collaborator

Narsil commented Jun 4, 2024

Wasn't even aware you can use hf_transfer for something other than downloading models from Huggingface.

It's just a tool that multiplexes the download over multiple byte range (bypassing some server side rate liimits and using all your cores).

Everything I said is a bit wrong, your whole argument is about multiple files, and that still holds, I was all thinking about thread multiplexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants