-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Add resumable downloads for llama-server model loading #15963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ericcurtin
commented
Sep 13, 2025
- Implement resumable downloads in common_download_file_single function
- Add detection of partial download files (.downloadInProgress)
- Check server support for HTTP Range requests via Accept-Ranges header
- Implement HTTP Range request with "bytes=-" header
- Open files in append mode when resuming vs create mode for new downloads
@ngxson @ggerganov PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements resumable downloads for llama-server model loading by adding support for HTTP Range requests when downloading model files. The implementation detects partial downloads and resumes them when servers support range requests.
Key changes:
- Added HTTP Range request support with server capability detection via Accept-Ranges header
- Implemented partial download detection using
.downloadInProgress
temporary files - Modified file handling to use append mode for resumable downloads vs create mode for new downloads
02492e9
to
22e86f9
Compare
macOS x86 build still flakey, seems like it's random luck based on which build server you get, sometimes it has the right versions of things to compile the code. |
2e485e5
to
10b789d
Compare
10b789d
to
6749867
Compare
We currently check the ETag header for this, as most of the time (if not
all?) ETag is the hash of the remote file to be downloaded.
|
Sure, but we don't check between retries, there's a minor chance the content changes in between retries. It probably wasn't a big deal before, you pull a full file or you don't. But with resumable transfers it becomes more relevant, because you could have half one version of the file, half a different version of the file. |
2e1b3f8
to
8da2f1f
Compare
Sure, but we don't check between retries, there's a minor chance the
content changes in between retries. It probably wasn't a big deal before,
you pull a full file or you don't. But with resumable transfers it becomes
more relevant, because you could have half one version of the file, half a
different version of the file.
we currently don't check between retries, but you can implement it. my
idea is that we can rely on ETag instead of Last-Modified.
the overall idea is: the first time the file is downloaded, ETag header is
pulled via the HEAD request and it should be stored somewhere. then, one of
3 cases may happen:
- file download is completed --> the next time user run the model, the
stored ETag is used in to verify if the file is up-to-date
- file download is failed --> check ETag in the next retry. I think this is
not yet implemented, so you can try adding this
- file download is half-completed --> when resume, the ETag is used to make
sure remote file content isn't changed (this case is not yet implemented
and should be added in the current PR)
|
SGTM... You probably noticed that we now write the .json immediately also in this PR... Whereas before this write was at the end... We need to write it first now so we can identify what was downloaded last time |
68f95b3
to
5bd47a7
Compare
becdb99
to
3011a70
Compare
@rgerganov @slaren PTAL |
@am17an @JohannesGaessler PTAL |
3011a70
to
49692ce
Compare
All done @ngxson ready for re-review |
b319db9
to
4e382e8
Compare
A windows flake, that's rare:
|
ca7d99d
to
14af48d
Compare
- Implement resumable downloads in common_download_file_single function - Add detection of partial download files (.downloadInProgress) - Check server support for HTTP Range requests via Accept-Ranges header - Implement HTTP Range request with "bytes=<start>-" header - Open files in append mode when resuming vs create mode for new downloads Signed-off-by: Eric Curtin <eric.curtin@docker.com>
14af48d
to
b3c2c83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work on my side. Btw I think it would be nice if we can have a better progress display, as the progress counts back from 0 when I resume the download, even though it only downloads the missing part.
And btw since we already implemented Range
here, we should also be able to implement multi-threaded downloading which should significantly improve the download speed. We should look into this in the future.
Agree on both parts, will do in follow on PRs |
Btw @ngxson, @doringeman and @npopov-vst if you have a Windows machine around, I'd appreciate a quick test of this progress bar on Windows:
I hope to just port a version of that over to llama-server. I only tested Linux/macOS at the time. With this recent PR, there was a Windows fix recently thanks to @npopov-vst : Want to be sure it looks fine on Windows terminals. |
@ericcurtin Hi, sure. So it depends on the current codepage. For me the default was 437: ![]() Maybe on Windows it is better to explicitly set it to UTF-8 like:
Or use some ASCII character (like #) as a progressbar |
Wanna open a PR? Since you are set up to test it on Windows. Without the hashes looks prettier I think. |
Agree.
I can, but I am not sure where should I place these changes, since you mentioned in #15988 (comment), that you are going to refactor some parts. I can confirm, that setting
should solve all issues. |
Don't worry about my refactor, I can do it after :) Most of my refactoring will be moving code from A -> B, will be pretty easy. |