Skip to content

Conversation

ericcurtin
Copy link
Collaborator

  • Implement resumable downloads in common_download_file_single function
  • Add detection of partial download files (.downloadInProgress)
  • Check server support for HTTP Range requests via Accept-Ranges header
  • Implement HTTP Range request with "bytes=-" header
  • Open files in append mode when resuming vs create mode for new downloads

@ericcurtin
Copy link
Collaborator Author

@ngxson @ggerganov PTAL

@ericcurtin ericcurtin requested a review from Copilot September 13, 2025 10:29
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements resumable downloads for llama-server model loading by adding support for HTTP Range requests when downloading model files. The implementation detects partial downloads and resumes them when servers support range requests.

Key changes:

  • Added HTTP Range request support with server capability detection via Accept-Ranges header
  • Implemented partial download detection using .downloadInProgress temporary files
  • Modified file handling to use append mode for resumable downloads vs create mode for new downloads

@ericcurtin ericcurtin force-pushed the resumable branch 2 times, most recently from 02492e9 to 22e86f9 Compare September 13, 2025 11:12
@ericcurtin
Copy link
Collaborator Author

macOS x86 build still flakey, seems like it's random luck based on which build server you get, sometimes it has the right versions of things to compile the code.

@ericcurtin ericcurtin force-pushed the resumable branch 11 times, most recently from 2e485e5 to 10b789d Compare September 13, 2025 14:51
@ngxson
Copy link
Collaborator

ngxson commented Sep 13, 2025 via email

@ericcurtin
Copy link
Collaborator Author

We currently check the ETag header for this, as most of the time (if not all?) ETag is the hash of the remote file to be downloaded.

Sure, but we don't check between retries, there's a minor chance the content changes in between retries. It probably wasn't a big deal before, you pull a full file or you don't. But with resumable transfers it becomes more relevant, because you could have half one version of the file, half a different version of the file.

@ericcurtin ericcurtin force-pushed the resumable branch 4 times, most recently from 2e1b3f8 to 8da2f1f Compare September 13, 2025 20:25
@ngxson
Copy link
Collaborator

ngxson commented Sep 14, 2025 via email

@ericcurtin
Copy link
Collaborator Author

Sure, but we don't check between retries, there's a minor chance the content changes in between retries. It probably wasn't a big deal before, you pull a full file or you don't. But with resumable transfers it becomes more relevant, because you could have half one version of the file, half a different version of the file.
we currently don't check between retries, but you can implement it. my idea is that we can rely on ETag instead of Last-Modified. the overall idea is: the first time the file is downloaded, ETag header is pulled via the HEAD request and it should be stored somewhere. then, one of 3 cases may happen: - file download is completed --> the next time user run the model, the stored ETag is used in to verify if the file is up-to-date - file download is failed --> check ETag in the next retry. I think this is not yet implemented, so you can try adding this - file download is half-completed --> when resume, the ETag is used to make sure remote file content isn't changed (this case is not yet implemented and should be added in the current PR)

SGTM... You probably noticed that we now write the .json immediately also in this PR... Whereas before this write was at the end... We need to write it first now so we can identify what was downloaded last time

@ericcurtin ericcurtin force-pushed the resumable branch 2 times, most recently from 68f95b3 to 5bd47a7 Compare September 14, 2025 11:10
@ericcurtin ericcurtin force-pushed the resumable branch 7 times, most recently from becdb99 to 3011a70 Compare September 15, 2025 10:09
@ericcurtin
Copy link
Collaborator Author

@rgerganov @slaren PTAL

@ericcurtin
Copy link
Collaborator Author

@am17an @JohannesGaessler PTAL

@ericcurtin
Copy link
Collaborator Author

All done @ngxson ready for re-review

@ericcurtin ericcurtin force-pushed the resumable branch 2 times, most recently from b319db9 to 4e382e8 Compare September 16, 2025 16:07
@ericcurtin
Copy link
Collaborator Author

A windows flake, that's rare:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  7  239M    7 18.2M    0     0  33.1M      0  0:00:07 --:--:--  0:00:07 33.4M
 17  239M   17 41.5M    0     0  41.1M      0  0:00:05  0:00:01  0:00:04 41.3M
curl: (18) end of response with 207310552 bytes missing
No marker found, stopped after 1.00 MB.

@ericcurtin ericcurtin force-pushed the resumable branch 3 times, most recently from ca7d99d to 14af48d Compare September 17, 2025 08:12
- Implement resumable downloads in common_download_file_single function
- Add detection of partial download files (.downloadInProgress)
- Check server support for HTTP Range requests via Accept-Ranges header
- Implement HTTP Range request with "bytes=<start>-" header
- Open files in append mode when resuming vs create mode for new downloads

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work on my side. Btw I think it would be nice if we can have a better progress display, as the progress counts back from 0 when I resume the download, even though it only downloads the missing part.

And btw since we already implemented Range here, we should also be able to implement multi-threaded downloading which should significantly improve the download speed. We should look into this in the future.

@ericcurtin
Copy link
Collaborator Author

Agree on both parts, will do in follow on PRs

@ericcurtin ericcurtin merged commit 4ca088b into master Sep 18, 2025
47 of 48 checks passed
@ericcurtin ericcurtin deleted the resumable branch September 18, 2025 15:22
@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Sep 18, 2025

Btw @ngxson, @doringeman and @npopov-vst if you have a Windows machine around, I'd appreciate a quick test of this progress bar on Windows:

llama-run hf://ggml-org/gemma-3-4b-it-qat-GGUF

I hope to just port a version of that over to llama-server. I only tested Linux/macOS at the time. With this recent PR, there was a Windows fix recently thanks to @npopov-vst :

#15988

Want to be sure it looks fine on Windows terminals.

@npopov-vst
Copy link
Contributor

@ericcurtin Hi, sure. So it depends on the current codepage. For me the default was 437:

Screenshot 2025-09-18 204348

Maybe on Windows it is better to explicitly set it to UTF-8 like:

#ifdef _WIN32
    UINT originalOutputCP = GetConsoleOutputCP();
    SetConsoleOutputCP(CP_UTF8);
#endif

...

#ifdef _WIN32
    SetConsoleOutputCP(originalOutputCP);
#endif

Or use some ASCII character (like #) as a progressbar

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Sep 18, 2025

@ericcurtin Hi, sure. So it depends on the current codepage. For me the default was 437:

Screenshot 2025-09-18 204348 Maybe on Windows it is better to explicitly set it to UTF-8 like:
#ifdef _WIN32
    UINT originalOutputCP = GetConsoleOutputCP();
    SetConsoleOutputCP(CP_UTF8);
#endif

...

#ifdef _WIN32
    SetConsoleOutputCP(originalOutputCP);
#endif

Or use some ASCII character (like #) as a progressbar

Wanna open a PR? Since you are set up to test it on Windows. Without the hashes looks prettier I think.

@npopov-vst
Copy link
Contributor

Without the hashes looks prettier I think.

Agree.

Wanna open a PR? Since you are set up to test it on Windows.

I can, but I am not sure where should I place these changes, since you mentioned in #15988 (comment), that you are going to refactor some parts.

I can confirm, that setting SetConsoleOutputCP(CP_UTF8); fixes the progress bar and I am happy to test your changes if needed.
Also It makes sense to add SetConsoleCP(CP_UTF8); in order to properly handle unicode input on Windows. So adding

#ifdef _WIN32
    SetConsoleCP(CP_UTF8);
    SetConsoleOutputCP(CP_UTF8);
#endif

should solve all issues.

@ericcurtin
Copy link
Collaborator Author

Without the hashes looks prettier I think.

Agree.

Wanna open a PR? Since you are set up to test it on Windows.

I can, but I am not sure where should I place these changes, since you mentioned in #15988 (comment), that you are going to refactor some parts.

I can confirm, that setting SetConsoleOutputCP(CP_UTF8); fixes the progress bar and I am happy to test your changes if needed. Also It makes sense to add SetConsoleCP(CP_UTF8); in order to properly handle unicode input on Windows. So adding

#ifdef _WIN32
    SetConsoleCP(CP_UTF8);
    SetConsoleOutputCP(CP_UTF8);
#endif

should solve all issues.

Don't worry about my refactor, I can do it after :)

Most of my refactoring will be moving code from A -> B, will be pretty easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants