refactor : use common download in tools/run #17535

angt · 2025-11-26T23:25:57Z

update the common/download interface to be directly usable by tools/run (removing duplicated code).
fix ollama downloads by implementing manual redirect handling (addressing issues with cpp-httplib).

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt · 2025-11-26T23:31:16Z

With this change, we can deprecate the cURL dependency and start shipping releases without it.
The unified HTTP stack will become a reality :)

ericcurtin · 2025-11-27T01:13:08Z

My reviews are somewhat irrelevant now, I don't have merge rights, took a quick skim didn't read every line, at a glance everything seems reasonably ok, recommend doing a quick test via:

llama-server -dr gemma3

to be sure...

angt · 2025-11-27T08:25:09Z

My reviews are somewhat irrelevant now, I don't have merge rights, took a quick skim didn't read every line, at a glance everything seems reasonably ok, recommend doing a quick test via:

llama-server -dr gemma3

to be sure...

Here some runs:

via docker

$ ./build/bin/llama-server -dr gemma3
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.021 sec
ggml_metal_device_init: GPU name:   Apple M3
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 19069.67 MB
common_docker_resolve_model: Downloading Docker Model: ai/gemma3:latest
common_download_file_single_online: no previous model file found /Users/angt/Library/Caches/llama.cpp/ai_gemma3_latest.gguf
common_download_file_single_online: trying to download model from https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/04/04a43a22e8d2003deda5acc262f68ec1005fa76c735a9962a8c77042a74a7d19/data?expires=1764234374&signature=m2%2BBuw6sCMTEH4cNizZDs6fsLC8%3D&version=2 to /Users/angt/Library/Caches/llama.cpp/ai_gemma3_latest.gguf.downloadInProgress (etag:"562fa5cb63ae8b96836b09a658443c01-25")...
^C==============================>                   ]  63%  (1503 MB / 2374 MB)

(resume works)

via HF:

$ ./build/bin/llama-server -hf unsloth/gpt-oss-120b-GGUF
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name:   Apple M3
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 19069.67 MB
common_download_file_single_online: no previous model file found /Users/angt/Library/Caches/llama.cpp/unsloth_gpt-oss-120b-GGUF_gpt-oss-120b-F16.gguf
common_download_file_single_online: trying to download model from https://cas-bridge.xethub.hf.co/xet-bridge-us/68923b51e5822b89fab7a1e7/f7c8b3cdb2bacb3cef00372f4fa3070f250a2d8838cec29653b9b9db8238a583?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20251127%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251127T081900Z&X-Amz-Expires=3600&X-Amz-Signature=555418210eb5445a4ca2376a0a0b4c15c8b5cef28cc67e4894397e8c41243075&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27gpt-oss-120b-F16.gguf%3B+filename%3D%22gpt-oss-120b-F16.gguf%22%3B&x-id=GetObject&Expires=1764235140&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2NDIzNTE0MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82ODkyM2I1MWU1ODIyYjg5ZmFiN2ExZTcvZjdjOGIzY2RiMmJhY2IzY2VmMDAzNzJmNGZhMzA3MGYyNTBhMmQ4ODM4Y2VjMjk2NTNiOWI5ZGI4MjM4YTU4MyoifV19&Signature=v02cqmRkLDiiyjkhUGUmvIzg9woZZbAsLWo9RCrqcYg6FB-wZboc4gLBqSvMjHzz8lCEVskrLy3ZrPPV8j%7E%7Ep%7EqqB04dzNIy608mlJUIUXm51Ux9%7EHYqjo9oZGi0ZJgoxNvnH5TU5Nn%7ELyftUrpB53b6BobQyzS66myRjsnEmIkYOyzXbZUBKRfpK9Hu65PUFvfAYEq0rC%7E6x4y8CLHu0eH8oX41UiKykrtZGiRGluYXztE0sBJpWGJERgp2Wv5L1qSX1J2YpgM7CJre0QPsk8xa8ZbP-lgRtvSj%7ELPkR3pwys4NxYillvJdB%7EI1uF-vURScMFi3COObp3OAqn-T6Q__&Key-Pair-Id=K2L8F4GPSG1IFC to /Users/angt/Library/Caches/llama.cpp/unsloth_gpt-oss-120b-GGUF_gpt-oss-120b-F16.gguf.downloadInProgress (etag:"f7c8b3cdb2bacb3cef00372f4fa3070f250a2d8838cec29653b9b9db8238a583")...
^C=========>                                        ]  21%  (13202 MB / 62340 MB)

and for tools/run:

$ ./build/bin/llama-run llama3
common_docker_resolve_model: Downloading Docker Model: library/llama3:latest
common_download_file_single_online: no previous model file found /Users/angt/Library/Caches/llama.cpp/library_llama3_latest.gguf
common_download_file_single_online: 403 on HEAD, assuming GET/Resume is allowed
common_download_file_single_online: trying to download model from https://dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/6a/6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=66040c77ac1b787c3af820529859349a%2F20251127%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20251127T082042Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=891427f6f6e2b09cafa8869ca3d84993b1c17352b11f342e2b8c033578a77ca0 to /Users/angt/Library/Caches/llama.cpp/library_llama3_latest.gguf.downloadInProgress (etag:)...
^C=>                                                ]   4%  (185 MB / 4445 MB)

the output is excessively verbose, and the progress bar is broken when doing multiple downloads. However, it will be easier to improve the code from now

CISC · 2025-11-27T08:25:21Z

Looks like thread-safety test fails across the board.

angt · 2025-11-27T08:28:21Z

Looks like thread-safety test fails across the board.

Yes I’m going to check all the red alerts. 😬

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ericcurtin · 2025-11-27T10:45:29Z

@angt I don't know if you are interested in this but, it would be preferable if llama-run ran a client and a llama-server in parallel, llama-run/llama-server would benefit from more re-usable code

angt · 2025-11-27T22:26:19Z

@angt I don't know if you are interested in this but, it would be preferable if llama-run ran a client and a llama-server in parallel, llama-run/llama-server would benefit from more re-usable code

I tried to preserve llama-run’s current behavior as my main goal was to move toward removing the cURL dependency and fix the tool when cURL is disabled. But I fully agree that rethinking llama-run could be useful :)

ericcurtin · 2025-11-27T22:38:16Z

@angt I don't know if you are interested in this but, it would be preferable if llama-run ran a client and a llama-server in parallel, llama-run/llama-server would benefit from more re-usable code

I tried to preserve llama-run’s current behavior as my main goal was to move toward removing the cURL dependency and fix the tool when cURL is disabled. But I fully agree that rethinking llama-run could be useful :)

A CVE in linenoise ended up scratching my itch, but I'd appreciate if you built this branch, gave the new llama-run experience a shot and provided feedback for future PRs:

#17554

ericcurtin · 2025-11-27T22:47:44Z

You might want to abandon run.cpp changes here (I don't know if the other parts should stay in this PR). The new PR is a much better experience and CVE free (with the removal of linenoise)

angt · 2025-11-28T07:05:25Z

@ericcurtin, this PR still resolves some issues with cpp-httplib and improves download.cpp. I believe it can be merged before the complete rewrite of llama-run to address current issues more efficiently.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ericcurtin · 2025-11-28T11:51:09Z

I've noticed a pattern where my PRs are often asked to wait for your changes to merge first. While I understand the need to avoid conflicts, constantly deferring my work can be discouraging. Could we try prioritizing based on readiness this time?

#16196 (comment)

I have no further changes to make to my pull request at this time:

#17554

In this case there isn't any significant conflict, the above PR makes no changes to download code.

angt · 2025-11-28T12:17:20Z

I apologize if you felt that way, but I must point out that the PR you mentioned happened before yours. 😅

Anyway, I was only suggesting we merge this one with yours, but more importantly, that they don’t solve the same problem.

angt added 4 commits November 26, 2025 22:26

common : allow custom headers when downloading

925182f

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

common : allow custom oci params

bca24e0

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

tools : update run.cpp to use download.cpp

4579a64

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

common : fix redirects

ec02657

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt requested review from ggerganov and ngxson as code owners November 26, 2025 23:25

github-actions bot added testing Everything test related examples server labels Nov 26, 2025

loci-dev mentioned this pull request Nov 26, 2025

UPSTREAM PR #17535: refactor : use common download in tools/run auroralabs-loci/llama.cpp#341

Open

ngxson requested a review from ericcurtin November 27, 2025 00:00

ericcurtin approved these changes Nov 27, 2025

View reviewed changes

Handle relative redirects

624f42d

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

common : fix logging

c5bdc84

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt closed this Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor : use common download in tools/run #17535

refactor : use common download in tools/run #17535

angt commented Nov 26, 2025

Uh oh!

angt commented Nov 26, 2025

Uh oh!

ericcurtin commented Nov 27, 2025 •

edited

Loading

Uh oh!

angt commented Nov 27, 2025

Uh oh!

CISC commented Nov 27, 2025

Uh oh!

angt commented Nov 27, 2025

Uh oh!

ericcurtin commented Nov 27, 2025

Uh oh!

angt commented Nov 27, 2025

Uh oh!

ericcurtin commented Nov 27, 2025 •

edited

Loading

Uh oh!

ericcurtin commented Nov 27, 2025

Uh oh!

angt commented Nov 28, 2025

Uh oh!

ericcurtin commented Nov 28, 2025 •

edited

Loading

Uh oh!

angt commented Nov 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor : use common download in tools/run #17535

refactor : use common download in tools/run #17535

Conversation

angt commented Nov 26, 2025

Uh oh!

angt commented Nov 26, 2025

Uh oh!

ericcurtin commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angt commented Nov 27, 2025

Uh oh!

CISC commented Nov 27, 2025

Uh oh!

angt commented Nov 27, 2025

Uh oh!

ericcurtin commented Nov 27, 2025

Uh oh!

angt commented Nov 27, 2025

Uh oh!

ericcurtin commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Nov 27, 2025

Uh oh!

angt commented Nov 28, 2025

Uh oh!

ericcurtin commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angt commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericcurtin commented Nov 27, 2025 •

edited

Loading

ericcurtin commented Nov 27, 2025 •

edited

Loading

ericcurtin commented Nov 28, 2025 •

edited

Loading

angt commented Nov 28, 2025 •

edited

Loading