Improve error handling for HuggingFace sharded GGUF models #73

Copilot · 2025-11-23T23:17:55Z

HuggingFace returns 400 errors for sharded GGUF models (like Unsloth's UD-Q4_K_XL) due to OCI registry limitations. The client was retrying these non-retryable errors, causing ~7 second delays before showing cryptic error messages.

Changes

Error classification: Extended isRetryableError() to detect permanent errors (sharded GGUF, manifest unknown, auth failures) and prevent retries
User guidance: Added enhanceErrorMessage() to wrap known error patterns with explanations and workarounds
Documentation: Added "Known Limitations" section to model_pull.md explaining sharded GGUF constraint

Impact

Before:

$ docker model pull hf.co/unsloth/model:UD-Q4_K_XL
Retrying download (attempt 1/3) in 1s...
Retrying download (attempt 2/3) in 2s...
Error: pulling ... failed with status 400: {"error":"...sharded GGUF..."}

After:

$ docker model pull hf.co/unsloth/model:UD-Q4_K_XL
Error: repository contains sharded GGUF

Note: Sharded GGUF models from HuggingFace are not supported due to OCI limitations.
Workaround: Use non-sharded quantization or upload to Docker Hub.
See: https://github.com/ollama/ollama/issues/5245

Original prompt

This section details on the original issue you should resolve

<issue_title>Unable to find some Unsloth Quants during docker model pull</issue_title>
<issue_description>When attempting to download some of the quants provided by unsloth (UD-Q4_K_XL), docker model pull is unable to find / match them.
~ ❯ docker model pull hf.co/unsloth/Kimi-K2-Instruct-0905-GGUF:UD-Q4_K_XL
Retrying download (attempt 1/3) in 1s...
^C%

~ ❯ docker model pull hf.co/unsloth/Kimi-K2-Instruct-0905-GGUF:Q4_K_XL
Retrying download (attempt 1/3) in 1s...
Retrying download (attempt 2/3) in 2s...
^C%
It works fine for other quants in the same repository.
~ ❯ docker model pull hf.co/unsloth/Kimi-K2-Instruct-0905-GGUF:Q4_K_M
5ddfa1832cb1: Downloading [>                                                  ]  397.1MB/44.23GB
5ddfa1832cb1: Downloading [>                                                  ]  401.1MB/44.23GB
124c30f2497a: Downloading [>                                                  ]  482.7MB/49.55GB
eb48051961b5: Downloading [>                                                  ]  337.5MB/46.3GB
...
I think this is most likely due to the prefix used for the folder and/or naming scheme: https://huggingface.co/unsloth/Kimi-K2-Instruct-0905-GGUF/tree/main/UD-Q4_K_XL not working nicely with however docker model runner parses and matches the files in the repo.

Note that HF kind of does the same thing on the model card where it's listing quants, it has it listed without the prefix even though it's not reflected in the folder / file structure.

</issue_description>

Comments on the Issue (you are @copilot in this section)
@ericcurtin It's being blocked huggingface side at present because it's a shared gguf:
Error: pull model manifest: 400: {"error":"The specified repository contains sharded GGUF. Ollama does not support this yet. Follow this issue for more info: https://github.com/ollama/ollama/issues/5245"}

@akoumjian how about we upload this model to Docker Hub so you can pull from there?

Fixes Unable to find some Unsloth Quants during docker model pull docker/model-runner#439

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>

Initial plan

4d915e9

Copilot AI assigned Copilot and ericcurtin Nov 23, 2025

Copilot started work on behalf of ericcurtin November 23, 2025 23:18 View session

Copilot AI and others added 4 commits November 23, 2025 23:33

Improve error handling for sharded GGUF and add documentation

3ac9567

Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>

Fix code formatting

3457cc9

Address code review feedback

56328a4

Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>

Final implementation complete

6eed174

Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix unable to find Unsloth Quants during docker model pull~~ Improve error handling for HuggingFace sharded GGUF models Nov 23, 2025

Copilot AI requested a review from ericcurtin November 23, 2025 23:42

Copilot finished work on behalf of ericcurtin November 23, 2025 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve error handling for HuggingFace sharded GGUF models #73

Improve error handling for HuggingFace sharded GGUF models #73

Uh oh!

Copilot AI commented Nov 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve error handling for HuggingFace sharded GGUF models #73

Are you sure you want to change the base?

Improve error handling for HuggingFace sharded GGUF models #73

Uh oh!

Conversation

Copilot AI commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Impact

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 23, 2025 •

edited

Loading