Skip to content

Conversation

Copy link

Copilot AI commented Nov 23, 2025

HuggingFace returns 400 errors for sharded GGUF models (like Unsloth's UD-Q4_K_XL) due to OCI registry limitations. The client was retrying these non-retryable errors, causing ~7 second delays before showing cryptic error messages.

Changes

  • Error classification: Extended isRetryableError() to detect permanent errors (sharded GGUF, manifest unknown, auth failures) and prevent retries
  • User guidance: Added enhanceErrorMessage() to wrap known error patterns with explanations and workarounds
  • Documentation: Added "Known Limitations" section to model_pull.md explaining sharded GGUF constraint

Impact

Before:

$ docker model pull hf.co/unsloth/model:UD-Q4_K_XL
Retrying download (attempt 1/3) in 1s...
Retrying download (attempt 2/3) in 2s...
Error: pulling ... failed with status 400: {"error":"...sharded GGUF..."}

After:

$ docker model pull hf.co/unsloth/model:UD-Q4_K_XL
Error: repository contains sharded GGUF

Note: Sharded GGUF models from HuggingFace are not supported due to OCI limitations.
Workaround: Use non-sharded quantization or upload to Docker Hub.
See: https://github.com/ollama/ollama/issues/5245
Original prompt

This section details on the original issue you should resolve

<issue_title>Unable to find some Unsloth Quants during docker model pull</issue_title>
<issue_description>When attempting to download some of the quants provided by unsloth (UD-Q4_K_XL), docker model pull is unable to find / match them.

~ ❯ docker model pull hf.co/unsloth/Kimi-K2-Instruct-0905-GGUF:UD-Q4_K_XL
Retrying download (attempt 1/3) in 1s...
^C%

~ ❯ docker model pull hf.co/unsloth/Kimi-K2-Instruct-0905-GGUF:Q4_K_XL
Retrying download (attempt 1/3) in 1s...
Retrying download (attempt 2/3) in 2s...
^C%

It works fine for other quants in the same repository.

~ ❯ docker model pull hf.co/unsloth/Kimi-K2-Instruct-0905-GGUF:Q4_K_M
5ddfa1832cb1: Downloading [>                                                  ]  397.1MB/44.23GB
5ddfa1832cb1: Downloading [>                                                  ]  401.1MB/44.23GB
124c30f2497a: Downloading [>                                                  ]  482.7MB/49.55GB
eb48051961b5: Downloading [>                                                  ]  337.5MB/46.3GB
...

I think this is most likely due to the prefix used for the folder and/or naming scheme: https://huggingface.co/unsloth/Kimi-K2-Instruct-0905-GGUF/tree/main/UD-Q4_K_XL not working nicely with however docker model runner parses and matches the files in the repo.

Note that HF kind of does the same thing on the model card where it's listing quants, it has it listed without the prefix even though it's not reflected in the folder / file structure.

Image

</issue_description>

Comments on the Issue (you are @copilot in this section)

@ericcurtin It's being blocked huggingface side at present because it's a shared gguf:

Error: pull model manifest: 400: {"error":"The specified repository contains sharded GGUF. Ollama does not support this yet. Follow this issue for more info: https://github.com/ollama/ollama/issues/5245"}

@akoumjian how about we upload this model to Docker Hub so you can pull from there?


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits November 23, 2025 23:33
Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>
Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>
Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix unable to find Unsloth Quants during docker model pull Improve error handling for HuggingFace sharded GGUF models Nov 23, 2025
Copilot AI requested a review from ericcurtin November 23, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to find some Unsloth Quants during docker model pull

2 participants