You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you are seeing a max context limit of 131072 when loading any of the new Gemma-4-12B models, this is because the initial config.json released by Google contained bad metadata. While Google and Unsloth have corrected their upstream files, this bad metadata was permanently baked into the GGUFs of hundreds of downstream finetunes, adapters, and quantizations during a 12-24 hour window (depending on exactly when the quantizer pulled the upstream files before the max_position_embeddings fix was merged).
Because most community quantizers will not re-run their pipelines to re-upload these files, there is a massive propagation of permanently "broken" Gemma 4 GGUFs currently circulating on Hugging Face.
How to fix your local model right now:
You can patch the metadata yourself without re-downloading the model by using the official gguf python package:
To prevent endless user confusion regarding truncated context limits with these models, it would be highly beneficial to implement a heuristic check during model loading that combines a log warning with a toggleable override. What do you guys think of this approach?
Informational Log Warning: Emit a high-visibility warning in the server logs when a Gemma 4 model with 131k context is detected: "Warning: This Gemma 4 model reports a context length of 131k, which is a known metadata bug from the initial release. Please use gguf-set-metadata to patch the value to 256k, or use the --patch-gemma4-context flag."
Auto-Patch Command-Line Flag: Provide the --patch-gemma4-context startup flag to allow users to easily override the 131072 value to 262144 during load, without having to permanently modify their gigabytes of GGUF files.
Without an override or a clear log warning, users will continually report bugs assuming the context truncation is a failure of llama.cpp, completely unaware that their downloaded GGUF file has permanent metadata damage from the Day 1 release window.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
If you are seeing a max context limit of
131072when loading any of the new Gemma-4-12B models, this is because the initialconfig.jsonreleased by Google contained bad metadata. While Google and Unsloth have corrected their upstream files, this bad metadata was permanently baked into the GGUFs of hundreds of downstream finetunes, adapters, and quantizations during a 12-24 hour window (depending on exactly when the quantizer pulled the upstream files before themax_position_embeddingsfix was merged).Because most community quantizers will not re-run their pipelines to re-upload these files, there is a massive propagation of permanently "broken" Gemma 4 GGUFs currently circulating on Hugging Face.
How to fix your local model right now:
You can patch the metadata yourself without re-downloading the model by using the official
ggufpython package:gguf-set-metadata "path/to/your/model.gguf" gemma4.context_length 262144 --forceRelated discussions:
https://huggingface.co/google/gemma-4-12B-it/discussions/10
https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/discussions/8
Commit that fixed metadata: https://huggingface.co/google/gemma-4-12B-it/commit/5926caa4ec0cac5cbfadaf4077420520de1d5205
Idea / Feature Request
To prevent endless user confusion regarding truncated context limits with these models, it would be highly beneficial to implement a heuristic check during model loading that combines a log warning with a toggleable override. What do you guys think of this approach?
"Warning: This Gemma 4 model reports a context length of 131k, which is a known metadata bug from the initial release. Please use
gguf-set-metadatato patch the value to 256k, or use the--patch-gemma4-contextflag."--patch-gemma4-contextstartup flag to allow users to easily override the131072value to262144during load, without having to permanently modify their gigabytes of GGUF files.Without an override or a clear log warning, users will continually report bugs assuming the context truncation is a failure of
llama.cpp, completely unaware that their downloaded GGUF file has permanent metadata damage from the Day 1 release window.AI Disclosure: Language, formatting.
Beta Was this translation helpful? Give feedback.
All reactions