PSA: Widespread 131k Context Bug in downstream Gemma-4-12B models #24198

fflurk · 2026-06-05T16:55:25Z

fflurk
Jun 5, 2026

If you are seeing a max context limit of 131072 when loading any of the new Gemma-4-12B models, this is because the initial config.json released by Google contained bad metadata. While Google and Unsloth have corrected their upstream files, this bad metadata was permanently baked into the GGUFs of hundreds of downstream finetunes, adapters, and quantizations during a 12-24 hour window (depending on exactly when the quantizer pulled the upstream files before the max_position_embeddings fix was merged).

Because most community quantizers will not re-run their pipelines to re-upload these files, there is a massive propagation of permanently "broken" Gemma 4 GGUFs currently circulating on Hugging Face.

How to fix your local model right now:
You can patch the metadata yourself without re-downloading the model by using the official gguf python package:

gguf-set-metadata "path/to/your/model.gguf" gemma4.context_length 262144 --force

Related discussions:
https://huggingface.co/google/gemma-4-12B-it/discussions/10
https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/discussions/8
Commit that fixed metadata: https://huggingface.co/google/gemma-4-12B-it/commit/5926caa4ec0cac5cbfadaf4077420520de1d5205

Idea / Feature Request

To prevent endless user confusion regarding truncated context limits with these models, it would be highly beneficial to implement a heuristic check during model loading that combines a log warning with a toggleable override. What do you guys think of this approach?

Informational Log Warning: Emit a high-visibility warning in the server logs when a Gemma 4 model with 131k context is detected:
"Warning: This Gemma 4 model reports a context length of 131k, which is a known metadata bug from the initial release. Please use gguf-set-metadata to patch the value to 256k, or use the --patch-gemma4-context flag."
Auto-Patch Command-Line Flag: Provide the --patch-gemma4-context startup flag to allow users to easily override the 131072 value to 262144 during load, without having to permanently modify their gigabytes of GGUF files.

Without an override or a clear log warning, users will continually report bugs assuming the context truncation is a failure of llama.cpp, completely unaware that their downloaded GGUF file has permanent metadata damage from the Day 1 release window.

AI Disclosure: Language, formatting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PSA: Widespread 131k Context Bug in downstream Gemma-4-12B models #24198

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

PSA: Widespread 131k Context Bug in downstream Gemma-4-12B models #24198

Uh oh!

Uh oh!

fflurk Jun 5, 2026

Idea / Feature Request

Replies: 0 comments

fflurk
Jun 5, 2026