Why are the default sampling parameters not read from the HF repo's generation_config.json file? #24024

lewtun · 2026-06-02T13:07:58Z

lewtun
Jun 2, 2026

Hello, while using llama.cpp to run some HF models locally, I noticed that the sampling parameters appear to be set to the library's default values instead of reading them directly from the model developer's settings in generation_config.json. Here's an example from Qwen: https://huggingface.co/Qwen/Qwen3.6-27B/blob/main/generation_config.json

I am curious why llama.cpp doesn't use the model's defaults (when available), as this behaviour differs from using e.g. native transformers or other inference engines like vllm

delcenjo · 2026-06-20T17:28:56Z

delcenjo
Jun 20, 2026

This is by design. generation_config.json lives in the Hugging Face repo, not in the GGUF file you actually run, and llama.cpp deliberately keeps sampling separate from the model.

GGUF is self-contained and sampling-agnostic. The HF -> GGUF conversion (convert_hf_to_gguf.py) bakes in architecture, weights and tokenizer metadata, but not runtime decoding defaults. llama.cpp never reads the original HF repo at run time, so it never sees generation_config.json.
Sampling is a runtime concern. Temperature, top-p, top-k, repetition penalty, etc. belong to the sampler / CLI / server, not to the model, so they fall back to llama.cpp's own defaults unless you pass them.

transformers and vLLM differ because they load straight from the HF repo and pick up generation_config.json there.

What to do: take the values from the model's generation_config.json and pass the matching flags yourself, e.g.

llama-cli   -m model.gguf --temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.1
llama-server -m model.gguf --temp 0.6 --top-p 0.95 --top-k 20

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are the default sampling parameters not read from the HF repo's generation_config.json file? #24024

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why are the default sampling parameters not read from the HF repo's generation_config.json file? #24024

Uh oh!

lewtun Jun 2, 2026

Replies: 1 comment

Uh oh!

delcenjo Jun 20, 2026

lewtun
Jun 2, 2026

delcenjo
Jun 20, 2026