Replies: 1 comment
-
|
This is by design.
What to do: take the values from the model's llama-cli -m model.gguf --temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.1
llama-server -m model.gguf --temp 0.6 --top-p 0.95 --top-k 20 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, while using llama.cpp to run some HF models locally, I noticed that the sampling parameters appear to be set to the library's default values instead of reading them directly from the model developer's settings in
generation_config.json. Here's an example from Qwen: https://huggingface.co/Qwen/Qwen3.6-27B/blob/main/generation_config.jsonI am curious why llama.cpp doesn't use the model's defaults (when available), as this behaviour differs from using e.g. native
transformersor other inference engines likevllmBeta Was this translation helpful? Give feedback.
All reactions