Skip to content

spec : refactor params#22397

Merged
ggerganov merged 7 commits intomasterfrom
gg/spec-refactor-params
Apr 28, 2026
Merged

spec : refactor params#22397
ggerganov merged 7 commits intomasterfrom
gg/spec-refactor-params

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented Apr 26, 2026

Overview

Refactor the speculative decoding parameters.

Additional information

Important

Some old CLI arguments are now replaced. See the table below

Old Parameter(s) New Parameter(s)
--draft, --draft-n, --draft-max --spec-draft-n-max or --spec-ngram-mod-n-max
--draft-min, --draft-n-min --spec-draft-n-min or --spec-ngram-mod-n-min
--spec-ngram-size-n --spec-ngram-*-size-n or --spec-ngram-mod-n-match
--spec-ngram-size-m --spec-ngram-*-size-m
--spec-ngram-min-hits --spec-ngram-*-min-hits

Important

Some old llama-server request parameters are removed

Parameter Description
speculative.ngram_size_n N-gram size for lookup
speculative.ngram_size_m M-gram size for speculative tokens
speculative.ngram_m_hits Minimum hits at n-gram/m-gram lookup

Requirements

@github-actions github-actions Bot added testing Everything test related examples server labels Apr 26, 2026
@github-actions github-actions Bot added the python python script changes label Apr 27, 2026
@ggerganov ggerganov marked this pull request as ready for review April 27, 2026 13:41
@ggerganov ggerganov requested review from a team and JohannesGaessler as code owners April 27, 2026 13:41
@ggerganov ggerganov requested a review from ngxson April 27, 2026 13:42
@ggerganov
Copy link
Copy Markdown
Member Author

ggerganov commented Apr 27, 2026

With this change we can now combine ngram-based speculative decoding with draft-based spec. dec.:

llama-server -hf ggml-org/Qwen3.6-27B-GGUF:Q8_0 \
  --spec-draft-hf ggml-org/Qwen3.5-0.8B-GGUF:Q4_0 \
  --spec-type ngram-mod \
  --spec-ngram-mod-n-match 24 \
  --spec-ngram-mod-n-min 48 \
  --spec-ngram-mod-n-max 64

@ggerganov ggerganov merged commit 14e733e into master Apr 28, 2026
49 of 50 checks passed
@ggerganov ggerganov deleted the gg/spec-refactor-params branch April 28, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants