Commit cadb762
server: SWA-aware fallback for spec-decode probe (--spec-type dflash on Qwen3.6, gemma-4, …)
`common_context_can_seq_rm()` in `common/common.cpp:1401` does a 2-token
test decode to classify the target context. When that `llama_decode()`
returns nonzero it currently classifies the context as
`COMMON_CONTEXT_SEQ_RM_TYPE_NO`, which causes
`tools/server/server-context.cpp:836` to disable speculative decoding
entirely:
common_speculative_is_compat: the target context does not
support partial sequence removal
srv load_model: speculative decoding not supported by this context
On SWA-based bodies (Qwen3.6-27B / 35B-A3B, gemma-4-31B, future
Eliza-1-27b once the Qwen3.6 backbone Qwen3.5/3.6 SFT lands) the
2-token test fails for reasons unrelated to whether spec-decode
actually works on the body. The downstream code already has the
right idea: lines 2680-2682 enable `do_checkpoint` when
`n_swa > 0`, so spec-decode + checkpoint mode is the supported
path for SWA bodies.
This patch closes the gap. If the probe classifies the context as
`_TYPE_NO` but the model declares SWA (and the operator did not
opt into `--swa-full`), demote the classification to `_TYPE_FULL`
(use checkpoints) so spec-decode initializes.
All non-SWA bodies are unaffected — the demotion only fires when
both `_TYPE_NO` AND `llama_model_n_swa(model) > 0`.
Companion to elizaOS/eliza#7635 in the downstream eliza repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent d629a37 commit cadb762
1 file changed
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
833 | 833 | | |
834 | 834 | | |
835 | 835 | | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
836 | 857 | | |
837 | 858 | | |
838 | 859 | | |
| |||
0 commit comments