Speculative Decoding Settings don't work with llama-cpp backend

**LocalAI version:**
localai/localai:latest-gpu-nvidia-cuda-12 4.1.3

**Environment, CPU architecture, OS, and Version:**
WSL-2 Windows

**Describe the bug**
Tried with 

"
draft_model: gemma-4-E4B.gguf
n_draft: 8

options:
  - spec_type:draft
"

and I get the above (despite the parameters seemingly being passed.

[90mApr 15 22:10:37[0m [90mDEBUG[0m GRPC: Loading model with options [36moptions[0m={{{} [] [] 0x395c882c46a8} 0 [] gemma-4-31b-Q6_K.gguf 20480 92994145 512 false false true false false false false 0 4 0 0 0 0 /models/gemma-4-31b-Q6_K.gguf false 0 false 0 0 false gemma-4-E4B.gguf 0 false false 0 0 0 false 0 0 0 0 0 0 0 true false //models [] [] [spec_type:draft spec_p_min:0.8 draft_gpu_layers:99 use_jinja:true] [] false []}

[90mApr 15 22:10:45[0m [90mDEBUG[0m GRPC stderr [36mid[0m="Gemma 31B - Q6_K speculative-127.0.0.1:39857" [36mline[0m="no implementations specified for speculative decoding" [36mcaller[0m={[36mcaller.file[0m="/build/pkg/model/process.go"

**To Reproduce**
See above for setting

**Expected behavior**
Secondary model loaded and speculative decoding activated.

**Logs**
See above

**Additional context**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speculative Decoding Settings don't work with llama-cpp backend #9371

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Speculative Decoding Settings don't work with llama-cpp backend #9371

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions