LocalAI version:
localai/localai:latest-gpu-nvidia-cuda-12 4.1.3
Environment, CPU architecture, OS, and Version:
WSL-2 Windows
Describe the bug
Tried with
"
draft_model: gemma-4-E4B.gguf
n_draft: 8
options:
and I get the above (despite the parameters seemingly being passed.
�[90mApr 15 22:10:37�[0m �[90mDEBUG�[0m GRPC: Loading model with options �[36moptions�[0m={{{} [] [] 0x395c882c46a8} 0 [] gemma-4-31b-Q6_K.gguf 20480 92994145 512 false false true false false false false 0 4 0 0 0 0 /models/gemma-4-31b-Q6_K.gguf false 0 false 0 0 false gemma-4-E4B.gguf 0 false false 0 0 0 false 0 0 0 0 0 0 0 true false //models [] [] [spec_type:draft spec_p_min:0.8 draft_gpu_layers:99 use_jinja:true] [] false []}
�[90mApr 15 22:10:45�[0m �[90mDEBUG�[0m GRPC stderr �[36mid�[0m="Gemma 31B - Q6_K speculative-127.0.0.1:39857" �[36mline�[0m="no implementations specified for speculative decoding" �[36mcaller�[0m={�[36mcaller.file�[0m="/build/pkg/model/process.go"
To Reproduce
See above for setting
Expected behavior
Secondary model loaded and speculative decoding activated.
Logs
See above
Additional context
LocalAI version:
localai/localai:latest-gpu-nvidia-cuda-12 4.1.3
Environment, CPU architecture, OS, and Version:
WSL-2 Windows
Describe the bug
Tried with
"
draft_model: gemma-4-E4B.gguf
n_draft: 8
options:
"
and I get the above (despite the parameters seemingly being passed.
�[90mApr 15 22:10:37�[0m �[90mDEBUG�[0m GRPC: Loading model with options �[36moptions�[0m={{{} [] [] 0x395c882c46a8} 0 [] gemma-4-31b-Q6_K.gguf 20480 92994145 512 false false true false false false false 0 4 0 0 0 0 /models/gemma-4-31b-Q6_K.gguf false 0 false 0 0 false gemma-4-E4B.gguf 0 false false 0 0 0 false 0 0 0 0 0 0 0 true false //models [] [] [spec_type:draft spec_p_min:0.8 draft_gpu_layers:99 use_jinja:true] [] false []}
�[90mApr 15 22:10:45�[0m �[90mDEBUG�[0m GRPC stderr �[36mid�[0m="Gemma 31B - Q6_K speculative-127.0.0.1:39857" �[36mline�[0m="no implementations specified for speculative decoding" �[36mcaller�[0m={�[36mcaller.file�[0m="/build/pkg/model/process.go"
To Reproduce
See above for setting
Expected behavior
Secondary model loaded and speculative decoding activated.
Logs
See above
Additional context