Skip to content

Misc. bug: Can you disable a backend in llama-server? #17266

@sergeysi779

Description

@sergeysi779

Name and Version

llama-cli --version
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Vega 3 Graphics (RADV RAVEN2) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon 540X Series (RADV POLARIS12) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
version: 6970 (7f09a68)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

Problem description & steps to reproduce

I have a laptop with AMD dGPU and nVidia eGPU.
llama.cpp is built with both CUDA and Vulkan backends.
When I use llama.cpp with explicitly set -dev Vulkan1 it tries to initialize CUDA backend first:

llama-server --port 10021 --host 0.0.0.0 -ngl 999 --jinja -fa on -dev Vulkan1 --model qwen3/Qwen3-VL-2B-Instruct-UD-Q4_K_XL.gguf -c 8000 -nr
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
ggml_vulkan: Found 2 Vulkan devices:

Is there a way to disable CUDA backend in such cases?
I tried to set CUDA_VISIBLE_DEVICES="", it didn't help.

It's not critical in any way, it's just annoying 10 seconds when loading a model.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions