-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
Name and Version
llama-cli --version
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Vega 3 Graphics (RADV RAVEN2) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon 540X Series (RADV POLARIS12) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
version: 6970 (7f09a68)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
I have a laptop with AMD dGPU and nVidia eGPU.
llama.cpp is built with both CUDA and Vulkan backends.
When I use llama.cpp with explicitly set -dev Vulkan1 it tries to initialize CUDA backend first:
llama-server --port 10021 --host 0.0.0.0 -ngl 999 --jinja -fa on -dev Vulkan1 --model qwen3/Qwen3-VL-2B-Instruct-UD-Q4_K_XL.gguf -c 8000 -nr
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
ggml_vulkan: Found 2 Vulkan devices:
Is there a way to disable CUDA backend in such cases?
I tried to set CUDA_VISIBLE_DEVICES="", it didn't help.
It's not critical in any way, it's just annoying 10 seconds when loading a model.
First Bad Commit
No response