-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Closed
Labels
Description
Name and Version
version: 7166 (eec1e33)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --list-devicesProblem description & steps to reproduce
The above command returns host free RAM memory rather than the 32GB of on device memory. This causes automatic memory allocation to fail when eg using RPC as it will try to over-allocate to the 5090.
First Bad Commit
Relevant log output
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
Available devices:
ggml_backend_cuda_get_available_uma_memory: final available_memory_kb: 56701456
CUDA0: NVIDIA GeForce RTX 5090 (32106 MiB, 55372 MiB free)