-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Description
Name and Version
C:\Users\Admin\Documents\llamacpp>llama-cli --version
version: 4547 (c5d9eff)
built with MSVC 19.29.30158.0 for
C:\Users\Admin\Documents\llamacpp>
Operating systems
Windows
GGML backends
CUDA
Hardware
Tesla P40 + 3090
Models
No response
Problem description & steps to reproduce
I tried both with cuda 11.7 and 12.4. I'm downloading
llama-b4547-bin-win-cuda-cu12.4-x64.zip and
cudart-llama-bin-win-cu12.4-x64.zip and putting them into the same folder.
C:\Users\Admin\Documents\llamacpp>llama-server --list-devices
Available devices:
C:\Users\Admin\Documents\llamacpp>
Llama server doesn't see any devices and doesn't offload layers to gpu with ngl.
It was working with previous versions. Latest koboldcpp also works (though it doesn't support R1 distilled models, so i'm trying to launch them with llamacpp)
CUDA_VISIBLE_DEVICES is not set. If I set it, still doesn't work.
First Bad Commit
No response
Relevant log output
C:\Users\Admin\Documents\llamacpp>nvidia-smi
Sat Jan 25 01:39:58 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.94 Driver Version: 560.94 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P40 TCC | 00000000:03:00.0 Off | Off |
| N/A 26C P8 9W / 250W | 9MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 WDDM | 00000000:04:00.0 Off | N/A |
| 0% 48C P8 30W / 350W | 635MiB / 24576MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+