-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Closed
Labels
bug-unconfirmedlow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)stale
Description
What happened?
If creating a llama model in python code, you can specific n_gpu_layers=-1 so that all layers are offloaded to GPU. (see below example) When starting llama cpp server using the docker image, setting LLAMA_ARG_N_GPU_LAYERS: -1 doesn't have the same functionality.
from llama_cpp import Llama
Llama('path/to/model', chat_format="llama-3", n_ctx=1024, n_gpu_layers=-1, verbose=False)
llamacpp-server:
image: ghcr.io/ggerganov/llama.cpp:server-cuda@sha256:fe887bd3debd1a55ddd95f067435a38166f15a058bf50fee173517b9831081c8
ports:
- 8080:8080
volumes:
# TODO: change
- ./model:/model
environment:
# alternatively, you can use "LLAMA_ARG_MODEL_URL" to download the model
LLAMA_ARG_MODEL: /model/path-to-model.gguf
LLAMA_ARG_N_GPU_LAYERS: -1
Name and Version
From the prebuilt docker image ghcr.io/ggerganov/llama.cpp:server-cuda@sha256:fe887bd3debd1a55ddd95f067435a38166f15a058bf50fee173517b9831081c8
version: 0 (unknown)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
llamacpp-server-1 | ggml_cuda_init: found 1 CUDA devices:
llamacpp-server-1 | Device 0: Tesla T4, compute capability 7.5, VMM: yes
llamacpp-server-1 | llm_load_tensors: ggml ctx size = 0.14 MiB
llamacpp-server-1 | llm_load_tensors: offloading 0 repeating layers to GPU
llamacpp-server-1 | llm_load_tensors: offloaded 0/33 layers to GPU
llamacpp-server-1 | llm_load_tensors: CPU buffer size = 6282.97 MiB
Metadata
Metadata
Assignees
Labels
bug-unconfirmedlow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)stale