Skip to content

Bug: llama cpp server arg LLAMA_ARG_N_GPU_LAYERS doesn't follow the same convention as llama cpp python n_gpu_layers #9556

@mvonpohle

Description

@mvonpohle

What happened?

If creating a llama model in python code, you can specific n_gpu_layers=-1 so that all layers are offloaded to GPU. (see below example) When starting llama cpp server using the docker image, setting LLAMA_ARG_N_GPU_LAYERS: -1 doesn't have the same functionality.

from llama_cpp import Llama

Llama('path/to/model', chat_format="llama-3", n_ctx=1024, n_gpu_layers=-1, verbose=False)
llamacpp-server:
  image: ghcr.io/ggerganov/llama.cpp:server-cuda@sha256:fe887bd3debd1a55ddd95f067435a38166f15a058bf50fee173517b9831081c8
  ports:
    - 8080:8080
  volumes:
    # TODO: change
    - ./model:/model
  environment:
    # alternatively, you can use "LLAMA_ARG_MODEL_URL" to download the model
    LLAMA_ARG_MODEL: /model/path-to-model.gguf
    LLAMA_ARG_N_GPU_LAYERS: -1

Name and Version

From the prebuilt docker image ghcr.io/ggerganov/llama.cpp:server-cuda@sha256:fe887bd3debd1a55ddd95f067435a38166f15a058bf50fee173517b9831081c8

version: 0 (unknown)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

llamacpp-server-1  | ggml_cuda_init: found 1 CUDA devices:
llamacpp-server-1  |   Device 0: Tesla T4, compute capability 7.5, VMM: yes
llamacpp-server-1  | llm_load_tensors: ggml ctx size =    0.14 MiB
llamacpp-server-1  | llm_load_tensors: offloading 0 repeating layers to GPU
llamacpp-server-1  | llm_load_tensors: offloaded 0/33 layers to GPU
llamacpp-server-1  | llm_load_tensors:        CPU buffer size =  6282.97 MiB

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedlow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions