Skip to content

Misc. bug: 5090 incorrectly recognized as unified memory. #17536

@matt23654

Description

@matt23654

Name and Version

version: 7166 (eec1e33)

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server --list-devices

Problem description & steps to reproduce

The above command returns host free RAM memory rather than the 32GB of on device memory. This causes automatic memory allocation to fail when eg using RPC as it will try to over-allocate to the 5090.

First Bad Commit

#17368

Relevant log output

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
Available devices:
ggml_backend_cuda_get_available_uma_memory: final available_memory_kb: 56701456
  CUDA0: NVIDIA GeForce RTX 5090 (32106 MiB, 55372 MiB free)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions