Misc. bug: 5090 incorrectly recognized as unified memory.

### Name and Version

version: 7166 (eec1e33a9)

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server --list-devices
```

### Problem description & steps to reproduce

The above command returns host free RAM memory rather than the 32GB of on device memory. This causes automatic memory allocation to fail when eg using RPC as it will try to over-allocate to the 5090.



### First Bad Commit

https://github.com/ggml-org/llama.cpp/pull/17368

### Relevant log output

```shell
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
Available devices:
ggml_backend_cuda_get_available_uma_memory: final available_memory_kb: 56701456
  CUDA0: NVIDIA GeForce RTX 5090 (32106 MiB, 55372 MiB free)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: 5090 incorrectly recognized as unified memory. #17536

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: 5090 incorrectly recognized as unified memory. #17536

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions