Eval bug: Vulkan llama.cpp > 64GB Graphics card load bug.

### Name and Version

v1.52.1 (Vulkan llama.cpp(Windows)



### Operating systems

Windows

### GGML backends

Vulkan

### Hardware

AMD Ryzen Strix Halo 365+ 128GB, setup as 96GB Vram, 32Gb system Ram.

### Models

LMStudio.
oss-120B max context for example. shows only 61GB(61316) usage in vram, 31.3GB ram before failing load.

loading any LLM Model requiring more than 64GB in vram, so model +context.

### Problem description & steps to reproduce

LMStudio.
AMD 365 Strix Ryzen Max+ 128GB.

Set with a 96GB Vram, 32Gb Ram setup.

When loading any LLM there appears to be a 64GB limit on the load(AMD adrenaline SW), I am assuming the RAM is loaded to 32GB and offloads in segments to the Vram memory. However there are 2 failures in the code somewhere. There is some logical 64GB limit, when I have 96GB VRAM, it should be able to load past the 64GB easily, in 16GB sections.

Can the code be verified it is loading the Ram in 16GB blocks, until the GPU VRam has the model installed. Can this be set somewhere as I do not see any setting in lmstudio for this control.

I should be able to load above 64GB, the context length should also be in Vram. 

llama is failing to load LLM into Vram above 64GB, the issue might be with the Ram loading and transfer to vram.

61GB+31GB = 92GB, but I do not see it transfer the ram to vram and then try to load again in ram. If it did I would know where the problem was.

It needs to do the ram->vram transfer, then to show the loading ram again and then the load fail, the load fails when transferring ram to vram? above 64GB?

FYI I can load oss-120B with a context size of 20k, and it takes up vram 64425(62GB) and ram is at 11.4GB, Total= 73GB, way short of 128GB total memory, with all layers loaded onto GPU. I cannot increase the context size such that it goes over 64GB in Vram, there is some bug here.

Have achieved 67366MB VRam, ~10GB ram, context 62557, but cannot get much better than that without it failing to load. So it would seem there might be a context length loading issue with llama.cpp. 

### First Bad Commit

_No response_

### Relevant log output

```shell
failure to load model.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Eval bug: Vulkan llama.cpp > 64GB Graphics card load bug. #16575

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: Vulkan llama.cpp > 64GB Graphics card load bug. #16575

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions