Skip to content

Eval bug: Vulkan llama.cpp > 64GB Graphics card load bug. #16575

@n545454

Description

@n545454

Name and Version

v1.52.1 (Vulkan llama.cpp(Windows)

Operating systems

Windows

GGML backends

Vulkan

Hardware

AMD Ryzen Strix Halo 365+ 128GB, setup as 96GB Vram, 32Gb system Ram.

Models

LMStudio.
oss-120B max context for example. shows only 61GB(61316) usage in vram, 31.3GB ram before failing load.

loading any LLM Model requiring more than 64GB in vram, so model +context.

Problem description & steps to reproduce

LMStudio.
AMD 365 Strix Ryzen Max+ 128GB.

Set with a 96GB Vram, 32Gb Ram setup.

When loading any LLM there appears to be a 64GB limit on the load(AMD adrenaline SW), I am assuming the RAM is loaded to 32GB and offloads in segments to the Vram memory. However there are 2 failures in the code somewhere. There is some logical 64GB limit, when I have 96GB VRAM, it should be able to load past the 64GB easily, in 16GB sections.

Can the code be verified it is loading the Ram in 16GB blocks, until the GPU VRam has the model installed. Can this be set somewhere as I do not see any setting in lmstudio for this control.

I should be able to load above 64GB, the context length should also be in Vram.

llama is failing to load LLM into Vram above 64GB, the issue might be with the Ram loading and transfer to vram.

61GB+31GB = 92GB, but I do not see it transfer the ram to vram and then try to load again in ram. If it did I would know where the problem was.

It needs to do the ram->vram transfer, then to show the loading ram again and then the load fail, the load fails when transferring ram to vram? above 64GB?

FYI I can load oss-120B with a context size of 20k, and it takes up vram 64425(62GB) and ram is at 11.4GB, Total= 73GB, way short of 128GB total memory, with all layers loaded onto GPU. I cannot increase the context size such that it goes over 64GB in Vram, there is some bug here.

Have achieved 67366MB VRam, ~10GB ram, context 62557, but cannot get much better than that without it failing to load. So it would seem there might be a context length loading issue with llama.cpp.

First Bad Commit

No response

Relevant log output

failure to load model.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions