-
Couldn't load subscription status.
- Fork 13.4k
Description
Name and Version
v1.52.1 (Vulkan llama.cpp(Windows)
Operating systems
Windows
GGML backends
Vulkan
Hardware
AMD Ryzen Strix Halo 365+ 128GB, setup as 96GB Vram, 32Gb system Ram.
Models
LMStudio.
oss-120B max context for example. shows only 61GB(61316) usage in vram, 31.3GB ram before failing load.
loading any LLM Model requiring more than 64GB in vram, so model +context.
Problem description & steps to reproduce
LMStudio.
AMD 365 Strix Ryzen Max+ 128GB.
Set with a 96GB Vram, 32Gb Ram setup.
When loading any LLM there appears to be a 64GB limit on the load(AMD adrenaline SW), I am assuming the RAM is loaded to 32GB and offloads in segments to the Vram memory. However there are 2 failures in the code somewhere. There is some logical 64GB limit, when I have 96GB VRAM, it should be able to load past the 64GB easily, in 16GB sections.
Can the code be verified it is loading the Ram in 16GB blocks, until the GPU VRam has the model installed. Can this be set somewhere as I do not see any setting in lmstudio for this control.
I should be able to load above 64GB, the context length should also be in Vram.
llama is failing to load LLM into Vram above 64GB, the issue might be with the Ram loading and transfer to vram.
61GB+31GB = 92GB, but I do not see it transfer the ram to vram and then try to load again in ram. If it did I would know where the problem was.
It needs to do the ram->vram transfer, then to show the loading ram again and then the load fail, the load fails when transferring ram to vram? above 64GB?
FYI I can load oss-120B with a context size of 20k, and it takes up vram 64425(62GB) and ram is at 11.4GB, Total= 73GB, way short of 128GB total memory, with all layers loaded onto GPU. I cannot increase the context size such that it goes over 64GB in Vram, there is some bug here.
Have achieved 67366MB VRam, ~10GB ram, context 62557, but cannot get much better than that without it failing to load. So it would seem there might be a context length loading issue with llama.cpp.
First Bad Commit
No response
Relevant log output
failure to load model.