Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441

Eliastrt · 2024-02-10T17:34:34Z

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

System: OpenSuse Leap 15.4
GPU: AMD RX580 8GB
Vulkan Instance Version: 1.3.275
VkPhysicalDeviceMemoryProperties:

memoryHeaps: count = 3
memoryHeaps[0]:
size = 8321499136 (0x1f0000000) (7.75 GiB)
budget = 8310034432 (0x1ef511000) (7.74 GiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1]:
size = 4133625856 (0xf6622000) (3.85 GiB)
budget = 4124254208 (0xf5d32000) (3.84 GiB)
usage = 0 (0x00000000) (0.00 B)
flags:
None
memoryHeaps[2]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 256970752 (0x0f511000) (245.07 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT

Vulkan0: AMD RADV POLARIS10 | uma: 0 | fp16: 0 | warp size: 64

GGUF model: mistral-7b-instruct-v0.2.Q6_K.gguf

First of all, thanks to Occam for new Vulkan implementation of Llama.CPP!

I have tried to run llama.cpp according instruction "without Docker":

--> ./bin/main -m "PATH_TO_mistral-7b-instruct-v0.2.Q6_K.gguf" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4

Got an error:
ggml_vulkan: Device memory allocation of size 4257734656 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/root/GPT/GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'
main: error: unable to load model

FYI:
If I reduce -ngl to 23 layers then everything worked properly, but slowly.

llm_load_tensors: CPU buffer size = 5666.09 MiB
llm_load_tensors: Vulkan0 buffer size = 3925.09 MiB
6.51 tokens per second (OpenCL version loads full model on the same GPU and shows ~12 tokens\sec)

0cc4m · 2024-02-11T05:48:11Z

OpenCL does not run the full model on the GPU, it just does matrix multiplications, scalar multiplications and scalar additions, all the rest is done by the CPU. That means it uses a little less VRAM, yes. I guess you're using the GPU for a GUI at the same time, so it has other programs occupying its VRAM and doesn't find enough for Vulkan. Try using q5_k or q4_k.

Eliastrt · 2024-02-11T12:50:30Z

Thanks for reply.

I have no GUI on this machine. Anyway, 23 layers uses ~4GB and run without problems, 24 layers rise this error.

According AMD docs: https://gpuopen.com/learn/vulkan-device-memory/

For Vulkan implementations could be used different Heap levels. And it looks like this implementation uses Heaps[1] instead of Heaps[0]. If so, it is the problem, because my vulkaninfo shows 8GB for Heaps[0] and only 4GB for Heaps[1]

0cc4m · 2024-02-11T14:08:14Z

Thanks for reply.

I have no GUI on this machine. Anyway, 23 layers uses ~4GB and run without problems, 24 layers rise this error.

According AMD docs: https://gpuopen.com/learn/vulkan-device-memory/

For Vulkan implementations could be used different Heap levels. And it looks like this implementation uses Heaps[1] instead of Heaps[0]. If so, it is the problem, because my vulkaninfo shows 8GB for Heaps[0] and only 4GB for Heaps[1]

You didn't post the whole vulkaninfo, but I'm pretty sure it's using the right heap. It looks for device-local memory first. The 4GB heap is host-visible memory (RAM). This is used, too, but only for layers you didn't put on the GPU and for staging buffers.

Eliastrt · 2024-02-12T09:05:18Z

full vulkaninfo output in attachment
vulkaninfo.txt

FYI, I have tried run with another RX580 8GB just to avoid hardware issues and got the same error.

0cc4m · 2024-02-12T17:02:04Z

full vulkaninfo output in attachment vulkaninfo.txt

FYI, I have tried run with another RX580 8GB just to avoid hardware issues and got the same error.

Looks perfectly fine. It's using the right memory. Try compiling with LLAMA_VULKAN_DEBUG=1 and LLAMA_VULKAN_VALIDATE=1 (this needs Vulkan validation layers installed). Then run it again in the way that causes it to error and upload the whole output. It's gonna be very verbose.

Eliastrt · 2024-02-12T19:50:41Z

Thanks! So, I can't understand this output, but here it is:
test_vulkan_rx580_8GB_part_1.txt
test_vulkan_rx580_8GB_part_2.txt

0cc4m · 2024-02-12T20:06:06Z

Thanks! So, I can't understand this output, but here it is: test_vulkan_rx580_8GB_part_1.txt test_vulkan_rx580_8GB_part_2.txt

So it's just trying to allocate a buffer of size 4257734656 bytes. Your driver reports a max buffer size of 4294967292 bytes, so this should work. Maybe the limit is lower, or it has an issue with memory fragmentation? Try this patch to halve the limit:

diff --git a/ggml-vulkan.cpp b/ggml-vulkan.cpp
index 7834e635..1d1cdc82 100644
--- a/ggml-vulkan.cpp
+++ b/ggml-vulkan.cpp
@@ -1186,6 +1186,8 @@ void ggml_vk_init(ggml_backend_vk_context * ctx, size_t idx) {
         ctx->device.lock()->max_memory_allocation_size = props3.maxMemoryAllocationSize;
     }

+    ctx->device.lock()->max_memory_allocation_size = 2147483646;
+
     ctx->device.lock()->vendor_id = ctx->device.lock()->properties.vendorID;
     ctx->device.lock()->subgroup_size = subgroup_props.subgroupSize;
     ctx->device.lock()->uma = ctx->device.lock()->properties.deviceType == vk::PhysicalDeviceType::eIntegratedGpu;

Eliastrt · 2024-02-12T21:15:05Z

Great! It's working now! ~11 tokens\sec

By the way, what is your plans for the feature of model split on several GPU? ;)

0cc4m · 2024-02-13T08:18:40Z

Great! But that's not a real solution yet. Let's keep the issue open until there's a way to lower the limit without code changes. Maybe an environment variable?

Eliastrt · 2024-02-14T09:31:26Z

I have found some interesting info in Vulkan Specs:
https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#vkAllocateMemory

"Some platforms may have a limit on the maximum size of a single allocation. For example, certain systems may fail to create allocations with a size greater than or equal to 4GB. Such a limit is implementation-dependent, and if such a failure occurs then the error VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned. This limit is advertised in VkPhysicalDeviceMaintenance3Properties::maxMemoryAllocationSize."

0cc4m · 2024-02-14T09:53:33Z

I have found some interesting info in Vulkan Specs: https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#vkAllocateMemory

"Some platforms may have a limit on the maximum size of a single allocation. For example, certain systems may fail to create allocations with a size greater than or equal to 4GB. Such a limit is implementation-dependent, and if such a failure occurs then the error VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned. This limit is advertised in VkPhysicalDeviceMaintenance3Properties::maxMemoryAllocationSize."

Yeah, if you look at the patch I gave you that is what I'm doing in the line above the new one. Your GPU reports a limit of 4GB, but fails when I actually try to allocate (slightly less than) 4GB. 2GB seems fine.

Eliastrt · 2024-02-14T10:29:22Z

I have made some R&D and can give you additional info:

For my device Heap[1] < maxMemoryAllocationSize

Vulkan0: AMD RADV POLARIS10 | uma: 0 | fp16: 0 | warp size: 64
DEBUG: maxMemoryAllocationSize=4294967292
DEBUG: maxBufferSize=0
DEBUG: MemoryHeap[1] Size=4133625856

If I assign size of Heap[1] everything working fine:

vk::PhysicalDeviceMemoryProperties mem_props = ctx->device.lock()->physical_device.getMemoryProperties();
....
if (maintenance4_support) {
ctx->device.lock()->max_memory_allocation_size = std::min(
std::min(props3.maxMemoryAllocationSize, props4.maxBufferSize), mem_props.memoryHeaps[1].size
);
} else {
ctx->device.lock()->max_memory_allocation_size = std::min(
props3.maxMemoryAllocationSize, mem_props.memoryHeaps[1].size
);
}

What are your params for Heap[1] and maxAllocationSize?

0cc4m · 2024-02-14T10:36:32Z

Yes, that means the actual limit is only slightly below 4GB. But heap 1 is in RAM, it is host-pinned memory, not device memory. It is purely coincidental that its heap size has that effect there. The amount of available host-pinned memory often corresponds to half of the available RAM, but that depends on the driver. The allocation that fails is one for heap 0, for the VRAM.

Eliastrt · 2024-02-14T11:34:42Z

Yes, you are absolutely right. Heap[1] is depending on installed RAM. One more test shows , that somehow it is important to feat with Heap[1]:

I have removed one RAM, so now in system only 4GB and MemoryHeap[1] Size=3221225472
I have tried to run again with hardcoded value of 4133625856 (previous Heap[1] size value that run successfully last time) and got the error: Device memory allocation of size 4126957568 failed.

Now I'm going to install RAM back and test it again. So, somehow system RAM and Heap[1] is linked to this error.

0cc4m · 2024-02-14T19:20:03Z

Maybe, but I don't see how. You are in an unusual situation, with an old driver and 8GB VRAM + 8GB RAM (right?). An up-to-date mesa might fix it. More RAM might fix it, maybe. But I'm not seeing similar issues on other devices. I'll add an environment parameter soon to reduce the max buffer size, which will allow you to work around this issue. I don't see a better way of fixing this right now.

Eliastrt · 2024-02-15T10:15:30Z

Just make buffer size min of two: heap[1] or maxMemAllocationSize ?

0cc4m · 2024-02-15T11:56:34Z

Just make buffer size min of two: heap[1] or maxMemAllocationSize ?

No, that is specific to your setup. It really is an allocation to heap 0 that fails. A mesa dev might be able to give more details, but they would likely just tell you to try an up-to-date mesa first.

Deins · 2024-02-24T14:36:21Z

Have the same issue with some models with AMD Radeon RX 7900 XTX with 24GB vram on windows.
But could not find a value that works. Went down to ctx->device.lock()->max_memory_allocation_size = 2147483648 / 32; when got error that the limit is lower than layer size tensor output.weight is too large to fit in a Vulkan0 buffer (tensor size: 107520000, max buffer size: 67108864)

But its model specifcic, some small models ~4GB don't work while some large ones (>10GB) work.
It clearly is some kind of allocation limitation, that fail to allocate large layers.

Here is vkinfo if anyone finds it useful.
vulkaninfo.txt

github-actions · 2024-04-10T01:06:16Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Eliastrt added the bug-unconfirmed label Feb 10, 2024

Eliastrt closed this as completed Feb 13, 2024

0cc4m reopened this Feb 13, 2024

0cc4m mentioned this issue Mar 2, 2024

Vulkan Improvements #5835

Merged

github-actions bot added the stale label Mar 26, 2024

github-actions bot closed this as completed Apr 10, 2024

yli147 mentioned this issue Jul 12, 2024

Bug: Vulkan backend not work on an Imagination GPU on RISC-V Platform #8437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441

Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441

Eliastrt commented Feb 10, 2024

0cc4m commented Feb 11, 2024

Eliastrt commented Feb 11, 2024

0cc4m commented Feb 11, 2024

Eliastrt commented Feb 12, 2024 •

edited

Loading

0cc4m commented Feb 12, 2024

Eliastrt commented Feb 12, 2024

0cc4m commented Feb 12, 2024

Eliastrt commented Feb 12, 2024

0cc4m commented Feb 13, 2024

Eliastrt commented Feb 14, 2024

0cc4m commented Feb 14, 2024

Eliastrt commented Feb 14, 2024 •

edited

Loading

0cc4m commented Feb 14, 2024

Eliastrt commented Feb 14, 2024

0cc4m commented Feb 14, 2024

Eliastrt commented Feb 15, 2024

0cc4m commented Feb 15, 2024

Deins commented Feb 24, 2024

github-actions bot commented Apr 10, 2024

Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441

Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441

Comments

Eliastrt commented Feb 10, 2024

System: OpenSuse Leap 15.4 GPU: AMD RX580 8GB Vulkan Instance Version: 1.3.275 VkPhysicalDeviceMemoryProperties:

0cc4m commented Feb 11, 2024

Eliastrt commented Feb 11, 2024

0cc4m commented Feb 11, 2024

Eliastrt commented Feb 12, 2024 • edited Loading

0cc4m commented Feb 12, 2024

Eliastrt commented Feb 12, 2024

0cc4m commented Feb 12, 2024

Eliastrt commented Feb 12, 2024

0cc4m commented Feb 13, 2024

Eliastrt commented Feb 14, 2024

0cc4m commented Feb 14, 2024

Eliastrt commented Feb 14, 2024 • edited Loading

0cc4m commented Feb 14, 2024

Eliastrt commented Feb 14, 2024

0cc4m commented Feb 14, 2024

Eliastrt commented Feb 15, 2024

0cc4m commented Feb 15, 2024

Deins commented Feb 24, 2024

github-actions bot commented Apr 10, 2024

System: OpenSuse Leap 15.4
GPU: AMD RX580 8GB
Vulkan Instance Version: 1.3.275
VkPhysicalDeviceMemoryProperties:

Eliastrt commented Feb 12, 2024 •

edited

Loading

Eliastrt commented Feb 14, 2024 •

edited

Loading