Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441

Closed
Eliastrt opened this issue Feb 10, 2024 · 19 comments
Closed

Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441

Eliastrt opened this issue Feb 10, 2024 · 19 comments

Comments

@Eliastrt
Copy link

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.


System: OpenSuse Leap 15.4
GPU: AMD RX580 8GB
Vulkan Instance Version: 1.3.275
VkPhysicalDeviceMemoryProperties:

memoryHeaps: count = 3
memoryHeaps[0]:
size = 8321499136 (0x1f0000000) (7.75 GiB)
budget = 8310034432 (0x1ef511000) (7.74 GiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1]:
size = 4133625856 (0xf6622000) (3.85 GiB)
budget = 4124254208 (0xf5d32000) (3.84 GiB)
usage = 0 (0x00000000) (0.00 B)
flags:
None
memoryHeaps[2]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 256970752 (0x0f511000) (245.07 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT

Vulkan0: AMD RADV POLARIS10 | uma: 0 | fp16: 0 | warp size: 64

GGUF model: mistral-7b-instruct-v0.2.Q6_K.gguf


First of all, thanks to Occam for new Vulkan implementation of Llama.CPP!

I have tried to run llama.cpp according instruction "without Docker":

--> ./bin/main -m "PATH_TO_mistral-7b-instruct-v0.2.Q6_K.gguf" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4

Got an error:
ggml_vulkan: Device memory allocation of size 4257734656 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/root/GPT/GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'
main: error: unable to load model

FYI:
If I reduce -ngl to 23 layers then everything worked properly, but slowly.

llm_load_tensors: CPU buffer size = 5666.09 MiB
llm_load_tensors: Vulkan0 buffer size = 3925.09 MiB
6.51 tokens per second (OpenCL version loads full model on the same GPU and shows ~12 tokens\sec)

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 11, 2024

OpenCL does not run the full model on the GPU, it just does matrix multiplications, scalar multiplications and scalar additions, all the rest is done by the CPU. That means it uses a little less VRAM, yes. I guess you're using the GPU for a GUI at the same time, so it has other programs occupying its VRAM and doesn't find enough for Vulkan. Try using q5_k or q4_k.

@Eliastrt
Copy link
Author

Thanks for reply.

I have no GUI on this machine. Anyway, 23 layers uses ~4GB and run without problems, 24 layers rise this error.

According AMD docs: https://gpuopen.com/learn/vulkan-device-memory/

For Vulkan implementations could be used different Heap levels. And it looks like this implementation uses Heaps[1] instead of Heaps[0]. If so, it is the problem, because my vulkaninfo shows 8GB for Heaps[0] and only 4GB for Heaps[1]

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 11, 2024

Thanks for reply.

I have no GUI on this machine. Anyway, 23 layers uses ~4GB and run without problems, 24 layers rise this error.

According AMD docs: https://gpuopen.com/learn/vulkan-device-memory/

For Vulkan implementations could be used different Heap levels. And it looks like this implementation uses Heaps[1] instead of Heaps[0]. If so, it is the problem, because my vulkaninfo shows 8GB for Heaps[0] and only 4GB for Heaps[1]

You didn't post the whole vulkaninfo, but I'm pretty sure it's using the right heap. It looks for device-local memory first. The 4GB heap is host-visible memory (RAM). This is used, too, but only for layers you didn't put on the GPU and for staging buffers.

@Eliastrt
Copy link
Author

Eliastrt commented Feb 12, 2024

full vulkaninfo output in attachment
vulkaninfo.txt

FYI, I have tried run with another RX580 8GB just to avoid hardware issues and got the same error.

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 12, 2024

full vulkaninfo output in attachment vulkaninfo.txt

FYI, I have tried run with another RX580 8GB just to avoid hardware issues and got the same error.

Looks perfectly fine. It's using the right memory. Try compiling with LLAMA_VULKAN_DEBUG=1 and LLAMA_VULKAN_VALIDATE=1 (this needs Vulkan validation layers installed). Then run it again in the way that causes it to error and upload the whole output. It's gonna be very verbose.

@Eliastrt
Copy link
Author

Thanks! So, I can't understand this output, but here it is:
test_vulkan_rx580_8GB_part_1.txt
test_vulkan_rx580_8GB_part_2.txt

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 12, 2024

Thanks! So, I can't understand this output, but here it is: test_vulkan_rx580_8GB_part_1.txt test_vulkan_rx580_8GB_part_2.txt

So it's just trying to allocate a buffer of size 4257734656 bytes. Your driver reports a max buffer size of 4294967292 bytes, so this should work. Maybe the limit is lower, or it has an issue with memory fragmentation? Try this patch to halve the limit:

diff --git a/ggml-vulkan.cpp b/ggml-vulkan.cpp
index 7834e635..1d1cdc82 100644
--- a/ggml-vulkan.cpp
+++ b/ggml-vulkan.cpp
@@ -1186,6 +1186,8 @@ void ggml_vk_init(ggml_backend_vk_context * ctx, size_t idx) {
         ctx->device.lock()->max_memory_allocation_size = props3.maxMemoryAllocationSize;
     }

+    ctx->device.lock()->max_memory_allocation_size = 2147483646;
+
     ctx->device.lock()->vendor_id = ctx->device.lock()->properties.vendorID;
     ctx->device.lock()->subgroup_size = subgroup_props.subgroupSize;
     ctx->device.lock()->uma = ctx->device.lock()->properties.deviceType == vk::PhysicalDeviceType::eIntegratedGpu;

@Eliastrt
Copy link
Author

Great! It's working now! ~11 tokens\sec

By the way, what is your plans for the feature of model split on several GPU? ;)

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 13, 2024

Great! But that's not a real solution yet. Let's keep the issue open until there's a way to lower the limit without code changes. Maybe an environment variable?

@0cc4m 0cc4m reopened this Feb 13, 2024
@Eliastrt
Copy link
Author

I have found some interesting info in Vulkan Specs:
https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#vkAllocateMemory

"Some platforms may have a limit on the maximum size of a single allocation. For example, certain systems may fail to create allocations with a size greater than or equal to 4GB. Such a limit is implementation-dependent, and if such a failure occurs then the error VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned. This limit is advertised in VkPhysicalDeviceMaintenance3Properties::maxMemoryAllocationSize."

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 14, 2024

I have found some interesting info in Vulkan Specs: https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#vkAllocateMemory

"Some platforms may have a limit on the maximum size of a single allocation. For example, certain systems may fail to create allocations with a size greater than or equal to 4GB. Such a limit is implementation-dependent, and if such a failure occurs then the error VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned. This limit is advertised in VkPhysicalDeviceMaintenance3Properties::maxMemoryAllocationSize."

Yeah, if you look at the patch I gave you that is what I'm doing in the line above the new one. Your GPU reports a limit of 4GB, but fails when I actually try to allocate (slightly less than) 4GB. 2GB seems fine.

@Eliastrt
Copy link
Author

Eliastrt commented Feb 14, 2024

I have made some R&D and can give you additional info:

  1. For my device Heap[1] < maxMemoryAllocationSize

Vulkan0: AMD RADV POLARIS10 | uma: 0 | fp16: 0 | warp size: 64
DEBUG: maxMemoryAllocationSize=4294967292
DEBUG: maxBufferSize=0
DEBUG: MemoryHeap[1] Size=4133625856

  1. If I assign size of Heap[1] everything working fine:

vk::PhysicalDeviceMemoryProperties mem_props = ctx->device.lock()->physical_device.getMemoryProperties();
....
if (maintenance4_support) {
ctx->device.lock()->max_memory_allocation_size = std::min(
std::min(props3.maxMemoryAllocationSize, props4.maxBufferSize), mem_props.memoryHeaps[1].size
);
} else {
ctx->device.lock()->max_memory_allocation_size = std::min(
props3.maxMemoryAllocationSize, mem_props.memoryHeaps[1].size
);
}

What are your params for Heap[1] and maxAllocationSize?

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 14, 2024

Yes, that means the actual limit is only slightly below 4GB. But heap 1 is in RAM, it is host-pinned memory, not device memory. It is purely coincidental that its heap size has that effect there. The amount of available host-pinned memory often corresponds to half of the available RAM, but that depends on the driver. The allocation that fails is one for heap 0, for the VRAM.

@Eliastrt
Copy link
Author

Yes, you are absolutely right. Heap[1] is depending on installed RAM. One more test shows , that somehow it is important to feat with Heap[1]:

  1. I have removed one RAM, so now in system only 4GB and MemoryHeap[1] Size=3221225472
  2. I have tried to run again with hardcoded value of 4133625856 (previous Heap[1] size value that run successfully last time) and got the error: Device memory allocation of size 4126957568 failed.

Now I'm going to install RAM back and test it again. So, somehow system RAM and Heap[1] is linked to this error.

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 14, 2024

Maybe, but I don't see how. You are in an unusual situation, with an old driver and 8GB VRAM + 8GB RAM (right?). An up-to-date mesa might fix it. More RAM might fix it, maybe. But I'm not seeing similar issues on other devices. I'll add an environment parameter soon to reduce the max buffer size, which will allow you to work around this issue. I don't see a better way of fixing this right now.

@Eliastrt
Copy link
Author

Just make buffer size min of two: heap[1] or maxMemAllocationSize ?

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 15, 2024

Just make buffer size min of two: heap[1] or maxMemAllocationSize ?

No, that is specific to your setup. It really is an allocation to heap 0 that fails. A mesa dev might be able to give more details, but they would likely just tell you to try an up-to-date mesa first.

@Deins
Copy link
Contributor

Deins commented Feb 24, 2024

Have the same issue with some models with AMD Radeon RX 7900 XTX with 24GB vram on windows.
But could not find a value that works. Went down to ctx->device.lock()->max_memory_allocation_size = 2147483648 / 32; when got error that the limit is lower than layer size tensor output.weight is too large to fit in a Vulkan0 buffer (tensor size: 107520000, max buffer size: 67108864)

But its model specifcic, some small models ~4GB don't work while some large ones (>10GB) work.
It clearly is some kind of allocation limitation, that fail to allocate large layers.

Here is vkinfo if anyone finds it useful.
vulkaninfo.txt

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants