-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan Device memory allocation failed (ErrorOutOfDeviceMemory ) #5441
Comments
OpenCL does not run the full model on the GPU, it just does matrix multiplications, scalar multiplications and scalar additions, all the rest is done by the CPU. That means it uses a little less VRAM, yes. I guess you're using the GPU for a GUI at the same time, so it has other programs occupying its VRAM and doesn't find enough for Vulkan. Try using q5_k or q4_k. |
Thanks for reply. I have no GUI on this machine. Anyway, 23 layers uses ~4GB and run without problems, 24 layers rise this error. According AMD docs: https://gpuopen.com/learn/vulkan-device-memory/ For Vulkan implementations could be used different Heap levels. And it looks like this implementation uses Heaps[1] instead of Heaps[0]. If so, it is the problem, because my vulkaninfo shows 8GB for Heaps[0] and only 4GB for Heaps[1] |
You didn't post the whole vulkaninfo, but I'm pretty sure it's using the right heap. It looks for device-local memory first. The 4GB heap is host-visible memory (RAM). This is used, too, but only for layers you didn't put on the GPU and for staging buffers. |
full vulkaninfo output in attachment FYI, I have tried run with another RX580 8GB just to avoid hardware issues and got the same error. |
Looks perfectly fine. It's using the right memory. Try compiling with |
Thanks! So, I can't understand this output, but here it is: |
So it's just trying to allocate a buffer of size diff --git a/ggml-vulkan.cpp b/ggml-vulkan.cpp
index 7834e635..1d1cdc82 100644
--- a/ggml-vulkan.cpp
+++ b/ggml-vulkan.cpp
@@ -1186,6 +1186,8 @@ void ggml_vk_init(ggml_backend_vk_context * ctx, size_t idx) {
ctx->device.lock()->max_memory_allocation_size = props3.maxMemoryAllocationSize;
}
+ ctx->device.lock()->max_memory_allocation_size = 2147483646;
+
ctx->device.lock()->vendor_id = ctx->device.lock()->properties.vendorID;
ctx->device.lock()->subgroup_size = subgroup_props.subgroupSize;
ctx->device.lock()->uma = ctx->device.lock()->properties.deviceType == vk::PhysicalDeviceType::eIntegratedGpu; |
Great! It's working now! ~11 tokens\sec By the way, what is your plans for the feature of model split on several GPU? ;) |
Great! But that's not a real solution yet. Let's keep the issue open until there's a way to lower the limit without code changes. Maybe an environment variable? |
I have found some interesting info in Vulkan Specs: "Some platforms may have a limit on the maximum size of a single allocation. For example, certain systems may fail to create allocations with a size greater than or equal to 4GB. Such a limit is implementation-dependent, and if such a failure occurs then the error VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned. This limit is advertised in VkPhysicalDeviceMaintenance3Properties::maxMemoryAllocationSize." |
Yeah, if you look at the patch I gave you that is what I'm doing in the line above the new one. Your GPU reports a limit of 4GB, but fails when I actually try to allocate (slightly less than) 4GB. 2GB seems fine. |
I have made some R&D and can give you additional info:
Vulkan0: AMD RADV POLARIS10 | uma: 0 | fp16: 0 | warp size: 64
vk::PhysicalDeviceMemoryProperties mem_props = ctx->device.lock()->physical_device.getMemoryProperties(); What are your params for Heap[1] and maxAllocationSize? |
Yes, that means the actual limit is only slightly below 4GB. But heap 1 is in RAM, it is host-pinned memory, not device memory. It is purely coincidental that its heap size has that effect there. The amount of available host-pinned memory often corresponds to half of the available RAM, but that depends on the driver. The allocation that fails is one for heap 0, for the VRAM. |
Yes, you are absolutely right. Heap[1] is depending on installed RAM. One more test shows , that somehow it is important to feat with Heap[1]:
Now I'm going to install RAM back and test it again. So, somehow system RAM and Heap[1] is linked to this error. |
Maybe, but I don't see how. You are in an unusual situation, with an old driver and 8GB VRAM + 8GB RAM (right?). An up-to-date mesa might fix it. More RAM might fix it, maybe. But I'm not seeing similar issues on other devices. I'll add an environment parameter soon to reduce the max buffer size, which will allow you to work around this issue. I don't see a better way of fixing this right now. |
Just make buffer size min of two: heap[1] or maxMemAllocationSize ? |
No, that is specific to your setup. It really is an allocation to heap 0 that fails. A mesa dev might be able to give more details, but they would likely just tell you to try an up-to-date mesa first. |
Have the same issue with some models with But its model specifcic, some small models ~4GB don't work while some large ones (>10GB) work. Here is vkinfo if anyone finds it useful. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
System: OpenSuse Leap 15.4
GPU: AMD RX580 8GB
Vulkan Instance Version: 1.3.275
VkPhysicalDeviceMemoryProperties:
memoryHeaps: count = 3
memoryHeaps[0]:
size = 8321499136 (0x1f0000000) (7.75 GiB)
budget = 8310034432 (0x1ef511000) (7.74 GiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1]:
size = 4133625856 (0xf6622000) (3.85 GiB)
budget = 4124254208 (0xf5d32000) (3.84 GiB)
usage = 0 (0x00000000) (0.00 B)
flags:
None
memoryHeaps[2]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 256970752 (0x0f511000) (245.07 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
Vulkan0: AMD RADV POLARIS10 | uma: 0 | fp16: 0 | warp size: 64
GGUF model: mistral-7b-instruct-v0.2.Q6_K.gguf
First of all, thanks to Occam for new Vulkan implementation of Llama.CPP!
I have tried to run llama.cpp according instruction "without Docker":
--> ./bin/main -m "PATH_TO_mistral-7b-instruct-v0.2.Q6_K.gguf" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4
Got an error:
ggml_vulkan: Device memory allocation of size 4257734656 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/root/GPT/GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'
main: error: unable to load model
FYI:
If I reduce -ngl to 23 layers then everything worked properly, but slowly.
llm_load_tensors: CPU buffer size = 5666.09 MiB
llm_load_tensors: Vulkan0 buffer size = 3925.09 MiB
6.51 tokens per second (OpenCL version loads full model on the same GPU and shows ~12 tokens\sec)
The text was updated successfully, but these errors were encountered: