Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

realKarthikNair · 2024-01-24T15:09:47Z

System Info

GPT4All version : 2.6.1

OS, kernel and Python

karthik@fedora:~$ cat /etc/fedora-release | cut -c -17 && uname -sr && python --version && python3.10 --version
Fedora release 39
Linux 6.6.13-200.fc39.x86_64
Python 3.12.1
Python 3.10.13

Memory info

karthik@fedora:~$ free
               total        used        free      shared  buff/cache   available
Mem:        15633228     2679428     4617172       38508     8723496    12953800
Swap:       52236280      683776    51552504

karthik@fedora:~$ swapon
NAME           TYPE       SIZE   USED PRIO
/dev/nvme0n1p6 partition   20G     0B    1
/dev/zram0     partition 29.8G 666.8M  100

GPU and CPU info

karthik@fedora:~$ inxi
CPU: 8-core AMD Ryzen 7 7840HS w/ Radeon 780M Graphics (-MT MCP-)
speed/min/max: 709/400/5137:5293:5608:6080:5449:5764:5924 MHz
Kernel: 6.6.13-200.fc39.x86_64 x86_64 Up: 1h 21m Mem: 2.53/14.91 GiB (17.0%)
Storage: 1011.17 GiB (18.0% used) Procs: 413 Shell: Bash inxi: 3.3.31
karthik@fedora:~$

karthik@fedora:~$ nvidia-smi
Wed Jan 24 20:38:10 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   42C    P0              19W / 105W |      7MiB /  8188MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Information

The official example notebooks/scripts
My own modified scripts

Reproduction

Try to load Wizard 1.2 model
Unable to allocate memory so crashes

karthik@fedora:~$ ./gpt4all/bin/chat 
[Warning] (Wed Jan 24 20:34:29 2024): Could not find the Qt platform plugin "wayland" in ""
[Warning] (Wed Jan 24 20:34:29 2024): Could not connect "org.freedesktop.IBus" to globalEngineChanged(QString)
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chat "/home/karthik/.local/share/nomic.ai/GPT4All//gpt4all-5a42a64c-864a-458f-beec-508935f6c28e.chat"
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chat "/home/karthik/.local/share/nomic.ai/GPT4All//gpt4all-5c19a817-c6ab-4891-8f53-b0d65e3c6eed.chat"
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chats took: 3 ms
[Warning] (Wed Jan 24 20:34:32 2024): ERROR: Previous attempt to load model resulted in crash for `wizardlm-13b-v1.2.Q4_0.gguf` most likely due to insufficient memory. You should either remove this model or decrease your system RAM usage by closing other applications. id "1ef2661f-f65a-4cba-a2d6-31bc4d85b5a2"
Error allocating memory ErrorOutOfDeviceMemory
[Warning] (Wed Jan 24 20:34:43 2024): Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt.
You must not let any exception whatsoever propagate through Qt code.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error allocating vulkan memory.
Aborted (core dumped)

Expected behavior

Try to load Wizard 1.2 model
Wizard 1.2 model loads and can interact with the chatbot on GPT4All GUI, which does work on the same system on Windows when I tried

Smaller models does work on Linux with and without GPU and throws no such errors as above

cebtenzzre · 2024-01-24T16:22:39Z

It'd be helpful if you could build GPT4All from source as described here with your build configuration set to Debug, open the directory containing the built chat and run gdb chat, and then run to run it and thread apply all bt to get the backtrace - then post the output.

realKarthikNair · 2024-01-24T19:10:58Z

@cebtenzzre here is the output of thread apply all bt

Please let me know if anything else is required

gdb.txt

cebtenzzre · 2024-01-24T23:06:17Z

Could you please run gdb chat again but use break ggml-vulkan.cpp:399, run, and then bt when it stops?

It would also be useful to have the output of vulkaninfo --summary.

realKarthikNair · 2024-01-25T03:10:49Z

Here is the output of karthik@fedora:~$ vulkaninfo --summary > Temp/vulkan_summary.txt

vulkan_summary.txt

realKarthikNair · 2024-01-25T03:13:49Z

As you can see, my system has both an integrated radeon GPU and a dedicated Nvidia one.

I have tried gpt4all chat with all these system combinations on linux :

default (hybrid mode)
integrated (turned dedicated GPU off)
dGPU (set mux switch to Nvidia one)

and the issue can be reproduced in all cases

cebtenzzre · 2024-01-25T03:44:31Z

Sorry, that should be ggml-kompute.cpp:409 - that was the file and line number from 2.6.1, but you're obviously building from the latest main branch.

Which GPU do you have selected in the UI?

realKarthikNair · 2024-01-25T04:18:40Z

It is showing No source file named ggml-kompute.cpp.

(gdb) break ggml-kompute.cpp:409
No source file named ggml-kompute.cpp.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) Quit
(gdb) 
[1]+  Stopped                 gdb chat
karthik@fedora:~/.../bin$ locate ggml-kompute.cpp
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Debug/llmodel/CMakeFiles/ggml-mainline-avxonly.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Debug/llmodel/CMakeFiles/ggml-mainline-default.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Release/llmodel/CMakeFiles/ggml-mainline-avxonly.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Release/llmodel/CMakeFiles/ggml-mainline-default.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-kompute.cpp
karthik@fedora:~/.../bin$

realKarthikNair · 2024-01-25T04:21:57Z

Which GPU do you have selected in the UI?

It was RTX 4060

I tried with CPU now and Wizard 1.2 is working.

Just noticed that on Windows too its working via CPU instead of RTX 4060 with 8 GB VRAM

My apologies for wasting your time.

Should I close the issue?

cebtenzzre · 2024-01-25T14:32:32Z

We still need to at least fix the fallback to CPU on Linux. Could you join the Discord and ping me? It seems like you're not getting the debug symbols that you should be - it may be more straightforward to just build with cmake instead.

realKarthikNair · 2024-01-25T14:59:27Z

Sure, just did.

cebtenzzre · 2024-01-31T21:47:00Z

Marking as fixed in the next release because of 6db5307 - testing would be appreciated. It seems like this can be reproduced easily by setting n_ctx to something really high - Mistral models will let you do this on the latest main, many others are limited to 4096 now.

cebtenzzre · 2024-02-01T17:08:46Z

Fixed in v2.6.2.

cebtenzzre added bug Something isn't working chat gpt4all-chat issues labels Jan 24, 2024

This comment was marked as outdated.

Sign in to view

realKarthikNair closed this as completed Jan 25, 2024

cebtenzzre reopened this Jan 25, 2024

cebtenzzre added the awaiting-release issue is awaiting next release label Jan 31, 2024

cebtenzzre changed the title ~~model memory allocation error on Linux while works on Windows (same machine)~~ Crash due to unhandled exception in ggml_vk_allocate in llama_kv_cache_init Jan 31, 2024

cebtenzzre changed the title ~~Crash due to unhandled exception in ggml_vk_allocate in llama_kv_cache_init~~ Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init Jan 31, 2024

cebtenzzre closed this as completed Feb 1, 2024

cebtenzzre removed the awaiting-release issue is awaiting next release label Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

realKarthikNair commented Jan 24, 2024 •

edited

cebtenzzre commented Jan 24, 2024

realKarthikNair commented Jan 24, 2024

cebtenzzre commented Jan 24, 2024 •

edited

This comment was marked as outdated.

realKarthikNair commented Jan 25, 2024

realKarthikNair commented Jan 25, 2024

This comment was marked as outdated.

cebtenzzre commented Jan 25, 2024 •

edited

realKarthikNair commented Jan 25, 2024

realKarthikNair commented Jan 25, 2024

cebtenzzre commented Jan 25, 2024

realKarthikNair commented Jan 25, 2024

cebtenzzre commented Jan 31, 2024

cebtenzzre commented Feb 1, 2024

Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

Comments

realKarthikNair commented Jan 24, 2024 • edited

System Info

Information

Reproduction

Expected behavior

cebtenzzre commented Jan 24, 2024

realKarthikNair commented Jan 24, 2024

cebtenzzre commented Jan 24, 2024 • edited

This comment was marked as outdated.

realKarthikNair commented Jan 25, 2024

realKarthikNair commented Jan 25, 2024

This comment was marked as outdated.

cebtenzzre commented Jan 25, 2024 • edited

realKarthikNair commented Jan 25, 2024

realKarthikNair commented Jan 25, 2024

cebtenzzre commented Jan 25, 2024

realKarthikNair commented Jan 25, 2024

cebtenzzre commented Jan 31, 2024

cebtenzzre commented Feb 1, 2024

realKarthikNair commented Jan 24, 2024 •

edited

cebtenzzre commented Jan 24, 2024 •

edited

cebtenzzre commented Jan 25, 2024 •

edited