Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

Closed
2 tasks
realKarthikNair opened this issue Jan 24, 2024 · 14 comments
Closed
2 tasks
Labels
bug Something isn't working chat gpt4all-chat issues

Comments

@realKarthikNair
Copy link
Contributor

realKarthikNair commented Jan 24, 2024

System Info

GPT4All version : 2.6.1

  1. OS, kernel and Python
karthik@fedora:~$ cat /etc/fedora-release | cut -c -17 && uname -sr && python --version && python3.10 --version
Fedora release 39
Linux 6.6.13-200.fc39.x86_64
Python 3.12.1
Python 3.10.13
  1. Memory info
karthik@fedora:~$ free
               total        used        free      shared  buff/cache   available
Mem:        15633228     2679428     4617172       38508     8723496    12953800
Swap:       52236280      683776    51552504
karthik@fedora:~$ swapon
NAME           TYPE       SIZE   USED PRIO
/dev/nvme0n1p6 partition   20G     0B    1
/dev/zram0     partition 29.8G 666.8M  100
  1. GPU and CPU info
karthik@fedora:~$ inxi
CPU: 8-core AMD Ryzen 7 7840HS w/ Radeon 780M Graphics (-MT MCP-)
speed/min/max: 709/400/5137:5293:5608:6080:5449:5764:5924 MHz
Kernel: 6.6.13-200.fc39.x86_64 x86_64 Up: 1h 21m Mem: 2.53/14.91 GiB (17.0%)
Storage: 1011.17 GiB (18.0% used) Procs: 413 Shell: Bash inxi: 3.3.31
karthik@fedora:~$ 
karthik@fedora:~$ nvidia-smi
Wed Jan 24 20:38:10 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   42C    P0              19W / 105W |      7MiB /  8188MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         

Information

  • The official example notebooks/scripts
  • My own modified scripts

Reproduction

  1. Try to load Wizard 1.2 model
  2. Unable to allocate memory so crashes
karthik@fedora:~$ ./gpt4all/bin/chat 
[Warning] (Wed Jan 24 20:34:29 2024): Could not find the Qt platform plugin "wayland" in ""
[Warning] (Wed Jan 24 20:34:29 2024): Could not connect "org.freedesktop.IBus" to globalEngineChanged(QString)
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chat "/home/karthik/.local/share/nomic.ai/GPT4All//gpt4all-5a42a64c-864a-458f-beec-508935f6c28e.chat"
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chat "/home/karthik/.local/share/nomic.ai/GPT4All//gpt4all-5c19a817-c6ab-4891-8f53-b0d65e3c6eed.chat"
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chats took: 3 ms
[Warning] (Wed Jan 24 20:34:32 2024): ERROR: Previous attempt to load model resulted in crash for `wizardlm-13b-v1.2.Q4_0.gguf` most likely due to insufficient memory. You should either remove this model or decrease your system RAM usage by closing other applications. id "1ef2661f-f65a-4cba-a2d6-31bc4d85b5a2"
Error allocating memory ErrorOutOfDeviceMemory
[Warning] (Wed Jan 24 20:34:43 2024): Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt.
You must not let any exception whatsoever propagate through Qt code.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error allocating vulkan memory.
Aborted (core dumped)

Expected behavior

  1. Try to load Wizard 1.2 model
  2. Wizard 1.2 model loads and can interact with the chatbot on GPT4All GUI, which does work on the same system on Windows when I tried

Smaller models does work on Linux with and without GPU and throws no such errors as above

@cebtenzzre
Copy link
Member

It'd be helpful if you could build GPT4All from source as described here with your build configuration set to Debug, open the directory containing the built chat and run gdb chat, and then run to run it and thread apply all bt to get the backtrace - then post the output.

@cebtenzzre cebtenzzre added bug Something isn't working chat gpt4all-chat issues labels Jan 24, 2024
@realKarthikNair
Copy link
Contributor Author

@cebtenzzre here is the output of thread apply all bt

Please let me know if anything else is required

gdb.txt

@cebtenzzre
Copy link
Member

cebtenzzre commented Jan 24, 2024

Could you please run gdb chat again but use break ggml-vulkan.cpp:399, run, and then bt when it stops?

It would also be useful to have the output of vulkaninfo --summary.

@realKarthikNair

This comment was marked as outdated.

@realKarthikNair
Copy link
Contributor Author

Here is the output of karthik@fedora:~$ vulkaninfo --summary > Temp/vulkan_summary.txt

vulkan_summary.txt

@realKarthikNair
Copy link
Contributor Author

As you can see, my system has both an integrated radeon GPU and a dedicated Nvidia one.

I have tried gpt4all chat with all these system combinations on linux :

  • default (hybrid mode)
  • integrated (turned dedicated GPU off)
  • dGPU (set mux switch to Nvidia one)

and the issue can be reproduced in all cases

@realKarthikNair

This comment was marked as outdated.

@cebtenzzre
Copy link
Member

cebtenzzre commented Jan 25, 2024

Sorry, that should be ggml-kompute.cpp:409 - that was the file and line number from 2.6.1, but you're obviously building from the latest main branch.

Which GPU do you have selected in the UI?

@realKarthikNair
Copy link
Contributor Author

It is showing No source file named ggml-kompute.cpp.

(gdb) break ggml-kompute.cpp:409
No source file named ggml-kompute.cpp.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) Quit
(gdb) 
[1]+  Stopped                 gdb chat
karthik@fedora:~/.../bin$ locate ggml-kompute.cpp
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Debug/llmodel/CMakeFiles/ggml-mainline-avxonly.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Debug/llmodel/CMakeFiles/ggml-mainline-default.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Release/llmodel/CMakeFiles/ggml-mainline-avxonly.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/build-gpt4all-chat-Desktop-Release/llmodel/CMakeFiles/ggml-mainline-default.dir/llama.cpp-mainline/ggml-kompute.cpp.o
/home/karthik/Temp/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-kompute.cpp
karthik@fedora:~/.../bin$ 

@realKarthikNair
Copy link
Contributor Author

Which GPU do you have selected in the UI?

It was RTX 4060

I tried with CPU now and Wizard 1.2 is working.

Just noticed that on Windows too its working via CPU instead of RTX 4060 with 8 GB VRAM

My apologies for wasting your time.

Should I close the issue?

@cebtenzzre
Copy link
Member

We still need to at least fix the fallback to CPU on Linux. Could you join the Discord and ping me? It seems like you're not getting the debug symbols that you should be - it may be more straightforward to just build with cmake instead.

@cebtenzzre cebtenzzre reopened this Jan 25, 2024
@realKarthikNair
Copy link
Contributor Author

Sure, just did.

@cebtenzzre cebtenzzre added the awaiting-release issue is awaiting next release label Jan 31, 2024
@cebtenzzre
Copy link
Member

Marking as fixed in the next release because of 6db5307 - testing would be appreciated. It seems like this can be reproduced easily by setting n_ctx to something really high - Mistral models will let you do this on the latest main, many others are limited to 4096 now.

@cebtenzzre cebtenzzre changed the title model memory allocation error on Linux while works on Windows (same machine) Crash due to unhandled exception in ggml_vk_allocate in llama_kv_cache_init Jan 31, 2024
@cebtenzzre cebtenzzre changed the title Crash due to unhandled exception in ggml_vk_allocate in llama_kv_cache_init Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init Jan 31, 2024
@cebtenzzre
Copy link
Member

Fixed in v2.6.2.

@cebtenzzre cebtenzzre removed the awaiting-release issue is awaiting next release label Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chat gpt4all-chat issues
Projects
None yet
Development

No branches or pull requests

2 participants