Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUI won't start on Windows (unhandled exception in ggml_vk_available_devices) #1477

Closed
1 of 10 tasks
ADD-eNavarro opened this issue Oct 6, 2023 · 26 comments · Fixed by nomic-ai/llama.cpp#12
Closed
1 of 10 tasks
Labels
bug Something isn't working chat gpt4all-chat issues

Comments

@ADD-eNavarro
Copy link

System Info

Hi, I'm running GPT4All on Windows Server 2022 Standard, AMD EPYC 7313 16-Core Processor at 3GHz, 30GB of RAM.
This computer also happens to have an A100, I'm hoping the issue is not there!
GPT4All was working fine until the other day, when I updated to version 2.4.9 and all of a sudden it wouldn't start. No feedback whatsoever, it just doesn't start.
I've downloaded the 2.5 pre-release today but I'm still having the same issue. Here's the event viewer record detail:
Error GPT4All pre-release.txt

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • backend
  • bindings
  • python-bindings
  • chat-ui
  • models
  • circleci
  • docker
  • api

Reproduction

GPT4All just doesn't start, even with admin privileges granted.

Expected behavior

Should start!

@cebtenzzre
Copy link
Member

It would be really helpful if you could build GPT4All from source in Debug mode, and run it under either the Visual Studio debugger, or windbg, in order to get the call stack. Unfortunately, the binaries we publish are stripped Release builds with very little information to assist debugging.

@ADD-eNavarro
Copy link
Author

That won't be easy. I'm not much of a developer, and cpp is not among the languages I know well. Also, I have security constraints, imposed by my enterprise, to install/run third party's code (I had to ask permission and wait for a week just to have the program installed). All in all, I don't see myself doing that.
Any volunteers?

@cebtenzzre cebtenzzre added bug Something isn't working chat gpt4all-chat issues labels Oct 10, 2023
@cosmic-snow
Copy link
Collaborator

You mean 2.4.19 not 2.4.9, right?

First of all, one thing you can try is rename your settings file, which is located at C:\Users\<name>\AppData\Roaming\nomic.ai\GPT4All.ini. Try giving it a different extension (so you have it backed up). A new one with default values will be created automatically the next time you start GPT4All.

If that doesn't help, you can also try adding a line device=CPU to the General section, or change the line if device= already exists there, e.g.:

[General]
device=CPU
...

Close the program before you do that and restart it afterwards.

@ADD-eNavarro ADD-eNavarro changed the title GPT4All not starting after update to version 2.4.9 GPT4All not starting after update to version 2.4.19 Oct 10, 2023
@ADD-eNavarro
Copy link
Author

You mean 2.4.19 not 2.4.9, right?

Yes, sorry, already updated the issue title.

First of all, one thing you can try is rename your settings file, which is located at C:\Users\<name>\AppData\Roaming\nomic.ai\GPT4All.ini. Try giving it a different extension (so you have it backed up). A new one with default values will be created automatically the next time you start GPT4All.

Changed the extension, no success: GPT4All still won't start.

If that doesn't help, you can also try adding a line device=CPU to the General section, or change the line if device= already exists there, e.g.:

[General]
device=CPU
...

Close the program before you do that and restart it afterwards.
I didn't need to close the program for obvious reasons. Added the device configuration, still won't start :(

@cebtenzzre
Copy link
Member

cebtenzzre commented Oct 10, 2023

I uploaded a debug build of the installer to the releases page, it's called gpt4all-installer-win64-v2.5.0-pre1-debug.exe. If you install that, the output of Event Viewer will at least have some meaning to us. windbg would be even better:

  1. Download the Windows SDK
  2. Install it, clearing all checkboxes except for "Debugging Tools for Windows", which is the only one you would need
  3. Start WinDbg (X64)
  4. File > Open Executable, navigate to C:\Program Files\gpt4all\bin\chat.exe
  5. If it stops at ntdll!LdrpDoDebuggerBreak, press the F5 key to continue
  6. If it stops again, go to View > Call Stack, which will hopefully have useful information about the crash

@ADD-eNavarro
Copy link
Author

Here's the result of following your instructions with Windbg:

image

@cebtenzzre
Copy link
Member

cebtenzzre commented Oct 18, 2023

Here's the result of following your instructions with Windbg:

Can you continue past that with F5? I think that's just another bug in Windows breakpoint handling, not an actual issue with the code. You should be able to continue until you get a call stack with lines other than ntdll!... in it.

@ADD-eNavarro
Copy link
Author

Hope this is what you need:
image

@cebtenzzre
Copy link
Member

cebtenzzre commented Oct 18, 2023

Hope this is what you need:

Yes, that is very helpful, thanks.

edit: Could you please try to get info for the exception by running the .exr -1 command after windbg stops at that point?

@cebtenzzre cebtenzzre changed the title GPT4All not starting after update to version 2.4.19 GUI won't start on Windows (unhandled exception in llmodel_threadCount) Oct 18, 2023
@cebtenzzre cebtenzzre changed the title GUI won't start on Windows (unhandled exception in llmodel_threadCount) GUI won't start on Windows (unhandled exception in ggml_vk_available_devices) Oct 18, 2023
@ADD-eNavarro
Copy link
Author

Sure thing, here it goes:
image

@cebtenzzre
Copy link
Member

Unfortunately, I'm not sure how to get the exception message with WinDbg. Here's another option:

I uploaded a console-enabled build (gpt4all-installer-win64-v2.5.0-pre2-debug-console.exe ) to the pre-release.

It would be helpful if you could start chat.exe via the command line - install that version, use "Open File Location" on the shortcut to find chat.exe, shift-right-click in the folder and open a powershell or command prompt there, and run .\chat (powershell) or chat (command prompt).

If there is any console output, please post it here.

@ADD-eNavarro
Copy link
Author

Morning!

Got this:
image

I?m afraid all three options result in the process stopping without further message:
image

@ADD-eNavarro
Copy link
Author

So, are we out of luck, @cebtenzzre ?

@cebtenzzre
Copy link
Member

Unless you can debug it with Visual Studio (which I know will provide the exception information), I'm not sure what else to do.

@H4CKS4F3
Copy link

H4CKS4F3 commented Nov 7, 2023

Just a suggestion for debugging this. What about using procdump (from Microsoft) to help capture the stack trace. Something like: procdump -mm -x . chat.exe (assuming procdump v11 and that it's in the current path). The -mm switch is the minidump format, captures the basic process details. You can use something like WinDbg (and other tools) to debug it. Again, just a thought to help capture the instant it crashes.

@ADD-eNavarro
Copy link
Author

@H4CKS4F3 , WinDbg was already used, if you read back a little.
I gave a try to procdump, here are the two files, first one with -mm and, since I couldn't see a thing in there, the second one without the minidump parameter.
dump.dmp
dump2.dmp

@H4CKS4F3
Copy link

H4CKS4F3 commented Nov 8, 2023

@ADD-eNavarro run the following and attach the dump. Since procdump defaults to not dump on unhandled exceptions, it lost the actual exception in the minidump. procdump -mm -e -x . chat.exe

@ADD-eNavarro
Copy link
Author

Here's the result of that last procdump run:
dump3_231109_073726.dmp

@cebtenzzre
Copy link
Member

Now we're getting somewhere:

KERNELBASE!RaiseException+6c    
VCRUNTIME140!_CxxThrowException+90 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75]   D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75 
llmodel+ba4dc    
0x0000002f`b14fd2b8 

Unfortunately, I no longer have a copy of the debug info for that build of GPT4All, so I can't resolve llmodel+ba4dc to anything specific.

Here is a newer build that you can install and run the same procdump command on: gpt4all-installer-win64-v2.5.2.r8.gd4ce9f4-debug-console.exe

I'll keep that build tree in a separate folder so I'll be able to debug it when you reply.

@ADD-eNavarro
Copy link
Author

New dump:
dump4_231110_094643.dmp

@cebtenzzre
Copy link
Member

Here is the call stack when the exception is thrown:

KERNELBASE!RaiseException+0x6c
VCRUNTIME140D!_CxxThrowException+0x120
llmodel!vk::detail::throwResultException+0x29c
llmodel!vk::resultCheck+0x23
llmodel!vk::Instance::enumeratePhysicalDevices<std::allocator<vk::PhysicalDevice>,vk::DispatchLoaderDynamic>+0xf7
llmodel!kp::Manager::listDevices+0x38
llmodel!ggml_vk_available_devices+0xf6
llmodel!LLModel::availableGPUDevices+0x4f
chat!MySettings::MySettings+0x74
chat!MyPrivateSettings::MyPrivateSettings+0x14
chat!`anonymous namespace'::Q_QGS_settingsInstance::innerFunction+0x36
chat!QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance>::Holder<`anonymous namespace'::Q_QGS_settingsInstance>+0x1c
chat!QGlobalStatic<QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance> >::instance+0x4c
chat!QGlobalStatic<QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance> >::operator()+0x24
chat!MySettings::globalInstance+0x12
chat!main+0x12f
chat!invoke_main+0x39
chat!__scrt_common_main_seh+0x12e
chat!__scrt_common_main+0xe
chat!mainCRTStartup+0xe
kernel32!BaseThreadInitThunk+0x10
ntdll!RtlUserThreadStart+0x2b

It's caused by VK_ERROR_DEVICE_LOST:
Capture

So it looks like we need to catch Vulkan exceptions from komputeManager()->listDevices() and ignore them. It seems like there is some issue with your GPU driver that prevents Vulkan from being used.

@ADD-eNavarro
Copy link
Author

Anything I can do then?

@H4CKS4F3
Copy link

From my perspective, unless you can suggest a patch, looks like you'll need to wait for the developers to do something. One thing I'd suggest is updating drivers, since this seems to be a driver issue. I actually was suffering from this issue too, but "something changed" and it started working again. Maybe I updated drivers, but I can't be certain. I have NVIDIA card, so I may have updated the driver + CUDA.

cebtenzzre added a commit to nomic-ai/llama.cpp that referenced this issue Dec 1, 2023
Sometimes Vulkan is not available due to VK_ERROR_INITIALIZATION_FAILED
or VK_ERROR_DEVICE_LOST. Ingore the exception instead of crashing.

Fixes nomic-ai/gpt4all#1477
cebtenzzre added a commit to nomic-ai/llama.cpp that referenced this issue Dec 1, 2023
Sometimes Vulkan is not available due to VK_ERROR_INITIALIZATION_FAILED
or VK_ERROR_DEVICE_LOST. Ignore the exception instead of crashing.

Fixes nomic-ai/gpt4all#1477
@ADD-eNavarro
Copy link
Author

ADD-eNavarro commented Dec 11, 2023

Following @H4CKS4F3 advice, we've updated the CUDA to version 12.3.1, which updated NVidia drivers from 545.84 to 546.12.
Other changes that came along were:
Nsight Compute, 2023.3.1 -> 2023.3.1
Nsight Visual Studio Edition, 2023.3.0.23xxx -> 2023.3.1.23311

But GPT4All still doesn't start. So maybe it's not the drivers.

@neural-oracle

This comment was marked as off-topic.

@cebtenzzre
Copy link
Member

i had exactly this problem

Different issue. OP experienced a crash caused by a bad interaction with a non-functional Vulkan driver.

@nomic-ai nomic-ai locked as resolved and limited conversation to collaborators May 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working chat gpt4all-chat issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants