-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel panic when running CUDA based unit-tests #7052
Comments
I can also somehow reproduce it, it crashes about every 10 minutes when running the test suite |
A significant change in WSL2 lately is that in insider builds WSLg is now enabled by default. You can try disabling it adding this to your .wslconfig
and running |
@blackliner could you also share |
These are insteresting, but I'm a curious to know if something started to crash before it:
|
of course its not happening anymore, but I keep my eyes open! |
For completness, here is the current dmesg, maybe it helpsClick to expand!
|
These last one appears to be all normal initializations, I have same behavior here:
No problem, if it happen to reproduce, let me know. |
Maybe something interesting, it locked up again, but I was not But this is the result of the dmesg after that, maybe relevant: Click to expand!
|
Got him!Click to expand!
|
This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request. Thank you! |
Windows Build Number
21390
WSL Version
Kernel Version
5.10.16.3-microsoft-standard-WSL2
Distro Version
Ubuntu 18.04
Other Software
Nvidia driver 470.76
Repro Steps
run unit tests that utilize CUDA hardware acceleration. Running
ctest -j 200
(yes, 200, to stress it a bit and since some tests do some sleep, it actually is faster ;-) ) on about 390 unit tests (proprietary code, cannot share), with some of them being CUDA based executables. CPU util (3900X) and GPU util (2080Ti) spiked to 100%, then it crashed.Just installed the new Nvidia driver and updated to newest insider, and never had that issue before. Could be that it is unrelated to CUDA, but I assume it due to the new driver + win build combo that seem to have fixed some past CUDA issues, but now this.
Expected Behavior
not to crash WSL2
Actual Behavior
getting a
[process exited with code 1]
while running unit tests in the windows terminal.Diagnostic Logs
From the event viewer:
Click to expand!
Virtual Machine has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x0, ErrorCode1: 0x0, ErrorCode2: 0x0, ErrorCode3: 0x0, ErrorCode4: 0x0. PreOSId: 0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID B8266772-F64F-44A8-B874-03468D05FC40)
Guest message:
The text was updated successfully, but these errors were encountered: