-
Notifications
You must be signed in to change notification settings - Fork 253
Description
With integrated gpu cycles free to burn on my Ubuntu system, I decided to run https://github.com/gcp/leela-zero with this driver. In the middle of the self-tuning process, it hung, with the following dmesg messages shortly after about 6 seconds of screen freeze:
[ 6583.292439] [drm] GPU HANG: ecode 9:0:0x8ed9fff2, in leelaz [9165], reason: Hang on rcs0, action: reset
[ 6583.292441] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 6583.292442] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 6583.292442] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 6583.292442] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 6583.292443] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 6583.292448] i915 0000:00:02.0: Resetting rcs0 after gpu hang
The contents of /sys/class/drm/card0/error I copypasted to https://gist.github.com/eaglgenes101/2c30c93d15953c75d476c2486a0b608a .
When I run the self-tuning process with beignet, it runs fine, but with repeated messages of "Beignet: "Work group size exceed Kernel's work group size."" during the tuning process, increasing with frequency as the tuning process goes on.
Where should I start digging deeper about this problem? And what further information should I provide to help solve this crash?