Skip to content

GPU hang on Intel Core i7-7500U CPU #35

@eaglgenes101

Description

@eaglgenes101

With integrated gpu cycles free to burn on my Ubuntu system, I decided to run https://github.com/gcp/leela-zero with this driver. In the middle of the self-tuning process, it hung, with the following dmesg messages shortly after about 6 seconds of screen freeze:

[ 6583.292439] [drm] GPU HANG: ecode 9:0:0x8ed9fff2, in leelaz [9165], reason: Hang on rcs0, action: reset
[ 6583.292441] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 6583.292442] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 6583.292442] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 6583.292442] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 6583.292443] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 6583.292448] i915 0000:00:02.0: Resetting rcs0 after gpu hang

The contents of /sys/class/drm/card0/error I copypasted to https://gist.github.com/eaglgenes101/2c30c93d15953c75d476c2486a0b608a .

When I run the self-tuning process with beignet, it runs fine, but with repeated messages of "Beignet: "Work group size exceed Kernel's work group size."" during the tuning process, increasing with frequency as the tuning process goes on.

Where should I start digging deeper about this problem? And what further information should I provide to help solve this crash?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions