Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random crashes in ca. 1min to 1h on Intel laptop with igpu driver (Iris) calling abort() / SIGABRT #25

Closed
gergo-salyi opened this issue Sep 10, 2022 · 3 comments

Comments

@gergo-salyi
Copy link

Hey, first of all thanks for the recent patches.

I was hoping it will solve some random crashes of mpvpaper I had in the past half year, but sadly things didn't change for me.

cpu/gpu: Intel i5-1035G4 (10th gen mobile)
display: external HDMI monitor to laptop
system: Arch Linux
wm: Sway

mpvpaper: reproduced both on 1.2.1 and current master branch
mesa: reproduced both on 22.1.7-1 and current main branch
kernel: reproduced both on 5.19.7.arch1-1 and current drm-tip

In the coredump all the crashes all traced as abort() < iris_dri.so < libEGL_mesa.so < render() at ../src/main.c:142 :

(gdb) bt
#0  0x00007f25f5c1e4dc in  () at /usr/lib/libc.so.6
#1  0x00007f25f5bce998 in raise () at /usr/lib/libc.so.6
#2  0x00007f25f5bb853d in abort () at /usr/lib/libc.so.6
#3  0x00007f25dd926613 in _iris_batch_flush(iris_batch*, char const*, int) (batch=0x5568aa8f8910, file=<optimized out>, line=<optimized out>)
    at ../mesa-main/src/gallium/drivers/iris/iris_batch.c:1121
#4  0x00007f25dd8fc4d7 in iris_fence_flush(pipe_context*, pipe_fence_handle**, unsigned int) (ctx=0x5568aa8f83f0, out_fence=0x7ffff78af8d8, flags=<optimized out>)
    at ../mesa-main/src/gallium/drivers/iris/iris_fence.c:267
#5  0x00007f25dd0dfbba in tc_flush(pipe_context*, pipe_fence_handle**, unsigned int) (_pipe=0x7f25d4679010, fence=0x7ffff78af8d8, flags=1)
    at ../mesa-main/src/gallium/auxiliary/util/u_threaded_context.c:3157
#6  0x00007f25dcce469a in st_flush (flags=1, fence=0x7ffff78af8d8, st=0x5568aa9198b0) at ../mesa-main/src/mesa/state_tracker/st_cb_flush.c:60
#7  st_context_flush(st_context_iface*, unsigned int, pipe_fence_handle**, void (*)(void*), void*)
    (stctxi=0x5568aa9198b0, flags=2, fence=0x7ffff78af8d8, before_flush_cb=0x7f25dcb35490 <notify_before_flush_cb(void*)>, args=0x7ffff78af8e0)
    at ../mesa-main/src/mesa/state_tracker/st_manager.c:808
#8  0x00007f25dcb34e6e in dri_flush(__DRIcontext*, __DRIdrawable*, unsigned int, __DRI2throttleReason)
    (cPriv=<optimized out>, dPriv=<optimized out>, flags=<optimized out>, reason=<optimized out>) at ../mesa-main/src/gallium/frontends/dri/dri_drawable.c:522
#9  0x00007f25dedf8e3b in dri2_wl_swap_buffers_with_damage (disp=0x5568aa8415a0, draw=0x5568aaa3cbd0, rects=<optimized out>, n_rects=<optimized out>)
    at ../mesa-main/src/egl/drivers/dri2/platform_wayland.c:1592
#10 0x00007f25dede6ae8 in dri2_swap_buffers (disp=0x5568aa8415a0, surf=0x5568aaa3cbd0) at ../mesa-main/src/egl/drivers/dri2/egl_dri2.c:2042
#11 0x00007f25dedd5825 in eglSwapBuffers (dpy=<optimized out>, surface=0x5568aaa3cbd0) at ../mesa-main/src/egl/main/eglapi.c:1421
#12 0x00005568a9bcbc0c in render (output=0x5568aa83c150) at ../src/main.c:142
#13 0x00007f25f5b8f536 in  () at /usr/lib/libffi.so.8
#14 0x00007f25f5b8c037 in  () at /usr/lib/libffi.so.8
#15 0x00007f25f5fee645 in  () at /usr/lib/libwayland-client.so.0
#16 0x00007f25f5feee03 in  () at /usr/lib/libwayland-client.so.0
#17 0x00007f25f5feeffc in wl_display_dispatch_queue_pending () at /usr/lib/libwayland-client.so.0
#18 0x00005568a9bcb5db in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:1019

I attach the coredump and the gdb debugging session with (gdb) thread apply all bt full
coredump-gdb-analysed.txt
core.mpvpaper.1000.6f0499264a6a4d699984290d77c2b318.11720.1662816696000000.gz

I would like to ask your opinion as if this is likely a mpvpaper issue or likely a Mesa issue?
If the later, then do you think I should I go ahead and report it on Mesa issues as an Intel igpu driver (Iris) bug?
(potentially pointing Mesa devs to mpvpaper to reproduce it?)

@GhostNaN
Copy link
Owner

Considering it crashed at:

mpvpaper/src/main.c

Lines 141 to 144 in 666f4c9

// Display frame
if (!eglSwapBuffers(egl_display, egl_surface))
cflp_error("Failed to swap egl buffers 0x%X", eglGetError());
}

And didn't just throw back an error message.
I want to say mpvpaper didn't do anything wrong here.

Also, it went from having a pointer to the display, to not?
#10 0x00007f25dede6ae8 in dri2_swap_buffers (disp=0x5568aa8415a0, surf=0x5568aaa3cbd0) at ../mesa-main/src/egl/drivers/dri2/egl_dri2.c:2042
#11 0x00007f25dedd5825 in eglSwapBuffers (dpy=<optimized out>, surface=0x5568aaa3cbd0) at ../mesa-main/src/egl/main/eglapi.c:1421

But I'm not a graphics expert, go ahead and take this upstream.
Because even if it is mpvpaper, they might be able to guide us in the right direction.

Thank you for your excellent debug logs and effort!

@gergo-salyi
Copy link
Author

I'm closing this because since mpvpaper commit 781320f this crash is extremely rare (happened only once for me since) and thus not reproducible.

(For the record: the crash likely happened with two Iris driver I915_GEM_EXECBUFFER2 ioctls which by random chance came within ~5us close in time, caused one of them to fail with errno ENOSPC which Mesa chooses to abort on. The leading cause to this event remains unknown, and could be anywhere in the whole application + libraries. The Mesa issue I reported is here although Mesa devs were not really interested.)

Moreover a week ago I saw a similar backtrace for a vanilla mpv crashing. Likely this situation was not and is not mpvpaper's fault.

@GhostNaN
Copy link
Owner

GhostNaN commented Dec 2, 2022

Thank you for your investigation into this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants