Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

Freezes and eventual crash when copy pasting in Xwayland apps #2425

Closed
luispabon opened this issue Oct 8, 2020 · 6 comments · Fixed by #2427
Closed

Freezes and eventual crash when copy pasting in Xwayland apps #2425

luispabon opened this issue Oct 8, 2020 · 6 comments · Fixed by #2427
Labels

Comments

@luispabon
Copy link

Please fill out the following:

  • Sway Version: 1.5 / wlroots 0.11 / xwayland 1.20.8-2ubuntu2.4 (ubuntu 20.04)

  • Debug Log:
    Nothing relevant on the debug logs:

(EE) failed to read Wayland events: Connection reset by peer
Gdk-Message: 13:07:05.395: Error reading events from display: Broken pipe
Gdk-Message: 13:07:05.396: Error reading events from display: Broken pipe
Gdk-Message: 13:07:05.396: Error reading events from display: Broken pipe
Gdk-Message: 13:07:05.399: Error reading events from display: Broken pipe
Gdk-Message: 13:07:05.399: Error reading events from display: Broken pipe
Gdk-Message: 13:07:05.400: Error reading events from display: Broken pipe
goa-daemon-Message: 13:07:15.479: goa-daemon version 3.36.0 exiting
coredumpctl gdb sway                                                                                            4985ms  Thu 08 Oct 2020 13:57:39 BST
           PID: 3225 (sway)
           UID: 1000 (luis)
           GID: 1000 (luis)
        Signal: 11 (SEGV)
     Timestamp: Thu 2020-10-08 13:07:04 BST (50min ago)
  Command Line: sway -d
    Executable: /usr/bin/sway
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (luis)
       Boot ID: 1a1ee786aad5414ba05de08bc9f0c995
    Machine ID: 58be819f85264bfcabcaa2ee090f85a0
      Hostname: luis-xps
       Storage: /var/lib/systemd/coredump/core.sway.1000.1a1ee786aad5414ba05de08bc9f0c995.3225.1602158824000000000000.lz4
       Message: Process 3225 (sway) of user 1000 dumped core.
                
                Stack trace of thread 3225:
                #0  0x00007f23b011259c xwm_selection_flush_source_data (libwlroots.so.6 + 0x7259c)
                swaywm/sway#1  0x00007f23b0112a77 xwm_data_source_read (libwlroots.so.6 + 0x72a77)
                swaywm/sway#2  0x00007f23b015e65a wl_event_loop_dispatch (libwayland-server.so.0 + 0xa65a)
                swaywm/sway#3  0x00007f23b015cbd5 wl_display_run (libwayland-server.so.0 + 0x8bd5)
                swaywm/sway#4  0x0000557177779da3 main (sway + 0x13da3)
                swaywm/sway#5  0x00007f23afe570b3 __libc_start_main (libc.so.6 + 0x270b3)
                swaywm/sway#6  0x0000557177779f0e _start (sway + 0x13f0e)
                
                Stack trace of thread 3312:
                #0  0x00007f23afe1d376 futex_wait_cancelable (libpthread.so.0 + 0x10376)
                swaywm/sway#1  0x00007f23ae12c62b n/a (iris_dri.so + 0x47c62b)
                swaywm/sway#2  0x00007f23ae12c23b n/a (iris_dri.so + 0x47c23b)
                swaywm/sway#3  0x00007f23afe16609 start_thread (libpthread.so.0 + 0x9609)
                swaywm/sway#4  0x00007f23aff52293 __clone (libc.so.6 + 0x122293)
                
                Stack trace of thread 3313:
                #0  0x00007f23afe1d376 futex_wait_cancelable (libpthread.so.0 + 0x10376)
                swaywm/sway#1  0x00007f23ae12c62b n/a (iris_dri.so + 0x47c62b)
                swaywm/sway#2  0x00007f23ae12c23b n/a (iris_dri.so + 0x47c23b)
                swaywm/sway#3  0x00007f23afe16609 start_thread (libpthread.so.0 + 0x9609)
                swaywm/sway#4  0x00007f23aff52293 __clone (libc.so.6 + 0x122293)
                
                Stack trace of thread 3314:
                #0  0x00007f23afe1d376 futex_wait_cancelable (libpthread.so.0 + 0x10376)
                swaywm/sway#1  0x00007f23ae12c62b n/a (iris_dri.so + 0x47c62b)
                swaywm/sway#2  0x00007f23ae12c23b n/a (iris_dri.so + 0x47c23b)
                swaywm/sway#3  0x00007f23afe16609 start_thread (libpthread.so.0 + 0x9609)
                swaywm/sway#4  0x00007f23aff52293 __clone (libc.so.6 + 0x122293)
                
                Stack trace of thread 3315:
                #0  0x00007f23afe1d376 futex_wait_cancelable (libpthread.so.0 + 0x10376)
                swaywm/sway#1  0x00007f23ae12c62b n/a (iris_dri.so + 0x47c62b)
                swaywm/sway#2  0x00007f23ae12c23b n/a (iris_dri.so + 0x47c23b)
                swaywm/sway#3  0x00007f23afe16609 start_thread (libpthread.so.0 + 0x9609)
                swaywm/sway#4  0x00007f23aff52293 __clone (libc.so.6 + 0x122293)

Core was generated by `sway -d'.
Program terminated with signal SIGSEGV, Segmentation fault.

bt

(gdb) bt full
#0  0x00007f23b011259c in xwm_selection_flush_source_data (transfer=transfer@entry=0x55717a45c7e0) at ../xwayland/selection/outgoing.c:40
        length = <optimised out>
swaywm/sway#1  0x00007f23b0112a77 in xwm_data_source_read (fd=<optimised out>, mask=4, data=0x55717a45c7e0) at ../xwayland/selection/outgoing.c:134
        transfer = 0x55717a45c7e0
        xwm = 0x0
        p = <optimised out>
        current = 0
        available = 65536
        len = 0
swaywm/sway#2  0x00007f23b015e65a in wl_event_loop_dispatch () from /lib/x86_64-linux-gnu/libwayland-server.so.0
No symbol table info available.
swaywm/sway#3  0x00007f23b015cbd5 in wl_display_run () from /lib/x86_64-linux-gnu/libwayland-server.so.0
No symbol table info available.
swaywm/sway#4  0x0000557177779da3 in main (argc=2, argv=0x7ffd3274b858) at ../sway/main.c:410
        verbose = 0
        debug = 1
        validate = 0
        allow_unsupported_gpu = 0
        long_options = {{name = 0x5571777c646b "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x5571777c9e49 "config", has_arg = 1, flag = 0x0, 
            val = 99}, {name = 0x5571777c6470 "validate", has_arg = 0, flag = 0x0, val = 67}, {name = 0x5571777c6479 "debug", has_arg = 0, flag = 0x0, 
            val = 100}, {name = 0x5571777c63cf "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x5571777c5543 "verbose", has_arg = 0, flag = 0x0, 
            val = 86}, {name = 0x5571777c647f "get-socketpath", has_arg = 0, flag = 0x0, val = 112}, {name = 0x5571777c648e "unsupported-gpu", has_arg = 0, 
            flag = 0x0, val = 117}, {name = 0x5571777c649e "my-next-gpu-wont-be-nvidia", has_arg = 0, flag = 0x0, val = 117}, {name = 0x0, has_arg = 0, 
            flag = 0x0, val = 0}}
        config_path = 0x0
        usage = 0x5571777c6810 "Usage: sway [options] [command]\n\n  -h, --help", ' ' <repeats 13 times>, "Show help message and quit.\n  -c, --config <config>  Specify a config file.\n  -C, --validate         Check the validity of the config file, th"...
        c = <optimised out>
  • Description:

This one is pretty hard to reproduce. It happens when I've been in a sway session for a couple of days - cannot be reliably reproduced on a brand new session.

  • Before the crash, I'll hit ctrl+c to copy something and ctrl+v to paste it into a xwayland app.
  • The whole thing (sway + apps) seem to freeze in place for a good 5-10 secs
  • c&p between wayland native apps still works. Does not work between xwayland/xwayland or wayland/xwayland
  • It will continue happening until I decide to go into a terminal and killall xwayland
  • I restart any of my xwayland apps and try to copy anything on it via ctrl+c
  • sway freezes for a couple of seconds then crashes
  • I can log back into sway without rebooting and everything's a-ok
@Xyene
Copy link
Member

Xyene commented Oct 9, 2020

This seems to be very similar to swaywm/sway#4007 (possibly the same).

@luispabon
Copy link
Author

luispabon commented Oct 9, 2020

Looks very similar indeed. I don't see anybody reporting crashes though so maybe there's an extra layer here. A lot of people commenting about QT apps - I use none.

@Xyene
Copy link
Member

Xyene commented Oct 11, 2020

There are some backtraces in that thread as well, but the cause of the one here is a bit clearer:

swaywm/sway#1  0x00007f23b0112a77 in xwm_data_source_read (fd=<optimised out>, mask=4, data=0x55717a45c7e0) at ../xwayland/selection/outgoing.c:134
        transfer = 0x55717a45c7e0
        xwm = 0x0 <-----------------------------
        p = <optimised out>
        current = 0
        available = 65536
        len = 0

The selection has no associated Xwayland instance. I assume this is because when you killed Xwayland, wlroots cleaned it up and freed all associated memory, but didn't unregister the fd from the event loop. It becomes "readable" again (but with 0 bytes to read, oh well), tries accessing the now-freed xwm, and finally crashes.

Moving this over to wlroots, since the crash happens there.

@Xyene Xyene transferred this issue from swaywm/sway Oct 11, 2020
@Xyene Xyene added the xwayland label Oct 11, 2020
Xyene added a commit to Xyene/wlroots that referenced this issue Oct 11, 2020
Fixes swaywm#2425.

wlroots can only handle one outgoing transfer at a time, so it keeps a
list of pending selections. The head of the list is the currently-active
selection, and when that transfer completes and is destroyed, the next
one is started.

The trouble is when you have a transfer to some app that is misbehaving.
fcitx is one such application. With really large transfers, fcitx will
hang and never wake up again. So, you can end up with a transfer list
that looks like this:

| swaywm#1: started | swaywm#2: pending | swaywm#3: pending | swaywm#4: pending |

The file descriptor for transfer swaywm#1 is registered in libwayland's epoll
loop. The rest are waiting in wlroots' list.

As a user, you want your clipboard back, so you `pkill fcitx`. Now
Xwayland sends `XCB_DESTROY_NOTIFY` to let us know to give up. We clean
up swaywm#4 first.

Due to a bug in wlroots code, we register the (fd, transfer data
pointer) pair for swaywm#1 with libwayland *again*, despite it already being
registered. We do this 2 more times as we remove swaywm#3 and swaywm#2.

Finally, we remove swaywm#1 and `free` all the memory associated with it,
before `close`-ing its transfer file descriptor.

However, we still have 3 copies of swaywm#1's file descriptor left in the
epoll loop, since we erroneously added them as part of removing swaywm#2/3/4.
When we `close` the file descriptor as part of swaywm#1's teardown, we
actually cause the epoll loop to wake up the next time around, saying
"this file descriptor has activity!" (it was closed, so `read`-ing would
normally return 0 to let us know of EOF).

But instead of returning 0, it returns -1 with `EBADF`, because the file
descriptor has already been closed. And finally, as part of error-handling
this, we access the transfer pointer, which was `free`'d. And we crash.
Xyene added a commit to Xyene/wlroots that referenced this issue Oct 11, 2020
Fixes swaywm#2425.

wlroots can only handle one outgoing transfer at a time, so it keeps a
list of pending selections. The head of the list is the currently-active
selection, and when that transfer completes and is destroyed, the next
one is started.

The trouble is when you have a transfer to some app that is misbehaving.
fcitx is one such application. With really large transfers, fcitx will
hang and never wake up again. So, you can end up with a transfer list
that looks like this:

| T1: started | T2: pending | T3: pending | T4: pending |

The file descriptor for transfer T1 is registered in libwayland's epoll
loop. The rest are waiting in wlroots' list.

As a user, you want your clipboard back, so you `pkill fcitx`. Now
Xwayland sends `XCB_DESTROY_NOTIFY` to let us know to give up. We clean
up T4 first.

Due to a bug in wlroots code, we register the (fd, transfer data
pointer) pair for T1 with libwayland *again*, despite it already being
registered. We do this 2 more times as we remove T3 and T2.

Finally, we remove T1 and `free` all the memory associated with it,
before `close`-ing its transfer file descriptor.

However, we still have 3 copies of T1's file descriptor left in the
epoll loop, since we erroneously added them as part of removing T2/3/4.
When we `close` the file descriptor as part of T1's teardown, we
actually cause the epoll loop to wake up the next time around, saying
"this file descriptor has activity!" (it was closed, so `read`-ing would
normally return 0 to let us know of EOF).

But instead of returning 0, it returns -1 with `EBADF`, because the file
descriptor has already been closed. And finally, as part of error-handling
this, we access the transfer pointer, which was `free`'d. And we crash.
Xyene added a commit to Xyene/wlroots that referenced this issue Oct 11, 2020
Fixes swaywm#2425.

wlroots can only handle one outgoing transfer at a time, so it keeps a
list of pending selections. The head of the list is the currently-active
selection, and when that transfer completes and is destroyed, the next
one is started.

The trouble is when you have a transfer to some app that is misbehaving.
fcitx is one such application. With really large transfers, fcitx will
hang and never wake up again. So, you can end up with a transfer list
that looks like this:

| T1: started | T2: pending | T3: pending | T4: pending |

The file descriptor for transfer T1 is registered in libwayland's epoll
loop. The rest are waiting in wlroots' list.

As a user, you want your clipboard back, so you `pkill fcitx`. Now
Xwayland sends `XCB_DESTROY_NOTIFY` to let us know to give up. We clean
up T4 first.

Due to a bug in wlroots code, we register the (fd, transfer data
pointer) pair for T1 with libwayland *again*, despite it already being
registered. We do this 2 more times as we remove T3 and T2.

Finally, we remove T1 and `free` all the memory associated with it,
before `close`-ing its transfer file descriptor.

However, we still have 3 copies of T1's file descriptor left in the
epoll loop, since we erroneously added them as part of removing T2/3/4.
When we `close` the file descriptor as part of T1's teardown, we
actually cause the epoll loop to wake up the next time around, saying
"this file descriptor has activity!" (it was closed, so `read`-ing would
normally return 0 to let us know of EOF).

But instead of returning 0, it returns -1 with `EBADF`, because the file
descriptor has already been closed. And finally, as part of error-handling
this, we access the transfer pointer, which was `free`'d. And we crash.
Xyene added a commit to Xyene/wlroots that referenced this issue Oct 11, 2020
Fixes swaywm#2425.

wlroots can only handle one outgoing transfer at a time, so it keeps a
list of pending selections. The head of the list is the currently-active
selection, and when that transfer completes and is destroyed, the next
one is started.

The trouble is when you have a transfer to some app that is misbehaving.
fcitx is one such application. With really large transfers, fcitx will
hang and never wake up again. So, you can end up with a transfer list
that looks like this:

| T1: started | T2: pending | T3: pending | T4: pending |

The file descriptor for transfer T1 is registered in libwayland's epoll
loop. The rest are waiting in wlroots' list.

As a user, you want your clipboard back, so you `pkill fcitx`. Now
Xwayland sends `XCB_DESTROY_NOTIFY` to let us know to give up. We clean
up T4 first.

Due to a bug in wlroots code, we register the (fd, transfer data
pointer) pair for T1 with libwayland *again*, despite it already being
registered. We do this 2 more times as we remove T3 and T2.

Finally, we remove T1 and `free` all the memory associated with it,
before `close`-ing its transfer file descriptor.

However, we still have 3 copies of T1's file descriptor left in the
epoll loop, since we erroneously added them as part of removing T2/3/4.
When we `close` the file descriptor as part of T1's teardown, we
actually cause the epoll loop to wake up the next time around, saying
"this file descriptor has activity!" (it was closed, so `read`-ing would
normally return 0 to let us know of EOF).

But instead of returning 0, it returns -1 with `EBADF`, because the file
descriptor has already been closed. And finally, as part of error-handling
this, we access the transfer pointer, which was `free`'d. And we crash.
emersion pushed a commit that referenced this issue Oct 11, 2020
Fixes #2425.

wlroots can only handle one outgoing transfer at a time, so it keeps a
list of pending selections. The head of the list is the currently-active
selection, and when that transfer completes and is destroyed, the next
one is started.

The trouble is when you have a transfer to some app that is misbehaving.
fcitx is one such application. With really large transfers, fcitx will
hang and never wake up again. So, you can end up with a transfer list
that looks like this:

| T1: started | T2: pending | T3: pending | T4: pending |

The file descriptor for transfer T1 is registered in libwayland's epoll
loop. The rest are waiting in wlroots' list.

As a user, you want your clipboard back, so you `pkill fcitx`. Now
Xwayland sends `XCB_DESTROY_NOTIFY` to let us know to give up. We clean
up T4 first.

Due to a bug in wlroots code, we register the (fd, transfer data
pointer) pair for T1 with libwayland *again*, despite it already being
registered. We do this 2 more times as we remove T3 and T2.

Finally, we remove T1 and `free` all the memory associated with it,
before `close`-ing its transfer file descriptor.

However, we still have 3 copies of T1's file descriptor left in the
epoll loop, since we erroneously added them as part of removing T2/3/4.
When we `close` the file descriptor as part of T1's teardown, we
actually cause the epoll loop to wake up the next time around, saying
"this file descriptor has activity!" (it was closed, so `read`-ing would
normally return 0 to let us know of EOF).

But instead of returning 0, it returns -1 with `EBADF`, because the file
descriptor has already been closed. And finally, as part of error-handling
this, we access the transfer pointer, which was `free`'d. And we crash.
@luispabon
Copy link
Author

Great stuff, thank you 👍

@luispabon
Copy link
Author

Just to report back: I've had a couple of Xwayland copy/paste freezes followed by killing xwayland, and have had no crashes 👍

neon64 pushed a commit to neon64/wlroots that referenced this issue Oct 15, 2020
Fixes swaywm#2425.

wlroots can only handle one outgoing transfer at a time, so it keeps a
list of pending selections. The head of the list is the currently-active
selection, and when that transfer completes and is destroyed, the next
one is started.

The trouble is when you have a transfer to some app that is misbehaving.
fcitx is one such application. With really large transfers, fcitx will
hang and never wake up again. So, you can end up with a transfer list
that looks like this:

| T1: started | T2: pending | T3: pending | T4: pending |

The file descriptor for transfer T1 is registered in libwayland's epoll
loop. The rest are waiting in wlroots' list.

As a user, you want your clipboard back, so you `pkill fcitx`. Now
Xwayland sends `XCB_DESTROY_NOTIFY` to let us know to give up. We clean
up T4 first.

Due to a bug in wlroots code, we register the (fd, transfer data
pointer) pair for T1 with libwayland *again*, despite it already being
registered. We do this 2 more times as we remove T3 and T2.

Finally, we remove T1 and `free` all the memory associated with it,
before `close`-ing its transfer file descriptor.

However, we still have 3 copies of T1's file descriptor left in the
epoll loop, since we erroneously added them as part of removing T2/3/4.
When we `close` the file descriptor as part of T1's teardown, we
actually cause the epoll loop to wake up the next time around, saying
"this file descriptor has activity!" (it was closed, so `read`-ing would
normally return 0 to let us know of EOF).

But instead of returning 0, it returns -1 with `EBADF`, because the file
descriptor has already been closed. And finally, as part of error-handling
this, we access the transfer pointer, which was `free`'d. And we crash.
@luispabon
Copy link
Author

luispabon commented Dec 1, 2020

I've had another crash today, I've opened a different issue on the sway tracker as I don't know where the issue belongs and whether it's a regression of this earlier fix.

New issue at https://github.com/swaywm/sway/issues/5852

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

Successfully merging a pull request may close this issue.

2 participants