Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

Fix #2161 and Sway#5294 #2162

Closed
wants to merge 1 commit into from
Closed

Fix #2161 and Sway#5294 #2162

wants to merge 1 commit into from

Conversation

alejor
Copy link

@alejor alejor commented May 6, 2020

Add a check for surface != NULL inside xdg_pointer_grab_enter()

Add a check for surface != NULL inside xdg_pointer_grab_enter()
@tchebb
Copy link
Contributor

tchebb commented May 7, 2020

I think this fix will work, but there's an underlying architectural issue that it doesn't address. Here's what's happened, as far as I can tell:

Prior to #183, compositors used two functions to tell wlroots about pointer movement between surfaces:

  1. wlr_seat_pointer_enter(), called when the pointer moved over a new surface, caused wlroots to mark that surface as focused, notify it that the pointer had entered it, and notify the previously-focused surface (queried from internal state) that the pointer had left it.
  2. wlr_seat_pointer_clear_focus(), called when the pointer left a surface but did not immediately enter a new one, caused wlroots to do an abridged version of the above: it notified the previously-focused surface that the pointer had left it and unmarked that surface as focused, but it did not send any entry notifications (since no surface was entered).

wlr_seat_pointer_clear_focus() was, and still is, internally implemented by calling wlr_seat_pointer_enter() with a NULL surface argument. However, this seems intended to have been purely an implementation detail: wlr_seat_pointer_enter()'s documentation did not specify that you can pass a NULL surface, no external clients did, and the very existence of wlr_seat_pointer_clear_focus() indicates that wlr_seat_pointer_enter() was not intended to be used for clearing focus.

Eventually, #183 came along with commit 17b134e ("wlr-seat: pointer grab interface"). In order for wlroots to support pointer grabs (used for xdg_shell popups and, since #262, drag-and-drop), this commit added an extra layer of indirection. Now, instead of calling wlr_seat_pointer_enter() directly, clients were supposed to call a new function wlr_seat_pointer_notify_enter(), which calls different enter hooks depending on the type of grab that's active. If no grab is active, it just delegates back to wlr_seat_pointer_enter(). Clients switched to the new function, and things seemed good.

However, there was an oversight: #183 added indirected versions of wlr_seat_pointer_enter() and several other functions, but it left wlr_seat_pointer_clear_focus() alone. Clients continued to call that function directly to clear focus regardless of whether a grab was active. What this meant was that grab hooks were not notified when the pointer left a surface without immediately entering a new one (or when focus was cleared due to special circumstances).

This brings us, finally, to the commit that @Xyene identified as the cause of swaywm/sway#5294. In swaywm/sway#5220, @nickdiego noticed weird behavior of drag-and-drop grabs caused by—surprise!—missing events when the pointer left a surface. In order to fix this, they wrote swaywm/sway#5222, adding a call to wlr_seat_pointer_notify_enter() with a NULL surface argument. This was the first time (at least in sway) the function was called that way, and although both wlr_seat_pointer_enter() (as discussed previously) and the drag-and-drop enter hook (drag_handle_pointer_enter()) can gracefully handle a NULL surface, the xdg_shell popup enter hook (xdg_pointer_grab_enter()) cannot and segfaults.

So, how should we fix it? One approach is the one taken by this commit, which is to ensure that all grab enter hooks can treat a NULL surface as a focus clear. I think this is a fine approach, but there are several other things I'd also want to see if we choose it:

  1. The fact that wlr_seat_pointer_enter() can take a NULL surface should be explicitly documented in wlr_seat.h.
  2. All wlr_seat_pointer_clear_focus() calls in clients should be replaced with wlr_seat_pointer_notify_enter(wlr_seat, NULL, 0, 0) calls. Note that input/seatop_default: properly notify pointer leave sway#5222 didn't do this: it added the latter but did not remove the former. This is wrong, IMO, since a grab may wish to fully override the default behavior.
  3. For consistency, wlr_seat_pointer_clear_focus() should perhaps be deprecated entirely in favor of wlr_seat_pointer_enter(wlr_seat, NULL, 0, 0).

The other approach we could take is to explicitly prohibit passing a NULL surface to wlr_seat_pointer_notify_enter() and add a new indirected function, wlr_seat_pointer_notify_clear_focus(). If we take this path, we'd need to

  1. Plumb up wlr_seat_pointer_notify_clear_focus() like all the other indirected functions, and add implementations of it to existing grabs (i.e. xdg_shell popup and drag-and-drop).
  2. Replace all calls to wlr_seat_pointer_clear_focus() in clients with calls to wlr_seat_pointer_notify_clear_focus(),
  3. Add an assert(surface) call to wlr_seat_pointer_notify_enter() to catch misbehaving clients early.

I personally like the second approach a bit better, and I'll probably attempt to write a PR implementing it this weekend barring any objections. As a stopgap, I think we should revert swaywm/sway#5222, since a rare drag-and-drop bug is preferable to a segfault every time your cursor leaves a surface while a popup is up.

@emersion
Copy link
Member

emersion commented May 7, 2020

I agree with @tchebb. I also slightly prefer the second approach because the x/y parameters don't make sense for leave events. Since we're going to break the API, it would be nice to rename clear_focus to leave to align the wording with the Wayland protocol.

@emersion
Copy link
Member

emersion commented May 7, 2020

Also see #1506

@alejor
Copy link
Author

alejor commented May 7, 2020

@tchebb, I agree absolutely with you, Indeed I just try to fix it as sway is my daily driver and any issue that can crash it makes me feeling bad, but I don't know any clue about the architecture. I already known that was a ugly workaround, but any is better than get a segfault and loss the daily work. Thanks for your advice and hope you can figure out the best way to tackle this. Cheers!

@nickdiego
Copy link

Wow! thanks for the in-depth analysis and sorry for not being able to look into this regression introduced by my change. Quite busy week here :/ TBH, I hesitated a bit before sending that patch without any kind of automated test / CI to "ensure" it was regressions-free.

The other approach we could take is to explicitly prohibit passing a NULL surface to wlr_seat_pointer_notify_enter() and add a new indirected function, wlr_seat_pointer_notify_clear_focus().

IMHO, this sounds better, had even asked about such a function to @emersion as I was a bit surprised it didn't exist yet :P

Thanks for looking into it. I still have several DND issues on the pipeline to report / try to fix :( hope to have some time for it in the coming weeks

@tchebb
Copy link
Contributor

tchebb commented May 21, 2020

I've posted #2217 and swaywm/sway#5368 to implement the second solution I proposed above.

@emersion
Copy link
Member

emersion commented Jun 8, 2020

Superseded by #2217

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants