-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix crashes on exit caused by wlroots listener checks #8578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
On second thought, it might be more useful to remove the bulk of the listeners in Destroy listeners are still needed for a bunch of other components that aren't directly created in |
If we could have a 1:1 mapping between what is set up in One thing is what is easiest to implement, another is what is easiest to maintain and catch errors within in the future. |
8ea9677 to
307ce0c
Compare
IMO components that have This is sometimes hard to distinguish though; I'm currently handling these by making the I think I'll try to get this PR far enough to exit sway without crashes with this method, keeping as much as possible in separate commits for reviewability, and we'll sort out what the best path forward is during review. |
adfe709 to
8ee1fc7
Compare
|
The current state makes sway exit correctly from a desktop containing a few konsole windows and a firefox window. I'm not completely sure about the correct handling of all of these cases, and I'm sure there are a lot more objects with listeners that need to be removed before destroy, but those don't trigger on a regular exit in my setup. I've already started writing a script to search for all objects in wlroots that have listener checks and their corresponding usages in sway, but that might take a while. In the meantime, I think it's better to mark this as reviewable now and see where we go afterwards. |
8ee1fc7 to
6df3682
Compare
Addendum to this: this works fine for The current implementation has this 1:1 mapping for objects created in |
6df3682 to
ca19d20
Compare
|
Diff to state before last force-push: diff --git a/sway/server.c b/sway/server.c
index 683ce169..79c8f542 100644
--- a/sway/server.c
+++ b/sway/server.c
@@ -458,6 +458,7 @@ bool server_init(struct sway_server *server) {
void server_fini(struct sway_server *server) {
// remove listeners
+ wl_list_remove(&server->renderer_lost.link);
wl_list_remove(&server->new_output.link);
wl_list_remove(&server->layer_shell_surface.link);
wl_list_remove(&server->xdg_shell_toplevel.link);
@@ -474,14 +475,12 @@ void server_fini(struct sway_server *server) {
wl_list_remove(&server->xdg_activation_v1_request_activate.link);
wl_list_remove(&server->xdg_activation_v1_new_token.link);
wl_list_remove(&server->request_set_cursor_shape.link);
-#if WLR_HAS_XWAYLAND
- wl_list_remove(&server->xwayland_surface.link);
- wl_list_remove(&server->xwayland_ready.link);
-#endif
input_manager_finish(server);
// TODO: free sway-specific resources
#if WLR_HAS_XWAYLAND
+ wl_list_remove(&server->xwayland_surface.link);
+ wl_list_remove(&server->xwayland_ready.link);
wlr_xwayland_destroy(server->xwayland.wlr_xwayland);
#endif
wl_display_destroy_clients(server->wl_display); |
ca19d20 to
8932123
Compare
|
I found an interesting crash related to the The error occurs while sway is trying to destroy the old renderer after a GPU reset and also exists before this PR. Call Trace: wlroots Detailed Stack Tracewlroots
|
We can fix this in sway if we can break the stack without causing issues elsewhere by either:
The first one is cleaner and is hopefully fine, but I'm not sure if there will be a bunch of things failing in between the renderer being lost and the idle callback running. |
These seem like sensible approaches. I'll look into it and try to see if the first approach works without complications. IMO this could be done in a separate PR, since it doesn't really touch the exit logic and is more than just removing listeners. EDIT: implemented this as a separate commit in this PR |
8932123 to
9122400
Compare
9122400 to
e53f77a
Compare
I just implemented this in the latest push. I haven't fully tested it yet since I don't know of a way to force a GPU reset or renderer loss. I'm still searching if there's a simple way to do this (for info: I'm using amdgpu on this laptop), otherwise I'll hack something in to make sway think a GPU reset happened to test this. I've already tried |
2126919 to
9f0e3c0
Compare
|
Rebased onto latest master. I've also tested the new delayed renderer recreation by faking the return value of @kennylevinsen / @llyyr: could you look at the new renderer lost handling and confirm that nothing looks out of the ordinary? Otherwise I think this PR is good to go. I've been running this for weeks now and haven't had any more crashes on exit. There might be a few listeners still somewhere in sway, but for daily use this should be fine, and the rest can be handled when we come across them (IMO anything is better than the current state of master where sway just crashes on exit). |
9f0e3c0 to
95117e8
Compare
|
Rebased again onto latest master. I also did a quick grep over the codebase and from a first look it seems like every remaining |
|
I've also tested this a bit and haven't seen any exit crashes |
emersion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing this! Overall looks good to me, a few minor comments below.
sway/input/text_input.c
Outdated
| void sway_input_method_relay_finish(struct sway_input_method_relay *relay) { | ||
| // return early if finish was already called | ||
| // can be called due to seat or manager protocol object being destroyed | ||
| if (!relay->input_method_new.link.prev) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this sounds like a workaround.
Since server.text_input and server.input_method are only accessed in sway_input_method_relay_init(), we could instead do something like:
static void relay_handle_text_input_manager_destroy(struct wl_listener *listener, void *data) {
struct sway_input_method_relay *relay = wl_container_of(listener, relay,
text_input_manager_destroy);
wl_list_remove(&relay->input_method_new.link);
wl_list_remove(&relay->input_method_manager_destroy.link);
wl_list_init(&relay->input_method_new.link);
wl_list_init(&relay->input_method_manager_destroy.link);
}Perhaps this can be extracted to a helper function sway_input_method_relay_finish_input_method() and re-used in sway_input_method_relay_finish().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this before I knew wl_list_init() could be used to make this work.
Thanks, I'll look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't relay_handle_text_input_manager_destroy() remove the listeners for text_input_new and text_input_manager_destroy, rather than the input method listeners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my bad, my suggestion has a typo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this to remove the text_input listeners instead in 8e75dad.
95117e8 to
fe79c1a
Compare
|
Fixed the nits from @emersion's review and rebased onto latest master. A quick test run seems to suggest everything is still working as before. EDIT: it seems something else was merged in the meantime, rebased again and updated the commit references in the comments above. |
This fixes a crash in wlroots listener checks. See swaywm#8509.
This fixes a crash in wlroots listener checks. See swaywm#8509.
This fixes a crash in wlroots listener checks. See swaywm#8509.
fe79c1a to
4b057c3
Compare
sway/input/text_input.c
Outdated
| struct sway_input_method_relay *relay = wl_container_of(listener, relay, | ||
| input_method_manager_destroy); | ||
|
|
||
| sway_input_method_relay_finish(relay); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we need to do the same here? ie, the input method manager may get destroyed before the text input manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 8e75dad.
include/sway/input/text_input.h
Outdated
| struct sway_input_method_relay *relay); | ||
|
|
||
| void sway_input_method_relay_finish(struct sway_input_method_relay *relay); | ||
| void sway_input_method_relay_finish_input_method(struct sway_input_method_relay *relay); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we keep this static inside text_input.c, since it's not called elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'll make it static
|
Sorry, two more comments :P |
sway_input_method_relay can be destroyed from two sources, either the seat is destroyed or the manager protocol objects are destroyed due compositor exit. This fixes a crash in wlroots listener checks. See swaywm#8509.
Change begin_destroy to remove event listeners before the final destroy, since otherwise event listeners would be removed twice, which crashes. This fixes a crash in wlroots listener checks. See swaywm#8509.
Destroying the wlr_renderer in a callback to its own renderer_lost event is unsafe due to wl_signal_emit*() still accessing it after it was destroyed. Delegate recreation of renderer to an idle callback and ensure that only one such idle callback is scheduled at a time by storing the returned event source.
4b057c3 to
270ca3a
Compare
no problem :) My last push changed the |
emersion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This PR fixes sway crashing on exit due to event listeners not being removed from wlroots objects on exit.
Listeners fixed:
server_init(), reworkedhandle_renderer_lost()to use an idle callbackwlr_backend.new_input,wlr_virtual_keyboard,wlr_virtual_pointer,wlr_keyboard_shortcuts_inhibit,wlr_transient_seat_managerwlr_idle_inhibit_managerwlr_text_input_managerandwlr_input_method_managerwlr_scene_buffer.output_enterandwlr_scene_buffer.output_leaveIf I haven't missed any, this fixes removing listeners at exit for all wlroots objects with listeners that sway uses. Other wlroots objects not touched in this PR already have some form of listener remove logic. I haven't individually tested those, but they do not lead to crashes on my system.