Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: atomically apply window state changes #2911

Closed
wants to merge 5 commits into from
Closed

Conversation

m-col
Copy link
Member

@m-col m-col commented Oct 18, 2021

This is a work in progress draft for some collaborative work with @jwijenbergh.

There are two main changes across these 5 commits:

  • a refactor of Window.place() and its use within the codebase. This becomes the single, exclusive point of entry for making geometric state changes on windows. Window.{x,y,width,height} are no longer intended to be used as direct setters, though they will continue to work (logging deprecation warnings). To reduce code duplication, all of its arguments have become kwargs so you can do e.g. window.place(borderwidth=4), and that would be the canonical way to change borderwidth. This makes it much easier to control state changes on windows and group them for...
  • atomic state changes on windows and groups of windows. Higher-level operations such as changing groups are performed within the Core.masked() context manager so that configure requests sent to windows can be saved in a pending queue, and only when the final window is acked are the state changes applied simultaneously on all involved windows.

The (intended) effect is that every frame is rendered perfectly, addressing #2544. Noticably this means windows will not appear with the wrong geometry for a frame before fixing themselves, and adding/removing windows from a tiled layout results in the tiled layout changing atomically without any windows lagging their geomtry change for very short (but noticable) amount of time.

Current issues that need resolving:

  • when changing to a group, it seems as though the old buffer is being rendered for windows on the now-current group. This is the most noticable in terminals which might only commit new buffers when things are printed, so until a key is pressed or they are clicked, they render old content.
  • even before these changes, firefox displays a single pixel window before fixing itself (that's their doing, I believe). Since WL: Set tiled edge xdg_surface state when mapping #2903 was merged, it doesn't seemt to fix itself with this patch also applied.

`Core.masked` is a context manager that lets us apply a series of
changes to windows under a set of temporary conditions, for example to
ignore mouse-enter events when changing group in the case of the x11
backend. This commit uses this mechanism for a similar reason for the
wayland backend: to perform a series of operations which we can gather
into pending state changes. We then listen to the ack-configure event on
windows to wait until all windows that were involved in that state
change have acked. When the last one acks (or when 200ms have passed,
whichever comes first), we can then apply all of those state changes
simultaneously.
These windows are not ordered and can't be pending more than once so a
set is better form.
This commit does a few things with the goal of having `Window.place()`
as the one and *only* way to change a window's x, y, width or height.
This lets the backend have a greater degree of control over window
reconfiguring, and is required for truly atomic state changes in the
Wayland backend, as well as correctly sending configure notifys to
windows in the X11 backend.

Changes:

- All arguments to `Window.place` are optional keyword args.
- The X11 `Window.place` correctly sends configure notify events when
  the window is moved but not resized.
- The Wayland `Window.place` only sends a resize request to the client
  when either `width` or `height` is passed.
- `Window._enablefloating` and `Window._reconfigure_floating`, which
  exist in both backends, have been collapsed into one (the latter).
- All instances where `Window.{x,y,width,height}` were set directly have
  been adjusted to instead pass the new value into `Window.place()`.
- All instances where `bordercolor=None` was passed to `Window.place`
  have been removed.
- All instances where the arguments passed to `Window.place` were
  directly retrieved from that window's property have been removed (See
  e.g. the change in scratchpad.py).

The intended usage is now to do something like `place(x=x, y=y)` or
`place(width=width, height=height)` if these are the only changes,
rather than directly setting those properties. Note that though these
properties should eventually become read-only by removing the setters,
the setters remain usable, logging a deprecation notice.
While this makes little difference in regular usage, this ensures that
tests do not fail due to pending geometry-related events waiting on the
core's event queue.
If a window is being resized or moved, geometry changes are added to the
queue of pending changes so that they can all be applied atomically.
This commit includes border width and colors in this process, if they
were provided to `Window.place()`.

This prevents border changes happening out of step with other layout
changes, which results in a strange state for a small number of frames
while waiting for windows to ack, where a window's border has changed
(possibly causing it to shift diagonally) but the remaining layout
changes are still pending.
@m-col m-col marked this pull request as draft October 18, 2021 21:40
@jwijenbergh
Copy link
Contributor

jwijenbergh commented Oct 18, 2021

This is a work in progress draft for some collaborative work with @jwijenbergh.

There are two main changes across these 5 commits:

* a refactor of `Window.place()` and its use within the codebase. This becomes the single, exclusive point of entry for making geometric state changes on windows. `Window.{x,y,width,height}` are no longer intended to be used as direct setters, though they will continue to work (logging deprecation warnings). To reduce code duplication, all of its arguments have become kwargs so you can do e.g. `window.place(borderwidth=4)`, and that would be the canonical way to change borderwidth. This makes it much easier to control state changes on windows and group them for...

* atomic state changes on windows and groups of windows. Higher-level operations such as changing groups are performed within the `Core.masked()` context manager so that configure requests sent to windows can be saved in a pending queue, and only when the final window is acked are the state changes applied simultaneously on all involved windows.

The (intended) effect is that every frame is rendered perfectly, addressing #2544. Noticably this means windows will not appear with the wrong geometry for a frame before fixing themselves, and adding/removing windows from a tiled layout results in the tiled layout changing atomically without any windows lagging their geomtry change for very short (but noticable) amount of time.

Thanks for the detailed pr! I am currently running this and it's really nice to see my terminals not have the dimension 'flicker'.

Current issues that need resolving:

* when changing to a group, it seems as though the old buffer is being rendered for windows on the now-current group. This is the most noticable in terminals which might only commit new buffers when things are printed, so until a key is pressed or they are clicked, they render old content.

Could you give me a reliably way to reproduce this (if there is one). I am currently trying it out and I can't seem to get in this state

* even before these changes, firefox displays a single pixel window before fixing itself (that's their doing, I believe). Since [WL: Set tiled edge xdg_surface state when mapping #2903](https://github.com/qtile/qtile/pull/2903) was merged, it doesn't seemt to fix itself with this patch also applied.

Firefox works for me. I observe that firefox still goes to a default dimension for a second (like half my screen) and then goes to the whole screen in a layout with it as the only window. Is this something you observe or is firefox completely broken for you now?
The same holds with emacs for me. If I run emacs with emacsclient (making startup very fast as compared to normal emacs) it still happens but it goes to the correct dimension much quicker than with my normal emacs config (which takes 1-2 seconds to load).
I will see if I can correct this behaviour.

Edit: just to add, with sway, the applications also kind of have the whole "dimension is loading" thing, but the border is drawn in the correct dimension the whole time, just the (visible) window inside the border takes a second to fully correct to the dimension of the border around it

@m-col
Copy link
Member Author

m-col commented Oct 18, 2021

Firefox works for me. I observe that firefox still goes to a default dimension for a second (like half my screen) and then goes to the whole screen in a layout with it as the only window. Is this something you observe or is firefox completely broken for you now?

Here is a video of it happening. Note the firefox window opens on a different group from the current, and it is launched via a keybinding. Then I change to that group, show the small window highlighted by some double borders, and then when I open a foot window firefox fixes itself. Now, the interesting thing is that it only happens to this firefox profile, and if I launch firefox with a different profile it works fine! No clue what is up with that, or if it's their or our bug.

recording.mp4

Could you give me a reliably way to reproduce this (if there is one). I am currently trying it out and I can't seem to get in this state

The problem with this is that it doesn't seem very reliable, so I will play around with things to see if I can find a pattern. The 200ms cut-off thing is literally a race so maybe it happens when foot is too slow or something. Here is another video showing this. Note the block cursor at the bottom left corner of the window. When changing group goes fine, it's filled (foot is correctly responding to the activated state), but sometimes it is empty. Near the end of the video I type a key, and foot receives this correctly and the next rendered buffer has this key and the block cursor, so it does have keyboard focus correctly.

foot.mp4

@m-col
Copy link
Member Author

m-col commented Oct 18, 2021

I am getting occasional problems with dragging floating windows around, and also sometimes segfaults that crash Qtile back to TTY. Nothing reliable yet though so not much else to report.

@jwijenbergh
Copy link
Contributor

Here is a video of it happening. Note the firefox window opens on a different group from the current, and it is launched via a keybinding. Then I change to that group, show the small window highlighted by some double borders, and then when I open a foot window firefox fixes itself. Now, the interesting thing is that it only happens to this firefox profile, and if I launch firefox with a different profile it works fine! No clue what is up with that, or if it's their or our bug.

Now that is really weird indeed. I do not seem to have this issue :(. I've had weird firefox profiles before. Does it work with that profile on e.g. sway?

I am getting occasional problems with dragging floating windows around, and also sometimes segfaults that crash Qtile back to TTY. Nothing reliable yet though so not much else to report.

Hmmmmmm, this is funny because I just had a segfault suddenly when reading that and opening a foot terminal, are you hacking me 👀

The problem with this is that it doesn't seem very reliable, so I will play around with things to see if I can find a pattern. The 200ms cut-off thing is literally a race so maybe it happens when foot is too slow or something. Here is another video showing this. Note the block cursor at the bottom left corner of the window. When changing group goes fine, it's filled (foot is correctly responding to the activated state), but sometimes it is empty. Near the end of the video I type a key, and foot receives this correctly and the next rendered buffer has this key and the block cursor, so it does have keyboard focus correctly.

Interesting thanks, I hope I can get to the bottom of this. I will do some debugging and try various things this week and I will report back when I have some news.

@m-col
Copy link
Member Author

m-col commented Oct 18, 2021

Firefox runs and is rendered perfectly on sway with this profile so I guess it is a bug here. If I return the surface.set_tiled(EDGES_TILED) line to Window.__init__ then Firefox does fix itself, but it still is rendered for a moment as a tiny point + borders. It doesn't in sway, so maybe its map state is being enabled too soon, before it is acked or something. Of course, adding that line back causes signal-desktop to never create a window. Clearly sway is doing something right and we need to follow that!

Hmmmmmm, this is funny because I just had a segfault suddenly when reading that and opening a foot terminal, are you hacking me eyes

At least the segfaults are reproducible across machines then 😆 I did just get one when trying to drag a nested session launched via the wephyr script.

@m-col
Copy link
Member Author

m-col commented Oct 18, 2021

I think maybe using ack_configure is not the way to go and instead we should be looking at the xdg_surface's configure_serial upon commit. I'm not entirely sure what difference this might have, but sway seems to do this. It's likely I've misunderstood the ack-configure thing.

Also a change that may or may not be important: we set up some listeners in Window.__init__ that sway doesn't set up until the window requests the map for the first time. We have some listeners set up there in Window._on_map but maybe some of the others need to move there too, otherwise we might be doing some things too early.

@jwijenbergh
Copy link
Contributor

jwijenbergh commented Oct 19, 2021

Firefox runs and is rendered perfectly on sway with this profile so I guess it is a bug here. If I return the surface.set_tiled(EDGES_TILED) line to Window.__init__ then Firefox does fix itself, but it still is rendered for a moment as a tiny point + borders. It doesn't in sway, so maybe its map state is being enabled too soon, before it is acked or something. Of course, adding that line back causes signal-desktop to never create a window. Clearly sway is doing something right and we need to follow that!

Okay that clears it up. I will look into it

At least the segfaults are reproducible across machines then laughing I did just get one when trying to drag a nested session launched via the wephyr script.

Hopefully we find a good way to reproduce the crash, race conditions can be very hard to debug :(

I think maybe using ack_configure is not the way to go and instead we should be looking at the xdg_surface's configure_serial upon commit. I'm not entirely sure what difference this might have, but sway seems to do this. It's likely I've misunderstood the ack-configure thing.

Also a change that may or may not be important: we set up some listeners in Window.__init__ that sway doesn't set up until the window requests the map for the first time. We have some listeners set up there in Window._on_map but maybe some of the others need to move there too, otherwise we might be doing some things too early.

I also am not sure what the difference is.
I am reading up on it and it's all quite complicated for me. Currently going through the wayland book so that might help me here.

Edit: Looking at e.g. river, they also set the ack_configure event on window map instead of in e.g. init:
https://github.com/ifreund/river/blob/5b8eab569c0860d48642bf096eccaa02244085e8/river/XdgToplevel.zig#L178

@m-col
Copy link
Member Author

m-col commented Oct 19, 2021

Something that we do not do at all, which we could if it helps, is to remove listeners when unmapping and return them again upon mapping, rather than just setting them up at the first map event and then leaving them.

@jwijenbergh
Copy link
Contributor

Something that we do not do at all, which we could if it helps, is to remove listeners when unmapping and return them again upon mapping, rather than just setting them up at the first map event and then leaving them.

Yeah I noticed this in the river code. We can make a separate pr to clean up these listeners or do it here?

@m-col
Copy link
Member Author

m-col commented Oct 19, 2021

Up to you - if it makes it easier to do it piece by piece then that's fine.

@m-col
Copy link
Member Author

m-col commented Oct 23, 2021

Looks like I can't give you access to edit this branch directly, so feel free to create a PR if you have updates and we can close this one in favour of that one.

@m-col m-col self-assigned this Oct 23, 2021
@m-col m-col linked an issue Oct 23, 2021 that may be closed by this pull request
@jwijenbergh
Copy link
Contributor

Looks like I can't give you access to edit this branch directly, so feel free to create a PR if you have updates and we can close this one in favour of that one.

Gotcha! I have been procrastinating doing other issues, but I will get to this soon.

@m-col
Copy link
Member Author

m-col commented Oct 23, 2021

No rush whatsoever, in your own time!

@jwijenbergh
Copy link
Contributor

Sorry for taking so long on this. I have actually been making a toy wayland compositor in c++ in my free time as I want to more thoroughly understand wayland. I will get back to this when I have a good understanding of what the best way to do this is

@m-col
Copy link
Member Author

m-col commented Nov 28, 2021

No worries at all 😄 sounds great!

@m-col
Copy link
Member Author

m-col commented Jan 15, 2022

Another relevant issue: #2911

@github-actions
Copy link

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions
Copy link

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@jwijenbergh
Copy link
Contributor

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

go away bot, I want to pick this up eventually :D

@github-actions
Copy link

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@jwijenbergh
Copy link
Contributor

@m-col should we close this or do you think we can rebase this on top of 0.16 pr?

@github-actions github-actions bot closed this Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Imperfect rendered frames due to no configure syncing with clients
3 participants