DRM Explicit fencing #894

ascent12 · 2018-04-23T23:07:06Z

Another one of these new shiny graphics-related features I eventually want to get around to.
Basically, the purpose is to stop slow clients from stalling the graphics pipeline by controlling synchronization ourselves. It may also help with multi-GPU.

I'm not sure what impact this is going to have on the codebase, and how it may apply to non-DRM backends, if at all.

The zwp_linux_explicit_synchronization_v1 protocol is used for this, although has not been merged at the time of writing.
EGL_ANDROID_native_fence_sync is related.
It's also related to Vulkan Fences/Semaphores.

Extra links:
https://www.collabora.com/news-and-blog/blog/2016/09/13/mainline-explicit-fencing-part-1/
https://www.collabora.com/news-and-blog/blog/2016/10/18/mainline-explicit-fencing-part-2/
https://www.collabora.com/news-and-blog/blog/2017/01/26/mainline-explicit-fencing-part-3/

wlroots has migrated to gitlab.freedesktop.org. This issue has been moved to:

https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/894

The text was updated successfully, but these errors were encountered:

emersion · 2018-04-24T07:10:17Z

Here's the protocol proposal: https://patchwork.freedesktop.org/patch/177866/

emersion · 2018-11-12T14:44:22Z

The protocol has been merged: https://cgit.freedesktop.org/wayland/wayland-protocols/commit/?id=19ec5dcc4b1d57ce5dd8e400b5e9b4ec5791ff06

Note that we can implement this protocol before actually supporting DRM fences, because it provides a solution to a deficiency of the wl_buffer.release event (see https://gitlab.freedesktop.org/wayland/wayland/issues/46).

ascent12 · 2019-02-19T11:37:43Z

After a lot more investigation to how this would be implemented, it turns out the nature of this problem is a lot different than I thought it was.
The way this would ideally be implemented, it's far less of a renderer issue and more of a state tracking issue.

Just so everyone is on the same page, a dma-fence is a kernel object which can signal the completion of asynchronous GPU operations. They are exposed to userspace using "sync files" which can be made up of multiple fences. Fortunately some kernel documentation for this exists:
https://github.com/torvalds/linux/blob/master/Documentation/sync_file.txt
Userspace API header:
https://github.com/torvalds/linux/blob/master/include/uapi/linux/sync_file.h
An important thing to note is that these can be used will poll(2)-like APIs.

Now when it comes to implementing zwp_linux_explicit_synchronization, there are two parts to the issue.

Passing a fence to the client

This part isn't that tricky, but you still need to be a little careful.
When using atomic DRM, there is a property called OUT_FENCE_PTR, which will return a sync-file for us to indicate when the pageflip is complete. This just needs to be sent to the client.

However, you need to be careful if you're using the client's buffer multiple times (e.g. spanning 2 outputs) or if you're going to hold onto the client's buffer longer than one frame. sync-files can be merged together, so it's not really that hard to do.

Also note: You don't actually have to implement this to conform to the protocol. You can just use the immediate_release event, and it'll act like a wl_buffer.release. As of writing, Weston only uses immediate releases, and an initial implementation of this protocol could be done this way.

Receiving a fence from the client

Now this is where things get complicated.
There are two main ways I can think this can be implemented.

The simple way

We get the fence from the client
We import the fence with EGL and use eglWaitSyncKHR on it (or whatever the Vulkan equivalent is).
Done; everything else works as normal.

This is how Weston currently implements it, and would be considered the bare minimum for supporting explicit-sync.

This adds some advantage for tracing, but doesn't really provide the real advantages of explicit-sync. A slow client is still capable of slowing the compositor's drawing down, and potentially missing vblank.

The good, but complicated way

This is where we actually make rendering decisions based of the status of the fence.
For example, at drawing time, if a fence has not been signalled, we instead choose to use the client's old buffer. We shouldn't ever be blocked on the client completing and can guarantee a more stable framerate.

However, this violates one of our fundamental assumptions:
A client may not be ready to render at the time they call wl_surface.commit
So instead of any wl_surface state simply being "pending" and "committed", it would now be "pending", "committed", and "ready".

Consider this situation:

It's time for the compositor to draw.
The client's fence has not been signalled
We look at the previous commit state
That also contains a fence which has not been signalled
We now have no valid buffer

To solve this, instead of the current design of pending and current in wlr_surface, it actually needs to be implemented as a queue, where we make sure to keep at least one good commit state with a ready buffer.

We also need to store ALL state tied to the surface commit, not just the buffer. Many protocol extensions affect rendering, so they need to be tied to the queue too. I don't particularly want to manage a queue of states inside of every single wl_surface-related protocol we implement, so that's why I proposed #1546, where we create a central place where state can be managed properly.

I believe my proposal should work for this, but there is still one pretty large nightmare protocol that still needs to be sorted: wl_subsurface. Queues blocking other queues and all sorts of shit.
I haven't finalized a design which I know will work for this yet, but I'm trying some implementations.

Anyway, that's my brain-dump about this protocol and hopefully justifies the changes I want to make in #1546.
As it turns out, if we implement it the "good" way, we don't even need to change the renderer. We'll just poll the sync-files and they'll be guaranteed to be ready by the time the renderer receives it.

I asked a question about this on wayland-devel, so you can see the thread for some more info: https://lists.freedesktop.org/archives/wayland-devel/2019-February/040066.html
I also had an off-list discussion with Daniel, where he confirms the necessity for a commit/buffer queue.

emersion · 2019-04-20T11:13:12Z

Here's a test target for a client blocking the whole pipeline: https://gitlab.collabora.com/daniels/texture-atlas-test/blob/master/obnoxious-fbo-load.c#L92

Another one: https://github.com/ascent12/compositor-killer

(Warning: don't try it at home)

emersion · 2020-10-12T07:47:39Z

As a side note, explicit synchronization should also help with e.g. buggy Vulkan programs that mess up synchronization and submit jobs that never complete. Here's an example: https://gitlab.freedesktop.org/drm/amd/-/issues/702#note_650470

ascent12 added enhancement backend/drm labels Apr 23, 2018

Drakulix mentioned this issue Apr 21, 2019

Vulkan backend support. Smithay/smithay#134

Open

ascent12 mentioned this issue May 3, 2019

Groundwork for wp_linux_explicit_synchronization #1685

Draft

17 tasks

ascent12 mentioned this issue Jan 26, 2020

stuttery hardware cursor under high GPU load swaywm/sway#4763

Open

emersion mentioned this issue Mar 22, 2020

[WIP] linux-explicit-synchronization-v1: new protocol implementation #2070

Draft

9 tasks

GrabbenD mentioned this issue Aug 26, 2023

[VRR] Direct Scanout causes stuttering swaywm/sway#7714

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRM Explicit fencing #894

DRM Explicit fencing #894

ascent12 commented Apr 23, 2018 •

edited by emersion

Loading

emersion commented Apr 24, 2018

emersion commented Nov 12, 2018

ascent12 commented Feb 19, 2019 •

edited

Loading

emersion commented Apr 20, 2019 •

edited

Loading

emersion commented Oct 12, 2020

DRM Explicit fencing #894

DRM Explicit fencing #894

Comments

ascent12 commented Apr 23, 2018 • edited by emersion Loading

emersion commented Apr 24, 2018

emersion commented Nov 12, 2018

ascent12 commented Feb 19, 2019 • edited Loading

Passing a fence to the client

Receiving a fence from the client

The simple way

The good, but complicated way

emersion commented Apr 20, 2019 • edited Loading

emersion commented Oct 12, 2020

ascent12 commented Apr 23, 2018 •

edited by emersion

Loading

ascent12 commented Feb 19, 2019 •

edited

Loading

emersion commented Apr 20, 2019 •

edited

Loading