Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

DRM Explicit fencing #894

Open
ascent12 opened this issue Apr 23, 2018 · 5 comments
Open

DRM Explicit fencing #894

ascent12 opened this issue Apr 23, 2018 · 5 comments

Comments

@ascent12
Copy link
Member

ascent12 commented Apr 23, 2018

Another one of these new shiny graphics-related features I eventually want to get around to.
Basically, the purpose is to stop slow clients from stalling the graphics pipeline by controlling synchronization ourselves. It may also help with multi-GPU.

I'm not sure what impact this is going to have on the codebase, and how it may apply to non-DRM backends, if at all.

The zwp_linux_explicit_synchronization_v1 protocol is used for this, although has not been merged at the time of writing.
EGL_ANDROID_native_fence_sync is related.
It's also related to Vulkan Fences/Semaphores.

Extra links:
https://www.collabora.com/news-and-blog/blog/2016/09/13/mainline-explicit-fencing-part-1/
https://www.collabora.com/news-and-blog/blog/2016/10/18/mainline-explicit-fencing-part-2/
https://www.collabora.com/news-and-blog/blog/2017/01/26/mainline-explicit-fencing-part-3/


wlroots has migrated to gitlab.freedesktop.org. This issue has been moved to:

https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/894

@emersion
Copy link
Member

Here's the protocol proposal: https://patchwork.freedesktop.org/patch/177866/

@emersion
Copy link
Member

The protocol has been merged: https://cgit.freedesktop.org/wayland/wayland-protocols/commit/?id=19ec5dcc4b1d57ce5dd8e400b5e9b4ec5791ff06

Note that we can implement this protocol before actually supporting DRM fences, because it provides a solution to a deficiency of the wl_buffer.release event (see https://gitlab.freedesktop.org/wayland/wayland/issues/46).

@ascent12
Copy link
Member Author

ascent12 commented Feb 19, 2019

After a lot more investigation to how this would be implemented, it turns out the nature of this problem is a lot different than I thought it was.
The way this would ideally be implemented, it's far less of a renderer issue and more of a state tracking issue.

Just so everyone is on the same page, a dma-fence is a kernel object which can signal the completion of asynchronous GPU operations. They are exposed to userspace using "sync files" which can be made up of multiple fences. Fortunately some kernel documentation for this exists:
https://github.com/torvalds/linux/blob/master/Documentation/sync_file.txt
Userspace API header:
https://github.com/torvalds/linux/blob/master/include/uapi/linux/sync_file.h
An important thing to note is that these can be used will poll(2)-like APIs.

Now when it comes to implementing zwp_linux_explicit_synchronization, there are two parts to the issue.

Passing a fence to the client

This part isn't that tricky, but you still need to be a little careful.
When using atomic DRM, there is a property called OUT_FENCE_PTR, which will return a sync-file for us to indicate when the pageflip is complete. This just needs to be sent to the client.

However, you need to be careful if you're using the client's buffer multiple times (e.g. spanning 2 outputs) or if you're going to hold onto the client's buffer longer than one frame. sync-files can be merged together, so it's not really that hard to do.

Also note: You don't actually have to implement this to conform to the protocol. You can just use the immediate_release event, and it'll act like a wl_buffer.release. As of writing, Weston only uses immediate releases, and an initial implementation of this protocol could be done this way.

Receiving a fence from the client

Now this is where things get complicated.
There are two main ways I can think this can be implemented.

The simple way

  1. We get the fence from the client
  2. We import the fence with EGL and use eglWaitSyncKHR on it (or whatever the Vulkan equivalent is).
  3. Done; everything else works as normal.

This is how Weston currently implements it, and would be considered the bare minimum for supporting explicit-sync.

This adds some advantage for tracing, but doesn't really provide the real advantages of explicit-sync. A slow client is still capable of slowing the compositor's drawing down, and potentially missing vblank.

The good, but complicated way

This is where we actually make rendering decisions based of the status of the fence.
For example, at drawing time, if a fence has not been signalled, we instead choose to use the client's old buffer. We shouldn't ever be blocked on the client completing and can guarantee a more stable framerate.

However, this violates one of our fundamental assumptions:
A client may not be ready to render at the time they call wl_surface.commit
So instead of any wl_surface state simply being "pending" and "committed", it would now be "pending", "committed", and "ready".

Consider this situation:

  • It's time for the compositor to draw.
  • The client's fence has not been signalled
  • We look at the previous commit state
  • That also contains a fence which has not been signalled
  • We now have no valid buffer

To solve this, instead of the current design of pending and current in wlr_surface, it actually needs to be implemented as a queue, where we make sure to keep at least one good commit state with a ready buffer.

We also need to store ALL state tied to the surface commit, not just the buffer. Many protocol extensions affect rendering, so they need to be tied to the queue too. I don't particularly want to manage a queue of states inside of every single wl_surface-related protocol we implement, so that's why I proposed #1546, where we create a central place where state can be managed properly.

I believe my proposal should work for this, but there is still one pretty large nightmare protocol that still needs to be sorted: wl_subsurface. Queues blocking other queues and all sorts of shit.
I haven't finalized a design which I know will work for this yet, but I'm trying some implementations.

Anyway, that's my brain-dump about this protocol and hopefully justifies the changes I want to make in #1546.
As it turns out, if we implement it the "good" way, we don't even need to change the renderer. We'll just poll the sync-files and they'll be guaranteed to be ready by the time the renderer receives it.

I asked a question about this on wayland-devel, so you can see the thread for some more info: https://lists.freedesktop.org/archives/wayland-devel/2019-February/040066.html
I also had an off-list discussion with Daniel, where he confirms the necessity for a commit/buffer queue.

@emersion
Copy link
Member

emersion commented Apr 20, 2019

Here's a test target for a client blocking the whole pipeline: https://gitlab.collabora.com/daniels/texture-atlas-test/blob/master/obnoxious-fbo-load.c#L92

Another one: https://github.com/ascent12/compositor-killer

(Warning: don't try it at home)

@emersion
Copy link
Member

As a side note, explicit synchronization should also help with e.g. buggy Vulkan programs that mess up synchronization and submit jobs that never complete. Here's an example: https://gitlab.freedesktop.org/drm/amd/-/issues/702#note_650470

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

2 participants