Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

Add support for GL_EXT_robustness #2705

Open
emersion opened this issue Feb 1, 2021 · 13 comments
Open

Add support for GL_EXT_robustness #2705

emersion opened this issue Feb 1, 2021 · 13 comments

Comments

@emersion
Copy link
Member

emersion commented Feb 1, 2021

Would allow us to recover from GPU resets. We'll need to trash all of our renderer state and re-create it.

See:


wlroots has migrated to gitlab.freedesktop.org. This issue has been moved to:

https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/2705

@emersion
Copy link
Member Author

On amdgpu, a manual GPU reset can be triggered with /sys/kernel/debug/dri/0/amdgpu_gpu_recover.

@zzag
Copy link

zzag commented Apr 21, 2021

You will have to trash renderer state and also client buffer textures. This means that you will need to keep a wl-shm client buffer referenced even after uploading data to the opengl texture.

@emersion
Copy link
Member Author

emersion commented Apr 21, 2021

Support for robustness could look like this:

  • In wlr_renderer_bind_buffer, check glGetGraphicsResetStatusEXT. If it doesn't return NO_ERROR, busy-wait with some sleep interval and a timeout until the GPU is ready again.
  • Re-initialize the renderer's internal state: EGL contexts, shaders
  • Destroy all wlr_gles2_buffer
  • Re-import all wlr_texture from their original wlr_buffer, if any (needs Cache and re-use DMA-BUF textures #2851). Otherwise make the texture "inert": destroy the GL texture but keep the wlr_texture alive to avoid crashing compositors.
  • Emit a wlr_renderer.events.reset event so that compositors can re-upload any texture that they created, and re-create their own GL state if any.

@emersion
Copy link
Member Author

This means that you will need to keep a wl-shm client buffer referenced even after uploading data to the opengl texture.

Indeed. We can't re-upload the buffer after we've released the buffer, because the client might be in the process of rendering to it, so its contents can be garbage.

@emersion
Copy link
Member Author

Emit a wlr_renderer.events.reset event so that compositors can […] re-create their own GL state if any

This might be slightly more complicated: we need compositors to destroy their old GL state before we destroy the EGL context, and re-create it after we've established a new EGL context. We might need two events (meh API), or to keep the old EGL context alive up to wlr_renderer.events.reset.

@zzag
Copy link

zzag commented Apr 21, 2021

Indeed. We can't re-upload the buffer after we've released the buffer, because the client might be in the process of rendering to it, so its contents can be garbage.

Beware though, there are applications in the wild that assume the compositor will release a shm buffer after uploading its data to an opengl texture, most prominent example is Firefox. https://bugzilla.mozilla.org/show_bug.cgi?id=1693472

@emersion
Copy link
Member Author

Oh, wow. That's pretty gross.

@dnkl
Copy link

dnkl commented May 6, 2021

This means that you will need to keep a wl-shm client buffer referenced even after uploading data to the opengl texture.

Is there no way around this? I completely agree that clients cannot assume a buffer is released immediately. But it does enable very nice optimizations when it is released immediately, especially considering rendering is done on the CPU.

Being able to recover from GPU resets is of course an extremely nice thing to have. It just seems somewhat weird to incur this kind of performance penalty for something that should, ideally, never happen.

But I guess one could argue that performance critical applications should use GL, not shm...

@emersion
Copy link
Member Author

emersion commented May 6, 2021

One way around this would be to add a protocol to let the compositor ask the client to submit a new frame. wl_surface.frame isn't enough, because it's (1) requested by the client (2) doesn't require the client to submit a new buffer (e.g. if nothing changed on screen, no need to redraw).

Non-immediate release shouldn't have too much of a CPU usage cost, I think? It does have a memory cost though.

@kennylevinsen
Copy link
Member

How about sending a configure to all surfaces? That is likely to provoke a new frame.

Granted, it's not guaranteed to work, but clients might not use a dedicated recovery protocol either, so a backup plan might be in order...

@dnkl
Copy link

dnkl commented May 6, 2021

Non-immediate release shouldn't have too much of a CPU usage cost, I think? It does have a memory cost though.

Clients can choose between a couple of strategies I think.

  1. Re-render the new frame from scratch. Potentially very expensive.
  2. memcpy() the old frame, then apply current frame's damage.
  3. Try to re-apply the damage from the old frame, before applying the new frame's damage.

("damage" refering to the client's internal damage tracking)

For reference, foot currently does 2), but I'm going to test 3). I should have some performance numbers after that (see https://codeberg.org/dnkl/foot/issues/478).

One way around this would be to add a protocol to let the compositor ask the client to submit a new frame.

How about sending a configure to all surfaces? That is likely to provoke a new frame.

Both alternatives crossed my mind as well. I think a new protocol would be more robust? But like @kennylevinsen said, perhaps a configure event can be used as a fallback mechanism for clients not implementing the new protocol?

@emersion
Copy link
Member Author

emersion commented May 6, 2021

How about sending a configure to all surfaces? That is likely to provoke a new frame.

Yeah, but some clients might just realize the configure event doesn't change anything, and ack it without attaching a new buffer.

For clients that don't support the "please redraw" protocol, we could always hold the wl_buffer longer, just as described above. The protocol could be designed with a per-surface addon object, so that the compositor knows exactly which surface can be redrawn on-demand.

(3) Try to re-apply the damage from the old frame, before applying the new frame's damage.

This seems like the best solution.

@dnkl
Copy link

dnkl commented May 8, 2021

Here are some initial numbers from the foot PR that implements (3). While the PR still needs more testing, I believe the performance numbers are close to what we'll see in the end:

Sway 1.6, wlroots 0.13.0
Terminal size: 135x67 cells
Surface size: 953x1024
(CPU: i5-8250U CPU @ 1.60GHz 4/8 cores/threads, 6MB L3)

Times are in microseconds (µs).

Numbers in parentheses is the time taken to “prepare” the buffer before applying the current frame’s damage (hence it’s always zero in the “Immediate release” column).

Not covered here: ignoring old buffer content and instead re-rendering the entire frame (on this setup they range from ~3000-5000µs).

edit: since this wasn't mentioned anywhere; this table shows the average rendering time for a single frame.

Benchmark Immediate release1 Damage tracking2 Copy last frame3
typing4 143.6 ±40.1 (0.0 ±0.0) 181.6 ±35.3 (35.2 ±3.2) 1404.2 ±135.2 (1212.2 ±120.7)
cursor movement5 165.9 ±39.9 (0.0 ±0.0) 231.5 ±33.8 (74.2 ±5.2) 1246.6 ±113.9 (1094.6 ±103.4)
scrolling6 540.6 ±106.7 (0.0 ±0.0) 854.5 ±91.1 (350.6 ±29.1) 1677.8 ±282.8 (1082.8 ±213.2)

Observations:

  • double buffering damage is a very clear improvement over doing a dumb copy of the last frame (which in turn is a huge improvement over re-rendering the frame from scratch).
  • scrolling is a fairly expensive operation. This is exacerbated when we’re forced to do it twice (350µs vs. 35-75µs).
  • a simple memcpy() is a fairly expensive operation on buffers as large as these. Also, in addition to take time, they can easily thrash the cache, slowing things down further.

Footnotes

  1. no double buffering, foot re-uses previous frame’s buffer

  2. foot re-applies last frame’s damage before applying current frame’s damage

  3. foot copies the old buffer (all of it) before applying current frame’s damage

  4. running cat - in the shell, at the bottom of the screen, typing a single letter at a time

  5. large C file in vim, moving cursor with arrow keys without scrolling the content

  6. large C file in vim, scrolling content by holding down arrow key

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

4 participants