Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opaque region or isOpaque hint #1871

Closed
rmader opened this issue Jun 23, 2021 · 13 comments
Closed

Add opaque region or isOpaque hint #1871

rmader opened this issue Jun 23, 2021 · 13 comments
Projects

Comments

@rmader
Copy link

rmader commented Jun 23, 2021

This is a follow up on #1425 / #1474

IIUC the now introduced GPUSwapChainAlphaMode=opaque hint can have overhead depending on the implementation, as the implementation has e.g. fill alpha values etc. As already pointed out in #1425 (comment), many OSs, including MacOS[1], Android, Wayland[2], X11[3] and maybe Windows allow clients to provide a hint of the form "please composite this assuming the alpha channel is actually all 1, with undefined results if it's not.". These are usually realized either as a region (Wayland/X11) or as single boolean for a surface (MacOS).

Such a opt-in flag should allow to save a canvas size blit on many implementations, by skipping any blending. Most importantly it would move the responsibility to the client: provide content in an easy to optimize way and you become faster.

As this kind of hint has been well established in OS compositors and is available to native clients, I think we should have an equivalent in WebGPU as well.

cc: @kvark, @magcius


1: https://developer.apple.com/documentation/appkit/nsview/1483558-isopaque
2: https://wayland.freedesktop.org/docs/html/apa.html#protocol-spec-wl_surface -> set_opaque_region
3: https://specifications.freedesktop.org/wm-spec/wm-spec-latest.html#idm46291029692400

@magcius
Copy link

magcius commented Jun 23, 2021

If the user sets GPUSwapChainAlphaMode = opaque, then we should be able to set the system compositing flag as well, assuming the surface is using the system compositor. The only difference is that the "undefined results" isn't good for the web. e.g. somebody authors web content on Windows, which has a "composite as opaque" flag. If they don't author content correctly and don't notice, that content then starts behaving differently on other platforms, which they may not be able to immediately test.

Thankfully, we can fix this by running an extra pass that clears alpha to 1.0 before present. As far as I'm aware, this should be relatively cheap on all supported targets. No blending is done, and the number of blits is the same. It's still a massive performance improvement for platforms that support the "assume opaque" flag.

Am I missing something?

@jrmuizel
Copy link

Isn't the extra pass going to be going to be an entire framebuffer's worth of memory writes? That's not particularly cheap on low end Intel GPUs

@kvark
Copy link
Contributor

kvark commented Jun 23, 2021

@rmader thank you for filing this! Issues with proper references are the best :)
I read it twice in order to understand the actual proposal, so I'm going to re-state it here.

The proposal is to add a hint API allowing the implementation (on some platforms) to reduce potential overhead when filling out the alpha with 1. It could be a boolean "it's already 1, I promise!", or some sort of a region that is promised to have alpha of 1.

The general portability concern would apply, like @magcius noted. We aren't going to be checking if the hint is correct, so we'd end up with non-portable behavior if the hint is wrong.

@jrmuizel

Isn't the extra pass going to be going to be an entire framebuffer's worth of memory writes? That's not particularly cheap on low end Intel GPUs

In the last lengthy discussion, #1425 (comment) confirms that the performance overhead is not a big concern, based on https://bugs.chromium.org/p/chromium/issues/detail?id=1045643#c11 investigation:

Experiment 2: https://chromium-review.googlesource.com/2287369
Clear the alpha channel at the end of the frame. Attempted to do this both against multisampled renderbuffer and resolved texture; same result.
Had rendering artifacts around the pinball in this example.
Result: 99%

@rmader
Copy link
Author

rmader commented Jun 23, 2021

@kvark: thanks :) and yes, your recap sounds right to me.

Concerning the alpha clear: if that is really super cheap and 100% correct on different architectures (we may even care about software implementations?), well, then we are good. I find the evidence for that rather small so far and wonder why OS compositors AFAIK haven't adopted such an approach. From a Mutter dev perspective that would be great news of course :)
I guess it would be good to have some input from GPU vendors here.

The only difference is that the "undefined results" isn't good for the web.

I'm not familiar with web standard development, but if the flag is opt-in and well documented, would it be really too bad for something as complex as WebGPU? I'd imagine there are plenty of ways to do things wrong and still accidentally getting a good results :/

@Kangz
Copy link
Contributor

Kangz commented Jun 23, 2021

Another idea discussed a long time ago was to have a special texture format that you can only render to that's rgbx8unorm. When using it we would put alpha-false in the write mask such that alpha is guaranteed to stay what it was at the beginning of the pass (and it would start at 1). This way there is no overhead for the compositor at all. However it is more spec complexity and less flexibility for the application.

I'm not familiar with web standard development, but if the flag is opt-in and well documented, would it be really too bad for something as complex as WebGPU? I'd imagine there are plenty of ways to do things wrong and still accidentally getting a good results :/

One promise of the Web is effortless portability where your page will work exactly the same on your system as other systems. When practical we try to keep this property in WebGPU. This prevents writing code that works on one browser and breaks on others (see all these "works best in XXX" pages).

@kvark
Copy link
Contributor

kvark commented Jun 23, 2021

The general approach to building a web API here is - minimize the chances something works on one platform but doesn't work on another. If there is a failure, and it's platform-specific, it should be happening as early as possible. I.e. if your program requests higher-than-base limits for the device, it will fail to request the logical device on some platforms. So this is a bold and explicit failure, done early.

@kainino0x
Copy link
Contributor

The general portability concern would apply, like @magcius noted. We aren't going to be checking if the hint is correct, so we'd end up with non-portable behavior if the hint is wrong.

We already have this problem with non-opaque canvases (in WebGPU and WebGL): if you output pixels with R>A or G>A or B>A then you get undefined compositing results (not undefined web-observable behavior, notably). I think it would be palatable to extend this to opaque canvases.

My understanding here: #1425 (comment)
is that it would benefit macOS and Android.

@kainino0x kainino0x added this to Needs Discussion in Main Jul 17, 2021
@kainino0x
Copy link
Contributor

@jdashg and I chatted about this and we think we should seriously consider the special-storeOp solution before this one.

Originally posted by @kainino0x in #1425 (comment)

@kvark had an intriguing idea on chat:

yeah, that's a bit of an issue.
I wonder if storeOp = "present" could be a thing

I think how this would work is, if the render target is a swap chain texture, and the app submit()s work using storeOp: "present", then we would inject a clear if needed and early-detach the swap chain texture (so it can't be accessed anymore). If a canvas texture didn't use storeOp: "present", the browser would potentially inject a whole extra render pass to clear the alpha channel (and we could warn if this occurs). Browsers could also choose a simpler implementation where they just always do that instead of the optimized injected clear.

@Kangz
Copy link
Contributor

Kangz commented Jul 19, 2021

we would inject a clear if needed

Would that clear be at the beginning of the render pass (at which point it becomes something kinda of observable through alphaBlend) or is it a fullscreen quad with writeMask=alpha at the end of the render pass?

@magcius
Copy link

magcius commented Jul 19, 2021

Would storeOp: "present" be an optional way to speed up performance, or would it be required to present? If required, would it be required everywhere, or just if you have compositingMode: "opaque" set on your swap chain config?

@kainino0x
Copy link
Contributor

Would that clear be at the beginning of the render pass (at which point it becomes something kinda of observable through alphaBlend) or is it a fullscreen quad with writeMask=alpha at the end of the render pass?

Fullscreen quad.

Would storeOp: "present" be an optional way to speed up performance, or would it be required to present? If required, would it be required everywhere, or just if you have compositingMode: "opaque" set on your swap chain config?

Optional. Browsers might issue a warning in some cases.

@kainino0x
Copy link
Contributor

I think this can be closed in favor of #1988 which discusses several possible solutions to this problem.

@kainino0x
Copy link
Contributor

(It says "get some implementation experience" - but once we do, we can finish resolving it in that issue.)

@kainino0x kainino0x moved this from Needs Discussion to Specification Done in Main Jan 19, 2022
@kainino0x kainino0x moved this from Specification Done to Needs Discussion in Main Jan 19, 2022
@kainino0x kainino0x moved this from Needs Discussion to No Action in Main Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Main
No Action
Development

No branches or pull requests

6 participants