-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Impeller] Why did Impeller opt for "triangulation then draw" over "stencil then cover"? #123671
Comments
The initial reason for using libtess2 is that it is easy to start with an engine that can render everything that Flutter supports without needing to add special cases. I don't think we're necessarily married to it as an implementation strategy. Certainly, libtess2 can be very expensive in terms of CPU utilization. I think we have a good path forward for strokes and convex fills, we're in the process of switching those over to simpler routines and leaning more on the GPU to determine geometry. For concave / complex polygons, in my experience this occur rarely but are very expensive to tessellate. We're certainly open to solutions, provided these solutions are fairly general purpose and don't require a combinatorial explosion of shaders |
For the linked code, it seems like it requires some analytic AA solution. If that's not the case we should explore it more. If it is the case we'd have to figure out some efficient and usable way of doing the AA. |
In terms of concrete implementation, my understanding is that stencil-then-cover means using a derivative of this trick. Impeller defers antialiasing to pass resolve time, and so AA shouldn't be a concern. As Jonah mentioned, for all convex fills, we're adding specialized paths that don't involve running a generic tessellation algorithm. Concave paths is where "stencil then cover" shines. In the ideal case, we get to trade off running a poorly scaling tessellation algorithm on the CPU for an extra draw call and a bit of stencil overdraw. Clearly a good deal! What's not clear to me is how we can use a technique like this to efficiently implement the
One potential way to eke out gains for |
The problem is that if the geometry supplied is a quad, MSAA will only see the edges of the quad - the stencil and cover techniques would be in a fragment stage |
The particular variation I'm describing doesn't have this problem. We first "stencil" by carving out the path on the stencil buffer (with no color attachment) in a way that requires no tessellation, and then "cover" by drawing a quad over the path coverage with a color attachment, culling fragments that lie outside the path just like a regular Impeller clip does. Both the stencil and color are multisample in tile memory. |
I think we could do that in compute, and I think there's probably a way to adapt this similarly for nonZero but need to mull this over. |
1. PrepareSplit the polygon into triangles, just like mentioned in the article 2. StencilBefore each stencil operation, ensure that the value on the stencil buffer is 0. We can observe that some of these triangles are counterclockwise while others are clockwise. Suppose we set the counterclockwise to be the front and the clockwise to be the back, then we can set the front stencil pass op to 3. CoverDraw a rectangle whose size is the size of the RenderTarget, when drawing, we need to set the compare op of the stencil test to NotEqual, and the stencil ref value to 0, so that the polygon can be drawn. What is the biggest difficulty for Impeller to implement this method?One reason I think Impeller doesn't work this way right now is Impeller's Clip implementation. Now Impeller's Clip implementation requires a stencil buffer, and all bits of the stencil buffer need to be used. The stencil then cover will also use all the bits of the stencil buffer, so in the current implementation of Impeller, it is difficult to introduce the algorithm of stencil then cover Does Skia use stencil then cover, how does it solve the conflict with clip implementation?Skia's gpu acceleration has two implementations, ganesh and graphite. In Ganesh, if the stenci buffer is enabled and the stencil buffer participates in the clip, only the highest bit in the stencil buffer value will be used to represent the clip, while the rest of the lower bits can be used for the stencil then cover. For the specific algorithm, please see section 3.4 of this article: https://developer.download.nvidia.cn/devzone/devcenter/gamegraphics/files/opengl/gpupathrender.pdf In Graphite, clip is implemented using depth test and depth buffer, so there is no conflict anymore. |
Oh derp, this was the insight I was missing while visualizing the triangle fan in my mind. Yes, it's easy to implement
I more or less agree with this. A 255 clip stack limit is reasonable, but 8 bits isn't going to be enough if we get into the business of segmenting the stencil buffer to play multiple counting roles. And we can't reach for larger 16 bit stencils.
Yes, this makes sense. It's quite likely we'll need a depth buffer anyhow for the purpose of draw order optimization -- even if we didn't reverse opaque draw calls, some GPUs don't perform HSR if depth testing is disabled. Clips are hard barriers for reversing the draw order, and so cramming both jobs into the depth buffer would be possible without loss of effectiveness. I'll append a depth-based design to #114402 that includes clipping in the depth buffer (right now it only has a stencil-based proposal, but again, segmenting the stencil buffer isn't really practical for this). |
What if we just used a separate render pass for this? |
This would let us repurpose the stencil buffer, but we'd have to store an extra texture and add an additional texture sampling draw call on the original pass. |
I think utilizing the depth buffer for clipping and repurposing the stencil buffer for filling will be as good or better than whatever we could come up with in compute for tessellating fills. |
One major concern with adding a depth buffer is that Impeller sometimes needs to segment rendering of a single SaveLayer into multiple passes (this happens when a backdrop filter or advanced blend is present). Today, when we encounter these scenarios, we take a hit of allocating/storing the multisample stencil buffer on the device heap -- the stencil is 8 bit, and so this is the same storage price as the RGBA resolve texture. If we add a depth buffer to the mix, this cost will 3x if the depth buffer is 16 bits and 5x if the depth buffer is 32 bits. Given our use of MSAA, if we can't find a way around this without extra render passes, I think it's going to be hard to find a short term solution that's not substantially heavier than just tessellating triangle meshes like we're doing now (whether that be on the CPU or on the GPU with a compute shader). Multi-pass is kinda not the common case but also isn't rare. It might be worth having a compromise solution where we only track a depth buffer and perform stencil-then-cover when we know that multi-pass isn't needed, for example. |
The sad thing about this is that apps suddenly mysteriously drop off in performance because of a one line code change. We could still do it, but it makes users really frustrated. |
Since this is more of a discussion than an issue or proposal. I'm closing this. |
I'm reopening this issue because I think that the current tessellation performance is not good enough and there is more to discuss here. For complex animated geometry, we're currently bottlenecked on libtess2. The generic tessellation algorithm is not parallelizable so we cannot use compute. We could attempt to use worker tasks, but given the limited fast threads on Android and the fact that tessellation can easily account for > 50% of CPU time, I think there is little to gain here. In contrast, triangle generation for STC is trivially parallelizable. We just have the issue of storage of depth buffers with backdrop filters. So dumb question, rather than storing the entire depth buffer, can we instead store the depth affecting clip commands and then replay them with a fresh depth buffer? FYI @bdero @dnfield |
I don't think this is such a bad idea to try. So long as we continue not tracking draw order and feel HSR coverage good enough (wonderful on iOS, but I haven't confirmed HSR is actually working on any Android devices), dedicating the depth buffer to clips and the stencil buffer to StC seems feasible. EntityPass already internally tracks a clip coverage stack, and so keeping a running record of the entities needed to restore the clip state in the depth buffer isn't a big deal. We should also double-check/document the availability of different depth/stencil formats across Flutter's target devices. Last time I checked, we'll need to support at least S8D32 and S8D24. |
We have landed a change that removes stencil storage and replays clips as I described earlier in this issue: #137448 This technically unblocks the foundational work to begin using StC |
#47987) Part of flutter/flutter#138460. In preparation for [draw order optimization](flutter/flutter#114402) and [StC](flutter/flutter#123671). Use a transient depth+stencil texture instead of a stencil-only texture. Doing this in isolation to detect/weed out any HAL bugs with handling the attachment.
…il buffer." (#49832) Reverts #47987 Initiated by: jonahwilliams This change reverts the following previous change: Original Description: Part of flutter/flutter#138460. In preparation for [draw order optimization](flutter/flutter#114402) and [StC](flutter/flutter#123671). Use a transient depth+stencil texture instead of a stencil-only texture. Doing this in isolation to detect/weed out any HAL bugs with handling the attachment.
…l buffer. (#49838) Part of flutter/flutter#138460. In preparation for [draw order optimization](flutter/flutter#114402) and [StC](flutter/flutter#123671). Use a transient depth+stencil texture instead of a stencil-only texture. Doing this in isolation to detect/weed out any bugs with handling the attachment.
…dering. (#50856) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432
…dering. (flutter#50856) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432
…dering. (#51209) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempt: - #50856 - reverted with #51191 - fixed with #51198
… path rendering. (#51209)" (#51217) Reverts #51209 Initiated by: bdero Reason for reverting: Golden breakages on [engine roll](flutter/flutter#144647) https://flutter-gold.skia.org/search?issue=144647&crs=github&patchsets=5&corpus=flutter Original PR Author: bdero Reviewed By: {jonahwilliams, chinmaygarde} This change reverts the following previous change: Original Description: Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempt: - #50856 - reverted with #51191 - fixed with #51198
…dering. (flutter#51209) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempts: 1. flutter#50856 - reverted with flutter#51191 - fixed with flutter#51198 2. flutter#51209 - reverted with flutter#51217 - fixed with flutter#51218
…dering. (#51219) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempts: 1. #50856 - reverted with #51191 - fixed with #51198 2. #51209 - reverted with #51217 - fixed with #51218
StC was enabled with flutter/engine#51219 |
This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of |
Recently, while reading the code of Skia's new GPU backend "Graphite", I noticed that they did not use a technology like "libtess2" to triangulate paths, but instead used "stencil then cover" or its variants. Details can be found in their code (https://github.com/google/skia/blob/5d8b4bc3c7133e38dcae7b76cd708ed4649c13d1/src/gpu/graphite/Device.cpp#L1103-L1121) and in their implementation of the TessellateWedgesRenderStep (https://github.com/google/skia/blob/main/src/gpu/graphite/render/TessellateWedgesRenderStep.cpp).
In my opinion, "stencil then cover" has many benefits, such as not requiring triangulation techniques like libtess2, thus saving CPU time. Additionally, it can use a single render pass for convex paths without triangulation, and it's easier to design methods for parallel drawing of curves using GPUs without pre-computing vertices through CPU or compute shaders.
Of course, "stencil then cover" also has drawbacks, such as the need for 2-pass or 3-pass when drawing a path. However, since there is no switching of the RenderTarget during the process, it doesn't seem to have a significant impact on performance.
I'm curious as to why Impeller didn't choose the technical solution of "stencil then cover"? @chinmaygarde @bdero, would you mind enlightening me? Thank you!
The text was updated successfully, but these errors were encountered: