[Impeller] Why did Impeller opt for "triangulation then draw" over "stencil then cover"? #123671

ColdPaleLight · 2023-03-29T07:18:40Z

Recently, while reading the code of Skia's new GPU backend "Graphite", I noticed that they did not use a technology like "libtess2" to triangulate paths, but instead used "stencil then cover" or its variants. Details can be found in their code (https://github.com/google/skia/blob/5d8b4bc3c7133e38dcae7b76cd708ed4649c13d1/src/gpu/graphite/Device.cpp#L1103-L1121) and in their implementation of the TessellateWedgesRenderStep (https://github.com/google/skia/blob/main/src/gpu/graphite/render/TessellateWedgesRenderStep.cpp).

In my opinion, "stencil then cover" has many benefits, such as not requiring triangulation techniques like libtess2, thus saving CPU time. Additionally, it can use a single render pass for convex paths without triangulation, and it's easier to design methods for parallel drawing of curves using GPUs without pre-computing vertices through CPU or compute shaders.

Of course, "stencil then cover" also has drawbacks, such as the need for 2-pass or 3-pass when drawing a path. However, since there is no switching of the RenderTarget during the process, it doesn't seem to have a significant impact on performance.

I'm curious as to why Impeller didn't choose the technical solution of "stencil then cover"? @chinmaygarde @bdero, would you mind enlightening me? Thank you!

jonahwilliams · 2023-05-09T21:38:49Z

The initial reason for using libtess2 is that it is easy to start with an engine that can render everything that Flutter supports without needing to add special cases. I don't think we're necessarily married to it as an implementation strategy.

Certainly, libtess2 can be very expensive in terms of CPU utilization. I think we have a good path forward for strokes and convex fills, we're in the process of switching those over to simpler routines and leaning more on the GPU to determine geometry.

For concave / complex polygons, in my experience this occur rarely but are very expensive to tessellate. We're certainly open to solutions, provided these solutions are fairly general purpose and don't require a combinatorial explosion of shaders

dnfield · 2023-05-09T22:28:56Z

For the linked code, it seems like it requires some analytic AA solution. If that's not the case we should explore it more. If it is the case we'd have to figure out some efficient and usable way of doing the AA.

bdero · 2023-05-10T14:08:19Z

In terms of concrete implementation, my understanding is that stencil-then-cover means using a derivative of this trick. Impeller defers antialiasing to pass resolve time, and so AA shouldn't be a concern.

As Jonah mentioned, for all convex fills, we're adding specialized paths that don't involve running a generic tessellation algorithm.

Concave paths is where "stencil then cover" shines. In the ideal case, we get to trade off running a poorly scaling tessellation algorithm on the CPU for an extra draw call and a bit of stencil overdraw. Clearly a good deal!

What's not clear to me is how we can use a technique like this to efficiently implement the nonZero winding mode, which is Flutter's default and thus the common case:

evenOdd (not Flutter's default) is trivial to implement with this trick. It would even work for self-intersecting and multi-contour paths. Performing the stencil inversion is key to erasing all of the geometry that lies outside the path. But for the other winding modes (including nonZero, which is the only other winding mode supported by Flutter's API), even if we dedicated multiple bits in the stencil for counting the overlap, a workable solution to cancel out parts of the geometry fan that lie outside the path isn't obvious to me.

One potential way to eke out gains for nonZero concave paths would be to "stencil then cover" only for cases where we can determine that a nonZero path would draw the same if it was evenOdd, and continue falling back to libtess2 for rarer cases. For example, if the path has one contour and doesn't self-intersect (an expensive check), we can be sure it'll behave the same as evenOdd -- but then we're back in the business of computing path properties that scale poorly, diminishing some of the benefit.

dnfield · 2023-05-10T20:13:43Z

AA shouldn't be a concern

The problem is that if the geometry supplied is a quad, MSAA will only see the edges of the quad - the stencil and cover techniques would be in a fragment stage

bdero · 2023-05-10T20:53:18Z

The problem is that if the geometry supplied is a quad, MSAA will only see the edges of the quad - the stencil and cover techniques would be in a fragment stage

The particular variation I'm describing doesn't have this problem.

We first "stencil" by carving out the path on the stencil buffer (with no color attachment) in a way that requires no tessellation, and then "cover" by drawing a quad over the path coverage with a color attachment, culling fragments that lie outside the path just like a regular Impeller clip does. Both the stencil and color are multisample in tile memory.

dnfield · 2023-05-10T22:59:11Z

I think we could do that in compute, and I think there's probably a way to adapt this similarly for nonZero but need to mull this over.

ColdPaleLight · 2023-05-11T03:42:30Z

What's not clear to me is how we can use a technique like this to efficiently implement the nonZero winding mode, which is Flutter's default and thus the common case

1. Prepare

Split the polygon into triangles, just like mentioned in the article

2. Stencil

Before each stencil operation, ensure that the value on the stencil buffer is 0.

We can observe that some of these triangles are counterclockwise while others are clockwise. Suppose we set the counterclockwise to be the front and the clockwise to be the back, then we can set the front stencil pass op to IncreaseWrap, and the back stencil pass op to DecreaseWrap. Then draw all the triangles. In the end, all values not equal to 0 in the stencil buffer are the area of the entire polygon

3. Cover

Draw a rectangle whose size is the size of the RenderTarget, when drawing, we need to set the compare op of the stencil test to NotEqual, and the stencil ref value to 0, so that the polygon can be drawn.

What is the biggest difficulty for Impeller to implement this method?

One reason I think Impeller doesn't work this way right now is Impeller's Clip implementation. Now Impeller's Clip implementation requires a stencil buffer, and all bits of the stencil buffer need to be used. The stencil then cover will also use all the bits of the stencil buffer, so in the current implementation of Impeller, it is difficult to introduce the algorithm of stencil then cover

Does Skia use stencil then cover, how does it solve the conflict with clip implementation?

Skia's gpu acceleration has two implementations, ganesh and graphite.

In Ganesh, if the stenci buffer is enabled and the stencil buffer participates in the clip, only the highest bit in the stencil buffer value will be used to represent the clip, while the rest of the lower bits can be used for the stencil then cover. For the specific algorithm, please see section 3.4 of this article: https://developer.download.nvidia.cn/devzone/devcenter/gamegraphics/files/opengl/gpupathrender.pdf

In Graphite, clip is implemented using depth test and depth buffer, so there is no conflict anymore.

bdero · 2023-05-11T05:31:54Z

We can observe that some of these triangles are counterclockwise while others are clockwise.

Oh derp, this was the insight I was missing while visualizing the triangle fan in my mind. Yes, it's easy to implement nonZero with this method as well, assuming we can dedicate enough stencil bits to it. Impeller doesn't support separate stencil operations for backfaces and frontfaces at the moment, but we can and should add this, as all graphics backends we'll be targeting indeed support this.

Now Impeller's Clip implementation requires a stencil buffer, and all bits of the stencil buffer need to be used. The stencil then cover will also use all the bits of the stencil buffer, so in the current implementation of Impeller, it is difficult to introduce the algorithm of stencil then cover

I more or less agree with this. A 255 clip stack limit is reasonable, but 8 bits isn't going to be enough if we get into the business of segmenting the stencil buffer to play multiple counting roles. And we can't reach for larger 16 bit stencils.

In Graphite, clip is implemented using depth test and depth buffer, so there is no conflict anymore.

Yes, this makes sense. It's quite likely we'll need a depth buffer anyhow for the purpose of draw order optimization -- even if we didn't reverse opaque draw calls, some GPUs don't perform HSR if depth testing is disabled. Clips are hard barriers for reversing the draw order, and so cramming both jobs into the depth buffer would be possible without loss of effectiveness.

I'll append a depth-based design to #114402 that includes clipping in the depth buffer (right now it only has a stencil-based proposal, but again, segmenting the stencil buffer isn't really practical for this).

dnfield · 2023-05-11T16:14:13Z

What if we just used a separate render pass for this?

bdero · 2023-05-11T16:32:06Z

What if we just used a separate render pass for this?

This would let us repurpose the stencil buffer, but we'd have to store an extra texture and add an additional texture sampling draw call on the original pass.

dnfield · 2023-05-11T16:53:43Z

I think utilizing the depth buffer for clipping and repurposing the stencil buffer for filling will be as good or better than whatever we could come up with in compute for tessellating fills.

bdero · 2023-05-11T20:29:29Z

One major concern with adding a depth buffer is that Impeller sometimes needs to segment rendering of a single SaveLayer into multiple passes (this happens when a backdrop filter or advanced blend is present). Today, when we encounter these scenarios, we take a hit of allocating/storing the multisample stencil buffer on the device heap -- the stencil is 8 bit, and so this is the same storage price as the RGBA resolve texture. If we add a depth buffer to the mix, this cost will 3x if the depth buffer is 16 bits and 5x if the depth buffer is 32 bits.

Given our use of MSAA, if we can't find a way around this without extra render passes, I think it's going to be hard to find a short term solution that's not substantially heavier than just tessellating triangle meshes like we're doing now (whether that be on the CPU or on the GPU with a compute shader).

Multi-pass is kinda not the common case but also isn't rare. It might be worth having a compromise solution where we only track a depth buffer and perform stencil-then-cover when we know that multi-pass isn't needed, for example.

dnfield · 2023-05-11T20:32:51Z

perform stencil-then-cover when we know that multi-pass isn't needed, for example.

The sad thing about this is that apps suddenly mysteriously drop off in performance because of a one line code change. We could still do it, but it makes users really frustrated.

chinmaygarde · 2023-09-08T18:52:48Z

Since this is more of a discussion than an issue or proposal. I'm closing this.

jonahwilliams · 2023-09-17T18:29:16Z

I'm reopening this issue because I think that the current tessellation performance is not good enough and there is more to discuss here. For complex animated geometry, we're currently bottlenecked on libtess2. The generic tessellation algorithm is not parallelizable so we cannot use compute. We could attempt to use worker tasks, but given the limited fast threads on Android and the fact that tessellation can easily account for > 50% of CPU time, I think there is little to gain here.

In contrast, triangle generation for STC is trivially parallelizable. We just have the issue of storage of depth buffers with backdrop filters. So dumb question, rather than storing the entire depth buffer, can we instead store the depth affecting clip commands and then replay them with a fresh depth buffer? FYI @bdero @dnfield

bdero · 2023-09-18T00:17:53Z

So dumb question, rather than storing the entire depth buffer, can we instead store the depth affecting clip commands and then replay them with a fresh depth buffer?

I don't think this is such a bad idea to try. So long as we continue not tracking draw order and feel HSR coverage good enough (wonderful on iOS, but I haven't confirmed HSR is actually working on any Android devices), dedicating the depth buffer to clips and the stencil buffer to StC seems feasible. EntityPass already internally tracks a clip coverage stack, and so keeping a running record of the entities needed to restore the clip state in the depth buffer isn't a big deal.
I suspect that In the common case (with perhaps a handful of large/mid sized coverage area clips), having to redraw the clip collection would require order(s) of magnitude less energy on the GPU than having to store and restore the absolute ocean liner that is a 32bit x4 multisample depth buffer.

We should also double-check/document the availability of different depth/stencil formats across Flutter's target devices. Last time I checked, we'll need to support at least S8D32 and S8D24.

jonahwilliams · 2023-11-01T00:30:55Z

We have landed a change that removes stencil storage and replays clips as I described earlier in this issue: #137448

This technically unblocks the foundational work to begin using StC

#47987) Part of flutter/flutter#138460. In preparation for [draw order optimization](flutter/flutter#114402) and [StC](flutter/flutter#123671). Use a transient depth+stencil texture instead of a stencil-only texture. Doing this in isolation to detect/weed out any HAL bugs with handling the attachment.

…il buffer." (#49832) Reverts #47987 Initiated by: jonahwilliams This change reverts the following previous change: Original Description: Part of flutter/flutter#138460. In preparation for [draw order optimization](flutter/flutter#114402) and [StC](flutter/flutter#123671). Use a transient depth+stencil texture instead of a stencil-only texture. Doing this in isolation to detect/weed out any HAL bugs with handling the attachment.

…l buffer. (#49838) Part of flutter/flutter#138460. In preparation for [draw order optimization](flutter/flutter#114402) and [StC](flutter/flutter#123671). Use a transient depth+stencil texture instead of a stencil-only texture. Doing this in isolation to detect/weed out any bugs with handling the attachment.

…dering. (#50856) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432

…dering. (flutter#50856) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432

…dering. (#51209) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempt: - #50856 - reverted with #51191 - fixed with #51198

… path rendering. (#51209)" (#51217) Reverts #51209 Initiated by: bdero Reason for reverting: Golden breakages on [engine roll](flutter/flutter#144647) https://flutter-gold.skia.org/search?issue=144647&crs=github&patchsets=5&corpus=flutter Original PR Author: bdero Reviewed By: {jonahwilliams, chinmaygarde} This change reverts the following previous change: Original Description: Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempt: - #50856 - reverted with #51191 - fixed with #51198

…dering. (flutter#51209) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempts: 1. flutter#50856 - reverted with flutter#51191 - fixed with flutter#51198 2. flutter#51209 - reverted with flutter#51217 - fixed with flutter#51218

…dering. (#51219) Turn the page, wash your hands. Addresses the following issues: * flutter/flutter#143077 * flutter/flutter#137714 * flutter/flutter#138460 * flutter/flutter#123671 * flutter/flutter#141961 * flutter/flutter#134432 Previous attempts: 1. #50856 - reverted with #51191 - fixed with #51198 2. #51209 - reverted with #51217 - fixed with #51218

bdero · 2024-03-12T18:53:09Z

StC was enabled with flutter/engine#51219

github-actions · 2024-03-26T19:00:41Z

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

ColdPaleLight added the e: impeller Impeller rendering backend issues and features requests label Mar 29, 2023

danagbemava-nc added the engine flutter/engine repository. See also e: labels. label Mar 30, 2023

chinmaygarde added the P3 Issues that are less important to the Flutter project label Apr 1, 2023

flutter-triage-bot bot added team-engine Owned by Engine team triaged-engine Triaged by Engine team labels Jul 8, 2023

bdero mentioned this issue Jul 21, 2023

[Impeller] Optimize draw order to minimize overdraw #114402

Open

chinmaygarde closed this as not planned Won't fix, can't repro, duplicate, stale Sep 8, 2023

jonahwilliams reopened this Sep 17, 2023

This was referenced Sep 18, 2023

Regression in 3.13.3: Lagging in iOS 16.6 iPhone high frame rates devices #134309

Closed

[Impeller] Tessellation overhead causes regression on some animations. #134432

Closed

jonahwilliams mentioned this issue Sep 30, 2023

[Impeller] Stencil then cover experiment flutter/engine#46442

Closed

jonahwilliams mentioned this issue Nov 1, 2023

[Impeller] Depth/Stencil changes for improved rendering performance of complex geometry. #137714

Closed

bdero mentioned this issue Nov 13, 2023

[Impeller] Switch from transient stencil-only to depth+stencil buffer. flutter/engine#47987

Merged

bdero mentioned this issue Nov 15, 2023

[Impeller] Perform clipping on the depth buffer. #138460

Closed

auto-submit bot mentioned this issue Jan 17, 2024

Reverts "[Impeller] Switch from transient stencil-only to depth+stencil buffer." flutter/engine#49832

Merged

bdero mentioned this issue Jan 18, 2024

[Impeller] Reland: Switch from transient stencil-only to depth+stencil buffer. flutter/engine#49838

Merged

bdero mentioned this issue Feb 22, 2024

[Impeller] Enable depth buffer clipping & Stencil-then-Cover path rendering. flutter/engine#50856

Merged

zanderso mentioned this issue Feb 29, 2024

[Impeller] Android Stable #144398

Open

5 tasks

bdero mentioned this issue Mar 5, 2024

[Impeller] Enable depth buffer clipping & Stencil-then-Cover path rendering. flutter/engine#51209

Merged

auto-submit bot mentioned this issue Mar 5, 2024

Reverts "[Impeller] Enable depth buffer clipping & Stencil-then-Cover path rendering. (#51209)" flutter/engine#51217

Merged

bdero mentioned this issue Mar 6, 2024

[Impeller] Enable depth buffer clipping & Stencil-then-Cover path rendering. flutter/engine#51219

Merged

bdero self-assigned this Mar 12, 2024

bdero closed this as completed Mar 12, 2024

github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Impeller] Why did Impeller opt for "triangulation then draw" over "stencil then cover"? #123671

[Impeller] Why did Impeller opt for "triangulation then draw" over "stencil then cover"? #123671

ColdPaleLight commented Mar 29, 2023 •

edited by dnfield

jonahwilliams commented May 9, 2023

dnfield commented May 9, 2023

bdero commented May 10, 2023

dnfield commented May 10, 2023

bdero commented May 10, 2023

dnfield commented May 10, 2023

ColdPaleLight commented May 11, 2023

bdero commented May 11, 2023

dnfield commented May 11, 2023

bdero commented May 11, 2023

dnfield commented May 11, 2023

bdero commented May 11, 2023

dnfield commented May 11, 2023

chinmaygarde commented Sep 8, 2023

jonahwilliams commented Sep 17, 2023

bdero commented Sep 18, 2023

jonahwilliams commented Nov 1, 2023

bdero commented Mar 12, 2024

github-actions bot commented Mar 26, 2024

[Impeller] Why did Impeller opt for "triangulation then draw" over "stencil then cover"? #123671

[Impeller] Why did Impeller opt for "triangulation then draw" over "stencil then cover"? #123671

Comments

ColdPaleLight commented Mar 29, 2023 • edited by dnfield

jonahwilliams commented May 9, 2023

dnfield commented May 9, 2023

bdero commented May 10, 2023

dnfield commented May 10, 2023

bdero commented May 10, 2023

dnfield commented May 10, 2023

ColdPaleLight commented May 11, 2023

1. Prepare

2. Stencil

3. Cover

What is the biggest difficulty for Impeller to implement this method?

Does Skia use stencil then cover, how does it solve the conflict with clip implementation?

bdero commented May 11, 2023

dnfield commented May 11, 2023

bdero commented May 11, 2023

dnfield commented May 11, 2023

bdero commented May 11, 2023

dnfield commented May 11, 2023

chinmaygarde commented Sep 8, 2023

jonahwilliams commented Sep 17, 2023

bdero commented Sep 18, 2023

jonahwilliams commented Nov 1, 2023

bdero commented Mar 12, 2024

github-actions bot commented Mar 26, 2024

ColdPaleLight commented Mar 29, 2023 •

edited by dnfield