Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert rectangle from prim to brush shader and support segments. #2146

Merged
merged 1 commit into from Dec 4, 2017

Conversation

@glennw
Copy link
Member

glennw commented Dec 1, 2017

Add support for brushes to be drawn as a list of segments that
are part of a larger primitive. Initially, these segments are
nine-patches only, although they may be expanded in the future
to work with generic segment rects.

This allows large brush primitives to be split into a number
of segments that can be split among the opaque and alpha passes,
which means that large rectangles are often rendered with the
inner rectangle as part of the opaque pass. Additionally, edge
segments are rendered in the opaque pass when the primitive does
not have a rotation or perspective transform applied.

When a rectangle is split, the clip tasks are generated only
where required for individual segments. This can significantly
reduce the amount of allocated area for clip masks, and also
the amount of GPU time spent drawing clip masks.

Previously, rectangle splitting could only work on LocalClip
items specified as part of the item, which meant that rectangles
in Gecko would never be split. Now, we calculate splitting
during prepare_prim_for_render, consulting the current clip chain
for the primitive. This means that we get the benefits of rectangle
splitting in Gecko, which doesn't use LocalClip API.

Having rectangle splitting handled as segments also means that we
don't add extra top-level primitives that take extra time to do
visibility culling on, saving some CPU time.

Future work:

  • Support more cases of primitive splitting.
  • Port images and gradients to be brushes so they automatically get splitting.

This change is Reviewable

@glennw
Copy link
Member Author

glennw commented Dec 1, 2017

For an example of what this changes, consider the rounded-rects reftest in wrench:

rr1

Previously, the clip mask render target looked like this, and each of the primitives was drawn completely in the alpha pass:

rr4

Now, there is a clip mask rendered for each clip region of interest, so we render a lot less clip mask pixels:

rr2

And, when we draw the main scene, the green segments are drawn in the opaque pass, while the red segments are drawn in the alpha pass:

rr3

@glennw
Copy link
Member Author

glennw commented Dec 1, 2017

r? @kvark and/or @nical and/or @mrobinson

(I still need to do a gecko try for this and fix any failures, but this is ready for review). Anecdotally, this makes clip mask generation time negligible on most real sites I tested on (exception where there are large inset box shadows, which still need work).

@kvark
Copy link
Member

kvark commented Dec 1, 2017

It's been a long way... the ideas we discussed in Hawaii took quite a bit to finally land. Amazing work!
I'm happy with the changes, anticipating to see how Gecko is running blazing fast :)
Left my feedback for your consideration.


Reviewed 18 of 18 files at r1, 1 of 1 files at r2, 3 of 7 files at r3.
Review status: 18 of 22 files reviewed at latest revision, 19 unresolved discussions.


webrender/res/brush.glsl, line 40 at r1 (raw file):

    int clip_address;
    int z;
    int segment_kind;

the flags there were one per edge, given the need to apply AA on some of the edges and not the others. Is this not the case any more?


webrender/res/brush.glsl, line 93 at r1 (raw file):

    vec2 p0 = brush_prim.local_rect.p0;
    vec2 p1 = brush_prim.local_rect.p0 + brush_prim.offsets.xy;

need some comments on what p0-3 are exactly


webrender/res/brush.glsl, line 102 at r1 (raw file):

            break;

        case SEGMENT_TOP_LEFT:

I wonder if it would make sense to separate X and Y here, i.e. each can be computed independently and needs 2 bits (0 - whole primitive, 1 - left/top, 2 - right/bottom, 3 - center). They could even share the same helper routine with a switch statement.


webrender/res/brush.glsl, line 139 at r1 (raw file):

    switch (brush_prim.aa_kind) {
        case AA_KIND_SEGMENT:
            break;

edge_aa_segment_mask seems to be uninitialized by this point?


webrender/src/border.rs, line 412 at r1 (raw file):

                    &inner_rect,
                    Some(&[
                        BrushSegmentKind::MidLeft,

why are we only including the middle here?


webrender/src/box_shadow.rs, line 304 at r1 (raw file):

                        },
                        None,
                        BrushAntiAliasMode::Default,

I think Primitive/Segment distinction would be cleaner for this enum than Default/Segment


webrender/src/prim_store.rs, line 256 at r1 (raw file):

    pub bottom_right_offset: LayerVector2D,
    pub segments: [BrushSegment; 9],
    pub enabled_segments: u16,

I see that you are concerned with this structure size for the fact of wrapping Option<Box<_>> to store it and pass it around. I think it's much bigger than it needs to be. Technically we only need a bit mask (enabled_segments) and 4 points to reconstruct all the needed segments wherever they are needed. This would be so small it wouldn't require a box.


webrender/src/prim_store.rs, line 263 at r1 (raw file):

        outer_rect: &LayerRect,
        inner_rect: &LayerRect,
        valid_segments: Option<&[BrushSegmentKind]>,

How about having a bitmask of the brush segment kinds, and accept the bitmask here directly instead of the slice?


webrender/src/prim_store.rs, line 274 at r1 (raw file):

                let mut enabled_segments = 0;

                for segment in valid_segments {

this is begging for fold() :)


webrender/src/prim_store.rs, line 1487 at r1 (raw file):

                let brush = &mut self.cpu_brushes[metadata.cpu_prim_index.0];

                if brush.segments.is_none() && metadata.local_rect.size.area() > 256.0 * 256.0 {

let's have the constant lower and named somewhere


webrender/src/prim_store.rs, line 1493 at r1 (raw file):

                            let local_clips = clip_store.get_opt(&clip.clip_sources).expect("bug");
                            for &(ref clip, _) in &local_clips.clips {
                                if let &ClipSource::RoundedRectangle(ref rect, ref radii, mode) = clip {

should put the ClipMode::Clip right in here


webrender/src/prim_store.rs, line 1501 at r1 (raw file):

                                        );
                                        if brush.segments.is_some() {
                                            break;

what happens with others in local_clips.clips? do we just ignore them here? I imagine we should only be splitting with a single clip


webrender/src/prim_store.rs, line 1511 at r1 (raw file):

                match brush.segments {
                    Some(ref mut segments) => {

let's rename it to segment_desc or something... too many segments :)


webrender/src/prim_store.rs, line 1513 at r1 (raw file):

                    Some(ref mut segments) => {
                        for (i, segment) in segments.segments.iter_mut().enumerate() {
                            segment.clip_task_id = if i <= BrushSegmentKind::BottomLeft as usize {

why is the comparison made with BottomLeft?


webrender/src/prim_store.rs, line 1559 at r1 (raw file):

            }
            _ => {
                let clip_task = RenderTask::new_mask(

looks to be the same code as the arm above. Can we merge?


webrender/src/prim_store.rs, line 1904 at r1 (raw file):

    radii: &BorderRadius
) -> Option<Box<BrushSegmentDescriptor>> {
    let inner = match extract_inner_rect_safe(local_clip_rect, radii) {

another thing we could possibly do: if we know the primitive is axis-aligned, we should allow T-junctions and thus generate 3 opaque parts instead of 5


webrender/src/prim_store.rs, line 1905 at r1 (raw file):

) -> Option<Box<BrushSegmentDescriptor>> {
    let inner = match extract_inner_rect_safe(local_clip_rect, radii) {
        Some(rect) => rect,

nit: just map this result instead of the early return


webrender/src/tiling.rs, line 429 at r1 (raw file):

                            let needs_blending = !prim_metadata.opacity.is_opaque ||
                                                 segment.clip_task_id.is_some() ||
                                                 (!is_inner && transform_kind == TransformedRectKind::Complex);

why do we need blending for non-centric opaque segments?


webrender/src/tiling.rs, line 2157 at r1 (raw file):

        }
    }
}

nit: newline


Comments from Reviewable

@glennw
Copy link
Member Author

glennw commented Dec 1, 2017

Review status: 14 of 22 files reviewed at latest revision, 19 unresolved discussions.


webrender/res/brush.glsl, line 40 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

the flags there were one per edge, given the need to apply AA on some of the edges and not the others. Is this not the case any more?

We can derive the correct flags from the segment kind.


webrender/res/brush.glsl, line 93 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

need some comments on what p0-3 are exactly

Done.


webrender/res/brush.glsl, line 102 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I wonder if it would make sense to separate X and Y here, i.e. each can be computed independently and needs 2 bits (0 - whole primitive, 1 - left/top, 2 - right/bottom, 3 - center). They could even share the same helper routine with a switch statement.

Yep, that would probably be tidier - OK if we do that as a follow up?


webrender/res/brush.glsl, line 139 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

edge_aa_segment_mask seems to be uninitialized by this point?

Done.


webrender/src/border.rs, line 412 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

why are we only including the middle here?

This matches the previous code - the parts that add the top and bottom rectangles handle adding the corners, so we only need to do that in one place.


webrender/src/box_shadow.rs, line 304 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I think Primitive/Segment distinction would be cleaner for this enum than Default/Segment

Done.


webrender/src/prim_store.rs, line 256 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I see that you are concerned with this structure size for the fact of wrapping Option<Box<_>> to store it and pass it around. I think it's much bigger than it needs to be. Technically we only need a bit mask (enabled_segments) and 4 points to reconstruct all the needed segments wherever they are needed. This would be so small it wouldn't require a box.

I did originally implement it this way. I changed to a local_rect representation for a couple of reasons. First, we need to store an individual clip task ID for each segment, so we need an array anyway (although in the future we could perhaps support allocating a range of render task IDs with a guaranteed contiguous range). Second, we need to transform these local rects on every scroll frame (to get the new device rect for the clip task) and so I thought it might be worth calculating the local rects once and storing them. We could definitely profile the difference though - there's pros and cons of each method, I think.


webrender/src/prim_store.rs, line 263 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

How about having a bitmask of the brush segment kinds, and accept the bitmask here directly instead of the slice?

I was using bitmasks originally for that - probably a good idea to switch back to them. Are you ok with doing that another time?


webrender/src/prim_store.rs, line 274 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

this is begging for fold() :)

Done.


webrender/src/prim_store.rs, line 1487 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

let's have the constant lower and named somewhere

Changed it to 128x128 and made it a constant.


webrender/src/prim_store.rs, line 1493 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

should put the ClipMode::Clip right in here

Done.


webrender/src/prim_store.rs, line 1501 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

what happens with others in local_clips.clips? do we just ignore them here? I imagine we should only be splitting with a single clip

Yes, you're right - I had originally thought we'd handle this by detecting the case and generating a clip mask for each segment in that case. But for now, I've just done what you suggested and limited it to only one rounded rect clip. We can optimize this in the future by handling more cases.


webrender/src/prim_store.rs, line 1511 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

let's rename it to segment_desc or something... too many segments :)

Done.


webrender/src/prim_store.rs, line 1513 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

why is the comparison made with BottomLeft?

Due to the ordering of the enum, and selecting clip masks only for the corners. I added a comment in the code to make this clearer.


webrender/src/prim_store.rs, line 1559 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

looks to be the same code as the arm above. Can we merge?

Done.


webrender/src/prim_store.rs, line 1904 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

another thing we could possibly do: if we know the primitive is axis-aligned, we should allow T-junctions and thus generate 3 opaque parts instead of 5

True - although then we will need to rebuild this if the transform changes kind due to property animation. Might be worth considering in the future though.


webrender/src/prim_store.rs, line 1905 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

nit: just map this result instead of the early return

Done.


webrender/src/tiling.rs, line 429 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

why do we need blending for non-centric opaque segments?

For edge segments that have a transform, we need blending enabled (when not axis-aligned) in order to allow edge AA blending.


webrender/src/tiling.rs, line 2157 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

nit: newline

Done.


Comments from Reviewable

@glennw
Copy link
Member Author

glennw commented Dec 1, 2017

@kvark OK, all review comments addressed, I think. Gecko try had a few failures, so investigating those now.

@glennw
Copy link
Member Author

glennw commented Dec 1, 2017

Hmm, a lot of failures in the last try run, but they all look like they are probably caused by the same bug. I'll try to set up a repro in wrench and then do another try run after that.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=14f0c4323c9d3b0845dc7db305c9b8b2473c1834

@kvark
Copy link
Member

kvark commented Dec 1, 2017

:lgtm: (minus the failures)


Reviewed 2 of 7 files at r3, 6 of 6 files at r4.
Review status: all files reviewed at latest revision, 6 unresolved discussions.


webrender/res/brush.glsl, line 102 at r1 (raw file):

Previously, glennw (Glenn Watson) wrote…

Yep, that would probably be tidier - OK if we do that as a follow up?

Yes, I thought I'd just share the idea while reviewing. Perfect for a follow-up :)
Would be nice to add a TODO/TRY comment near the code to not forget.


webrender/res/brush.glsl, line 139 at r1 (raw file):

Previously, glennw (Glenn Watson) wrote…

Done.

Ah, so this is going to be a regression then? Until we implement this TODO


webrender/src/prim_store.rs, line 256 at r1 (raw file):

Previously, glennw (Glenn Watson) wrote…

I did originally implement it this way. I changed to a local_rect representation for a couple of reasons. First, we need to store an individual clip task ID for each segment, so we need an array anyway (although in the future we could perhaps support allocating a range of render task IDs with a guaranteed contiguous range). Second, we need to transform these local rects on every scroll frame (to get the new device rect for the clip task) and so I thought it might be worth calculating the local rects once and storing them. We could definitely profile the difference though - there's pros and cons of each method, I think.

Ok, I see. Thanks for the info! We can revisit this later.


webrender/src/prim_store.rs, line 263 at r1 (raw file):

Previously, glennw (Glenn Watson) wrote…

I was using bitmasks originally for that - probably a good idea to switch back to them. Are you ok with doing that another time?

sure


webrender/src/prim_store.rs, line 274 at r1 (raw file):

Previously, glennw (Glenn Watson) wrote…

Done.

beautiful, thanks!


webrender/src/prim_store.rs, line 1904 at r1 (raw file):

Previously, glennw (Glenn Watson) wrote…

True - although then we will need to rebuild this if the transform changes kind due to property animation. Might be worth considering in the future though.

Yes, as with other cases we discussed before - if there is a transform property we should just treat it as transformed all the time.


Comments from Reviewable

if metadata.prim_kind == PrimitiveKind::Brush {
let brush = &mut self.cpu_brushes[metadata.cpu_prim_index.0];

if brush.segment_desc.is_none() && metadata.local_rect.size.area() > MIN_BRUSH_SPLIT_AREA {

This comment has been minimized.

@mrobinson

mrobinson Dec 1, 2017

Member

I think it might make this a bit more readable to create a helper here that's called something like "can_create_nine_patch" and then another one to find the selected clip.

@mrobinson
Copy link
Member

mrobinson commented Dec 1, 2017

This is really cool.

@glennw
Copy link
Member Author

glennw commented Dec 1, 2017

OK, rebased and fixed one bug in the Gecko reftests. Doing a gecko try run shortly to see if that solves all the failures or if there's another bug.

@kvark
Copy link
Member

kvark commented Dec 1, 2017

r=me when grass is green!

@glennw
Copy link
Member Author

glennw commented Dec 2, 2017

There is one test failure remaining.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=57423ca51e3c54d8baba954ff5832d48f5b05c26

I'll look into this on Monday and post an updated patch for that.

@glennw
Copy link
Member Author

glennw commented Dec 2, 2017

Latest gecko try is all green:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c18afcae2b73f63391130e4998b9aee3fe1c6d82

The last commit fixes a bug when the selected clip and the primitive are in a different (but still compatible) coordinate space. In this case (which the scroll node indices are different), calculate and apply a relative offset between the spaces. This is safe since we know that the transforms are in a compatible space, so no rotation / skew / perspective is involved.

r? on the last commit @kvark or @mrobinson

Once I get r= I'll squash these commits before merge.

// node and the primitive, we need to get the clip rect in the
// local space of the primitive, in order to generate correct
// local segments. Since we know that the coordinate systems
// are compatable from above, there's no need to do a full

This comment has been minimized.

@pcwalton

pcwalton Dec 2, 2017

Collaborator

typo: "compatible"

@glennw glennw force-pushed the glennw:b10 branch from 57a8ccd to b57d1f2 Dec 3, 2017
@@ -520,7 +520,8 @@ impl ClipScrollNode {
state.nearest_scrolling_ancestor_viewport
.translate(&info.origin_in_parent_reference_frame);

if !info.resolved_transform.preserves_2d_axis_alignment() {
if !info.resolved_transform.preserves_2d_axis_alignment() ||
info.resolved_transform.has_perspective_component() {

This comment has been minimized.

@mrobinson

mrobinson Dec 3, 2017

Member

Hrm...does this work well with 90 degree rotations?

This comment has been minimized.

@glennw

glennw Dec 3, 2017

Author Member

Ah, good point! I updated the patch to do a full relative transform calculation and transform the rect. That code path almost never gets hit in reality, so the performance difference should be negligible.

@glennw glennw force-pushed the glennw:b10 branch from b57d1f2 to e64ff36 Dec 3, 2017
@kvark
Copy link
Member

kvark commented Dec 4, 2017

:lgtm:


Reviewed 2 of 4 files at r5, 6 of 7 files at r7, 1 of 1 files at r9.
Review status: all files reviewed at latest revision, 9 unresolved discussions.


wrench/reftests/transforms/segments-bug-ref.yaml, line 1 at r9 (raw file):

---

would be nice to have a more descriptive name of the reftest an/or some comments on what it's testing


Comments from Reviewable

@glennw glennw force-pushed the glennw:b10 branch from e64ff36 to 8e7ad26 Dec 4, 2017
@glennw
Copy link
Member Author

glennw commented Dec 4, 2017

@kvark OK, updated the reftest with some comments describing the test. r=you one squashed?

Add support for brushes to be drawn as a list of segments that
are part of a larger primitive. Initially, these segments are
nine-patches only, although they may be expanded in the future
to work with generic segment rects.

This allows large brush primitives to be split into a number
of segments that can be split among the opaque and alpha passes,
which means that large rectangles are often rendered with the
inner rectangle as part of the opaque pass. Additionally, edge
segments are rendered in the opaque pass when the primitive does
not have a rotation or perspective transform applied.

When a rectangle is split, the clip tasks are generated only
where required for individual segments. This can significantly
reduce the amount of allocated area for clip masks, and also
the amount of GPU time spent drawing clip masks.

Previously, rectangle splitting could only work on LocalClip
items specified as part of the item, which meant that rectangles
in Gecko would never be split. Now, we calculate splitting
during prepare_prim_for_render, consulting the current clip chain
for the primitive. This means that we get the benefits of rectangle
splitting in Gecko, which doesn't use LocalClip API.

Having rectangle splitting handled as segments also means that we
don't add extra top-level primitives that take extra time to do
visibility culling on, saving some CPU time.

Future work:
 - Support more cases of primitive splitting.
 - Port images and gradients to be brushes so they automatically get splitting.
@glennw glennw force-pushed the glennw:b10 branch from 8e7ad26 to b562ddf Dec 4, 2017
@glennw
Copy link
Member Author

glennw commented Dec 4, 2017

(squashed into one commit - I think this is ready to go now).

@kvark
Copy link
Member

kvark commented Dec 4, 2017

@glennw thank you!
@bors-servo r+

@bors-servo
Copy link
Contributor

bors-servo commented Dec 4, 2017

📌 Commit b562ddf has been approved by kvark

@bors-servo
Copy link
Contributor

bors-servo commented Dec 4, 2017

Testing commit b562ddf with merge d1feb4f...

bors-servo added a commit that referenced this pull request Dec 4, 2017
Convert rectangle from prim to brush shader and support segments.

Add support for brushes to be drawn as a list of segments that
are part of a larger primitive. Initially, these segments are
nine-patches only, although they may be expanded in the future
to work with generic segment rects.

This allows large brush primitives to be split into a number
of segments that can be split among the opaque and alpha passes,
which means that large rectangles are often rendered with the
inner rectangle as part of the opaque pass. Additionally, edge
segments are rendered in the opaque pass when the primitive does
not have a rotation or perspective transform applied.

When a rectangle is split, the clip tasks are generated only
where required for individual segments. This can significantly
reduce the amount of allocated area for clip masks, and also
the amount of GPU time spent drawing clip masks.

Previously, rectangle splitting could only work on LocalClip
items specified as part of the item, which meant that rectangles
in Gecko would never be split. Now, we calculate splitting
during prepare_prim_for_render, consulting the current clip chain
for the primitive. This means that we get the benefits of rectangle
splitting in Gecko, which doesn't use LocalClip API.

Having rectangle splitting handled as segments also means that we
don't add extra top-level primitives that take extra time to do
visibility culling on, saving some CPU time.

Future work:
 - Support more cases of primitive splitting.
 - Port images and gradients to be brushes so they automatically get splitting.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2146)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Dec 4, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: kvark
Pushing d1feb4f to master...

@bors-servo bors-servo merged commit b562ddf into servo:master Dec 4, 2017
4 of 5 checks passed
4 of 5 checks passed
code-review/reviewable 1 file, 9 discussions left (glennw, kvark, mrobinson, pcwalton)
Details
Taskcluster (pull_request) TaskGroup: success
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@glennw
Copy link
Member Author

glennw commented Dec 5, 2017

@jrmuizel @staktrace It'd be great to get this into a Gecko update this week, if possible. There's a heap of possible optimizations to come based on this, but this is a start that provides quite a significant win on most real sites already.

@glennw glennw deleted the glennw:b10 branch Dec 5, 2017
@jrmuizel
Copy link
Contributor

jrmuizel commented Dec 5, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.