Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate local clipping rectangle per-primitive run #2200

Merged
merged 2 commits into from Dec 25, 2017

Conversation

@mrobinson
Copy link
Member

mrobinson commented Dec 8, 2017

Instead of using the hierarchical nature of the ClipScrollTree to calculate local clipping rectangles for primitives, do it when we are about to render a particular primitive run. This will allow us to handle custom clip chains while maintaining local clipping rectangle optimizations. It also allows us to greatly simplify a lot of shader code that used to deal with both clipping and scrolling nodes, as the local clipping rectangle is now bundled into the primitives local_clip_rect.

TransformOrOffset is the main struct that handles the efficient transformation of clipping rectangles between compatible coordinate systems. It will avoid doing expensive matrix operations as much as possible, which should make these calculations cheaper -- important now that they are done per-primitive run.


This change is Reviewable

@mrobinson mrobinson requested review from kvark and glennw Dec 8, 2017
@mrobinson
Copy link
Member Author

mrobinson commented Dec 8, 2017

@mrobinson
Copy link
Member Author

mrobinson commented Dec 8, 2017

Looks like there are two unexpected passes and one test which also seems to be crashing in other, unrelated try pushes. It's WebRender-related though, so I'll do some more investigation into that particular panic.

@mrobinson mrobinson force-pushed the mrobinson:intersecting-clip-chains-1 branch from 433cba6 to b210343 Dec 8, 2017
@mrobinson
Copy link
Member Author

mrobinson commented Dec 8, 2017

The Talos panic does not seem to be related to this PR and is being tracked here: https://bugzilla.mozilla.org/show_bug.cgi?id=1423203#c10

Copy link
Member

kvark left a comment

Looks like a great change! I've got a few concerns, plus questions about the description:

Instead of using the hierarchical nature of the ClipScrollTree to calculate local clipping rectangles for primitives, do it when we are about to render a particular primitive run.

Wouldn't this prevent us from culling the primitive runs earlier?

This will allow us to handle custom clip chains while maintaining local clipping rectangle optimizations.

How did the old (hierarchical update) approach prevent the custom clip chains?

node.scroll_offset = offsets.zw;

vec4 misc = TEXEL_FETCH(sClipScrollNodes, uv0, 0, ivec2(6, 0));
vec4 misc = TEXEL_FETCH(sClipScrollNodes, uv0, 0, ivec2(4, 0));
node.is_axis_aligned = misc.x == 0.0;

This comment has been minimized.

@kvark

kvark Dec 8, 2017

Member

looks like reference_frame_relative_scroll_offset and scroll_offset are left uninitialized?

This comment has been minimized.

@mrobinson

mrobinson Dec 15, 2017

Author Member

Removed these.

@@ -122,9 +122,12 @@ pub struct ClipScrollNode {
/// The axis-aligned coordinate system id of this node.
pub coordinate_system_id: CoordinateSystemId,

pub coordinate_system_relative_transform: TransformOrOffset,

This comment has been minimized.

@kvark

kvark Dec 8, 2017

Member

Hmm, this needs a comment.
If this is relative to the current coordinate system ID, then the transform could be axis-alinged rotation/scale? what else?

This comment has been minimized.

@mrobinson

mrobinson Dec 15, 2017

Author Member

I added a comment here.

/// A linear ID / index of this clip-scroll node. Used as a reference to
/// pass to shaders, to allow them to fetch a given clip-scroll node.
pub node_data_index: ClipScrollNodeIndex,

This comment has been minimized.

@kvark

kvark Dec 8, 2017

Member

nit: extra new line?

This comment has been minimized.

@mrobinson

mrobinson Dec 15, 2017

Author Member

Removed.

source_transform: source_transform.unwrap_or(PropertyBinding::Value(identity)),
source_perspective: source_perspective.unwrap_or(identity),
origin_in_parent_reference_frame,
uninvertible: false,

This comment has been minimized.

@kvark

kvark Dec 8, 2017

Member

nit: perhaps make it positive-meaning, like "can_invert" or "is_invertible"

This comment has been minimized.

@mrobinson

mrobinson Dec 15, 2017

Author Member

Changed this one.

/// A transformation with an inverse. If the inverse isn't present, this isn't a 2D
/// transformation, which means we need to fall back to using inverse_rect_footprint.
/// Since this operation is so expensive, we avoid it for the 2D case.
Transform {

This comment has been minimized.

@kvark

kvark Dec 8, 2017

Member

nice type system usage!

TransformOrOffset::Transform { inverse: Some(inverse), .. } =>
inverse.transform_rect(&rect),
TransformOrOffset::Transform { transform, inverse: None } =>
transform.inverse_rect_footprint(rect),

This comment has been minimized.

@kvark

kvark Dec 8, 2017

Member

looks very clean, this one!

@glennw
Copy link
Member

glennw commented Dec 9, 2017

Reviewed 27 of 27 files at r1.
Review status: all files reviewed at latest revision, 9 unresolved discussions.


webrender/res/prim_shared.glsl, line 167 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

looks like reference_frame_relative_scroll_offset and scroll_offset are left uninitialized?

I think those fields are unused now, so can probably be removed from both the CPU and GPU versions of the ClipScrollNodeData.


webrender/src/prim_store.rs, line 1240 at r1 (raw file):

        prim_index: PrimitiveIndex,
        prim_context: &PrimitiveContext,
        local_clip_rect: LayerRect,

Should this be in the PrimitiveContext? Not sure...


webrender/src/prim_store.rs, line 1578 at r1 (raw file):

                        coordinate_system_id: prim_coordinate_system_id,
                    },
                    local_clip_rect: LayerRect::zero(), // This should be unused.

Could you expand on that comment?


webrender/src/util.rs, line 372 at r1 (raw file):

}

/// An enum that ensures tries to avoid expensive transformation matrix calculations

nit: grammar


Comments from Reviewable

Copy link
Member

glennw left a comment

This is a great change! I left a few additional minor comments in reviewable. Apart from that, I just have one other concern.

in the case of property animation on transforms, it would seem we can get in a situation where the local rect in the primitive changes. But since we don't invalidate the GPU cache request, we won't update the cache with the new local clip rect. So, we would need to detect if the local clip rect for a primitive run has changed since the last frame (which should be quite rare), and then invalidate the GPU blocks for the primitives that make up that run. Does that sound right?

@bors-servo
Copy link
Contributor

bors-servo commented Dec 15, 2017

The latest upstream changes (presumably #2221) made this pull request unmergeable. Please resolve the merge conflicts.

@mrobinson mrobinson force-pushed the mrobinson:intersecting-clip-chains-1 branch from b210343 to 632345b Dec 15, 2017
@mrobinson
Copy link
Member Author

mrobinson commented Dec 15, 2017

Okay. Sorry for the delay on the new patch. After talking with Glenn about this, I finally have a new version which adds an entirely new texture for the local clips rects (now called "clip chain rects"). The primitive rectangle is calculated early by intersecting the local clip rect with the appropriate clip chain rect when the primitive is loaded in the shader. This ensures that that they clip chain rectangles are up to date even with property binding updates, but prevents a lot of churn in the GPU cache. The addition of the new texture is unfortunate, but this seemed like the lesser hack of all the things considered.

@mrobinson
Copy link
Member Author

mrobinson commented Dec 15, 2017

@kvark
Copy link
Member

kvark commented Dec 15, 2017

this ensures that that they clip chain rectangles are up to date even with property binding updates, but prevents a lot of churn in the GPU cache.

I haven't reviewed the new code, but this sounds suspicious. We shouldn't be moving away from GPU cache back in the land of free-for-all textures just because some use cases are slow to update. I'd rather have us focused on getting the GPU cache performance solid, since it is simpler and has greater positive impact on the rest of the code. If the clip rect updates are slow, we may try the following things:

  • enable scattered GPU updates (in RendererOptios)
  • change the GPU cache allocation logic so that particular types (e.g. clip rectangles) or sub-types (e.g. animated clip rectangles) are grouped together in rows, so that the updates are efficient
  • play with row size
  • try sub-row updates according to simple heuristics
  • (possibly, other ways)
@glennw
Copy link
Member

glennw commented Dec 15, 2017

The issue it prevents in the GPU cache is that when we invalidate the item for a text run, that involves re-uploading the entire set of glyphs for a text run, which involves a lot of extra work (we found about an extra 0.5ms in compositor CPU time, but there is also the backend time in re-copying those glyphs across).

We could have an extra entry in the GPU cache, but that would involve adding an extra i32 to all the instance structures, to make room for a GPU cache address. With this system, we can use a spare u16 in each instance to index a fixed size texture.

It's not ideal, but I think it's probably the best solution for now. In the future, once we are able to compress the primitive instance structures further, it probably makes sense to remove this and use a normal GPU cache entry instead?

@glennw
glennw approved these changes Dec 18, 2017
@glennw
Copy link
Member

glennw commented Dec 18, 2017

This looks good to me. The try shows two new passes. @kvark Anything you want to check over on this or is it ready to go?

@bors-servo
Copy link
Contributor

bors-servo commented Dec 18, 2017

The latest upstream changes (presumably #2223) made this pull request unmergeable. Please resolve the merge conflicts.

@kvark
Copy link
Member

kvark commented Dec 18, 2017

:lgtm:


Reviewed 4 of 11 files at r2.
Review status: 21 of 28 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


Comments from Reviewable

@glennw
Copy link
Member

glennw commented Dec 18, 2017

@mrobinson This just needs a rebase, then r=glennw,kvark.

@mrobinson mrobinson force-pushed the mrobinson:intersecting-clip-chains-1 branch from 632345b to 9faaada Dec 20, 2017
@mrobinson
Copy link
Member Author

mrobinson commented Dec 20, 2017

Okay. I've rebased this PR, but it seems that the newly added test (reftests/boxshadow/overlap1.yaml) is failing with my PR, so I'll investigate this after Christmas.

@bors-servo
Copy link
Contributor

bors-servo commented Dec 20, 2017

The latest upstream changes (presumably #2242) made this pull request unmergeable. Please resolve the merge conflicts.

mrobinson added 2 commits Dec 4, 2017
Instead of using the hierarchical nature of the ClipScrollTree to
calculate local clipping rectangles for primitives, do it when we are
about to render a particular primitive run. This will allow us to
handle custom clip chains while maintaining local clipping rectangle
optimizations. It also allows us to greatly simplify a lot of shader
code that used to deal with both clipping and scrolling nodes, as the
local clipping rectangle is now bundled into the primitives
local_clip_rect.

TransformOrOffset is the main struct that handles the efficient
transformation of clipping rectangles between compatible coordinate
systems. It will avoid doing expensive matrix operations as much as
possible, which should make these calculations cheaper -- important now
that they are done per-primitive run.
It's unused now.
@mrobinson mrobinson force-pushed the mrobinson:intersecting-clip-chains-1 branch from 9faaada to 7ad7225 Dec 25, 2017
@mrobinson
Copy link
Member Author

mrobinson commented Dec 25, 2017

Looks like the failing test is just a subpixel difference, so I've updated the png reference file and will push the rebased branch.

@mrobinson
Copy link
Member Author

mrobinson commented Dec 25, 2017

@bors-servo r=glennw,kvark.

@bors-servo
Copy link
Contributor

bors-servo commented Dec 25, 2017

📌 Commit 7ad7225 has been approved by glennw,kvark.

bors-servo added a commit that referenced this pull request Dec 25, 2017
…kvark.

Calculate local clipping rectangle per-primitive run

Instead of using the hierarchical nature of the ClipScrollTree to calculate local clipping rectangles for primitives, do it when we are about to render a particular primitive run. This will allow us to handle custom clip chains while maintaining local clipping rectangle optimizations. It also allows us to greatly simplify a lot of shader code that used to deal with both clipping and scrolling nodes, as the local clipping rectangle is now bundled into the primitives local_clip_rect.

TransformOrOffset is the main struct that handles the efficient transformation of clipping rectangles between compatible coordinate systems. It will avoid doing expensive matrix operations as much as possible, which should make these calculations cheaper -- important now that they are done per-primitive run.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2200)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Dec 25, 2017

Testing commit 7ad7225 with merge 3db4c46...

@bors-servo
Copy link
Contributor

bors-servo commented Dec 25, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: glennw,kvark.
Pushing 3db4c46 to master...

@bors-servo bors-servo merged commit 7ad7225 into servo:master Dec 25, 2017
3 of 4 checks passed
3 of 4 checks passed
code-review/reviewable 12 files, 6 discussions left (glennw, kvark, mrobinson)
Details
Taskcluster (pull_request) TaskGroup: success
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details
@mrobinson mrobinson deleted the mrobinson:intersecting-clip-chains-1 branch Dec 25, 2017
@staktrace
Copy link
Contributor

staktrace commented Dec 28, 2017

@mrobinson This PR seems to have introduced or exposed a crash in WR, which manifests as the talos-g1 test failing. There's a backtrace from the log at https://bugzilla.mozilla.org/show_bug.cgi?id=1426116#c7

@mrobinson
Copy link
Member Author

mrobinson commented Jan 2, 2018

@staktrace I've posted a fix for this issue here: #2249

Sorry for the delay, it was due to the holidays.

@staktrace
Copy link
Contributor

staktrace commented Jan 6, 2018

Confirmed that the fix worked, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.