New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make picture caching more performant, and fix some bugs. #3446

Merged
merged 2 commits into from Dec 28, 2018

Conversation

Projects
None yet
4 participants
@gw3583
Copy link
Collaborator

gw3583 commented Dec 24, 2018

One the remaining issues with enabling picture caching is that
it was a noticeable performance regression on sites that don't
benefit from picture caching, due to the extra render targets
and blits. Instead, we now draw to the main framebuffer and
blit those results into tiles for caching, as appropriate.

There is still a bit of work to be done here (specifically,
detecting when a tile isn't opaque, and disabling the cache blit
if we decide the tile is changing too often), but this is in
general a much faster solution than before. It's also faster
on pages that do benefit from picture caching, removing
one extra blit step during cache creation.

This also changes how we handle the way gecko supplies different
display lists with different local coordinate systems. Now, we
cache based on the world rect of the tile, and verify the content
with the new ComparableVec type. This is much more robust and
fixes a number of existing bugzilla bugs related to picture caching.

This patch also removes much of the hashing required, which is a
performance win on the sites I tested (even with the vec comparison
that we now use instead). We do currently update prim deps each
scroll frame, which can be improved / optimized in the future.

Finally, this implementation only allocates and stores tiles for
the current viewport, which is much better for performance and
scaling of content where the picture rect is very large.


This change is Reviewable

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Dec 24, 2018

This is not quite ready for merge yet - I need to tidy it up a bit and add a couple small remaining bits to the patch (mostly mentioned in TODO comments). But it makes picture caching much faster than it was, and fixes a number of visual glitches that are bugzilla bugs. It seems like a much more solid approach, even though the current implementation is quite messy.

@gw3583 gw3583 force-pushed the gw3583:dc19 branch from ab321b6 to 88622ce Dec 24, 2018

@kvark

This comment has been minimized.

Copy link
Member

kvark commented Dec 24, 2018

Could you clarify this part please?

Now, we cache based on the world rect of the tile, and verify the content
with the new ComparableVec type.

@kvark
Copy link
Member

kvark left a comment

Left out my questions/concerns ^ and below:

Reviewed 10 of 10 files at r1.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @gw3583)


webrender/src/batch.rs, line 994 at r1 (raw file):

                                // main picture primitive list, and draw them first.
                                if let Some(_) = tile_cache.dirty_region {
                                    self.add_pic_to_batch(

Why aren't we filtering the primitives in this list over the dirty_region?


webrender/src/batch.rs, line 1019 at r1 (raw file):

                                for i in &tile_cache.tiles_to_draw {
                                    let tile = &tile_cache.tiles[*i as usize];

Why aren't we checking for is_visible and in_use any more? If those are assumed to be true, let's assert on that.


webrender/src/clip.rs, line 1387 at r1 (raw file):

        for clip_chain_id in &self.clips {
            let node = clip_store.get_clip_chain(*clip_chain_id);

this would read better if there wasn't node and clip_node, the former needs to be more specific


webrender/src/clip.rs, line 1397 at r1 (raw file):

                if let Some(world_rect) = map_local_to_world.map(&local_rect) {
                    clip_rect = match clip_rect.intersection(&world_rect) {

nit: could just use ? at the end


webrender/src/frame_builder.rs, line 297 at r1 (raw file):

        );

        // If we had any retained tiles from the last scene that were not picked

what makes it no longer necessary to do this?


webrender/src/picture.rs, line 162 at r1 (raw file):

    /// to uniquely describe the content of the primitive template, while
    /// the other parameters describe the clip chain and instance params.
    prims: ComparableVec<PrimitiveDescriptor>,

nice privatization 👍


webrender/src/picture.rs, line 221 at r1 (raw file):

    map_local_to_world: SpaceMapper<LayoutPixel, WorldPixel>,
    /// A list of tiles to draw during batching.
    pub tiles_to_draw: Vec<usize>,

is this a tile index?


webrender/src/picture.rs, line 257 at r1 (raw file):

            tiles: Vec::new(),
            map_local_to_world: SpaceMapper::new(
                ROOT_SPATIAL_NODE_INDEX,

why aren't we mapping to spatial_node_index here? are all the tiles now rasterized in world space?


webrender/src/picture.rs, line 310 at r1 (raw file):

        ).expect("bug: unable to get scroll transform");
        let scroll_offset = WorldVector2D::new(
            scroll_transform.m41,

hmm, but this isn't necessarily a "scroll" transform, right? I can be just origin offset, or regular translation transform, etc


webrender/src/picture.rs, line 488 at r1 (raw file):

                        py / frame_context.device_pixel_scale.0,
                    ),
                    WorldSize::new(

we already have this computed in self.world_tile_size


webrender/src/picture.rs, line 534 at r1 (raw file):

            }

            if self.needs_update {

this is never false at this point. Is it intentional?


webrender/src/picture.rs, line 691 at r1 (raw file):

    }

    /// Update the dependencies for each tile for a given primitive instance.

let's have it updated instead of removed ;)


webrender/src/picture.rs, line 2157 at r1 (raw file):

            let surface_rect = state.current_surface().rect;

            // Sometimes, Gecko supplies a huge picture rect. This is typically

how are we handling this case now?


webrender/src/renderer.rs, line 3513 at r1 (raw file):

                );

                // Modify the src/dest rects since we are blitting from the framebuffer

we are blitting from the famebuffer, so I'd assume only the source rect needs to be Y-flipped. Why are we changing both source and destination rects here?


webrender/src/util.rs, line 896 at r1 (raw file):

pub struct ComparableVec<T> {
    /// The items to be stored and compared
    pub items: Vec<T>,

it doesn't seem safe here to expose items publicly. I'd expect something like

pub fn items(&self) -> &[T] {
  &self.items[.. self.current_index]
}

Alternatively, if items isn't valid when is_same == false, we should do:

pub fn items(&self) -> Option<&[T]> {
    if self.is_same && self.prev_len == self.items.len() {
        Some(&self.items)
    } else {
        None
    }
}

webrender/src/util.rs, line 900 at r1 (raw file):

    current_index: usize,
    /// The previous length of the array
    prev_len: usize,

it looks like the prev_len and is_same are tied together, i.e the former stops making any sense as soon as is_same == false.
Perhaps, this could be refactored into Option<usize>?


webrender/src/util.rs, line 966 at r1 (raw file):

        // Increment where the next item will be pushed.
        self.current_index += 1;

we can even go further and make current_index irrelevant for when is_same==false by just shrinking the vector to size, since current_index isn't much useful when is_same == false.

@gw3583
Copy link
Collaborator

gw3583 left a comment

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @kvark)


webrender/src/batch.rs, line 994 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

Why aren't we filtering the primitives in this list over the dirty_region?

We do, via the dirty world rect being set small enough to enclose the primitives we want to draw. In the future, we can probably make this more efficient by building an index buffer of visible primitives instead and just walking those during batching, rather than walking each primitive and checking if its visible bounding rect was set.


webrender/src/batch.rs, line 1019 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

Why aren't we checking for is_visible and in_use any more? If those are assumed to be true, let's assert on that.

These fields are no longer required, as we only create tiles around the visible viewport, rather than for the entire picture.


webrender/src/clip.rs, line 1387 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

this would read better if there wasn't node and clip_node, the former needs to be more specific

Fixed


webrender/src/clip.rs, line 1397 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

nit: could just use ? at the end

Fixed


webrender/src/frame_builder.rs, line 297 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

what makes it no longer necessary to do this?

Because we only keep tiles for the visible viewport, and the eviction policy is set to Eager.


webrender/src/picture.rs, line 162 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

nice privatization 👍

👍


webrender/src/picture.rs, line 221 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

is this a tile index?

Switched it to a new type - TileIndex


webrender/src/picture.rs, line 257 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

why aren't we mapping to spatial_node_index here? are all the tiles now rasterized in world space?

Currently, yes. Although - this is more about getting a stable prim origin in the descriptor (they are stored as world position - tile world position, to get a stable position regardless of the world location of the tile).


webrender/src/picture.rs, line 310 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

hmm, but this isn't necessarily a "scroll" transform, right? I can be just origin offset, or regular translation transform, etc

It only supports scroll transforms - the tile cache is created by selecting a spatial node that refers to a ScrollFrame. I could perhaps add an assertion check in the tile cache code that the caller has selected a scroll frame.


webrender/src/picture.rs, line 488 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

we already have this computed in self.world_tile_size

Indeed, fixed.


webrender/src/picture.rs, line 534 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

this is never false at this point. Is it intentional?

Yea, it's an optimization I'm leaving to do as a follow up. This comment elsewhere in the code discusses:

        // TODO(gw): We don't actually need to update the prim dependencies each frame.
        //           For common cases, such as only being one main scroll root, we could
        //           detect this and skip the dependency update on scroll frames.

webrender/src/picture.rs, line 691 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

let's have it updated instead of removed ;)

Reviewable diff is a bit weird here - did you mean the fn doc comment? I restored that if that's what you mean here :)


webrender/src/picture.rs, line 2157 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

how are we handling this case now?

We only create tiles around the visible viewport, so we can implicitly handle this case correctly now :)


webrender/src/renderer.rs, line 3513 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

we are blitting from the famebuffer, so I'd assume only the source rect needs to be Y-flipped. Why are we changing both source and destination rects here?

I think we need to adjust where we are copying from in the source, and flip upside-down in the dest. Actually I just copy-pasted code from elsewhere in renderer.rs that does this same logic and it seemed to work :)


webrender/src/util.rs, line 896 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

it doesn't seem safe here to expose items publicly. I'd expect something like

pub fn items(&self) -> &[T] {
  &self.items[.. self.current_index]
}

Alternatively, if items isn't valid when is_same == false, we should do:

pub fn items(&self) -> Option<&[T]> {
    if self.is_same && self.prev_len == self.items.len() {
        Some(&self.items)
    } else {
        None
    }
}

Good point, done.


webrender/src/util.rs, line 900 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

it looks like the prev_len and is_same are tied together, i.e the former stops making any sense as soon as is_same == false.
Perhaps, this could be refactored into Option<usize>?

I think this is still needed for the following case. In frame 1, you push items A, B, C. Then, in frame 2, you push items A, B. In this case, is_same will be true, since all items pushed thus far are equal. However, if you call is_valid at this point, you want it to return false, since the overall lists are not equal. Or perhaps I'm misunderstanding your suggestion - would it handle this case?


webrender/src/util.rs, line 966 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

we can even go further and make current_index irrelevant for when is_same==false by just shrinking the vector to size, since current_index isn't much useful when is_same == false.

See above comment.

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Dec 25, 2018

@kvark Thanks for the review. I think that addresses the comments, although you might have some follow ups to some of those answers before this is ready for merge.

Make picture caching more performant, and fix some bugs.
One the remaining issues with enabling picture caching is that
it was a noticeable performance regression on sites that don't
benefit from picture caching, due to the extra render targets
and blits. Instead, we now draw to the main framebuffer and
blit those results into tiles for caching, as appropriate.

There is still a bit of work to be done here (specifically,
detecting when a tile isn't opaque, and disabling the cache blit
if we decide the tile is changing too often), but this is in
general a much faster solution than before. It's also faster
on pages that *do* benefit from picture caching, removing
one extra blit step during cache creation.

This also changes how we handle the way gecko supplies different
display lists with different local coordinate systems. Now, we
cache based on the world rect of the tile, and verify the content
with the new ComparableVec type. This is much more robust and
fixes a number of existing bugzilla bugs related to picture caching.

This patch also removes much of the hashing required, which is a
performance win on the sites I tested (even with the vec comparison
that we now use instead). We do currently update prim deps each
scroll frame, which can be improved / optimized in the future.

Finally, this implementation only allocates and stores tiles for
the current viewport, which is much better for performance and
scaling of content where the picture rect is very large.

@gw3583 gw3583 force-pushed the gw3583:dc19 branch from 4f99dac to 8f2ee43 Dec 27, 2018

@kvark

kvark approved these changes Dec 27, 2018

Copy link
Member

kvark left a comment

:lgtm:

Reviewed 9 of 9 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @gw3583)


webrender/src/picture.rs, line 257 at r1 (raw file):

Previously, gw3583 (Glenn Watson) wrote…

Currently, yes. Although - this is more about getting a stable prim origin in the descriptor (they are stored as world position - tile world position, to get a stable position regardless of the world location of the tile).

I wonder if (in the future) we can make it so the cache origin is always the root of the coordinate space. This would be the world position for the root scroll frame, and yet we'd be able to cache transformed surfaces in their local space while still re-using the interned primitives.


webrender/src/util.rs, line 900 at r1 (raw file):

Previously, gw3583 (Glenn Watson) wrote…

I think this is still needed for the following case. In frame 1, you push items A, B, C. Then, in frame 2, you push items A, B. In this case, is_same will be true, since all items pushed thus far are equal. However, if you call is_valid at this point, you want it to return false, since the overall lists are not equal. Or perhaps I'm misunderstanding your suggestion - would it handle this case?

Yes, your case makes sense. However, I'm not talking about is_valid here, but only about is_same and prev_len. If we find out that the primitives are different from the last time, we can just set the actual length of the list to current_index and never look up prev_len again. Hence, I thought we can have prev_len: Option<usize> that turns None as soon as we encounter a different primitive.

This is somewhat minor and fine for me to follow-up on ;)

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Dec 27, 2018

Rebased and added a try run to make sure this doesn't cause any issues with existing code paths (looks good so far). I'm working on the remaining bits and pieces for this as a follow up commit.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5f13a55c62b101c120cfe3bbb07625e0df3bbd9&selectedJob=219010851

@gw3583 gw3583 changed the title [WIP] Make picture caching more performant, and fix some bugs. Make picture caching more performant, and fix some bugs. Dec 27, 2018

@Darkspirit

This comment has been minimized.

Copy link

Darkspirit commented Dec 28, 2018

Login and click to open a tweet.

RUST_BACKTRACE=1 mozregression --repo try --launch a5f13a55c62b101c120cfe3bbb07625e0df3bbd9 --pref gfx.webrender.all:true gfx.webrender.picture-caching:true -B debug -a https://www.twitter.com

1:37.78 INFO: Hit MOZ_CRASH(bug: no world clip rect) at libcore/option.rs:1008

Edit: Same when scrolling around on https://forum.xda-developers.com/redmi-note-3/development. (Another new regression: The shadow above the breadcrumb there may flicker now.)

@Darkspirit

This comment has been minimized.

Copy link

Darkspirit commented Dec 28, 2018

Still partly invisible:
mozregression --repo try --launch a5f13a55c62b101c120cfe3bbb07625e0df3bbd9 --pref gfx.webrender.all:true gfx.webrender.picture-caching:true -a https://bug1515540.bmoattachments.org/attachment.cgi?id=9032606

@Darkspirit

This comment has been minimized.

Copy link

Darkspirit commented Dec 28, 2018

New regression first appeared in this try build: Zoom in on https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Graphics%3A%20WebRender so that you have horizontal and vertical scrollbars: They're flickering or are even invisible then.

@gw3583 gw3583 force-pushed the gw3583:dc19 branch from 8f2ee43 to 8845e44 Dec 28, 2018

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Dec 28, 2018

@Darkspirit Thanks for taking a look!

I pushed an update which fixes the crash / panic you were hitting. The scrollbars (and I think other artifact) are expected until I land the follow up TODOs for detecting opaque tiles and splitting the batches based on this.

I think that this is fine to merge as long as it doesn't have any bad effect on the non-picture caching path, since they are a known issue that I'm working on now as a follow up.

The try run looks OK - none of those issues are ones that occur on the non-picture caching path, is that right?

Thanks!

@kvark

This comment has been minimized.

Copy link
Member

kvark commented Dec 28, 2018

@bors-servo delegate=Darkspirit

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Dec 28, 2018

✌️ @Darkspirit can now approve this pull request

@Darkspirit

This comment has been minimized.

Copy link

Darkspirit commented Dec 28, 2018

Haven't seen other problems. Your latest commit doesn't have a try build, but we'll get one in wr-future-update and I'll test that too.

@bors-servo r+

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Dec 28, 2018

📌 Commit 8845e44 has been approved by Darkspirit

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Dec 28, 2018

⌛️ Testing commit 8845e44 with merge b4dfe9c...

bors-servo added a commit that referenced this pull request Dec 28, 2018

Auto merge of #3446 - gw3583:dc19, r=Darkspirit
Make picture caching more performant, and fix some bugs.

One the remaining issues with enabling picture caching is that
it was a noticeable performance regression on sites that don't
benefit from picture caching, due to the extra render targets
and blits. Instead, we now draw to the main framebuffer and
blit those results into tiles for caching, as appropriate.

There is still a bit of work to be done here (specifically,
detecting when a tile isn't opaque, and disabling the cache blit
if we decide the tile is changing too often), but this is in
general a much faster solution than before. It's also faster
on pages that *do* benefit from picture caching, removing
one extra blit step during cache creation.

This also changes how we handle the way gecko supplies different
display lists with different local coordinate systems. Now, we
cache based on the world rect of the tile, and verify the content
with the new ComparableVec type. This is much more robust and
fixes a number of existing bugzilla bugs related to picture caching.

This patch also removes much of the hashing required, which is a
performance win on the sites I tested (even with the vec comparison
that we now use instead). We do currently update prim deps each
scroll frame, which can be improved / optimized in the future.

Finally, this implementation only allocates and stores tiles for
the current viewport, which is much better for performance and
scaling of content where the picture rect is very large.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3446)
<!-- Reviewable:end -->
@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Dec 28, 2018

☀️ Test successful - status-appveyor, status-taskcluster
Approved by: Darkspirit
Pushing b4dfe9c to master...

@bors-servo bors-servo merged commit 8845e44 into servo:master Dec 28, 2018

3 of 4 checks passed

code-review/reviewable 6 files, 2 discussions left (gw3583, kvark)
Details
Taskcluster (pull_request) TaskGroup: success
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details

@bors-servo bors-servo referenced this pull request Dec 28, 2018

Merged

Redesigned clip and scroll API #3251

2 of 6 tasks complete

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Dec 29, 2018

Bug 1516650 - Update webrender to commit b4dfe9c4f98fdeca3814976cd075…
…bde8ed409123 (WR PR #3446). r=kats

servo/webrender#3446

Differential Revision: https://phabricator.services.mozilla.com/D15480

--HG--
extra : moz-landing-system : lando

mykmelez pushed a commit to mykmelez/gecko that referenced this pull request Dec 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment