New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce structure interner, and use for clip interning. #3075

Merged
merged 3 commits into from Sep 19, 2018

Conversation

Projects
None yet
4 participants
@gw3583
Copy link
Collaborator

gw3583 commented Sep 18, 2018

This patch introduces a new data structure, that allows generic
structures to be interned. This works similarly to how normal
string interning works, however it is specialized to the
thread model for WR (explained at the top of intern.rs).

The effect of this change is that clip nodes are de-duplicated
and persisted between both frames and display lists.

This has three primary benefits:

  • Since they are de-duplicated, the handle for an interned
    structure uniquely identifies it. This is very useful for
    future use where we want to be able to quickly and cheaply
    compare if the contents of a cached picture matches that
    of a new display list.
  • Since they are persisted between display lists, the GPU
    cache handles for the nodes remain valid. This means far
    fewer GPU cache update patches for types that are interned.
  • Since they are de-duplicated by content hash value, there
    are fewer clips overall used by the frame builder.

The plan in the future is to extend this to other primitive types,
as well as gradient stops, text runs etc. This will allow us to
very quickly check if a cached picture remains valid, even in the
presence of a completely new display list.

This adds a small amount of overhead to the scene builder thread,
(extra hashing) but reduces the CPU time in the render backend and
compositor threads, which is also a good tradeoff.


This change is Reviewable

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 18, 2018

r? @kvark

This does seem to give a noticeable reduction in the number of GPU cache rows on several pages that I tested on. Of course, the major wins wouldn't come until we use it for more than just clips.

The main remaining bit of work (apart from review comments) is that I haven't hooked up the interner and data store to the capture / replay code yet.

Try run looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=32256ade54c961eabec197c380bb2eeda194f8d1&selectedJob=199879097

(lots of blue and orange, but they are unrelated as far as I can tell - still some jobs pending though)

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 18, 2018

And also r? @nical too, since this touches scene builder thread and such things.

@nical
Copy link
Collaborator

nical left a comment

Looks good to me, just want to make sure this doesn't interfere with my world domination plans concerning the hit tester, see comment.

);
self.hit_tester = Some(frame_builder.create_hit_tester(&self.clip_scroll_tree));
self.hit_tester = Some(frame_builder.create_hit_tester(

This comment has been minimized.

@nical

nical Sep 18, 2018

Collaborator

My only worry is creating a dependency between frame building and building the hit tester. We really need to be able to build the latter independently. Here we are first mutably passing the clip data store to the frame builder and then using it to build the hit tester.
Upon inspecting the code that mutates the data store, it looks like only GPU thingies are modified so it should be safe to create the hit tester independently, but I'd feel better if we moved the line that creates the hit tester above the FrameBuilder::build call to make sure dependencies don't creep in.

This comment has been minimized.

@gw3583

gw3583 Sep 18, 2018

Collaborator

Yep, sounds good.

pub struct Interner<S : Eq + Hash + Clone + Debug, M> {
// Uniquely map an interning key to a handle
map: FastHashMap<S, Handle<M>>,
// List of free slots in the data store for re-used.

This comment has been minimized.

@nical

nical Sep 18, 2018

Collaborator

nit: "for re-use"?

@kvark
Copy link
Member

kvark left a comment

Reviewed 17 of 17 files at r1.
Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @nical and @gw3583)


webrender/src/batch.rs, line 1778 at r1 (raw file):

        clip_scroll_tree: &ClipScrollTree,
        transforms: &mut TransformPalette,
        clip_data_store: &ClipDataStore,

does it make sense to merge clip_data_store and clip_store?


webrender/src/clip.rs, line 743 at r1 (raw file):

// comparison of clip node equality by handle, and also allows
// the uploaded GPU cache handle to be retained between display lists.
// TODO(gw): Maybe we should consider constructing these directly

perhaps, this would let us not have both Au and non-Au versions of the clip structs


webrender/src/clip.rs, line 748 at r1 (raw file):

#[cfg_attr(feature = "capture", derive(Serialize))]
#[cfg_attr(feature = "replay", derive(Deserialize))]
pub enum ClipItemKey {

I agree that names are missing here


webrender/src/clip.rs, line 835 at r1 (raw file):

#[cfg_attr(feature = "capture", derive(Serialize))]
#[cfg_attr(feature = "replay", derive(Deserialize))]
pub enum ClipItem {

I do wonder if we can make this more generic, i.e. just have <T> all the way down, where T=Au or T=f32 depending on the requirements?


webrender/src/intern.rs, line 36 at r1 (raw file):

 is no longer referencing.

 Items in the data store are stored in a traditional

one way to use a freelist here is to have it built into Item<T> like this:

enum Item<T> {
  Valid { epoch: Epoch, data: T },
  Free { next: usize },
}

We could easily hide the enumeration tag by using NonZeroU64:

struct Epoch(NonZeroU64);

This way we'd also not need the INVALID, I think.

Perhaps, this approach would improve data locality and make the code more idiomatic/typed?


webrender/src/intern.rs, line 45 at r1 (raw file):

#[cfg_attr(feature = "replay", derive(Deserialize))]
#[derive(Debug, Copy, Clone, PartialEq)]
struct Epoch(u64);

let's document what the epoch applies to: scene builder, individual interned item, etc


webrender/src/intern.rs, line 84 at r1 (raw file):

}

// The data item is stored with an epoch, for validating

let's have all those internal comments as doc comments (///), now that cargo doc is able to produce private documentation that would be useful


webrender/src/intern.rs, line 100 at r1 (raw file):

    items: Vec<Item<T>>,
    _source: PhantomData<S>,
    _marker: PhantomData<M>,

what is the marker for?


webrender/src/intern.rs, line 126 at r1 (raw file):

                        epoch: update_list.epoch,
                    };
                    if self.items.len() == update.index {

what is guaranteeing that the index isn't greater than the length?


webrender/src/intern.rs, line 143 at r1 (raw file):

    // Retrieve an item from the store via handle
    pub fn get(&self, handle: &Handle<M>) -> &T {

nit: consider impl ops::Index


webrender/src/intern.rs, line 174 at r1 (raw file):

    updates: Vec<Update<S>>,
    // The current epoch for the interner.
    current_epoch: Epoch,

that's quite a few epochs, epochs everywhere!


webrender/src/intern.rs, line 195 at r1 (raw file):

    pub fn intern(
        &mut self,
        data: &S,

if data here is the same as on the data store side, let's use the same generic T for it to avoid confusion


webrender/src/intern.rs, line 254 at r1 (raw file):

        let current_epoch = self.current_epoch.0;

        // First, run a GC step. Walk through the handles, and

do we really need the "used for some time" semantics here? I thought just removing ones not used for each incoming scene build would be sufficient.


webrender/src/intern.rs, line 284 at r1 (raw file):

        // Begin the next epoch
        self.current_epoch = Epoch(self.current_epoch.0 + 1);

the method itself does much more than just getting the updates, so we should probably rename it to something other than get_updates in order to indicate that we are bumping the epoch and GC-cleaning


webrender/src/intern.rs, line 284 at r1 (raw file):

        // Begin the next epoch
        self.current_epoch = Epoch(self.current_epoch.0 + 1);

hm, I am worried a bit that the interner keeps its own epoch. Why isn't the epoch really driven by the scene builder, so that one build = one epoch?


webrender/src/render_backend.rs, line 1305 at r1 (raw file):

            }

            // TODO(gw): Work out serializing the clip interner / store.

isn't this done already?


webrender_api/src/units.rs, line 222 at r1 (raw file):

}

pub trait AuRectHelpers {

we have 4 traits that could pretty much be trait AuHelpers<T>

@gw3583
Copy link
Collaborator

gw3583 left a comment

Reviewable status: 8 of 17 files reviewed, 17 unresolved discussions (waiting on @kvark, @nical, and @gw3583)


webrender/src/batch.rs, line 1778 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

does it make sense to merge clip_data_store and clip_store?

Not currently, since the clip_store is reccycled per frame, while the clip_data_store is persistent for the Document lifetime. It's likely we can tidy this up a bit in the future though.


webrender/src/clip.rs, line 743 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

perhaps, this would let us not have both Au and non-Au versions of the clip structs

Done.


webrender/src/clip.rs, line 748 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I agree that names are missing here

Would you prefer this is fixed up in this patch, or done at some later time?


webrender/src/clip.rs, line 835 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I do wonder if we can make this more generic, i.e. just have <T> all the way down, where T=Au or T=f32 depending on the requirements?

Perhaps - although in some cases (e.g the box-shadow) it's quite different between the structs.


webrender/src/intern.rs, line 36 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

one way to use a freelist here is to have it built into Item<T> like this:

enum Item<T> {
  Valid { epoch: Epoch, data: T },
  Free { next: usize },
}

We could easily hide the enumeration tag by using NonZeroU64:

struct Epoch(NonZeroU64);

This way we'd also not need the INVALID, I think.

Perhaps, this approach would improve data locality and make the code more idiomatic/typed?

The complexity in this case is that the items live in the data store (frame builder thread), while the free-list itself lives in (and is managed by) the interner in the scene builder thread.


webrender/src/intern.rs, line 45 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

let's document what the epoch applies to: scene builder, individual interned item, etc

Done.


webrender/src/intern.rs, line 84 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

let's have all those internal comments as doc comments (///), now that cargo doc is able to produce private documentation that would be useful

Done.


webrender/src/intern.rs, line 100 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

what is the marker for?

To ensure a Handle from an unrelated data store type cannot be used.


webrender/src/intern.rs, line 126 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

what is guaranteeing that the index isn't greater than the length?

The next_index field in the scene builder size is the exactly length of the array in the data store side - the free-list management ensures we only ever insert into an existing slot or append at the end of the array.


webrender/src/intern.rs, line 143 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

nit: consider impl ops::Index

Done


webrender/src/intern.rs, line 174 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

that's quite a few epochs, epochs everywhere!

Yep - this could be cleaned up a bit when we move epoch into the scene builder, as you suggested above.


webrender/src/intern.rs, line 195 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

if data here is the same as on the data store side, let's use the same generic T for it to avoid confusion

It's not the same - the S is the source data type, the T is the target data type that implements From<S>. For example, the source type for clips is just the key, but the target type includes a persisted GPU cache handle.


webrender/src/intern.rs, line 254 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

do we really need the "used for some time" semantics here? I thought just removing ones not used for each incoming scene build would be sufficient.

That would be sufficient for correctness - what I'm trying to handle here is the case where JS is removing / adding elements, and so it's useful to have them exist for sometime after they were removed in case they are re-added. It's debatable how common this actually is though!


webrender/src/intern.rs, line 284 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

the method itself does much more than just getting the updates, so we should probably rename it to something other than get_updates in order to indicate that we are bumping the epoch and GC-cleaning

Naming is hard! Made it slightly better.


webrender/src/intern.rs, line 284 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

hm, I am worried a bit that the interner keeps its own epoch. Why isn't the epoch really driven by the scene builder, so that one build = one epoch?

That's a good idea, we should do this. Are you ok with doing this as I add the next interned data structure type in a follow up, or would you prefer to do it now?


webrender/src/render_backend.rs, line 1305 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

isn't this done already?

I haven't worked out the saving of the scene-side structure yet - will implement that today.


webrender_api/src/units.rs, line 222 at r1 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

we have 4 traits that could pretty much be trait AuHelpers<T>

Indeed!

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 18, 2018

@kvark Thanks for the review! Still need to implement the capture functionality, and any follow ups from the review that remain above.

@kvark

kvark approved these changes Sep 19, 2018

Copy link
Member

kvark left a comment

:lgtm:

Reviewed 9 of 9 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @nical and @kvark)


webrender/src/clip.rs, line 748 at r1 (raw file):

Previously, gw3583 (Glenn Watson) wrote…

Would you prefer this is fixed up in this patch, or done at some later time?

fine to be left out of this PR :)

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 19, 2018

@kvark Added one more commit with the capture / replay integration. I think it's possible to have races where the interner / data store get out of sync, but maybe it's really unlikely in practice. What do you think? (I'm going to do some testing locally to see if I can ever get them out of sync).

@gw3583 gw3583 force-pushed the gw3583:intern9 branch from 94a05a1 to f3e3b4e Sep 19, 2018

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 19, 2018

@kvark I did some testing with the simple capture / replay integration and it does seem to work. I couldn't reproduce any out of sync issues, so it may be fine as is.

I also rebased it, so I think this is ready to go once you check over the last commit.

Kicked off a try run (pending) too, to make sure I didn't break anything with the review comment fixes:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8027b1a08ffbc4c0e8fc4dbed58b79a9ec2f59c

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Sep 19, 2018

☔️ The latest upstream changes (presumably #3081) made this pull request unmergeable. Please resolve the merge conflicts.

@gw3583 gw3583 force-pushed the gw3583:intern9 branch from f3e3b4e to b73da90 Sep 19, 2018

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 19, 2018

(rebased again)

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 19, 2018

Try run still looks good 🎉

@gw3583 gw3583 referenced this pull request Sep 19, 2018

Merged

Remove BorderWidthsAu #3084

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Sep 19, 2018

☔️ The latest upstream changes (presumably #3080) made this pull request unmergeable. Please resolve the merge conflicts.

gw3583 added some commits Sep 18, 2018

Introduce structure interner, and use for clip interning.
This patch introduces a new data structure, that allows generic
structures to be interned. This works similarly to how normal
string interning works, however it is specialized to the
thread model for WR (explained at the top of intern.rs).

The effect of this change is that clip nodes are de-duplicated
and persisted between both frames and display lists.

This has three primary benefits:
 * Since they are de-duplicated, the handle for an interned
   structure uniquely identifies it. This is very useful for
   future use where we want to be able to quickly and cheaply
   compare if the contents of a cached picture matches that
   of a new display list.
 * Since they are persisted between display lists, the GPU
   cache handles for the nodes remain valid. This means far
   fewer GPU cache update patches for types that are interned.
 * Since they are de-duplicated by content hash value, there
   are fewer clips overall used by the frame builder.

The plan in the future is to extend this to other primitive types,
as well as gradient stops, text runs etc. This will allow us to
very quickly check if a cached picture remains valid, even in the
presence of a completely new display list.

This adds a small amount of overhead to the scene builder thread,
(extra hashing) but reduces the CPU time in the render backend and
compositor threads, which is also a good tradeoff.

@gw3583 gw3583 force-pushed the gw3583:intern9 branch from b73da90 to 7cb78ff Sep 19, 2018

@gw3583

This comment has been minimized.

Copy link
Collaborator

gw3583 commented Sep 19, 2018

Rebased again - just removed the modified box shadow reftest image, which is no longer different with this change after @pcwalton 's box shadow border radius fix.

@kvark

kvark approved these changes Sep 19, 2018

Copy link
Member

kvark left a comment

I'd like to hear more about a case where the interner gets out of sync with the data store. I think that, as long as our communication channel is one-way (going through WR pipeline of API -> Backend(A) -> SceneBuilder -> Backend(B) -> Renderer), and there is only one channel, there should be no data-races.

With this in mind, :lgtm:

Reviewed 5 of 6 files at r3.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @nical and @gw3583)


webrender/src/scene_builder.rs, line 257 at r3 (raw file):

    #[cfg(feature = "capture")]
    fn save_scene(&mut self, config: CaptureConfig) {

note the naming is inconsistent with load_scenes

@kvark

This comment has been minimized.

Copy link
Member

kvark commented Sep 19, 2018

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Sep 19, 2018

📌 Commit 7cb78ff has been approved by kvark

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Sep 19, 2018

⌛️ Testing commit 7cb78ff with merge f17f6a4...

bors-servo added a commit that referenced this pull request Sep 19, 2018

Auto merge of #3075 - gw3583:intern9, r=kvark
Introduce structure interner, and use for clip interning.

This patch introduces a new data structure, that allows generic
structures to be interned. This works similarly to how normal
string interning works, however it is specialized to the
thread model for WR (explained at the top of intern.rs).

The effect of this change is that clip nodes are de-duplicated
and persisted between both frames and display lists.

This has three primary benefits:
 * Since they are de-duplicated, the handle for an interned
   structure uniquely identifies it. This is very useful for
   future use where we want to be able to quickly and cheaply
   compare if the contents of a cached picture matches that
   of a new display list.
 * Since they are persisted between display lists, the GPU
   cache handles for the nodes remain valid. This means far
   fewer GPU cache update patches for types that are interned.
 * Since they are de-duplicated by content hash value, there
   are fewer clips overall used by the frame builder.

The plan in the future is to extend this to other primitive types,
as well as gradient stops, text runs etc. This will allow us to
very quickly check if a cached picture remains valid, even in the
presence of a completely new display list.

This adds a small amount of overhead to the scene builder thread,
(extra hashing) but reduces the CPU time in the render backend and
compositor threads, which is also a good tradeoff.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3075)
<!-- Reviewable:end -->
@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Sep 19, 2018

☀️ Test successful - status-appveyor, status-taskcluster
Approved by: kvark
Pushing f17f6a4 to master...

@bors-servo bors-servo merged commit 7cb78ff into servo:master Sep 19, 2018

3 of 4 checks passed

code-review/reviewable 3 discussions left (gw3583, nical)
Details
Taskcluster (pull_request) TaskGroup: success
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details

@gw3583 gw3583 deleted the gw3583:intern9 branch Sep 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment