Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and improve the texture cache, implement new allocator. #1572

Merged
merged 9 commits into from Aug 15, 2017

Conversation

@glennw
Copy link
Member

glennw commented Aug 14, 2017

Refactor and improve the texture cache implementation.

  • All textures using the sColor* samplers are now array textures. This will allow us to simplify the large image tiling code in the future.

  • All texture updates are performed via PBO, to eliminate CPU driver stalls when updating textures.

  • The texture cache now acts more like a traditional cache, where the cache is a fixed size, and can evict items as required.

  • The allocator is switched to a slab style allocator, allowing deterministic behavior when allocating and freeing.

  • The slab allocator does not suffer from fragmentation like the previous allocator, and has no coalesce requirements.

  • The eviction policy is a lot smarter than the previous implementation. Items are evicted from the cache as required.

  • Since the shared cache is a fixed / bounded size, any allocations that are too large for the cache are created as standalone textures.

  • In the rare / edge case of the fixed sized cache being full, and the cache not being able to evict items, these items also fallback to the standalone texture allocator.

  • Add support for array textures that don't have FBOs.

  • Simplify render targets (always array textures).

  • Remove resize_texture which is now unused.

  • Remove support for texture upload without PBO.

  • Add support for weak handles to freelist.

  • Update gleam to 0.4.8 (allows upload of array textures via PBO).


This change is Reviewable

gw3583 added 7 commits Aug 14, 2017
Simplify render targets (always array textures).
Remove resize_texture which is now unused.
Support uploading all texture types via PBO.
Remove support for texture upload without PBO.
* The texture cache now acts more like a traditional cache,
  where the cache is a fixed size, and can evict items as required.
* The allocator is switched to a slab style allocator, allowing
  deterministic behavior when allocating and freeing.
* The slab allocator does not suffer from fragmentation like the
  previous allocator, and has no coalesce requirements.
* The eviction policy is a lot smarter than the previous implementation.
  Items are evicted from the cache as required, and not before.
* Since the shared cache is a fixed / bounded size, any allocations
  that are too large for the cache are created as standalone textures.
* In the rare / edge case of the fixed sized cache being full, and
  the cache not being able to evict items, these items also fallback
  to the standalone texture allocator.
* All textures using the sColor* samplers are now array textures.
  This will allow us to simplify the large image tiling code in
  the future.
* All texture updates are performed via PBO, to eliminate CPU
  driver stalls when updating textures.
@glennw
Copy link
Member Author

glennw commented Aug 14, 2017

r? @kvark

I'm so sorry for the size of this patch. It really snowballed. Let me know if you want to discuss details on IRC / Vidyo etc.

I'm happy with the end result though :)

I've done a fair amount of manual testing in both Servo and Gecko. It also passes the Gecko sanity reftests, the Servo WPT/CSS tests and the Wrench reftests.

@glennw
Copy link
Member Author

glennw commented Aug 14, 2017

@staktrace @sotaroikeda @JerryShih I've run the gecko sanity tests locally with this, and they pass. It doesn't change any external interfaces, but it does expect that external textures which use the normal texture2D sampler switch over to place them into the texture2DArray sampler. It doesn't affect anything that goes via the TextureRect or TextureExternal image types. Do you foresee any issues with this? I'll ping you later during the week with some questions I have on running a try build with this change.

This was referenced Aug 14, 2017
@glennw
Copy link
Member Author

glennw commented Aug 14, 2017

@kvark PS: The texture_cache.rs file is different enough now that it probably is easier to just review it as a new file rather than trying to diff it...

@glennw
Copy link
Member Author

glennw commented Aug 14, 2017

To get a feel for how the cache works, the image below may help. This is what the cache looks like after browsing a heap of wikipedia and mozilla pages in Servo.

cache

You can see that within each layer of a texture array, there are number of "regions". Each region acts as a slab allocator with a fixed size, that allows fast and deterministic allocating and freeing.

We'll probably want to tweak the slab sizes, but the initial implementation seems to work well on all the pages I've tried so far.

@kvark
Copy link
Member

kvark commented Aug 14, 2017

Overall, I think moving to slab allocation is a great step forward! The amount of code is making it rather difficult to get a full picture even when using Reviewable, but I got a general idea, and I like it (despite the fact I was cleaning up the coalescing not so long ago) ;)

In the rare / edge case of the fixed sized cache being full, and the cache not being able to evict items, these items also fallback to the standalone texture allocator.

Hmm, not sure about this one. I feel like we should not be hard-coding the texture cache size. If there is a workload that needs a little more than we have, we'll see consistent drop in performance there, where optimally I'd like our performance curve to be smooth.

So, instead of allocating a standalone texture, can we just allocate a new texture page to be added to the cache (and reused later on)?

To get a feel for how the cache works, the image below may help.

I love those, thank you! Although, perhaps it would be visually more appealing to render them Y-flipped, so that all the textures look exactly as they are on screen.


Reviewed 32 of 34 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 17 unresolved discussions.


webrender/res/debug_font.fs.glsl, line 10 at r2 (raw file):

void main(void)
{
#ifdef SERVO_ES2

is that check intended for removal?


webrender/src/device.rs, line 1644 at r2 (raw file):

        let (gl_format, bpp, data, data_type) = match self.textures.get(&texture_id).unwrap().format {
            ImageFormat::A8 => {
                if cfg!(any(target_arch="arm", target_arch="aarch64")) {

is this path intended for removal as well?


webrender/src/freelist.rs, line 94 at r2 (raw file):

    }

    pub fn upsert(&mut self,

please add a comment explaining what upsert means in this context


webrender/src/glyph_cache.rs, line 14 at r2 (raw file):

    pub texture_cache_handle: TextureCacheHandle,
    pub glyph_bytes: Arc<Vec<u8>>,
    pub width: u32,

should it use euclid primitives, perhaps? I.e. TypedSize<u32> and TypedPoint2D<f32>


webrender/src/glyph_rasterizer.rs, line 161 at r2 (raw file):

            match glyph_key_cache.entry(key.clone()) {
                Entry::Occupied(mut entry) => {
                    if let &mut Some(ref mut glyph_info) = entry.get_mut() {

nit: probably easier to have if let Some(...) = *entry.get_mut()


webrender/src/glyph_rasterizer.rs, line 273 at r2 (raw file):

                    let glyph_bytes = Arc::new(glyph.bytes);
                    let mut texture_cache_handle = TextureCacheHandle::new();
                    texture_cache.request(&mut texture_cache_handle, gpu_cache);

shouldn't we check for the request return value?


webrender/src/glyph_rasterizer.rs, line 274 at r2 (raw file):

                    let mut texture_cache_handle = TextureCacheHandle::new();
                    texture_cache.request(&mut texture_cache_handle, gpu_cache);
                    texture_cache.update(

is update always following request?


webrender/src/internal_types.rs, line 100 at r2 (raw file):

#[derive(Copy, Clone, Debug, PartialEq)]
pub enum RenderTargetMode {

I feel like we'll eventually (not now!) need to have a separate TargetView type (instead of this enumeration) that would capture the internal FBOs (per slice) and the depth attachment.


webrender/src/renderer.rs, line 2417 at r2 (raw file):

        let mut size = 512;
        let fb_width = framebuffer_size.width as i32;
        let num_textures: i32 = self.cache_texture_id_map

shouldn't this be num_layers?


webrender/src/resource_cache.rs, line 108 at r2 (raw file):

}

pub struct ResourceClassCache<K,V> {

is there any point in this struct now as oppose to just typedeffing to FastHashMap?


webrender/src/resource_cache.rs, line 144 at r2 (raw file):

    where for<'r> F: Fn(&'r &K) -> bool
    {
        let resources_to_destroy = self.resources.keys()

I think it can be easily done on the caller site now that we have Rust-1.19 (used by Gecko) and all those pretty retain methods in libstd


webrender/src/texture_cache.rs, line 82 at r2 (raw file):

// cache or a standalone texture.
#[derive(Debug)]
struct CacheEntry {

nice struct!


webrender/src/texture_cache.rs, line 231 at r2 (raw file):

    // Request an item in the texture cache. All images that will
    // be used on a frame *must* have request() called on their

do we have any detection of a forgotten request?


webrender/src/texture_cache.rs, line 404 at r2 (raw file):

        // Sort by access time so we remove the oldest ones first.
        eviction_candidates.sort_by_key(|handle| {

I think this procedure can be done simpler and a bit faster without temporary arrays:

  • sort standalone_entry_handles in descending order of last_access. Given that it's likely to be temporarily coherent, a quicksort-like method would be efficient at skipping the work on already sorted parts
  • check 32th from the back for early return
  • pop 32 items, calling free() for each

Although, this doesn't seem to be a performance critical place here, so we can look into it later. The current code is correct.


webrender/src/texture_cache.rs, line 477 at r2 (raw file):

                let entry = self.entries.free(handle);
                if let Some(region) = self.free(entry) {
                    found_matching_slab = found_matching_slab ||

can we use |= operator?


webrender/src/texture_cache.rs, line 870 at r2 (raw file):

        // allocate from.
        let slab_size = SlabSize::new(width, height);
        let slab_size_bytes = slab_size.get_size();

I thought that get_size() returns the width of a slab region, so it seems confusing to see _bytes here


webrender/src/texture_cache.rs, line 906 at r2 (raw file):

                region.init(slab_size);
                if let Some(location) = region.alloc() {
                    entry_kind = Some(EntryKind::Cache {

nit: could be entry_kind = region.alloc().map(|location| {...});


Comments from Reviewable

@glennw
Copy link
Member Author

glennw commented Aug 14, 2017

Review status: 29 of 34 files reviewed at latest revision, 17 unresolved discussions.


webrender/res/debug_font.fs.glsl, line 10 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

is that check intended for removal?

Yep, not needed now that we require es3.


webrender/src/device.rs, line 1644 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

is this path intended for removal as well?

Yep - es3 supports alpha texture format natively.


webrender/src/freelist.rs, line 94 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

please add a comment explaining what upsert means in this context

Done.


webrender/src/glyph_cache.rs, line 14 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

should it use euclid primitives, perhaps? I.e. TypedSize<u32> and TypedPoint2D<f32>

Done.


webrender/src/glyph_rasterizer.rs, line 161 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

nit: probably easier to have if let Some(...) = *entry.get_mut()

Done.


webrender/src/glyph_rasterizer.rs, line 273 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

shouldn't we check for the request return value?

In this case (a newly rasterized glyph), we know that we will definitely want to update.


webrender/src/glyph_rasterizer.rs, line 274 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

is update always following request?

Only if request returns true, or the caller knows it wants to update the image data.


webrender/src/internal_types.rs, line 100 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I feel like we'll eventually (not now!) need to have a separate TargetView type (instead of this enumeration) that would capture the internal FBOs (per slice) and the depth attachment.

Sounds good.


webrender/src/renderer.rs, line 2417 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

shouldn't this be num_layers?

Done.


webrender/src/resource_cache.rs, line 108 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

is there any point in this struct now as oppose to just typedeffing to FastHashMap?

Probably not - we could perhaps do as a follow up though?


webrender/src/resource_cache.rs, line 144 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I think it can be easily done on the caller site now that we have Rust-1.19 (used by Gecko) and all those pretty retain methods in libstd

Yup, we could do that as a follow up when we remove the resource class cache struct.


webrender/src/texture_cache.rs, line 231 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

do we have any detection of a forgotten request?

Yes, get() asserts that the last_frame matches the current frame.


webrender/src/texture_cache.rs, line 404 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I think this procedure can be done simpler and a bit faster without temporary arrays:

  • sort standalone_entry_handles in descending order of last_access. Given that it's likely to be temporarily coherent, a quicksort-like method would be efficient at skipping the work on already sorted parts
  • check 32th from the back for early return
  • pop 32 items, calling free() for each

Although, this doesn't seem to be a performance critical place here, so we can look into it later. The current code is correct.

Sounds good!


webrender/src/texture_cache.rs, line 477 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

can we use |= operator?

Done.


webrender/src/texture_cache.rs, line 870 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

I thought that get_size() returns the width of a slab region, so it seems confusing to see _bytes here

Done.


webrender/src/texture_cache.rs, line 906 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

nit: could be entry_kind = region.alloc().map(|location| {...});

Done.


Comments from Reviewable

@glennw
Copy link
Member Author

glennw commented Aug 14, 2017

@kvark Thanks for the review. I think all comments / questions addressed - I still haven't run on the gecko try build yet. I agree with you about the resizing - in the case where we evict everything not used on this frame and still can't fit in the cache, we could resize and add more layers. Are you happy for this to be done as a follow up?

@kvark
Copy link
Member

kvark commented Aug 15, 2017

Yes, I'm happy with it. I assume you are going to test Gecko before merging?
r=me then


Review status: 29 of 34 files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@glennw
Copy link
Member Author

glennw commented Aug 15, 2017

Yup, I'll test against Gecko today. Thanks!

@JerryShih
Copy link
Contributor

JerryShih commented Aug 15, 2017

I've run the gecko sanity tests locally with this, and they pass. It doesn't change any external interfaces, but it does expect that external textures which use the normal texture2D sampler switch over to place them into the texture2DArray sampler. It doesn't affect anything that goes via the TextureRect or TextureExternal image types. Do you foresee any issues with this? I'll ping you later during the week with some questions I have on running a try build with this change.

I think there is no problem if we try to use texture2DArray for texture2D sampler with your fix.

@glennw
Copy link
Member Author

glennw commented Aug 15, 2017

OK, gecko test run results here - https://treeherder.mozilla.org/#/jobs?repo=try&revision=727e26960bc131783b36d0fe4cd6dc143af414b3

I think that's all done correctly and tested OK - but I'll get @staktrace to confirm since I haven't tried this before.

If that's all OK, and with the sign-off from @JerryShih above, this should be good to merge.

@glennw
Copy link
Member Author

glennw commented Aug 15, 2017

@staktrace The required changes in gecko are:

  • Update gleam to 0.4.8.
  • Remove the cache_expiry_frames field when initializing WR, it's no longer needed.
@staktrace
Copy link
Contributor

staktrace commented Aug 15, 2017

@glennw The try push looks good, thanks! And also thanks for listing the changes required in gecko, that's perfect!

@kvark
Copy link
Member

kvark commented Aug 15, 2017

Ok, sounds like a fairy tale, :shipit:
@bors-servo r+

@bors-servo
Copy link
Contributor

bors-servo commented Aug 15, 2017

📌 Commit 63ec62c has been approved by kvark

@bors-servo
Copy link
Contributor

bors-servo commented Aug 15, 2017

Testing commit 63ec62c with merge c98d889...

bors-servo added a commit that referenced this pull request Aug 15, 2017
Refactor and improve the texture cache, implement new allocator.

Refactor and improve the texture cache implementation.

* All textures using the sColor* samplers are now array textures. This will allow us to simplify the large image tiling code in the future.
* All texture updates are performed via PBO, to eliminate CPU driver stalls when updating textures.

* The texture cache now acts more like a traditional cache, where the cache is a fixed size, and can evict items as required.
* The allocator is switched to a slab style allocator, allowing deterministic behavior when allocating and freeing.
* The slab allocator does not suffer from fragmentation like the previous allocator, and has no coalesce requirements.
* The eviction policy is a lot smarter than the previous implementation. Items are evicted from the cache as required.
* Since the shared cache is a fixed / bounded size, any allocations that are too large for the cache are created as standalone textures.
* In the rare / edge case of the fixed sized cache being full, and the cache not being able to evict items, these items also fallback to the standalone texture allocator.

* Add support for array textures that don't have FBOs.
* Simplify render targets (always array textures).
* Remove resize_texture which is now unused.
* Remove support for texture upload without PBO.
* Add support for weak handles to freelist.
* Update gleam to 0.4.8 (allows upload of array textures via PBO).

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/1572)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Aug 15, 2017

☀️ Test successful - status-travis
Approved by: kvark
Pushing c98d889 to master...

@bors-servo bors-servo merged commit 63ec62c into servo:master Aug 15, 2017
2 of 3 checks passed
2 of 3 checks passed
code-review/reviewable 5 files, 1 discussion left (glennw, kvark)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@glennw glennw deleted the glennw:tc-tidy3 branch Aug 15, 2017
@jrmuizel jrmuizel mentioned this pull request Aug 17, 2017
20 of 24 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.