Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upOff-screen clip mask generation #556
Conversation
| @@ -314,7 +314,7 @@ dependencies = [ | |||
|
|
|||
| [[package]] | |||
| name = "ipc-channel" | |||
| version = "0.6.0" | |||
| version = "0.5.1" | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
kvark
Nov 11, 2016
Author
Member
That's actually not me, it's Glenn - bf3ac6f
not sure how it got in my PR though
This comment has been minimized.
This comment has been minimized.
glennw
Nov 11, 2016
Member
There's a bug in ipc-channel 0.6 preventing us from landing the update in Servo, so we had to revert the update in WR (for now).
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.
| // ResourceCache allocates/load the actual data | ||
| // will be simplified after the TextureCache upgrade | ||
| pub image: Option<ImageMask>, | ||
| pub device_rect: DeviceRect, |
This comment has been minimized.
This comment has been minimized.
kvark
Nov 12, 2016
Author
Member
One problem that I see with this change is the computation of device_rect. Since layers are supposed to move, this rectangle needs to be re-computed, but there is not tracking of the layers at this point. Is there an example where the layers state is being cached and updated upon any movement? Alternatively, I can re-compute the rectangle every frame.
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
Doesn't this get re-computed every frame anyway, since the generate() call is done during cull_layers()?
This comment has been minimized.
This comment has been minimized.
kvark
Nov 14, 2016
Author
Member
generate() is currently only done if metadata.clip_cache_info.is_none()
| @@ -314,7 +314,7 @@ dependencies = [ | |||
|
|
|||
| [[package]] | |||
| name = "ipc-channel" | |||
| version = "0.6.0" | |||
| version = "0.5.1" | |||
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.
| @@ -18,7 +18,7 @@ byteorder = "0.5" | |||
| euclid = "0.10" | |||
| fnv="1.0" | |||
| gleam = "0.2" | |||
| ipc-channel = "0.6" | |||
| ipc-channel = "0.5" | |||
This comment has been minimized.
This comment has been minimized.
| int render_task_index; | ||
| int layer_index; | ||
| int data_index; | ||
| int pad; |
This comment has been minimized.
This comment has been minimized.
| * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ | ||
|
|
||
| void main(void) { | ||
| oFragColor = vec4(1, 1, 1, 1); |
This comment has been minimized.
This comment has been minimized.
| #[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)] | ||
| pub struct ClipAddressRange { | ||
| pub start: GpuStoreAddress, // start GPU address | ||
| pub count: u32, // number of items, not bytes |
This comment has been minimized.
This comment has been minimized.
| @@ -36,7 +36,7 @@ pub struct RenderBackend { | |||
| next_namespace_id: IdNamespace, | |||
|
|
|||
| resource_cache: ResourceCache, | |||
| dummy_resources: DummyResources, | |||
| _dummy_resources: DummyResources, | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| &projection); | ||
| } | ||
|
|
||
| // Draw the clip items into the tiled alpha mask. | ||
| if true { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| @@ -567,11 +645,14 @@ struct CompileTileContext<'a> { | |||
| struct RenderTargetContext<'a> { | |||
| layer_store: &'a [StackingContext], | |||
| prim_store: &'a PrimitiveStore, | |||
| resource_cache: &'a ResourceCache, | |||
| _clip_stack: &'a ClipRegionStack, | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@kvark What was the reason for having to cover the entire tile with each clip primitive again? I think we'll probably need to find a solution that doesn't require this - I think the performance overhead will probably be too high otherwise. I'll pull the branch today and do some performance tests though to check that. |
|
The name became slightly misleading. The clip tile is not the tile a clip belongs to, it's actually just representing the screen space boundary of thr intersection between all clip instances, so the performance is not expected to suffer.
|
| return cci; | ||
| } | ||
|
|
||
| // The transformed vertex function that always covers the whole tile with the primitive |
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
Let's specify that the whole tile means the intersection of the clip instances here.
This comment has been minimized.
This comment has been minimized.
|
Doing some testing with this today, and ran into something which I assume is a bug - unless I'm misunderstanding something: I have a test case that contains a single div, with the follow style:
With some logging to show what the renderer is submitting for clip cache draws, I see:
Which suggests that it's issuing two clear draw calls for the cache render, and also a clear on the main render target. I was expecting to see 1 clear + 1 rect in the render target, and no clip items in the main frame buffer target. (I ensured the div is positioned completely inside one tile, just to rule out any bugs related to crossing tile boundaries). |
|
Ah, those extra clears are related to the dummy mask item - I guess that always gets created at the moment, even if not needed? |
|
Yeah, and it's also incomplete/buggy. I've been looking into fixing it, hopefully done by tomorrow. Note: it should be just one draw, I'll have a closer look tomorrow about the rest.
|
|
This looks promising! Added a few comments, feel free to reply here or perhaps they might be easier to discuss on IRC / Vidyo during the week. |
| @@ -49,7 +52,6 @@ uniform sampler2DArray sCache; | |||
| uniform sampler2D sLayers; | |||
| uniform sampler2D sRenderTasks; | |||
| uniform sampler2D sPrimGeometry; | |||
| uniform sampler2D sClips; | |||
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
We could also remove Clips from the TextureSampler enum, and associated code.
|
|
||
| float do_clip() { | ||
| // anything outside of the mask is considered transparent | ||
| bvec4 inside = lessThanEqual( |
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
It's unfortunate to add the cost of this conditional to every shader - is it actually needed? Shouldn't we always be able to just sample from the clip mask?
This comment has been minimized.
This comment has been minimized.
kvark
Nov 14, 2016
Author
Member
We'd have to clamp to UV borders, and even then - we don't currently guarantee that those borders contain opaque data. So - possible, but non-trivial.
I don't see it as a big deal though, just a few comparisons with masking the fetch result.
| vec2 final_pos = tile.screen_origin_task_origin.zw + | ||
| tile.size_target_index.xy * aPosition.xy; | ||
|
|
||
| gl_Position = uTransform * vec4(final_pos, 0, 1); |
This comment has been minimized.
This comment has been minimized.
| vec2 source_uv = clamped_mask_uv * vClipMaskUvRect.zw + vClipMaskUvRect.xy; | ||
| float clip_alpha = texture(sMask, source_uv).r; //careful: texture has type A8 | ||
|
|
||
| oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha)); |
This comment has been minimized.
This comment has been minimized.
|
|
||
| float clip_alpha = rounded_rect(local_pos); | ||
|
|
||
| oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha)); |
This comment has been minimized.
This comment has been minimized.
| @@ -11,6 +11,7 @@ void main(void) { | |||
| vec2 local_pos = vPos; | |||
| #endif | |||
|
|
|||
| alpha = min(alpha, do_clip(local_pos)); | |||
| //alpha = min(alpha, do_clip(local_pos)); | |||
| alpha = min(alpha, do_clip()); | |||
| oFragColor = vColor * vec4(1, 1, 1, alpha); | |||
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| pub struct ClipRegionStack { | ||
| layers: HashMap<StackingContextIndex, LayerInfo>, |
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
It would be good if we could get rid of this layers hash map - the LayerInfo struct doesn't really seem to store anything that's not already stored in the main Layer Store. Could we instead just pass in a borrow of the Layer Store when it's needed in generate(), including the current layer id? That way, we guarantee that there is one source of truth for the stacking context transform, that includes scroll offsets etc.
This comment has been minimized.
This comment has been minimized.
|
|
||
| pub struct ClipRegionStack { | ||
| layers: HashMap<StackingContextIndex, LayerInfo>, | ||
| image_masks: Vec<ImageMask>, |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| @@ -5,7 +5,7 @@ | |||
| use renderer::MAX_VERTEX_TEXTURE_WIDTH; | |||
| use std::mem; | |||
|
|
|||
| #[derive(Debug, Copy, Clone)] | |||
| #[derive(Debug, Copy, Clone, Eq, Hash, PartialEq)] | |||
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
I'm unsure about this - maybe it's necessary but it just doesn't seem quite right. Perhaps in the cache key we could store the ImageKey instead of the actual GpuStoreAddress?
This comment has been minimized.
This comment has been minimized.
kvark
Nov 14, 2016
Author
Member
we may be able to store ImageKey instead of the gpu address in the cache key, but there is also ClipAddressRange in there that uses GpuStoreAddress
| let image = match source { | ||
| &PrimitiveClipSource::NoClip => None, | ||
| &PrimitiveClipSource::Complex(rect, radius) => { | ||
| let address = clip_store.alloc(CLIP_DATA_GPU_SIZE); |
This comment has been minimized.
This comment has been minimized.
glennw
Nov 14, 2016
Member
Ideally we don't want to be calling alloc() on gpu stores during generate. The reason is that the gpu stores are not re-created during scrolling, so this will end up growing the gpu stores during scrolling. What most of the primitives do is a gpu store alloc when the prim is first added (if needed), and then just get a (mutable) reference to that address when they need to update that data. I think that would work OK in this case?
This comment has been minimized.
This comment has been minimized.
kvark
Nov 14, 2016
Author
Member
Ok, so this appears to be complicating things a bit (implementing now). The old generate logic is going to be split into 3 parts, starting with the GPU store allocation (part 1), and proceeding to actually filling the GPU stores during the primitives generation, if needed (part 2), and followed by the update of the device rectangle (part 3 - again, if needed).
|
|
|
@glennw thank you for the fantastic review! |
|
Fixed the dummy task now some more and confirmed the mozilla test suite passes for servo. |
|
|
|
@kvark Thanks! I'll take another look at this today. I did some testing, and in general the performance seems quite good. A good page to do some testing with is https://github.com/servo/servo - the performance goes off a cliff as you scroll around halfway down the page. The GPU profiler shows that it is spending the majority of its time generating clip masks, so that would be worth investigating. There also seems to be some clipping artifacts when scrolling on some of the buttons. It's possible those slowdowns / artifacts are unrelated to this change, but I haven't noticed them previously. |
|
Left a couple of minor comments. Let's look into the GH performance issue too and find out what is causing that to be slow. Then we can either add a follow up task to fix that, or fix it now if simple. Then this should be ready to merge. |
| @@ -156,8 +157,13 @@ impl AlphaBatchHelpers for PrimitiveStore { | |||
| PrimitiveKind::TextRun | | |||
| PrimitiveKind::Image | | |||
| PrimitiveKind::Gradient | | |||
| PrimitiveKind::BoxShadow => true, | |||
|
|
|||
| PrimitiveKind::BoxShadow => { | |||
This comment has been minimized.
This comment has been minimized.
glennw
Nov 15, 2016
Member
Let's do this test outside the match statement, so it handles borders too.
| if let Some(mask_task) = RenderTask::new_mask(self.rect, clip_info) { | ||
| current_task.children.push(mask_task); | ||
| } else { | ||
| // The primitive has clip items but their intersection is empty |
This comment has been minimized.
This comment has been minimized.
glennw
Nov 15, 2016
Member
Ideally we'd catch this earlier, and not even mark the primitive as visible in this case. But we can do that as a follow up (let's just add a TODO comment here about it).
|
@glennw I checked the servo page and haven't found anything outstanding in terms of what the implementation is doing. There is quite a large of an area associated with the clips (~80% screen space), so that's where the slowdown can come from. Frankly, I do not observe the "going off the cliff" effect, the FPS drops from 55 to 51 when scrolled to the middle of the document. Given that the rounded corner radius is typically about 2-3px, up to 10, and the rectangles are pretty big, I can think of a way to drastically reduce the time spent on clip generation:
I'm going to rebase and resolve the last concerns. |
9f1255e
to
2942ca4
…tiple rounded cornered rectangles. Removal of the _clip shader variants.
|
@bors-servo r+ |
|
|
|
|
Off-screen clip mask generation This is the second major step towards #498 Edit: now actually includes the third step as well (removal of `*_clip` shaders). The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new `ClipBatcher`. It supports everything that we currently support, plus the actual handling of arbitrary number of clips. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/556) <!-- Reviewable:end -->
kvark commentedNov 11, 2016
•
edited
This is the second major step towards #498
Edit: now actually includes the third step as well (removal of
*_clipshaders).The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new
ClipBatcher.It supports everything that we currently support, plus the actual handling of arbitrary number of clips.
This change is