Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upSwitch primitives to be stored in SoA style in vertex textures. #448
Conversation
|
r? @pcwalton I apologize for the size of this patch - if it's too difficult to review, let me know and I'll split it up tomorrow into smaller patches. I've run this locally against WPT and CSS tests successfully. I've also tested on my Linux laptop and Mac Mini. Some timings are pasted below. |
|
Some representative timings from my laptop. Note that this change doesn't actually do any incremental updates yet, so there are some good sized wins still to come from follow ups to this patch. https://en.wikipedia.org https://news.ycombinator.com https://reddit.com https://github.com/servo/servo |
|
I'd be happy to review this gigantic change as well. |
|
|
The primary goals of this patch are:
(1) Store primitives in SoA style. This makes some passes, such as
culling primitives much faster due to better cache coherency.
(2) Store primitives in flat arrays, for direct upload to GPU. This
reduces the amount of redundant copying on the CPU during
frame construction.
(3) Decouple primitive information from geometry instances. This allows
us to improve occlusion culling, by submitting segments of primitives
without any extra CPU overhead of copying primitive data.
(4) Allow incremental updates of the GPU SoA arrays during scrolling and
zoom. This isn't implemented yet, but the framework is in place and
will allow significant backend/compositor thread improvements during
scrolling and zoom frames.
There's also a number of other changes that have been rolled in to this
patch since it made sense to fix them at the same time. In particular:
* Blend/composite batches are stored as ints, avoiding CPU float packing overhead.
* Border segment rectangles are calculated in the vertex shader rather than CPU.
* Gradient colors use vertex shader interpolators for axis aligned gradients, reducing FS overhead.
* Removed separate Text/TextRun shader types, resulting in better batching and fewer draw calls.
* Angle gradients support arbitrary stop count in primitive data.
* Removed some unused interpolators from the border shader.
* Axis aligned gradient segments are calculated in vertex shader, reducing CPU overhead.
* Reduced size of packed image structure - UV type is stored in sign of UV field.
* Reduced size of packed glyph structure - color stored per run, rect calculated from UVs.
* Moved some utility types from tiling.rs to util.rs.
* Remove clip cases from batch enum, store needs_clipping flag in batch key.
|
Review notes:
Some thoughts about the change:
Other observations through the code:
I've yet to check |
|
@kvark Thanks for taking a look, comments below:
Thanks for looking over the changes! :) |
|
@glennw makes sense, thanks for the answers! |
|
Partial review |
| vOffsets[i] = gradient.offsets[i]; | ||
| int stop_index = int(prim.user_data.x); | ||
|
|
||
| for (int i=0 ; i < vStopCount ; ++i) { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| tr_inner.x - tl_inner.x, | ||
| border.widths.y); | ||
| break; | ||
| } |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| TextureId(0), | ||
| TextureId(0), | ||
| TextureId(0), | ||
| TextureId(0), |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| #[derive(Debug, Clone)] | ||
| pub struct TextRunPrimitiveCpu { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
glennw
Oct 20, 2016
Author
Member
Structs get split SoA style into portions that get directly pushed into a vertex texture (the gpu version) and cpu-side only portions (typically used to look up resource lists etc).
| Clip { | ||
| rect: ClipRect { | ||
| rect: rect, | ||
| padding: [0.0, 0.0, 0.0, 0.0], |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| self.gpu_data16.push(InstanceRect { | ||
| rect: rect, | ||
| }); | ||
| } |
This comment has been minimized.
This comment has been minimized.
pcwalton
Oct 20, 2016
Collaborator
nit: Could just be
self.gpu_data16.extend(instance_rects.into_iter().map(|rect| InstanceRect {
rect: rect,
}))
| clip_data[1] = GpuBlock32::from(clip.top_left.clone()); | ||
| clip_data[2] = GpuBlock32::from(clip.top_right.clone()); | ||
| clip_data[3] = GpuBlock32::from(clip.bottom_left.clone()); | ||
| clip_data[4] = GpuBlock32::from(clip.bottom_right.clone()); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| TextureCoordKind::Pixel => { | ||
| uv1.x = -uv1.x; | ||
| } | ||
| } |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| *dest = GpuBlock32::from(GradientStop { | ||
| offset: 1.0 - src.offset, | ||
| color: src.color, | ||
| padding: [0.0, 0.0, 0.0], |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| TransformedRectKind::Complex | ||
| }; | ||
|
|
||
| /* |
This comment has been minimized.
This comment has been minimized.
pcwalton
Oct 20, 2016
Collaborator
I prefer not to have commented out code, but I understand why you did it here. I think it'd be good to add a FIXME at the top saying what the commented out code does and when to reenable it.
This comment has been minimized.
This comment has been minimized.
nit: Instead of repeating
Can we add a |
|
This looks good to me with those nits addressed! |
|
@bors-servo r=pcwalton |
|
|
Switch primitives to be stored in SoA style in vertex textures.
The primary goals of this patch are:
(1) Store primitives in SoA style. This makes some passes, such as
culling primitives much faster due to better cache coherency.
(2) Store primitives in flat arrays, for direct upload to GPU. This
reduces the amount of redundant copying on the CPU during
frame construction.
(3) Decouple primitive information from geometry instances. This allows
us to improve occlusion culling, by submitting segments of primitives
without any extra CPU overhead of copying primitive data.
(4) Allow incremental updates of the GPU SoA arrays during scrolling and
zoom. This isn't implemented yet, but the framework is in place and
will allow significant backend/compositor thread improvements during
scrolling and zoom frames.
There's also a number of other changes that have been rolled in to this
patch since it made sense to fix them at the same time. In particular:
* Blend/composite batches are stored as ints, avoiding CPU float packing overhead.
* Border segment rectangles are calculated in the vertex shader rather than CPU.
* Gradient colors use vertex shader interpolators for axis aligned gradients, reducing FS overhead.
* Removed separate Text/TextRun shader types, resulting in better batching and fewer draw calls.
* Angle gradients support arbitrary stop count in primitive data.
* Removed some unused interpolators from the border shader.
* Axis aligned gradient segments are calculated in vertex shader, reducing CPU overhead.
* Reduced size of packed image structure - UV type is stored in sign of UV field.
* Reduced size of packed glyph structure - color stored per run, rect calculated from UVs.
* Moved some utility types from tiling.rs to util.rs.
* Remove clip cases from batch enum, store needs_clipping flag in batch key.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/448)
<!-- Reviewable:end -->
|
|
glennw commentedOct 18, 2016
•
edited by larsbergstrom
The primary goals of this patch are:
(1) Store primitives in SoA style. This makes some passes, such as
culling primitives much faster due to better cache coherency.
(2) Store primitives in flat arrays, for direct upload to GPU. This
reduces the amount of redundant copying on the CPU during
frame construction.
(3) Decouple primitive information from geometry instances. This allows
us to improve occlusion culling, by submitting segments of primitives
without any extra CPU overhead of copying primitive data.
(4) Allow incremental updates of the GPU SoA arrays during scrolling and
zoom. This isn't implemented yet, but the framework is in place and
will allow significant backend/compositor thread improvements during
scrolling and zoom frames.
There's also a number of other changes that have been rolled in to this
patch since it made sense to fix them at the same time. In particular:
This change is