Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upLarger-than-ideal B_Solid times on github on my integrated GPU (Intel HD Graphics 530) #2178
Comments
mstange
commented
Dec 6, 2017
|
|
|
Given the overdraw of opaque to be 1.0 and transparent to be 0.21, the large GPU time of "B_Solid" means we are hitting the rasterizer/depth-test bottleneck of fixed-function part of the hardware. This may happen if we try to draw too many opaque rectangles with pixels mostly discarded and not reaching the fragment shaders. A similar situation is described in https://bugzilla.mozilla.org/show_bug.cgi?id=1419863 Investigation of a RenderDoc capture shows that we have 3 large full-screen-ish rectangles: one of the whole screen, one of the content plus the scroll bar, and one of the content. Interestingly, each of those is rendered twice (resulting in 6 full-screen rects). Hopefully, this can be addressed by the Gecko side by looking at the background setting and discarding it if the content is fully opaque. If not, we'll need to come up with a better culling heuristics in WR. |
|
Note that something needs to split the container rectangles to avoid overdraw. Whether it's WR or Gecko is still to be figured out. Another thing that would help is using a separate document for the chrome stuff. |
|
Briefly recounting what I've been discussing in IRC with @kvark. That very roughly looks like ~2ms in opaque to me. On an integrated GPU, at ~4k resolution, that seems possible / reasonable amount of time to fill a rect that large. It'd be great if we find that we are suffering costs from rejecting those extra large quads (there's probably some cost at least), but I'd be surprised if we saw a massive GPU time win over what we're currently seeing for that resolution and that GPU. One possible improvement to this is using partial raster / present - it's certainly the case that on almost all websites you view at 4k resolution, the sides are largely empty. So, I think it makes sense to do some measurements to see if we're paying much cost for rejecting those extra quads, but I'm not super-surprised that it's ~2ms to fill a 4k rect with the opaque pass. |
|
Following up on that, if we do the following, which are all relatively straightforward:
I expect that we'd probably be at ~3.5 - 4ms for this page at 4k resolution, which seems quite close to optimal, I think. Thoughts? |
|
Oh, and also using dual-source blending for the subpx text rendering should give a significant win for this page, perhaps another 0.5ms off the profile. |
|
@glennw raised a concern that maybe it's not about rasterization/depth-test but rather the sheer number of pixels we draw in the opaque pass. We decided to benchmark it separately for a baseline, which corresponds to an ideal case where we only draw every pixel once and the pixel shader does nothing. I made this little tool - https://github.com/kvark/gl-bench I'm getting this on my Intel 520 with Linux/GLX:
A true 4K screen has 2.25 more pixels, so it would estimate to spend 1.26 ms just to fill it on that GPU. |
|
Thanks @kvark for setting that up! This is quite promising - it may well mean that my assumption that we aren't paying any real z-reject cost was wrong. We should further investigate what happens if we remove those extra full pass opaque quads (even just removing them manually as a test case to benchmark). |
|
We started filling up the data in https://github.com/servo/webrender/wiki/GPU-fill-rate-baseline |
|
I've added the data from my machine. I used RDM to switch to a non-retina resolution (2880x1800 at native 1:1 scale) so that the megapixel arithmetic worked out, because it's not HiDPI-aware. I also used gfxCardStatus to pin my machine to the integrated GPU. |
|
Numbers tell that the cost of rejecting a quad is roughly 7% of the cost of drawing it. And the cost of drawing it is significantly smaller than what we spend in the opaque pass. This means we don't understand yet where all the time has gone. |
|
The bugzilla is marked "resolved fixed" FWIW, unsure based on the discussion whether it is the same issue but it may be able to be closed |
|
Thank you! |
