Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBO copies are slow on Angle #2110

Closed
kvark opened this issue Nov 27, 2017 · 17 comments
Closed

PBO copies are slow on Angle #2110

kvark opened this issue Nov 27, 2017 · 17 comments

Comments

@kvark
Copy link
Member

@kvark kvark commented Nov 27, 2017

It appears that Windows/Angle spends tons of time in update_texture_from_pbo: https://perfht.ml/2zJnkg4
Angle decided to defer mapping and filling an actual D3D11 staging buffer up until this call. Since the copy itself is done in a draw call, it has to stall the GPU until the copy is done. This is one issue, but the actual current one is different: Map itself is waiting. I suppose it's waiting for the GPU considering the texture is still in use, which implies our PBO orphaning doesn't work as expected (see Device::orphan_pbo).

cc @jrmuizel @glennw

@jrmuizel
Copy link
Contributor

@jrmuizel jrmuizel commented Nov 27, 2017

This is showing up during motionmark bouncing circles.

@jrmuizel
Copy link
Contributor

@jrmuizel jrmuizel commented Nov 27, 2017

I should also note that to reproduce this slowness I need to run the test case in ramp mode first and then reload with constant complexity. i.e.
Constant complexity @ 400 => 6ms compositor time
Ramp => up to 11ms compositor time
Constant complexity @ 400 => 16ms compositor time

So it does seem like this problem is being made worse through some invisible persistent state (the webrender texture cache?, the ANGLE buffer cache?)

@kvark
Copy link
Member Author

@kvark kvark commented Nov 27, 2017

@jrmuizel we should have a separate issue for the (suspected) cache pollution

@kvark kvark changed the title PBO orphaning doesn't work on Angle PBO copies are slow on Angle Nov 28, 2017
@kvark
Copy link
Member Author

@kvark kvark commented Nov 28, 2017

Thanks to Angle team I got some answers! This is not about orphaning, but rather about some texture formats not being supported by the fast path of buffer->texture copies: https://cs.chromium.org/chromium/src/third_party/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp?type=cs&q=supportsFastCopyBufferToTexture&sq=package:chromium&l=3006

In particular, Angle doesn't like our RGB8 and A8.

@kvark kvark self-assigned this Nov 28, 2017
@glennw
Copy link
Member

@glennw glennw commented Nov 28, 2017

Awesome, thanks for following this up!

@jrmuizel
Copy link
Contributor

@jrmuizel jrmuizel commented Nov 28, 2017

@glennw
Copy link
Member

@glennw glennw commented Nov 28, 2017

@jrmuizel Are you using the Gecko profiler for this or something else?

@glennw
Copy link
Member

@glennw glennw commented Nov 28, 2017

@kvark From your investigations of angle, would it be worth considering alternative update strategies (e.g. render target with float points, or perhaps map/unmap or something else?)

@jrmuizel
Copy link
Contributor

@jrmuizel jrmuizel commented Nov 28, 2017

@glennw
Copy link
Member

@glennw glennw commented Nov 28, 2017

When I originally set up the GPU cache, the idea was to be able to use an unsynchronized map to update the texture, on platforms where that made sense (i.e. a persistent pointer into the texture data). It does two things to enable this:

  • Items are never evicted that are newer than an arbitrary amount of frames (currently set to 10).
  • When items are invalidated for update, the old location is orphaned - that is, a new location is selected for the updated data, and the old location is left to be evicted by the cache by the eviction policy mentioned above.

In theory, this means we don't need any synchronization here - these two policies should guarantee that we never write to a location that is < 10 frames old, which should thus guarantee that the GPU is never reading incorrect data.

Admittedly, I've never actually tested this in practice - I intended to revisit it at a later time. But perhaps we can signal to ANGLE that it doesn't need to do any blocking and see how it goes?

Another possibility (more of a temporary quick fix / hack) could be to round-robin a series of backing textures for the GPU cache, and update / upload the entire texture each frame. This sounds bad, but that data is actually quite small, and may be perfectly fine as an interim solution, if most of the time is spend blocking on a GPU fence or similar.

@kvark @jrmuizel These might be worth pursuing if we're not able to get the current path running well on ANGLE.

@kvark
Copy link
Member Author

@kvark kvark commented Nov 28, 2017

@jrmuizel

The profile shows us hitting the fast path not missing it.

Damn, that is correct. Back to square one!

@glennw

From your investigations of angle, would it be worth considering alternative update strategies (e.g. render target with float points, or perhaps map/unmap or something else?)

Will see after this issue is resolved.

the idea was to be able to use an unsynchronized map to update the texture

TBH, that sounds perfect :) We definitely need to check out persistent mapping for the GPU cache texture. However, this is quite specific (GPU cache only, not the texture cache or other data) and will probably need to wait till we resolve the outstanding performance issues first (assuming we can resolve the Angle problem without rewriting the upload path too much).

@kvark
Copy link
Member Author

@kvark kvark commented Nov 28, 2017

Had a long discussion today with Angle peers. Ended up filing an upstream issue - https://bugs.chromium.org/p/angleproject/issues/detail?id=2268

TL;DR:
They don't expect us to use PBOs at all, and STREAM_DRAW on them in particular. Using STATIC_DRAW is our best hope for a minimally-intrusive fix. They'd prefer us just calling glTexSubImage2D directly on the GPU textures...

@glennw
Copy link
Member

@glennw glennw commented Nov 28, 2017

We can quite easily switch to using glTexSubImage2D. That should only be a few lines to change. However, that definitely introduces stalls on Intel drivers (at least on Linux), so we should support both paths and select one based on the GL renderer. We could start by doing a local test on ANGLE to see how much this improves things?

@kvark
Copy link
Member Author

@kvark kvark commented Nov 28, 2017

Yes, that's my plan :) We'll definitely need separate paths on Windows/Angle versus the world :)

@kvark kvark added the bugzilled label Nov 29, 2017
bors-servo added a commit that referenced this issue Dec 4, 2017
Texture update strategies

~~Fixes  #2110, hopefully: still to be benchmarked on Windows.~~
Provide the settings for Gecko to fix ^^

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=12fecf240aa3440764cd64205b17bfcfe7eb252c

r? @glennw

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2147)
<!-- Reviewable:end -->
@kvark
Copy link
Member Author

@kvark kvark commented Dec 4, 2017

Oh, didn't mean to close this just yet.

@kvark kvark reopened this Dec 4, 2017
bors-servo added a commit that referenced this issue Dec 8, 2017
Scattered GPU cache updates

Fixes #2110

This PR introduces the GPU cache uploading via shader writes. It minimizes the amount of data we need to transfer to the GPU in order to reduce the stalls and make us scale better with content (see #2132 (comment)).

Note: in terms of efficiency, rasterizing lines would be much better than points. Leaving this for the follow up. Doing so would require us to upload the source data into a texture as opposed to a buffer, and slightly (but non-critically) complicate things.

WIP, because:
  - benchmarking is TODO
  - to be squashed eventually

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f583b2deae1c7312437638bdda6ddc65693d53ed

r? @glennw

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2162)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.