Minutes 2018 08 27

GPU Web 2018-08-27

Chair: Dzmitry

Scribe: Ken

Location: Google Meet

Minutes from last meeting

TL;DR

Fields in sampler objects
- Lod bias and normalized coordinates not on all APIs. consensus to take them out
- MIRROR_CLAMP_TO_EDGE not in Vulkan and expect to be deprecate in the future, take it out.
Occlusion queries
- Consensus single query can be present at any time, started and ended in the render pass.
- Ok to skip the staging stage of Vk / D3D and write directly in a query buffer.
- DM will make a PR.
More on passes:
- Clarified the synchronization in that model. UAV in render passes are still racy
- D3D12 won’t be able to dispatch in render passes.
(segway from passes) different ways to clear textures
- loadOpClear
- Clear attachment in the renderpass -> would need to be emulated in Metal
- Clear outside renderpass. Not in Metal.
- Need more investigation.

Tentative agenda

Occlusion queries
More on copies
More on passes
Agenda for next meeting

Attendance

Apple
- Myles C. Maxfield
- Thomas Denney
Google
- Corentin Wallez
- Dan Sinclair
- Kai Ninomiya
- Ken Russell
Intel
- Aleksandar Stojilijkovic
- Yunchao He
Microsoft
- Chas Boyd
- Rafael Cintron
Mozilla
- Dzmitry Malyshau

Status

Google running Dawn unit tests on infrastructure

Fields in sampler objects

https://github.com/gpuweb/gpuweb/pull/73
MM: LOD bias is not in Metal
normalized_coordinates bool is not in D3D
We could either not include those things, or emulate them
- Pretty easily emulated
- Trick: if a webGPU programmer sets one of these in a sampler where the platform doesn’t support it in the sampler, then at point when recording drawing commands, would have to upload some info to shader.
- In Vulkan: would have to be push constant. D3D; root constant. Metal: SetBytes (acts like upload).
- Just need to think if we should emulate these, or leave them out.
DM: my initial reaction: leave them out because they’re not critical.
CW: all of number of root constants, number of push constants and number of inlined buffers in Metal are pretty small. Would be pretty scary to depend on autogenerating these.
MM: don’t have any plans to expose root or push constants through the API. Only use for them would be this sort of invisible data. Think there would probably not be that much use for it.
CW: thought we had push constants. Put on agenda for next week?
DM: don’t see a strong reason why we shouldn’t have push constants since all APIs have a way to do small quick uploads. Agenda for next week? Also, how about uploading normalized coordinates? Query size first?
MM: yes. Either that, or call shading language function to query texture’s size.
DM: not as easy as LODBias isue.
DM: also since we don’t have a notion of pipeline layout we don’t have an ahead of time place where those constants are known to be needed, create root signature, etc.
MM: unless anyone else speaks up we can take them out.
DM: Google and Mozilla have approved the PR.
DM: MIRROR_CLAMP_TO_EDGE?
MM: assumed that would be widely available. Hard to know which Vulkan features are widely available.
DM: there’s a database; you can query it to see. Available on ~87% of devices. Most of those not supporting it are Android GPUs.
- http://vulkan.gpuinfo.org/listreports.php?extension=VK_KHR_sampler_mirror_clamp_to_edge&option=not
MM: Google’s call about whether it should be required or not?
KN: don’t know what the use case is of this feature. Easier to say we shouldn’t require it. ~85% isn’t that bad, but it’s not 100% either.
MM: I’ve never seen anyone use this thing.
CB: what feature?
MM: addressing mode of MIRROR_CLAMP_TO_EDGE for texture sampling. (-1,1) -> (0,1), but everything else gets clamped.
CB: used to be called MIRROR_ONCE in D3D.
MM: let me take it out for now.
CB: expect future hardware will be deprecating this anyway.

Occlusion queries

https://github.com/gpuweb/gpuweb/issues/72
DM: was only able to write this down. The others are optional in Vulkan. May want to add them separately.
For occlusion queries: all 3 APIs provide mechanisms for querying data. vulkan has the most wide spectrum of options on data collection and querying. Metal is the most restrictive. Our API would naturally lean toward the Metal model.
In Metal, user has to specify buffer when starting a pass. Since it’s the occlusion query it’s only for render passes. They would change the mode of the pass to start counting at a specific offset of this buffer. Disable counting later. Only a single query can be active at any time.
Last rule is common across all APIs. Model will map pretty well across all of them.
If we don’t go the Metal route – in Vulkan you can have multiple queries in multiple pools and begin/end queries in different places – you don’t have a single buffer you can specify at render pass creation. Unless all queries belong to a single buffer.
D3D12 is very close to Vulkan. Simpler resolve mechanics (results only 64-bits, only command buffer path to get the contents). So, not easy to map to Vulkan.
Since we already have render passes, I suggest we follow Metal’s model.
CW: so we do a copy from query pool to buffer after every renderpass and it prevents batching.
DM: in Vulkan you can follow up with commands that query the results.
DM; given that a single pass can have multiple queries you can enable/disable that would be batching. All the queries you’ve made will be copied over after the render pass.
DM: will make a pull request
RC: question: where is the query started and ended? inside pass, or outside?
DM: inside. From here on, everything should be counted at this specific address at this buffer you spec’d at the beginning of the pass. Later, in pass, say “stop counting”. Can be issued multiple times in the same render pass, as long as you stop counting at old offset before beginning new offset.
RC: so ending the query is D3D12’s resolve?
CW: no it is more like D3D12’s EndQuery. Resolve is for copying from a staging “query heap” to a buffer. The current proposal is to skip that in WebGPU and directly write in the buffer like Metal
DM: yes, we resolve everything at end of pass. Use queries from descriptor heap that we manage in the D3D12 backend.
RC: makes sense. In your issue you said MSFT please confirm. Synchronous path: doesn’t seem to be one. But you could use fence on queue and wait for it to finish. Then all command buffers you enqueued will have finished.
DM: in both Metal and D3D12, to get query results, supposed to be available at that time. In Vulkan they can be unavailable. This is quite a complication on the API side we don’t want to expose.
RC: will double-check.
MM: what’s the next step?
DM: for queries, will make PR. Also for passes.

More on copies

https://github.com/gpuweb/gpuweb/issues/69
DM: where did we stop?
CW: biggest concern was aesthetics from Jeff but he isn’t here. Delay?
DM: will delay until the next meeting
DM: my concerns: had a few on alignment. Would either have to encode in API or not. Haven’t given much thought since last call.

More on passes

https://github.com/gpuweb/gpuweb/issues/70
DM: Rafael reviewed proposal (thanks). Believe CW resolved your concerns; agree with those answers. Seems to be consensus.
RC: have some q’s. Seem to have agreed on: render passes where you can start/stop. In render pass you can only draw things. No copying, dispatching. Compute passes where the only thing you can do is dispatch thing. Anything besides drawing/dispatching can happen between passes. ?
DM: yes.
RC: it’s up to the webgpu api to put the correct barriers for copies, etc. in place between passes.
DM: correct. No synchronization needed inside render pass. Only UAV barriers needed inside compute passes. Can be any kind of sync between passes.
RC: but the API can detect that it would have needed a barrier in a render pass. For example, trying to sample a render target you just rendered to.
DM: that will be usage.
(lost connection)
DM: if we see conflict between read-only and read-write…
MM: in general Rafael you’re right.
RC: so you can read/write to/from UAVs inside render pass, woe to you if you write to it and read from it in next draw.
MM: yes. If programmer hits this they should have broken the pass.
RC: little concerned about the UAV thing but overall fine by me.
RC: since last meeting the D3D team sent a new spec and they removed the ability to dispatch inside render passes. And there are no other kinds of passes. Hope we don’t have to police UAV reads/writes inside a render pass. Would perhaps need errors for these in the future.
CB: app has to manage these with barriers.
RC: our API won’t have barriers.
MM: when a webgpu dev wants barriers they end the pass and begin a new one.
DM: not a good workaround for us to sync anything inside a render pass. Would have to break the render pass and that’s a heavy operation. Don’t want to introduce this perf cliff.
CW: no API has barriers in render passes so we can’t really have them
RC: it’s a unit of synchronization: you have your render targets and will only write to them. If you try to clear something in middle of render pass also not allowed?
DM: clearing render targets is a topic we haven’t investigated. in vulkan you can have two / three different clears. Beginning render pass: this target should be cleared (common to all APIs). Does D3D12 pass proposal have a similar pass-clear operation?
RC: yes it does. Per-render-target: discard, preserve, clear. Could say, all clears have to happen at beginning. Discard probably not a good one for us b/c don’t want uninit memory.
DM: we discussed discard but don’t remember. Can we provide a perf advantage, i.e. detect contents are fully written?
CW: we can use discard as a lazy … if the next render target clears then we can do …
DM: my worry: if we detect this draw call covers the entire render target, then something goes wrong in the driver. Then you get contents of video memory exposed to users. Asking for trouble if we do the “don’t care” semantic.
RC: aside from full render-target clears we can’t detect whether a draw call affects the whole render target. You can move the vertices in the vertex shader, etc.
DM: we can know if it’s a blit/copy op.
MM: we’re discussing two different things; load-stores on render pass, and “blit filling with this data”. Let’s split those discussions.
DM: OK, don’t have blits inside render passes anyway. Don’t-care discussion…
MM: what’s the perf benefit of don’t-care? Propose to eliminate it.
CW: don’t care is important on tilers; avoids full-framebuffer write.
DM: should talk later. Agree with Myles.
RC: Agree with Myles too
DM: within render pass you can say: clear rectangles of rendertargets. Don’t think other APIs have that.
MM: Metal doesn’t have that. Would have to implement it by ending the render pass. Think we should not do that.
DM: have to end RP?
MM: if we want to implement it in Metal we have to either draw full-screen quad or do blit. Have to change shader.
CW: would just switch pipelines.
MM: OK. Would have to compile a pipeline.
DM: user can do it themselves.
MM: we don’t need 3 different ways.
DM: 3rd way of clearing: outside of pass, say “image has to be cleared”. Not in Metal. It’s in D3D12. (Can get a view, and clear the view). Not clear if and how we should expose it. Little investigation to be done.
MM: in Metal there’s no way to fill a texture with a value. But can fill a buffer and copy buffer to texture. 2 commands instead of one.
- Note that each byte has to be set to the same value — so this only works for grey values
DM: assume this will be slower than starting/ending render pass, but interesting idea.

Agenda for next meeting

Continue discussing copies
“Don’t care” mechanics
Hopefully, some of the IDL pull requests.
KN: one week from now is Labor Day in the US.
- DM: cancel next time?
- CW: yes, and can work offline on pull requests.
Next meeting is September 10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2018 08 27

GPU Web 2018-08-27

Minutes from last meeting

TL;DR

Tentative agenda

Attendance

Status

Fields in sampler objects

Occlusion queries

More on copies

More on passes

Agenda for next meeting

Clone this wiki locally