GPU Web 2022 02 23

GPU Web 2022-02-23

Chair: KG

Scribe: Ken / Kelsey / Kai

Location: Google Meet

Previous meeting: https://docs.google.com/document/d/1-F2vNAbrHCVdLxo9L-WlFhfNr2XqShrZ6MsgWUPFr6Y/edit

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
~~Discuss the WebGPU Adapter Identifier explainer before reaching out to the PING.~~
Limit for the maximum buffer size #1371
[Process] Extensions directory in gpuweb/gpuweb #2301
[Process] Requirements for additional functionality #2424
mapSync on Workers - and possibly on the main thread #2217
Streaming implementations and indirect draws/dispatches #2189
Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388
Agenda for next meeting

Attendance

WIP, it is the list of all the people invited to the meeting. In bold the people that have been seen in the meeting:

Apple
- Dean Jackson
- Myles C. Maxfield
- Robin Morisset
Google
- Austin Eng
- Ben Clayton
- Brandon Jones
- Corentin Wallez
- David Neto
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Loko Kung
- Shannon Woods
- Shrek Shao
- Ryan Harrisson
Intel
- Brandon Jones
- Bryan Bernhart
- Hao Li
- Jiajia Qin
- Jiawei Shao
- Jie Chen
- Shaobo Yan
- Yang Gu
- Yunchao He
- Zhaoming Jiang
Kings Distributed Systems
- Daniel Desjardins
- Hamada Gasmallah
- Wes Garland
Microsoft
- Jesse Natalie
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jim Blandy
- Kelsey Gilbert
Unity
- Brendan Duncan
- Dave Shreiner
Andrew Varga
Connor Fitzgerald
Eduardo H.P. Souza
Francois Daoust
Francois Guthmann
Jeremy Sachs
Joshua Groves
Matijs Toonen
Mehmet Oguz Derin
Michael Shannon
Nathan Johnson
Pelle Johnsen
Rick Battagline
Rob Conde
Timo de Kort

Administrivia

Next week - APAC meeting

CTS Update

KN: few test fixes.
KR: WGSL team sent out test inventory - good to know the scope of the work - about 10% of the needed tests have been written, but optimistic we can burn down this list

“Tacit Resolution” Queue

Fake transfers for getMappedRange-created ArrayBuffers? #2072
- KN: discussed this weeks ago, forgot about it.
- When you create AB by mapping GpuBuffer you get AB object. Looks regular though it's pointing to special memory behind the scenes. Need to detach upon unmap. If passed across threads, can't do this - would create a race.
- Proposal here - instead of disallowing transfers - when you transfer them, it creates a copy, transfers that, and detaches the original. And issues a warning.
- Don't have to create weird types of ABs that can't be transferred. Don't even know if we can do that in TC39.
- KG: makes sense to me. Let's do it.

Discuss the WebGPU Adapter Identifier explainer before reaching out to the PING.

Myles and Brandon OOO

Limit for the maximum buffer size #1371

KN: a bit blocked on discussions with Apple.
KG: ideally wanted to discuss - think there's a consensus on adding this to limits. Think that includes consensus from Myles.
KR: thought we came to agreement last time that we'd add the max buffer size to the limits.
KG: if someone wanted to write a PR, think we can move in that direction.
KR: yes, let's please do that instead of waiting for permission.
KG: ok, yes, let's do that.

[Process] Extensions directory in gpuweb/gpuweb #2301

KN: these would be extensions we don't expect to put into the spec at any point. e.g. Pipeline Statistics - don't think these need to be in production applications. Would like a common place to develop extensions like that, even if not exposed by default.
KN: think this addresses concerns about putting these in an official place, but Apple had concerns about whether we the group work on stuff like this or require them to be developed somewhere else.
JB: what's the value in making these easy to find?
KN: people might still use these in local development. Pipeline stats can give you interesting results. Single place says - if you implement pipeline stats, there's a standard place to get the spec for it. Then users could find it - not browser-specific.
KN: fair that this isn't critically necessary. But there will be a lot of these. Better if they aren't scattered.
JB: would each of these refer to a specific commit of the spec so they know what it's defined relative to? Keep up to date as the spec evolves?
KN: strive to keep them up to date. Suspect they won't easily get out of date. After 1.0, see whether inconsistent with any breaking changes. If get out of sync, people will discover. Not trying to be strict about every detail.
KN: was interesting input from W3C: some groups have done things like this, exploratory work that might end up on standards track. Linked to WebRTC and MediaCapture extensions. In our case, could end up being standards track if someone discovers they need it. Also exploratory work like ray tracing.
KG: as long as it's useful for future API development, useful to put it in the repo. Myles' perspective: if not valuable to the API, we shouldn't keep that around.
KN: hard to guess. Right now, don't think we'll put pipeline stats in the API, but it's possible.
KG: think easiest thing is what we do for webgpu-native: another repo that someone maintains that isn't this group.
KN: having single separate repo would work.
JB: cost of deleting things isn't high. Everything's in version control.

[Process] Requirements for additional functionality #2424

KN: probably not much to talk about since Myles isn't here. We posted some thoughts on it; nothing further since.

mapSync on Workers - and possibly on the main thread #2217

KN: not sure how much I want to push on this for 1.0, but some things to consider.
KN: lot of people who'd like to be able to do sync readbacks. Don't know if we want to design it in. Lot of people porting from WebGL - very difficult to change their code or public API to deal with problems like this.
BD: we're one of those people at Unity. Because Unity API's GraphicsDevice was old, and synchronous. Async API available, but large codebase of existing projects designed around synchronous readbacks of textures & buffers.
KR: using WebGL 2.0 synchronous APIs? getBufferSubData?
BD: not really in hot loops. But there are texture readbacks. People do get lazy, and do e.g. reading storage buffers into the CPU to do fake instancing, etc. Bunch of projects around compute shaders that do that. (Not in WebGL 2). Other backends like Vulkan, Metal to this.
KR: A major difference between Web and native is that WebGPU is going to be sandboxed in another process. No way to make it fast - milliseconds, not microseconds. This is why the API was designed to be asynchronous. You can use await in JavaScript, realize it’s undesirable in C++.... Meet team ran into this: readPixels on every frame. Trying to get them to use the “async” readback that Kelsey and Kai developed in WebGL.
KR: We can put it in, but you’re going to see minimum 1ms roundtrips for these things. Maybe better to just not put these in now, and let application developers find workarounds for now.
BD: Can’t implement these functions at all as designed. Can do that in the framework in some places, but not functions [exposed to application], testing, …
KR: For testing, Emscripten Asyncify. Won’t perform well, bloats code size, but can work for tests.
MS: similar issue with public facing API that requires sync readback. No way to remove without breaking compat with non-web builds. We know it's slow, but lot better than asyncifying everything. We just removed asyncify from our prod builds because of perf impact.
JB: background q - WebGL 2 also isolates graphics in its own process. What is WebGL doing that WebGPU isn't?
KG: it's slow.
MS: we lose perf, too, in sync buffer readbacks. Sometimes can't work around it.
KR: what if we implemented it behind a flag? Could folks help us characterize the performance?
BD: we could test it.
KG: think it's something we'll have to ask people to do the other way for V1. Think this is the time we're going to try to do this right and ask people to do the porting. Experimentation is valuable though, for example if Dawn wanted to try it. Would be nice to find a different way to satisfy this.
KN: one thing to consider: sometimes may be able to start your map early and not wait on it until later. Could do better than sync - closer to WebGL where you can give it a hint to do the readback early, but make it explicit. Maybe changes to the API if we wanted to do this in the future? mapAsync and say "now I want to map sync"? Maybe don't need to make changes for it, can be side-by-side. Similar things for pipeline creation - start it, wait, then say we need it synchronously later. API design pattern for both of these?
KG: I'd be more encouraged about doing this if I hadn't just run into a product that was doing it wrong and unwilling to make changes on their side. Experimentation welcome, but gun-shy about this.
KN: you can always do this a different way - write into a canvas, then read back from the canvas. Can't stop synchronous canvas readbacks. Could say - you won't get your results from the canvas readback, but think that'd be too disruptive. The spec'd say - readback from canvas doesn't happen until some point later. Don't want to do that, would probably break things people do with canvases. That's the biggest reason I'm hesitant to say that we won't do this - people will probably do workarounds like that.
KG: I'd rather figure out how to stop those working than add this at this point.
KN: understood. Lot of things to prevent. WebGPU canvas readback; 2D / WebGL canvas interop; a lot to break.
KR: most of these interop cases should be async and must not break.
KN: right. So only the readback case is changing in some way - changing behavior of other specs.
KG: when you use this canvas as a source, this is what happens. Ill specified? There have been questions about whether ops pull from the front or back buffer in WebGL. If front buffer - it's what rAF picked up last time.
KN: not just impl details. We can do that for WebGPU, but for any other canvas that uses the WebGPU texture, and any other canvas that uses those. Strange propagation through lots of other canvases.
KG: no - saying drawImage picks up the last frame.
KN: don't think that'd work. People rely on things like that working.
MS: if you do this in JS I don't see a reason to have a sync API. When in Wasm, needs to be a standard way of doing this that isn't asyncify the whole codebase.
KG: isn't answer Emscripten bindings? WebGL Emscripten bindings weren't written in Wasm. Has a cost, but it's fine.
KN: can't pop out to async JS.
MS: that's the issue. 2 places where it hurts: mapAsync, and submitWorkDone. Both of these are sore points in writing WebGPU code in Wasm. Can't wait on it - have to come up with a sync wait on it. For me, it's a bunch of 0-tick callbacks and proxying everything to another thread. Not great way to do it. If a way to address these two pain points in Wasm would be great.
KG: understood. In some way, there's no way around this. If you have a tight loop of Wasm code, because of our rules, if we don't yield to the event loop we can't give you results back. Said the timeline doesn't work that way. Then no API we can give you unless we change how that timeline works. Can things be considered "done" before you return to the event loop?
MS: in general - in Wasm you don't return to the event loop until next frame. So, can't map buffers, and can't wait on submitted work, until the next frame. Breaks a lot of stuff.
KG: we'll keep thinking about it.
KN: if we have a function that creates 1 ms janks, and people use and complain about it - it's still better, 90% of people needed it and will take the perf tradeoff if they wanted.
KG: not comfortable with doing this given that I wasn't able to convince a team at a member company here to follow the best practices for the existing API (WebGL 2.0). Soured me on this notion, unfortunate.
KN: is it slower in your browser than others?
KG: it is.
KN: WebGL readbacks?
KG: yes. Some things we can do to make it better, but I have other things I have to work on. Still disabled in Firefox.
KN: maybe put it off until we can try to get matching performance.
KG: considering reverse origin trial. If some origin starts doing poorly performing things we can push them to fix their code. Right now people are choosing a different browser for these things.
KN: we do interventions for things like this - slightly different.
KN: let's just have Wasm solve this async problem.
KG / MS: would be nice.
KN: some progress has been made, but not sure if any in the past 2 years.

Streaming implementations and indirect draws/dispatches #2189

KR: there's something in ANGLE's Metal backend where it enqueues an empty command buffer and fills it in later. Effectively goes back in time. Would this not work?
KN: issue - may have to split at the beginning of each render pass. Split one WebGPU cmdbuf into many Metal cmdbufs. Think there's a way to reduce the number of splits - that's the way I'd go. But imp't to Apple.
KG: if you had a transition that you inserted, converting something writable to being indirect source - happened outside render pass - that's when we would do it.
KN: yes, force users to split their own cmdbufs.
KG: or even offer to let them batch these. Goes back to voluntary barriers. Is there a reason we'd want to give people the option to eagerly transition things? This would be an example.
KN: can be in the command encoder. Switch this buffer from write to read.
KG: would be cool to put this in the compute pass.
KN: can happen with draw calls too. Can be in the compute pass, or top-level encoder for render. Maybe going in that direction's worthwhile? A bit breaking, but don't think it's a huge barrier for people.
KG: breaking API or precedent?
KN: the API. And introducing a possibly complex concept.
KG: will try to write this down - it'd be voluntary, wouldn't break the API, just tell you I'll be doing indirect calls from this.
KR: not to be contrary but if it's optional then most developers won't get it right.
KN: can imagine some engines would have trouble - wouldn't know they have to do this before starting their render pass. But it's an option. They can always split their command buffers before the render pass so they can insert it.

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388

KN: nobody satisfied with current state, but nobody has a better idea. Everyone's resigned to this fate except Myles. :)
KG: one thing that has changed since it was first discussed - more common today than 2 years ago to get adapters that let you map CPU read and host read/device use - used to be UMA archs only, and some AMD cards - but has changed now.
KN: right. Intel doesn't even have some of these options (host-coherent + device-coherent?). Think we could do this on Intel regardless.
KG: if something we can't support - don't want to fragment the ecosystem by making you write 2 paths. If things have changed - still do need that.
KN: we don't have solution for doing this underneath the hood of the application. Can do it with an extension. Would like to. Maybe we should do it for 1.0. Would need separate code path for application.
KR: WebGL doesn't have the ability to optimize for this and performance in this area is basically fine. I think WebGPU will also perform fine in general without this optimization, and since applications will have to add a new code path to take advantage of it, think this should be pushed out to post-V1.

Agenda for next meeting

Next meeting: Wednesday March 2 APAC 5pm-6 (America/Los_Angeles)
(Tacit resolution queue) Implicit pipeline layouts #2550 #2470
- Editors (now just Kai/Brandon) don’t feel it’s worth the effort to remove GPUPipelineLayout object. Would still like to introduce partially-automatic pipeline layouts, but don’t feel it’s necessary for V1. Want to make one breaking change: make automatic layout explicit (“auto”) instead of implicit (optional dictionary member).
Discuss the WebGPU Adapter Identifier explainer before reaching out to the PING.
(Jiawei) Expose optional texture capabilities in WebGPU #2630
[Process] Extensions directory in gpuweb/gpuweb #2301
[Process] Requirements for additional functionality #2424

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Web 2022 02 23

GPU Web 2022-02-23

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

Discuss the WebGPU Adapter Identifier explainer before reaching out to the PING.

Limit for the maximum buffer size #1371

[Process] Extensions directory in gpuweb/gpuweb #2301

[Process] Requirements for additional functionality #2424

mapSync on Workers - and possibly on the main thread #2217

Streaming implementations and indirect draws/dispatches #2189

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388

Agenda for next meeting

Clone this wiki locally