GPU Web 2023 11 29

GPU Web 2023-11-29 Atlantic-time

Note that unless stated otherwise this is a GPU for the Web community group meeting and not a working group meeting.

Chair: CW

Scribe: KG, CW

Location: Google Meet

Tentative agenda

Administrivia
- Welcome Jim as spec editor!
CTS Update
“Tacit Resolution” Queue
WebGPU Compatibility Mode (remaining issues) #4266
- Add Compat Limits to Proposal #4397
The bgra8unorm-storage feature should most likely only enable write-only access #4377
Dual Source Blending Investigation + Feature Proposal #4283
Problem measuring time deltas with writeTimestamp #3952
requiredLimits must be an ordinary JS object #4277
Allow GPUBufferUsage.QUERY_RESOLVE | GPUBufferUsage.MAP_READ buffer usage #4383
Agenda for next meeting

Attendance

WIP, it is the list of all the people invited to the meeting. In bold the people that have been seen in the meeting:

Apple
- Dan Glastonbury
- Mike Wyrzykowski
ByteDance
- Yunchao He
Cocos
- Huabin Ling
- Zeqiang Li
- Zhenglong Zhou
Google
- Austin Eng
- Ben Clayton
- Brandon Jones
- Corentin Wallez
- Dan Sinclair
- David Neto
- Gregg Tavares
- Kai Ninomiya
- Ken Russell
- Loko Kung
- Shannon Woods
- Shrek Shao
- Stephen White
- Ryan Harrison
Intel
- Brandon Jones
- Bryan Bernhart
- Hao Li
- Jiajia Qin
- Jiawei Shao
- Jie Chen
- Shaobo Yan
- Yang Gu
- Zhaoming Jiang
Microsoft
- Jesse Natalie
- Rafael Cintron
Mozilla
- Erich Gubler
- Jim Blandy
- Kelsey Gilbert
- Teodor Tanasoaia
Nvidia
- Anders Leino
- Thomas Meier
Snap
- Corvyn Kusuma
- Dhritiman Sagar
- Sumant Hanumante
Unity
- Brendan Duncan
- Dave Shreiner
Connor Fitzgerald
Deyu Tian
Francois Daoust
James Moir
Mehmet Oguz Derin
Michael Shannon
Rajesh Malviya
Rob Conde
Russell Campbell

Administrivia

Corentin: Welcome Jim as spec editor!

CTS Update

Kai: More work going on with compat testing. Iterating on the cache for shader execution test cases.
Corentin: Jiawei is doing RW storage texture tests.

“Tacit Resolution” Queue

Empty

WebGPU Compatibility Mode (remaining issues) #4266

Add Compat Limits to Proposal #4397
Gregg: Only big question is the 3D texture limit if we can keep it at 1k instead of just 256, otherwise don't think there's controversy.
Gregg: basically spec limits or what's actually available in hardware, for example the 3D texture limit. Also the maxStorageTexturePerShaderStage that can be 0 but no device does that.
Corentin: Please take a look offline.

The bgra8unorm-storage feature should most likely only enable write-only access #4377

Teo: Was an oversight when we landed RO/RW PR for storage texture because in the investigation we only looked at read-write storage textures, and then we added read-only storage textures. The general case works for all formats but this one which requires an additional capability compared to the other. So we should only allow writeonly for that texture format.
Corentin: We have consensus to make it only allow writeonly.

Dual Source Blending Investigation + Feature Proposal #4283

Corentin: Only thing is to bikeshed the WGSL syntax, see Jim's comment https://github.com/gpuweb/gpuweb/issues/4283#issuecomment-1758400763
CW: Most people seem satisfied with #2 and #3 (@location(0) @blend_source(0) and @location(0) @blend_src(0) respectively)
TT: I would prefer #3 as well, since it’s most similar to what we do on API side.
KN: What would make it required?
CW: Only when there’s one that's blend_src(1) as well.
KG: What happens if WGSL doesn't write to the blend_src.
TT: The question is what if you use the src1 factor but don't specify it in the shader?
KG: Grammar-wise blend_src is optional, when would blend_src(0) be required?
KN: We would require you to write @blend_src(0) if you also have a @blend_src(e.g. 1) on the same location. (but otherwise @blend_src(0) is implied)
TT: Would like it to not be optional since they are often written together.
CW: Consensus: blend_src(N), if blend_src(N) is used for a location, all other outputs for the same location must have a blend_src(M).
CW: What about question of setting blend_src on API side, but not writing blend_src(0) in WGSL?
TT: Remember looking at what happens in the backend APIs and it is undefined to use src1 if it is not in the shader.
KN: We already have rules like that for alpha / color.
CW: Ok let's validate it out.

Problem measuring time deltas with writeTimestamp #3952

Kai: Been monologuing on the issue, trying to understand what we can guarantee. In practice the timestamps must be working in Vulkan / D3D. The issue is that writeTimestamp but also timestampWrites in passes (implemented with regular writeTimestamps on Vulkan / D3D). There must be an expectation that this works or the APIs would be unusable. Basically the timestamps don't get pushed too far in the command buffer? Don't see a way to synchronize them though. Should reach out to D3D/Vulkan people to understand it better.
Kelsey: Good investigation, but we don't need them to be perfect for devs, they already deal with the imprecision. It's still valuable to devs, and valuable for us to expose it as devs use it in native APIs and it's enough for them.
Kai: Agreed, would like to be clear about the guarantees we provide so developers don't have to re-do the investigation we did. They mostly should work so that's ok. We have an issue where writeTimestamp doesn't work outside of passes on Metal.
Corentin: Metal can reorder encoders under the as-if rule, except that timestamps are real.
Kai: Couldn't make writeTimestamp work as expected on Metal, but we removed it. So what we have in the API seems fine. I would like to reintroduce writeTimestamp but not 100% sure how to do it.
CW: Seems like we solved this issue, and can do an investigation for writeTimestamp offline (for Metal) and come back to writeTimestamp later?
KN: Think we can close the issue and see later for writeTimestamp not that important. Although devs care about it?
[coming back to this later]
RC: Despite its name, the only things in D3D12 command queues that have an explicit ordering are pixel shader operations, buffer copies and timestamps. Other than those, the GPU/driver is free to reorder whatever. Adding timestamps, therefore, adds ordering and may make things slower. If developers want to analyze their workloads, they should really be using tools like RenderDoc, PIX and others on their local machine. Using timestamps in the wild is subject to other randomness on the system. D3D has APIs that let you turn this off, though we shouldn’t make these be a web standard.
RC: Link to a doc: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device6-setbackgroundprocessingmode

requiredLimits must be an ordinary JS object #4277

Kai: We thought you could pass in the object for adapter features to the device descriptor, but that turns out to not work and results in an adapter that uses only the default limits. We could fix this, but don't need to do it. It's just surprisingly difficult to debug.
Kai: Bunch of possibilities. Could special case GPUSupportedLimits in there. Could search for properties on the object. Editors found 6 options.
Kai: Does somebody care a lot or editors should just fix it.
Jim: This is surprisingly more complicated than it seems.
JB: [...] When you move a feature into core, it’s still in the feature list, still advertised, we just require it to be present. That’s really just an expectations change, it shouldn’t change behavior of API, right?
KN: Idea of moving it to core is that we enable it even if you don’t request it.
JB: So that’s always [has potential to] breaks things
JB: Absent that caveat, it seems like #5 is nice behavior.
KN: Generally speaking, we considered adding features to core as “non-breaking”
KN: Copying functionality from extension into core should then also be considered “non-breaking”, but it’s a little different.
KN: [...] So we never introduce validation, but it’ll be observable. Same as adding anything to the platform. So we just have to guess how likely it is to break.
JB: So we would never add feature-gated validation, and so you’re saying that this would be that, and so we shouldn’t do that?
CW: That sounds correct.
JB: I understand why we wouldn’t add FGV.
KN: So the limit we ignore is [...]
JB: How does option 4 fare?
KN: 4 is my favorite option
BJ: Diff between 4 and [?] is that we would [always be able to detect?] Even though features are introducing new limits, they wouldn’t introduce new validation failures.
KN: That needs to be fleshed out but I think that’s correct.
KN: In 4, if we see a limit bytesPerAlignment, we will check that it’s 64 basically. And we do that regardless of if you ask for the feature. If you ask for feature, you can set it lower.
TT: [...] If an implementation doesn’t yet implement it, it’ll fail on the that implementation
BJ: If you have a limit associated with a feature, we would effectively treat that as a typo, and we would reject it the same way. In the case of the typo, we say “you did something wrong”, if we know about the limit but the feature isn’t turned on, we treat it as if we don’t understand the limit. It’s only if feature enabled, that we recognize it at all.
CW: Seem like we’re going for option 4.
KN: Editors will flesh it out and get it done.

Allow GPUBufferUsage.QUERY_RESOLVE | GPUBufferUsage.MAP_READ buffer usage #4383

CW: One reason, we require devs to jump through hoops to use these.
CW: Also, on Compat, query results are only available on the CPU.
SW: Right now we immediately turn around and upload them and that’s suboptimal.
CW: Two reasons why we want QR w/ MREAD. 1) It’s less code for people who just want to read this. 2) For compat, we may want to disallow resolving into a non-mappable buffer, so you don’t have to stall to reupload them.
SW: We thought we’d be able to support both. If you ask for [...]
KN: You hit the fastpath if buffer usage is QRES and MREAD. [...] Sounds complicated. Resolves are also those[?]. I would prefer to just issue a warning if you use it with any usage other than map-read on compat.
KG: Historically we didn’t allow both of these together, and we haven’t talked about this yet. [...]
CW: We historically separated upload and download and we still need to get back around to UMA optimizations. It does open the pandora’s box of MAP+other
BJ: So to be clear, we’re saying we’d allow query-res+map-read, but not e.g. vertex+query-res+map-read.
CW: Yes, I mean eventually we want that probably, but for now yes.
BJ: I think the compat argument is more compelling than the hoop-jumping. I think when devs are doing this stuff, adding [just one more hoop to jump through] isn’t too onerous. So it is an ergo improvement, but I don’t think that should be compelling to us.
KN: Especially for query results, I think [ergo-wise] and async getBufferData would solve this nicely.
KG: +1, I think getBufferData would let us [satisfy the ergonomic issue, both here and elsewhere].
KG: query-resolve isn’t a high-bandwidth thing, so eating an extra copy isn’t that bad.
[...]
KN: In a compat implementation, could special-case query-res+copy-src+nothing-else such that it didn’t immediately re-upload.
KN: Could add a way to asynchronously get buffer data but only from query-resolve.
KG: Why not from any COPY_SRC? COPY_SRC can be a copy from-anywhere to-anywhere. The presence of COPY_SRC on a QUERY_RESOLVE doesn’t mean that you need to upload the data or otherwise get it ready to be a gpu-side COPY_SRC. It can be a cpu-side src, and an enqueued copy [to a gpu-side resource] would then be an upload, like normal form e.g. a MAP_READ/MAP_WRITE+COPY_SRC.
KG: I think we all need to think a bit more offline. I do suspect that doing the more general implementation work here is not harder than doing these proposed special-cases.
KG: Also, is this M2?
CW: I think this is M1, because of the compat aspect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly