Skip to content

Minutes 2021 10 11

Corentin Wallez edited this page Oct 15, 2021 · 1 revision

GPU Web 2021-10-11

Chair: Corentin

Scribe: Myles, Ken

Location: Google Meet

Tentative agenda

  • CTS updates
  • Milestone triage (timebox 15m) #2073
  • Consider making ShaderModule compilation errors special #2119
  • Refactor API for load/store OP and read-only attachments #2131
  • Consider removing GPUCommandBuffer.executionTime #2156
  • Remove COPY buffer usages #2172
  • Agenda for next meeting

Attendance

  • Apple
    • Myles C. Maxfield
  • Google
    • Brandon Jones
    • Corentin Wallez
    • Gregg Tavares
    • Ken Russell
    • Loko Kung
  • Microsoft
    • Damyan Pepper
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Mehmet Oguz Derin

Administrivia

  • Doodle poll to find a new time for this meeting; Corentin has a conflict, but could move it with difficulty
  • Jeff Gilbert, could you ask the same of the WGSL meeting attendees?
    • JG: sure, will do.
  • TAG review: one TAG member reached out privately, said WebGPU was pretty big, but they should have looked at it a couple of weeks ago. But review coming soon.
    • GT: still a lot of areas in the spec saying "missing", "needs spec", etc. How will TAG be able to do their work?
    • CW: TAG's not expert on details - don't need e.g. texture sampling description. They're interested in integration with JS, web platform, etc. Think spec's in good shape in these areas. Ig you think there are TODOs needed for the TAG to do their work please highlight them.

CTS updates

  • CW: not much this week; regular contributors from Google/Intel not on CTS this week.

Milestone triage (timebox 15m) #2073

  • #2171 specify cube texture filtering as seamless
    • CW: we all agree this is V1, just need specification of it.
  • V1 triage
  • #150 perf benefit of GPUBufferUsage - skipping for this week
  • #168 - usage bit for view with different formats when creating texture
    • Did investigation, haven't spec'd anything yet
    • MM: think post-V1's OK.
    • CW: anything painful about RGB / sRGB?
    • MM: I made a big table...every API allows sRGB reinterpretation.
    • MM: first pass - simple version that's obviously correct. Complex bits post-V1.
    • CW: for Vulkan, still need a flag - RGB/sRGB reinterpretation can be expensive. Need to discuss that further.
    • CW: worth putting this issue on an agenda.
  • #234 shader programming model: use little-endian layouts for scalar data.
    • CW: already in the spec?
    • JG: needs-specification.
    • CW: moving to WGSL.
    • KR: TypedArray are specified to match the native endianness.
    • <discussion about native vs. little endian>
    • MM: We need to write something in the spec. Either little endian or native endian. Noop for Metal.
    • JG: Don't think we can do native-endian, even if the host is big endian, then we would not be able to run the GPU big endian.
    • MM: If device and host don't match you could say "whoops you don't get WebGPU".
    • GT: think it should match. If it doesn't then all WebGPU code that doesn't try to check and byte-swap their data then that WebGPU code will be incorrect.
    • JG: I don't think we can win here, aside from picking a working combination and saying "you don't get WebGPU" for all others. Don't think we want to polyfill, we could but don't want to.
    • GT: I think it's either the browser says, if they don't match you don't get WebGPU, or you pour it on every developer.
    • KR: AFAIK there has been no situation in the past where the GPU endianness is different from the CPU endianness. At least starting when API started adding batch data stuff (vs. immediate vertex mode).
    • KR: When discussing TypeArrays in TC39 and Khronos, pointed at Nitendo HW that had big endian CPU and big endian GPU. So TypeArrays keep the native endianness. So think we should follow the same in WebGPU. If for some reason a system doesn't have matching endianness, then we can expose no adapter.
    • JG: there are probably systems where this is true - where endianness differs between CPU and GPU - and we don't want to discard them. Would like to do what our underlying APIs do. That should be enough. We don't need to forbid these. If people want to handle it, they can.
    • MM: sounds like there's discussion to be had. Mark Needs-Discussion to triage.
    • CW: OK, sounds good. Vulkan says the endianness matches between host CPU and GPU.
    • JG: even in SPIR-V?
    • CW: yes, that's what Vk says.
  • #244 Changing ArrayBuffers to ArrayBuffer + srcOffset.
    • CW: Q from Google folks about this. Is GC still an issue?
    • KR: pretty sure partners like Unity will want this. They make a lot of graphical API calls and don't want to have to fake up a temporary ArrayBufferView for every call out.
    • JG: we have had this discussion multiple times before with the web platform folks.
    • Discussion.
    • CW: come back to this issue, need at least a benchmark from years ago with WebGL to shed more light on this issue.
  • #323 out-of-bounds indirect dispatch handling
    • CW: dispatch size in indirect dispatch is too big compared to limits?
    • DM: not sure this is in the spec already.
    • CW: I'll check, and close the issue if not the case, otherwise Needs-Specification.
    • MM: what was the resolution?
    • CW: we agreed we need it, need to no-op the draw call if it's out of bounds.
    • MM: OK. If the browser's responsible for no-op'ing these, all browser has to execute the compute shader before the dispatch?
    • CW: yes.
    • DM: before any indirect dispatch call.
    • MM: yes.
    • MM: was going to open an issue, believe you can have 2 compute passes in a single command buffer. We have to have rules: it's possible to have rules which invalidate the dispatch…
    • CW: you mean, you can't both produce/consume storage indirect buffer?
    • MM: you're not allowed to produce/consume in the same command buffer. Need to be able to insert command buffer to perform this check.
    • MM: getting off-topic, we can discuss when I open the next issue.

Consider making ShaderModule compilation errors special #2119

  • MM: don't think we reached conclusion last week. I was trying to understand the problem.
  • CW: everyone up to speed on the problem?
  • DM: I understand the problem btu don't understand how much of a problem it is. ShaderModule creation in Dawn's parsing WGSL. Not that much work.
  • CW: parsing, validating and reflecting.
  • MM: we added a new term for it - "we don't know what happened" in WGSL - the results of the platform's compilation failure?
  • CW: can happen at pipeline creation time.
  • MM: happens when platform compiles module. Want this during createShaderModule.
  • CW: drivers do register allocation at pipeline creation time.
  • MM: I see.
  • CW: we can have "unexpected / unknown" errors. Does it change this?
  • CW: means we can have expensive work done during shader module creation time.
  • MM: yes that's what I meant.
  • CW: makes sense? : (shouldn't decide this without Kai) add third kind of validation, something that captures shader module creation errors, and then pushErrorScope should capture shader compilation errors.
  • MM: think queueing up errors isn't hard. Agree with DM's thesis. If program wants to compile 100 shaders at once, might need to delay one program's reported errors
  • KR: If we want to agree on a consistent timeline for the errors, this is the issue. We want to shader module creation to be on multiple threads and delaying the shader module errors isn't a problem. But blocking other errors on it is a problem. Want to report out of order.
  • MM: If you can queue up compilation errors, why not queue other errors.
  • KR: Because the work on the device could be happening right now, and not multithreaded. It is not just shader parsing, but device level work to create MTLLibrary for example. Other errors could be queued up for arbitrary amounts of time. Shader compilation is super long in WebGL, most likely the same for WebGPU. Would like to reorder errors. In WebGL applications put KHR_parallel_shader compile super easily and it helps a lot.
  • JG: If you are only using error scope to get errors synchronously...
  • CW: yes you get the errors async via shader module compilation info. But if you report these in ErrorScopes for draw, etc., then those errors can be delayed for arbitrarily long periods of time.
  • MM: is the complaint that "i don't want to write this code" or "it will harm applications"?
  • CW: will harm applications. they compile 100 shaders, takes 10 seconds, then do other init, and their errors will be delivered 10 seconds late.
  • MM: in this program you're describing, they pushErrorScope, do 100 compialtions, do a draw, then popErrorScope, they get their errors 10 seconds late?
  • CW: yes.
  • JG: can we filter based on things? Maybe want it for your draw calls but not shaders? Use different ErrorScopes for this?
  • CW: we do have mechanisms for filtering ErrorScopes. Validation's one thing, not an ErrorScope? Write it in the things you want filtered.
  • JG: what if we had a compilation error filter?
  • CW: would address this concern...path in a good direction, but wouldn't address the nesting problem.
  • MM: so you'd still need a queueing mechanism...I don't think this is a problem, but also adding a new filter isn't a problem.
  • CW: think that would help solve the problem. Ties into the ordering of Promises on the Device. Maybe we should just say: I don't care about reporting errors, I'll just look at CompilationInfo.
  • DM: why are we trying to solve the shader compilation? Buffer creation, for example, can be slow. Allocating memory can be slow.
  • KR: From experience it is far faster to do any of these things, then to do shader compilation. Some shader compilation can be multi-second and dwarf any other operation you do on the GPU. Found 20-30 second pauses in some content in WebGL on Unity exports. We need to decouple shader module creation from the rest of the timeline.
  • MM: It is true that buffer creation can be slow, AND that shader creation is (usually) slower.
  • JG: Think that KR is bringing up that sometimes shaders take a long time to compile. Seen some "real" applications where shader take 10 seconds.
  • DM: Nobody will compile shaders at shader module creation. Only Apple might compile MTLLibrary at createShaderModule time (when the hint is present). So it should be pretty fast compared to pipeline creation.
  • KR: We can't know that we won't be able to do mor at createShaderModule creation time.
  • JG: Think we should just have the filter.
  • KR: We should have reordering.
  • DM: Reordering is fine, we should say that reordering cannot be done for dependent operations.
  • DM: The new error filter for compilation is too much because there's two asynchronous error reporting things for pipelines and shader modules.
  • CW: createShaderModuleAsync?
  • JG :discussed problems with that in the past.
  • MM: would be in addition, though. The previous problems with with a replacement API - This one would have to be a sibling function.
  • CW: yes. (1) would be same as pipeline creation. (2) … (both async?)
  • MM: agree, ship it.
  • CW: and Promise is already resolved. Have CompilationInfo.
  • GT: if you have 2 Promises and wait for them out of order you'll never get the results…
  • JG: Promise order's well defined.
  • MM: he's talking about poorly written code.
  • GT: can design an API that makes code hard to break. If you're saying, I wait for 2 Promises and can't guarantee the order, need Promise.all()
  • JG: if you run into a situation where we need that … we can spec it.
  • CW: createShaderModuleAsync, we could say things are resolved in order...gives you consistent ordering. This is an option.
  • CW: I'll write something for createShaderModuleAsync. JG, would be interesting to see what the shader module creation filter looks like.
  • MM: excited to see both these proposals.

Refactor API for load/store OP and read-only attachments #2131

  • DM: I think this is editorial. Do you want to discuss it?
  • CW: maybe just explain what's going on?
  • DM: not changing semantics. Just the way user specifies the same info.
  • JG: that's a real change. Editorial = things slightly changed.
  • DM: identified a few issues with the current way load/store ops are specified, specifically for depth/stencil, and how they interact with RO D/S.
    • Need to spec for both D/S, although view may only have one of them.
    • If your depth is RO, load/store ops are technically redundant. Neither have same values. Neither discard, nor store for the store op, reflects what will be done.
    • Opportunity to rethink these 3 things for less confusion. Haven't figured out exact solution yet.
  • CW: contributions welcome. Lots of design possibilities. None of the ones we have here are particularly obviously the right solution.
  • CW: slightly agree that this isn't purely editorial. Modest restructuring. Let's make progress offline. Discussion here = inefficient.

Consider removing GPUCommandBuffer.executionTime #2156

  • Deferred until next time.

Remove COPY buffer usages #2172

  • CW: do we really need these usages? Kai & other folks at Google think these can be removed and they can be implicit.
  • CW: if you use MAP_READ can use as copy dest (?) - for many reasons, impls will request the copy usages anyway, for 0-init, workarounds, etc.
  • CW: trivial extension for textures.
  • CW: please comment on the PR.

Agenda for next meeting

  • Refactor API for load/store OP and read-only attachments #2131
  • ...same agenda as this meeting? …
  • We'll have proposals for ShaderModule creation.
  • Might have progress on RO attachment, load/store op.
  • Need to talk about executionTime.
  • MM: as long as ppl do their homework so we can make progress.
  • Administrivia: Austin samples into gpuweb org?
Clone this wiki locally