Skip to content

GPU Web 2023 11 01

Corentin Wallez edited this page Nov 7, 2023 · 1 revision

GPU Web 2023-11-01

Note that unless stated otherwise this is a GPU for the Web community group meeting and not a working group meeting.

Chair: Jim Blandy

Scribe: Ken Russell

Location: Google Meet

Tentative agenda

  • Administrivia
  • CTS Update
  • “Tacit Resolution” Queue
  • Investigation: multi-draw-indirect #4349
  • HTMLCanvasElement copyExternalImageToTexture under specified? #4255
    • Using CopyExternalImageToTexture() to copy from webgl canvas might get blank result discussion#4356
  • Reference High Resolution Time spec #4175
  • Support for arrays of textures #822
  • Agenda for next meeting

Attendance

WIP, it is the list of all the people invited to the meeting. In bold the people that have been seen in the meeting:

  • Apple
    • Dan Glastonbury
    • **Mike Wyrzykowski **
    • Myles C. Maxfield
  • ByteDance
    • Yunchao he
  • Cocos
    • Huabin Ling
    • Zeqiang Li
    • Zhenglong Zhou
  • Google
    • Austin Eng
    • Ben Clayton
    • Brandon Jones
    • Corentin Wallez
    • Dan Sinclair
    • David Neto
    • Gregg Tavares
    • Kai Ninomiya
    • Ken Russell
    • Loko Kung
    • Shannon Woods
    • Shrek Shao
    • Stephen White
    • Ryan Harrison
  • Intel
    • Hao Li
    • Jiawei Shao
    • Shaobo Yan
    • Zhaoming Jiang
  • Microsoft
    • Rafael Cintron
  • Mozilla
    • Jim Blandy
  • Rob Conde

Administrivia

CTS Update

  • KN: Chrome team's been having trouble with CTS -> Chromium rolls. Slowness, flakiness, major timeouts. We're working on fixing this stuff. Declared an emergency - "Code Yellow".
    • Fixed a lot of our problems
    • Also fixed some performance problems in the CTS
    • Should benefit other vendors as well.
  • JB: we're also working on getting some of our internal CTS problems under control.
  • KR: want to highlight some changes that came to my attention
    • Some incoming constant data being mutated during a test, causing flakes in subsequent tests (Loko). TypeScript's now updated to prevent many cases like this from happening (Kai).
    • Some shader execution tests were waiting for async pipeline creation before continuing to the next pipeline. Sped up some tests by ~4x. (Brandon)
    • Many others:
    • KN: others: device lost tests. Bit different from the others. Most tests reuse a device, these don't. Caused a lot of devices to be created - interesting stress test of implementations. Had flakes in D3D with hangs acquiring devices. Reduced the flake rate by making sure these tests run single-threaded rather than in parallel with other copies of themselves (Austin). But might come up again.
    • KR: highlighting that Gregg found device leaks / command queue leaks in the validation tests.
    • KN: might still be flakes here - device or queue leaks.
  • JB: this is all fantastic. Hats off. We're looking at this exact area too. Didn't help that Moz's impl had TODO for device destroy. :D

“Tacit Resolution” Queue

  • Empty

Investigation: multi-draw-indirect #4349

  • KN: feature request to add multi-draw indirect. Adds multi-draw and multi-draw-indexed indirect. Allows them to be issue multiple indirect calls at once. Either a fixed number, or the # of draws can be pulled from another indirect buffer. 2 features in Vulkan, but 1 feature in other backends. Made sense to bundle them into 1 feature.
  • KN: wanted to pull this into Milestone 1 because it's a big feature gap between WebGPU & native backend APIs. They can get a lot of perf by not bouncing back to CPU. Other vendors' feedback on which milestone?
  • JB: my impression is that M1 is what we already shipped, and that we're landing stuff as we go.
  • KN: M0 is what we shipped. M1 is the one we're currently adding things to the spec. M2 is things we're holding off on adding to the spec. Since this is an optional feature I think it's OK to add to M1. But large enough that we might want more impl experience with it before we do that. Don't know impl status of this, or prototyped in any impl. Not surprised if it's in wgpu already.
  • JB: don't know whether it's in wgpu. In general Moz is fine with considering this for M1, but would like impl experience before we land something.
  • KN: wgpu does have both of these features btw. Just checked.
  • KR: think wgpu already has hardening in place?
  • JB: we're actually not even shipping indirect draws because we don't have the needed compute shader hardening, but it's a high priority item. Widely used, imp't for perf.
  • KR: do we know whether Chromium's implementation of indirect validation's implemented in WGSL so it'd be easier for other vendors to pick up the impl if they wanted it?
  • KN: think it is, but not sure there's a lot of code for that.
  • JB: sounds good.
  • JB: sounds like, we're interested in seeing spec language & impl reports. OK to start talking about standardization at that point. Consensus to move forward.
  • RobC: nothing to mention, glad people are on board. Handling of max draw count? Was objecting to myself about it. Other thoughts? Max that depends on a feature. There's a thought about how you could eliminate the max, not sure if it's based on good wisdom.
  • KN: need a chance to read what you wrote.
  • RobC: spec now says the default's "1" even when you don't have the feature. kvark suggested getting rid of it, having implicit limit. There'll be some max binding buffer size. Max draw count past that doesn't make sense. If that's the implicit limit & most impls follow that rule, that could be an implicit max. (more discussion, didn't catch it all)
  • KN: should see what actual devices have. Think we can add a limit with the feature. Haven't done this yet. Thoughts: feature request, wouldn't be allowed to request higher limit unless you also enable the feature. Straightforward to implement with current spec rules. Think we could have a limit. Nice to not have a limit though.
  • KN: anyone going to try implementing it?
  • JB: FF definitely not until 2024.
  • RobC: haven't done impl work on this. Now that indirect-first-instance is now fixed in Chrome, we could polyfill this until we get this. So our team isn't blocked on this.
  • KN: this probably means this is effectively M2 until it's prototyped. But can leave in M1 for now, optimistically. Have to triage milestones more.
  • JB: M2's fine.

HTMLCanvasElement copyExternalImageToTexture under specified? #4255

  • GT: not sure what happens if canvas/texture size don't match. In WebGL, canvas width/height aren't the size of the texture, it's drawingBufferWidth/Height, and these may not match.
  • KN: tried to solve this on HTML spec side, had hard time. Couple of years ago. Lot of back-and-forth with annevk. How sizes work for canvas, WebGL, ImageBitmap rendering, etc. Don't remember where that left off. Not sure what the behavior is in WebGL or if it's consistent.
  • GT: So - hard to spec, need to write tests and tell people to pass them?
  • KR: No way to trigger the behavior where width!=drawingBufferWidth with CSS.
  • GT: Make a canvas larger than the GL max texture size and it should always trigger this behavior. Doesn’t have to be added to the document.
  • KR: No concern with Shaobo’s suggestion to use drawingBufferWidth/Height as the width/height for the copy.
  • GT: not hard to make a canvas large. Just zoom out 6x. If multiplying by devicePixelRatio you'll get a 5X canvas.
  • SY: as developer, I have WebGL canvas, want to copy into WebGPU. If I render, I might get different content size than canvas width/height. Want to see this fixed. Maybe drawingBufferWidth/Height is a good option.
  • KR: seems possible to write a fairly robust test for this.
  • JB: what if we copy from a WebGPU canvas? Similar problems?
  • KN: no, WebGPU doesn't have this behavior where drawing buffer you get can be smaller than what you asked for. It fails instead.
  • JB: is the next step to write tests?
  • GT: write spec, then write tests. :)
  • KN: probably good idea to also write tests for WebGL & 2D canvas, make sure they also do the same thing. Can go into WebGL CTS b/c both are interactions with WebGL. No guarantee that HTML people will like it.
  • KR: can this not be in the WebGPU spec?
  • KN: it can go there. WebGL -> 2D canvas though, should be in HTML spec. Our spec'd then pick it up automatically. If defined there, might not need to special case.
  • KR: that's CanvasRenderingContext2D.drawImage's behavior right?
  • KN: yes.
  • GT: you can pass canvas to toDataURL, toBlob, createImageBitmap, …
  • KN: it's something we do need a definition for ultimately. Also where you can encode canvas to video stream. Should have definition for this. But this case, think it's OK to special case in our spec.
  • GT: other question: thought videos can change size on the fly.
    • KN: yes.
    • GT: how does that work with importExternalTexture? No width/height provided to WebGPU.
    • KN: it captures particular frame, which has well-defined size. Best-guess situation? requestVideoFrameCallback can snapshot width/height at the right time? Preferable, add width/height properties to the external texture object.
    • SY: in external texture impl details, it expects from WebCodecs video frame or HTMLVideoElement. Think we use the video frame size, should resolve this problem.
    • KN: external texture's size is somewhat well defined but no way for app to check what it is. We have width/height properties on GPUTexture, just not GPUExternalTexture.
    • JB: probably should table the video discussion, sounds complex, but file issue & put on future agendas.
  • Using CopyExternalImageToTexture() to copy from webgl canvas might get blank result discussion#4356
    • SY: other issue, from our experience: user can use copyExternalImageToTexture from WebGL canvas, get black results. WebGL spec, when you copy from WebGL canvas w/o preserveDrawingBuffer behavior's undefined. (KR: not accurate - only if you do work across different requestAnimationFrame callbacks.) Seems like surprising behavior esp. if they don't know WebGL. Could we add a note to warn people when copying from WebGL canvas?
    • SY: if we can copy WebGL canvas contents before presenting, might get expected results. Any spec that defines well when is "before presenting"? In Chrome we could use rendering to WebGL canvas & copy to WebGPU texture in same render loop - that works. preserveDrawingBuffer may slow performance.
    • KR: This is well-defined in the WebGL spec. Having a warning note in the WebGPU spec about how to use this reliably, and noting the performance impact, would be fine. But WebDL’s behavior is well-defined.
    • KN: we also tried to define this strongly in the WebGPU spec. But we have different guarantees than WebGL. If you're in the same Task, then rendering's guaranteed to still be there. Think we should put a note in the spec. We should also put a warning in place if this happens, if you copy from a WebGL canvas that was already presented+cleared.
    • JB: are warnings called out in the WebGPU spec?
    • KN: we call some out, but they're non-normative.
    • JB: AIs: Gregg: spec work/tests. Proposal for the "warning note" in the spec, recommended warnings in impls when working with already-presented canvases. Anything else?

Reference High Resolution Time spec #4175

  • KN: I was assigned to discuss this. We have timestamp queries in WebGPU. Haven't shipped yet, would like to. Requires figuring out coarsening. Bug suggests we point to the W3C high-res time spec to prescribe particular behavior. That spec says that timestamps should be coarsened to 100 us (microseconds) if non-origin-isolated and 5 us if isolated. Seem reasonable to me. But one thing: our timestamps aren't running in the isolated context, but on the GPU. Typically shared between isolated contexts. Even if not (separate GPU process per isolated process), GPUs don't isolate things as well as CPUs nowadays. Tentatively thinking we should say: always behave as if non-cross-origin-isolated, coarsen to 100 us.
  • JB: makes solid sense.
  • JB: security paper we were alerted to last month, using GPU for timing attacks through GPU cache. One takeaway: high res clocks on the GPU. If people can construct their own clocks it may not matter if we coarsen them. But, these are measuring something different, hard to roll your own shader for? The fact that you can get high-res timings from compute shaders probably doesn't make it less necessary to coarsen these timestamps.
  • KN: wouldn't be confident. Might be possible to get 2 pcs of code to run on the GPU at the same time. Probably 2 cores, separated by buses. 2 things running at the same time using the same storage area - 2 draw calls in same render pass. They can run at the same time, so interference is possible. But not confident you can build a timer with that.
  • JB: both Moz & Goog's response to this was - this was not the attack people will use. Exotic. There are less exotic attacks. No action in response to this report. If not addressing this disclosure, is it imp't for us to coarsen this timer at all? What are the security implications?
  • KN: 1) don't know 2) we know it's relatively easy to coarsen these. These are much easier to use. You can do high-res time on the GPU with these. Dubious to do this any other way. And easy for us to fix, think we should.
  • KR: There was an attack reported a few years ago - the first real, serious attack we couldn’t plug - that executed Rowhammer on the GPU with WebGL. It required getting high precision timestamps from the timestamp queries. They could see cache miss rates. They could get a layout of physical memory on the GPU, target a particular bit to flip with rowhammer, and trick the page table into doing something insidious. Their attack was limited to a particular device with a very clearly defined TLB design. Rowhammer mitigations in newer RAM also avoided this.
  • JB: so very fine timestamps can help you probe the GPU architecture, so worth coarsening them.
  • KN: another attack suggested was the one jyasskin linked - "clock around the clock". Tries to see how fast the chip's clock goes. Uses for fingerprinting. Another thing possibly could apply here - timestamp clocks are implemented in ticks, then we convert to real-time format. Some designs might have actual clocks.
  • JB: upshot: we're going to use non-isolated coarsening for all GPU work.
  • KN: yes, that's my suggestion, can loosen it if we want too.
  • JB: sounds good.

Support for arrays of textures #822

  • JB: quick summary of learnings / next steps.
  • KR: haven't thought about this for a month, but Gregg pointed out that we need this for feature parity for WebGL. Can't expand out arrays of textures into single individual texture bindings.
  • JB: is this easier for textures?
  • KN: think it's a hardware feature.
  • GT: think you'll need arrays of samplers too for feature parity with WebGL, since in WebGL they're combined.
  • JB: next steps - figure out what this looks like in backends.
  • KN: not sure how you do this in Vk. Vk has dynamic indexing features for various things… Oh: Vk spec says, shaderSampledImageArrayDynamicIndexing provides both samplers and textures. So yes, we should bundle these together too.
  • JB: for WebGL parity, sounds important, so let's pick this up in the next meeting.

Agenda for next meeting

  • Agenda
    • WebGPU Compat topics
    • Support for arrays of textures #822
    • Problem measuring time deltas with writeTimestamp #3952
  • Please add more items!
  • KN: we'll send along a list of compat issues to discuss soon.
    • JB: Teo will be back next week, he's already up to speed so this'll be faster.
Clone this wiki locally