Skip to content

Minutes 2021 02 08

Corentin Wallez edited this page Feb 9, 2021 · 1 revision

GPU Web 2021-02-08

Chair: Corentin

Scribe: Ken

Location: Google Meet

Tentative agenda

  • PR bundown
    • Rename GPUVertexFormat members to match GPUTextureFormat #1322
    • [FYI] Rename GPUExtent3DDict.depth to .depthOrArrayLayers #1390
    • [FYI?] Reconsider GPULoadOp "clear"/"zero" #1377
    • Add GPURequestAdapterOptions.majorPerformanceCaveat #1302
    • Remove GPUDevice.{adapter,features,limits} #1309
  • [FYI?] Require storeOp when resolveTarget is specified? #1376
  • Proposal for a GPUVideoTexture object. #1415
    • Context: Proposal for importing Web platform images in WebGPU #1154
    • Context: Investigation: Import VideoFrame from WebCodec to WebGPU #1380
  • [stretch]Require [SecureContext] for using WebGPU. #1363
  • Agenda for next meeting

Attendance

  • Apple
    • Dean Jackson
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Brandon Jones
    • Corentin Wallez
    • Dan Sinclair
    • Kai Ninomiya
    • Ken Russell
    • Shrek Shao
  • Kings Distributed Systems
    • Hamada Gasmallah
  • Microsoft
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Andrew Varga
  • Dominic Cerisano
  • Michael Shannon
  • Timo de Kort

Administrivia

  • Seems several times where a lot of people are available for the VF2F

PR burndown

  • Rename GPUVertexFormat members to match GPUTextureFormat #1322
    • Removes concepts of “char”, “short” etc.
    • KN: Makes the component types match. Vertex formats use names “uchar”, “ushort”, “half”, etc. Naming’s inconsistent with everything else. Component types now match texture formats. Makes them match more closely with WGSL.
    • JG: we have rough consensus even if not everyone loves it.
    • CW: only question is the … one?
    • KN: I’m the only one who wants X1.
    • JG: I’d like X1.
    • MM: I’d rather not have X1 because of shading languages. Those don’t have float_x1.
    • KN: they don’t match closely with WGSL. I think the example I gave 10 days ago is not a negligible problem. Format uint32, you’ll wonder “how many of them”? Doesn’t tell you there’s only 1.
    • JG: the prior art is that there’s no x1. “float”, then “vec2 float”.
    • KN: then you’ll know it’s the format for the entire vertex.
    • BJ: equivalent call in OpenGL is where you pass in “float”, and then another where there’s 3 of them. If they see “float” here, I can understand it could be confusing. I see Kai’s point.
    • KN: I agree but don’t feel too strongly.
    • CW: can we vote with thumbs up/down on this issue?
    • JG: could accept this PR today, see if we want to change it later.
    • MM: I think that’s a good idea.
    • CW: let’s do that.
  • [FYI] Rename GPUExtent3DDict.depth to .depthOrArrayLayers #1390
    • Had some feedback from Yunchao. Depth for 2D array and depth for 3D is confusing. Our texture dimension, 2D, is actually 2D array. Change is to modify the name “depth” to “depthOrArrayLayers”. Prior art from D3D12. Most users will use array overload not dict overload. In most cases they won’t have to type it out.
    • MM: I think this is an improvement.
    • BJ: in one of my earliest WebGPU attempts I ran into this problem, and thought the D3D12 solution would have fixed it.
    • KN: we’ve received this feedback from a bunch of different places.
    • CW: sounds good.
  • [FYI?] Reconsider GPULoadOp "clear"/"zero" #1377
    • CW: from Kai as well. Common case for LoadOp: put zeros in the thing.
    • JG: I don’t think this is an improvement. Case where we’re trying to make the best case obvious, and adding something to the API for it. Think we should just make it a best practice. Think there should be no difference between what you clear to, and loading.
    • MM: don’t think I understand. Load value “load” - why are they writing that?
    • KN: it seems easier. Clear as an operation, load doing nothing, when it’s the opposite.
    • CW: more complicated than this. On desktop GPUs, to achieve fastest performance, want LoadOp=Load instead of Clear. Clear is a separate operation. Then they load from memory again. We see this in mipmap generation benchmarks. We should have a DontCare but that’s a portability hazard.
    • JG: surprised that’s what you’re seeing - wouldn’t expect taht behavior nowadays.
    • CW: that’s what we measured. BabylonJS complained mipmap generation was faster in WebGL. Changing LoadOp from Clear to Load sped it up.
    • MM: fastest would be to add mipmap generation API. :)
    • CW: what about “zero”?
    • KN: in general suggest people prefer “zero” because overhead of load on mobile is higher than on desktop. Trying to get more perf out of less hardware. Useful to push people in that direction. Not sure it’s the way to do it or not.
    • MM: so people like typing keywords instead of numbers?
    • JG: they’d have a keyword that’s fast in addition to the slow one.
    • KN: look at what options you have. Load + whatever value you want, it’s zero and whatever you want.
    • MM: the API design indicates that the thing that appears to be the default option is slow...want to give them an option that’s faster? Seems legit.
    • CW: have we seen people do this wrong?
    • KN: 2 examples so far.
    • CW: guess we could have them write “zero” instead of (0,0,0,0).
    • JG: they’ll think “zero” is faster than choosing a clear value. Any place where you say “zero” is the same thing as clearing to all zeros could just be “expect clear to be faster”.
    • KN: right, don’t want them clearing to 0 and then drawing a quad to get a different color.
    • DM: requiring people to specify a color seems awkward. Have the behavior where they don’t do anything with the resource.
    • MM: also seems a reasonable choice.
    • KN: so change the default to zero?
    • CW: if you don’t define it, it’s zero.
    • MM: difference between “not specified” and “specified with zero”.
    • JG: don’t think this is a great idea.
    • KN: despite other problem, have seen people accidentally clearing render passes.
    • DM: not suggesting to have it unspecified. Think it should be required. Good to have an option reflecting “I don’t care” semantics.
    • CW: “I don’t care” so give me all zeros?
    • BJ: could you call that “invalidate” or does that have too many connotations from OpenGL?
    • CW: you start your rendering with a value in WebGPU. Either 0, or the value specified.
    • JG: this is definitely not critical for MVP.
    • KN: if no clear consensus, let’s put it off.
  • Add GPURequestAdapterOptions.majorPerformanceCaveat #1302
    • CW: 10 min timebox. (Until 30 min past the hour)
    • KN: we thought maybe we should be more explicit. WebGL’s said “software rendering” - wanted to put more behind that option. In that thing - was an issue compositing WebGL with a software compositor. Canvas compositing is less central to WebGPU than WebGL. People not using canvases at all. Do we want to make it more specific and say “this is software”? “Disallow software rendering”?
    • JG: my opinion’s changed on this in past couple of months. Ran into this with WebGL. Disabled all majorPerfCaveat in Firefox. Will never fail you. Frankly that seems to be a less bad solution than before, as far as my users are concerned. Complex issue; intersection between user sovereignty and choice, and applicatino’s ability to curate for its users. Apps want to say, if we’re running on software, we want to know, give you faster degraded solution. Lot of users don’t want that. With majorPerfCaveat they weren’t able to override it. In FF right now, can ask what adapter you’re on, and if they’re software, use it like one that’s just really really slow. Does require you to keep a database of slow adapters. Bit of a burden. Smaller apps don’t have this. Would like to give them a speed estimate. But some users want the fully featured thing even if it’s slower.
    • KN: makes sense. Lot of users will only have software WebGPU, but hardware WebGL. Many 3D apps won’t degrade in feature set from WebGPU -> WebGL. Want to give them choice, but not so many knobs they can’t figure out what to do.
    • KR: Does WebGPU guarantee that the swapchain texture written in a RAF gets composited with the changes of the DOM in the same RAF? A:Yes.
    • KN: expect more users to not even use a canvas.
    • JG: compute-only use cases.
    • KN: doesn’t make sense to tell all WebGPU users about a perf caveat just because of a compositing issue.
    • CW: also no default framebuffer craziness. You render to a texture separate from everything. Some issues disappear as a consequence.
    • CW: allowSoftware seems good. Want WebGPU available everywhere. Also want to handle users not wanting software WebGPU, instead hardware WebGL.
    • MM: do you expect to ship software WebGPU?
    • CW: yes, we have a software Vulkan backend.
    • DM: Firefox plans to do so too.
    • MM: any preference for apps falling back from e.g. hardware WebGPU -> hardware WebGL? Hardware WebGPU -> software WebGPU?
    • JG: hardware Webgl.
    • KN: agree.
    • DM: more work for them to maintain 2 code paths.
    • JG: agree. Unity / Three may have same feature set, but many other users probably won’t maintain them.
    • MS: we have no plans to maintain both at all.
    • MM: even if you get software rendering?
    • MS: yeah, we don’t intend to maintain a WebGL renderer. Too much work since we’re writing from scratch.
    • MM: if user goes to your site, would you rather they get software WebGPU or an error message?
    • MS: we would prefer to fall back to software; we run on machines without GPUs. We’re in the CAD/CAM/Simulation space.
    • CW: ok, we need the knob. Opt-in or opt-out?
    • JG: think it’s easiest to say “software or not”. There are machines running in software that will be faster than others in hardware. Should worry more about performance.
    • KR: can we agree on “allowSoftware”?
    • JG: maybe...there are still situations where users would want to override it.
    • KN: JG can you suggest something?
    • JG: yes.
  • Remove GPUDevice.{adapter,features,limits} #1309
    • Skipping this for today.

[FYI?] Require storeOp when resolveTarget is specified? #1376

Proposal for a GPUVideoTexture object. #1415

  • Context: Proposal for importing Web platform images in WebGPU #1154
    • AE: one of the biggest issues - copyImageBitmapToTexture is async. Not suitable if you want to integrate video in WebGPU. You always miss your rAF callback. You’re one frame behind, or you drop frames.
    • AE: also in Chrome’s implementation, creating ImageBitmap creates real backed resources. One copy. Then copying that into GPU texture is another copy. Today, it’s 2 copies. Not sure how other browsers are implemented. Some significant amount of work to do something that replaces Imagebitmap’s store if the video changes.
    • KR: We have done simple video import tests and the performance of ImageBitmap importing is not enough. The previous idea of using ImageBitmap proves to be insufficient.
    • MM: what’s the motivation for MVP?
    • KR: we need parity with WebGL.
    • CW: people use WebGL for this today. Lot of people have come to us, say WebGL has this, why doesn’t WebGPU? (Efficient video import.)
    • KR: this is a major feature and application of WebGL.
    • JG: like having efficient video uploads in WebGL, blocked some Firefox out-of-process work until we had it.
    • MM: OK.
    • JG: maybe helps: one major user is Facebook 360 video relies on this being fast. If you don’t have efficient uploads here, it chunks, people don’t like your browser.
    • KR:
    • MM: ok, you convinced me!
    • RC: is this for regular videos and/or WebRTC videos?
    • CW: conceptually it’s both.
    • RC: is the current proposal about importing straight NV12? “Fat” sampler?
    • CW: there are 2, WebGPU should have both. Fat sampler #1415. #1154, provide a generic way to import other kinds of images (video, etc.) into WebGPU. HTMLImageElement, also Canvas.
    • RC: OK.
    • CW: think we should have both. Video textures should be faster than import + sampling, or at least as good.
    • JG: instead of working on both, could we satisfy all use cases with MVP version of fat sampling? IF you want to blit it you can, sample from it you can? Would satisfy needs for video.
    • CW: what about canvas?
    • JG: for color, canvas-canvas uploads aren’t fast in Firefox today.
    • KN: canvas uploads are typically not nearly as many pixels as video-to-cavnas.
    • RC: BabylonJS team has given me feedback about slow cavnas2d uploads. Theri priorities: 1) WebRTC uploads 2) canvas 3) traditional uploads.
    • JG: having gpu-gpu copies for canvas isn’t that big a deal in Firefox because our canvas2d is software anyway.
    • CW: it’s not webrender?
    • JG: no.
    • CW: i thought the videotexture would be a hard sell because it’s complicated. thought to do importTexture first, then videotexture.
    • JG: had discussion about this in webgl - treating everything as “external” samples, including canvas.
    • KN: what if canvases could be in YUV color space?
    • MM: that sounds like magical fantasy land features.
    • KN: it’s all transparent. Canvas 2Ds are all color managed.
    • CW: either importTexture or import video GPU texture - either of these will be complicated.
    • KR: Think it is important to do the import of all oF DOM sources first to cover what WebGL can do today instead of over-specializing for video.
    • JG: I’d love to hook up canvas as an input to the fat sampler, see what the perf is.
    • MM: that’s a stance we can get on board with. Do the simple thing first.
    • CW: what’s the simple thing?
    • JG: fat sampler.
    • KR: don’t think that’s simple. fat sampler’s more complicated to implement than one-copy.
    • JG: thikn it’s easier to inject stuff in shader than do the blit.
    • CW: you need some of the spaghetti code. In Dawn, importing the video texture - it’s a massive amount of work. Not sure what it would look like in wgpu. Spaghetti goop you have to do anyways for video texture.
    • MM: one benefit of Jeff’s suggestion is getting a formal API in the middle of the spaghetti goop.
    • JG: <things>
    • KR: We’ve had some extensive discussions with folks on the ANGLE team. Looking at WebGL for 0-copy import is very thorny. Spanning across the different texture types from Canvas vs the Camera is very complex. Today there’s a runtime switch statement in the shader to determine which texture unit to sample from. I’m fairly sure it will be tricky in all of the graphics APIs.
    • RC: +1 on one-copy as good as WebGL before doing zero-copy fat sampler. I’ve gotten feedback about NV12 as well - if there’s any scaling, etc. need to do filtering ourselves.
    • MM: Would the one copy solution also work for image elements?
    • RC: Yes it would, and for 2D canvas I think it has to be 1-copy way. For canvas, if you import and then change the canvas, we’d have to specify what happens.
    • MM: makes sense to have video be a separate code path than image elements. Worrying about the latter first, then the former, sounds good.
    • JG: I’m worried about maintaining fast paths on both sides of the API. I’d rather like to maintain the fast path on just one. It’s a higher maintenance burden to do both.
    • KR: Zero-copy is still very much investigation. It’s never been brought to fruition, even in WebGL. Our Vulkan experts have studied the zero-copy WebGPU proposal and have said it won’t work as is.
    • CW: The only part you don't’ need spaghetti for the video texture is YUV conversion. That would be in the shader.
    • RC: how many copies would slow-path be for this 1-copy upload?
    • JG: At worst, we do GPU decoding of the video and read it back to the CPU… then to RGB, and if you ask for y-flip or premultiply, we would do another CPU copy before you imported it. Part of it is because we never implemented those fast paths in FF.
    • CW: If for the fat sampler you optimize it, it would also work for the importTexture copy.
    • JG: You could build one on the other for sure, but that’s easier said than done. Still a lot of maintenance burden.
    • MM: From what I heard, the strongest argument against having only one of the ideas was that it is complicated and we didn’t really know how it worked. It sounds to me that if Jeff thinks this is the way forward, you have some homework to do on this.
    • JG: I’ll try to implement it for WebGL.
    • KR: We have experience having done this in WebGL, and have implemented the blitting routines. We’re going down a risky route if we try to do only the zero copy path.
    • JG: I would like us not to consider customers that are already doing things with WebGPU.
    • CW: I think Ken is saying that the one-copy path is proven in WebGL, and we have experience making it work.
    • JG: I agree, and my experience is that it sucks.
    • CW: The zero-copy path is an engineering/research topic. We’ve tried before and failed.
    • JG: Okay I’ll do my homework and see how it works.
    • MM: I don’t have implementation experience, but my intuition is that it would work fine.
    • KR: People have experimented with this in ANGLE, Vulkan, etc. and have found that it’s not so easy to span across all of the different texture formats used by videos, canvases, etc..
    • JG: We have time till MVP to decide on something.
    • CW: for our origin trial we can put in whatever we want because the group hasn’t decided, and then figure out something for MVP.
    • KR: think we should even consider removing the current copyImageBitmapToTexture, and doing importTexture as the baseline way to get DOM data into WebGPU.
    • CW: I think it would still be nice to agree on what the shape of the two options would be. We think we need both for WebGPU. If we can agree on the shape of them, it derisks everyone’s prototype.
    • MM: I think it would be good to know how long we should delay for Jeff’s research.

Agenda for next meeting

  • US Holiday next Monday
  • Meet February 22
Clone this wiki locally