Skip to content

Minutes 2018 08 20

Corentin Wallez edited this page Jan 4, 2022 · 1 revision

GPU Web 2018-08-20

Chair: Corentin

Scribe: Ken

Location: Google Meet

TL;DR

  • Can move forward with the CLA
  • Issue 68 cubemaps
    • Textures are only 1/2/3D maybe arrays. Cubeness is set on the texture view.
    • Concern that always setting the “cube compatible” bit on Vulkan will drop perf
      • Reach out to Khronos IHVs to understand why.
  • Issue 69 copy investigation and proposal
    • Copy views are only used for copies, cannot be passed to shaders
    • Discussion whether WebGPU should present all the constraints of native APIs or emulated if needed, or have fast and slow paths.
    • APIs all have different constraints on parameter alignment. Investigate more?
    • Concern that “copy views” are not pretty, comment on the issue!
    • Will origin3D, extent3D be dictionnaries or sequences?
    • Concern that emulated copies would cause issues if we have multiple queue types.
  • Issue 70 pass proposal
    • People are happy about it, will ask Rafael’s opinion

Tentative agenda

  • Passes
  • Cubemaps
  • Agenda for next meeting

Attendance

  • Apple
    • Myles C. Maxfield
  • Google
    • Corentin Wallez
    • Kai Ninomiya
    • Ken Russell
  • Intel
    • Yunchao He
  • Microsoft
    • Chas Boyd
    • Rafael Cintron
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Yandex
    • Kirill Dmitrenko
  • Elviss Strazdiņš
  • Joshua Groves

Administrative updates

  • Intel basically agreed to contribute under the CLA!
  • Meeting time has changed to Mondays 12 PM Pacific time.
  • DM: I’m doing an investigation into occlusion / timestamp queries, if that’s interesting.
    • CW: interesting, might be hard, but might be easy.
    • DM: was hoping to defer it but saw that some software (like Dolphin and PS3 emulators) use them.
    • CW: were you going to port these to WebGPU?
    • DM: no, gfx-rs.
    • MM: in Metal, occlusion and performance queries are different enough that they might need to be implemented differently.
    • CW: DM volunteered to investigate; looking forward to results. :)

Cubemaps

  • https://github.com/gpuweb/gpuweb/issues/68
  • DM: Is there any overhead from telling Vulkan that you’re cube-compatible?
  • CW: there’s a bit set on the open-source Intel driver but don’t know what it does. Only seems to be used at the memory allocation point. Maybe slight memory increase.
  • DM: may affect allocation alignment but not much else?
  • CW: SwiftShader allocates one more pixel around each cube map face for seamless cube maps. Hope hardware doesn’t do that.
  • DM: must be a reason why the flag is there in the first place.
  • CW: maybe someone voiced a concern on some hardware. Is it important enough for us to add the bit?
  • DM: test on AMD, NVIDIA?
  • MM: seems something we can validate through testing.
  • CW: Google should reach out to Khronos and ask if anyone needs the bit. Do simpler solution if possible.
  • DM: Apple doesn’t have this in Metal I think.
  • CW: gist of idea: in native APIs you don’t have concept of “cube map” when creating resource. Can sort of in Metal, but then “reinterpret_cast” the resource later. You have 2D array textures and view them as a cubemap. 1D, 2D, 3D arrays of them at texture creation time. Set the cube map bit only on texture cubes and not on textures themselves.
    • Was fine for D3D12, was OK with Mozilla. Concerns from Myles?
  • MM: only concern is if we’re reinterpreting a texture as a cube map, have to do some validation to make sure the format of the cube map was the format of the original texture.
  • CW: so validate that there are 6 slices and whatever other validation Metal needs?
  • MM: yes
  • MM: how do mipmaps work?
  • CW: a separate dimension, orthogonal to the arrays.
  • MM: a cubemap can be mipmapped?
  • CW: yes.
  • MM: doing the simple thing until we discover that it’s too simple sounds good
  • CW: good. Then can write conformance tests and find bugs.

Copies

  • https://github.com/gpuweb/gpuweb/issues/69
  • CW: Buffer-to-buffer copies: everyone does the same thing to access the size.
  • CW: texture-to-buffer and vice versa: everyone explains how to view it as a linear texture in compatible format. Copy between them, one is linear and in buffer memory.
    • Can set the size of the copy you want to do.
    • Pretty clear that all APIs to this the same; D3D12 a bit different
    • Copy among these resources: you give it two resource views specifying what subset to copy. Specify row pitch etc. Specify level, mipmap level, maybe slice, etc., or aspect (e.g. depth/stencil from depth+stencil). Bunch of constraints w.r.t alignment.
  • MM: does this proposal include using these views for any arbitrary purpose? Attaching them to a shader and sampling from them?
  • CW: no, only for copies. Called a view because, when there’s a texture involved, the buffer is seen as a texture to copy to another texture. D3D has the same concept with placed sub-resource footprint. Only used for copies.
  • DM: you want to enforce that row pitch is 256?
  • CW: no way around it unless we emulate things on D3D with compute shaders.
  • DM: only D3D which would suffer from that path. Not necessarily compute shaders but multiple transfers.
  • CW: multiple transfers: would have to do one per row. Would be a lot, e.g. for a 2Kx2K texture. 2Kx2K-1 texture would be problem.
  • DM: major inconvenience for users. Don’t mind either way. We have all the logic for handling row pitches.
  • MM: 3 options. (1) API that has no restrictions and code bifurcation inside it with hidden performance cliff. (2) Only have one API with restrictions and force developers to use it. If dev wants to have an unrestricted API they can write their own compute shader. (3) Two APIs, fast mode and slow mode.
  • DM: 3rd one would be replaced by backend giving you a hint about what’s the fast path. That’s what Vulkan does right now.
  • CW: explicit fast/slow path sounds interesting. Concerned about one path that works everywhere and does everything. Looks like a copy operation and that it would go on the copy queue, but if it’s implemented via compute shader then it has to offload it to the compute queue internally.
  • JG: would rather just expose the restriction.
  • CW: but it makes an API that’s fairly constrained. Not a fan of exposing constraints to backend.
  • JG: slow paths are a huge pain to maintain.
  • MM: what do DX developers do today?
  • CW: they align stuff.
  • MM: so they make buffers bigger than necessary and ensure they’re aligned?
  • CW: yes but it’s not over-allocating.
  • DM: it could be bigger: up to 256 times bigger if it’s a 1x1 texture.
  • DM: no strong opinion. Maybe don’t expose the value from the backend but just spec it. Would work well on all the backends.
  • MM: unfortunate. Developers read tutorials, not spec.
  • JG: tutorial would then say what the alignment is.
  • MM: seems to be sort of difficult.
  • CW: lots of tiny alignment constraints coming from all APIs. DM found one in features for Metal. Can’t remember exactly. “Buffer alignment for copying distinct texture to a buffer”. Metal has this but hidden deep inside a table. Is this validated and the driver crashes?
  • DM: don’t remember the exact validation error, but pretty sure you get one.
  • CW: Metal has the same constraint.
  • DM: this isn’t raw alignment, just buffer alignment. The one D3D has is 512.
  • CW: for buffer alignment, we found a way to go past that in the D3D12 backend. Can split the copy into two copies and it works out.
  • DM: can you do the same with Metal?
  • CW: assume so.
  • DM: Metal has a lot of restrictions on buffer offset. Could fix it here but will run into it again. Texel buffer offset for example.
  • CW: on NVIDIA everything’s aligned to 256. Uniform buffer offsets, etc. will all be 256.
  • CW: maybe we should investigate more and come back to this when we have a start of a CTS which fails on every platform.
  • DM: more people will read investigation and comment on it.
  • JG: from API standpoint, don’t like copy views – origin of respective texture, and you pass in the size of the copy.
  • DM: the fact that the origin and size are in different structs?
  • JG: yes. It’s weird. Like D3D’s way of doing it.. Here’s source rect, then specify destination.
  • CW: in D3D12 it’s unfortunate that there’s a destination position but no source or origin.
  • (... never mind …)
  • CW: source box updated at the destination origin. (?)
  • JG: weird that name is CopyView, but views are all unsized. It’s most of a view, and we make a full view out of it once we know the size.
  • CW: disagree – you have a view with a size (esp. for buffers, raw pitch and image height) – you have a linear texture of given size. Copy can be subset of that.
  • JG: BufferCopyView can incidentally detail a size, but not actually a full size. Might have depth slices. Max bounding size of row, image, etc.
  • CW: can you comment on the issue?
  • JG: will do.
  • JG: wrt origin, origin3d and extent3d: do we want to use dictionaries for these throughout the API? Don’t have a good answer.
  • CW: thought we went with string enums for v1.
  • JG: WebGPU Origin3D – string of integers – should it be a sequence of 3 things? Don’t like the strings, and don’t like creating a dictionary. JS doesn’t make it easy to do nice type conversions.
  • MM: what’s the other option? Array of 3 elements?
  • JG: that’s one. Inline the args, another option (which is also gross). Might be picking least gross way to do it.
  • CW: my understanding of WebGPU copies; people not opposed but not pretty right now. Either be restrictive, or be unrestrictive and do compute behind the app’s back.
  • MM: compute behind the app’s back and doing the work pass idea from a couple of weeks ago would be pretty straightforward.
  • JG: unless we expose different hardware queues, and have a blit queue which would be dependent on compute being supported.
  • MM: converting to compute has some issues.
  • JG: there are definitely issues.
  • DM: don’t have to use compute. Can use transfers in D3D12, for example.
  • CW: there’s the concern that in Vulkan if you want an image-only queue (...)
  • CW: would like for subset of passes to map to queues.
  • MM: is that restriction – where some kinds of queues in Vulkan only support some sizes and alignments – something that the queue or the spec advertises?
  • CW: something the driver imposes on the transfer-only queue. But the granularity has to be (1, 1, 1) on something that supports compute or render. Only applies to blit queues.
  • CW: nice segue into command buffer proposal.

Passes

  • https://github.com/gpuweb/gpuweb/issues/70
  • DM: think this is nicer than earlier proposal. Has command buffers like group wanted. Users encode render and compute passes into command buffers, and issue transfers against command buffers themselves.
    • Nice: compute + render passes can share API for resource binding.
    • Every command on command buffer itself (either render pass, compute pass, transfer operation) is the unit of synchronization the WebGPU API considers. Doesn’t sync on calls within pass, except for UAVs.
  • MM: can i have a render pass open at the same time as a compute pass?
  • DM: when you open a render pass that borrows the command buffer.
  • MM: so it’s modal.
  • DM: yes. When you start a pass you get an encoder and borrow the command buffer.
  • CW: it’s similar to Metal. No BlitEncoder, and that’s on the command buffer itself.
  • MM: sounds great!
  • DM: was wondering if the fact that we don’t want to expose transfer passes is a concern.
  • CW: know Rafael had some concerns with requiring people to do compute passes when it’s extra typing not really needed compared to D3D.
  • MM: yes but this is less typing than Metal.
  • CW: I like this proposal. Think everyone else does too. Let’s see what Rafael says when he’s back from vacation.
  • MM: is there IDL?
  • DM: not yet but because people like it I’ll write IDL.
  • CW: off-topic: the IDL is out of date with respect to our discussions. I’ll make a pass for string enums, but please make a pass over it and look for things you care about.
  • DM: does Dawn use an IDL?
  • CW: Dawn uses a JSON file which generates IDL. It’s ahead of the IDL in some places but behind in some others. We still use the Builder pattern in some places for example. But should strive to make the IDL what we’re discussing.
  • MM: we can start drawing triangles now.
  • CW: yes, we’re trying to draw a triangle in Chrome using the IDL.
  • DM: have you figured out the swap chain issue?
  • CW: no. Still integrating Dawn into Chrome.
  • DM: is NXT being renamed?
  • CW: yes.
  • DM: What about Skia on NXT?
  • CW: it’ll be renamed.
  • MM: seems early to be writing Skia on top of this thing that doesn’t exist yet.
  • CW: true, but good to be able to show that things run and are comparable to Skia’s native Vulkan backend.

Agenda for next meeting

  • Everyone, look at the IDL, in particular your favorite parts, and try to bring it up to date with the current discussions.
  • Occlusion queries
  • Copies (maybe)
  • Command buffers, if Rafael’s back
Clone this wiki locally