Skip to content

Minutes 2023 02 08

Corentin Wallez edited this page Feb 28, 2023 · 1 revision

GPU Web 2023-02-08

Note that unless stated otherwise this is a GPU for the Web community group meeting and not a working group meeting.

Chair:

Scribe:

Location: Google Meet

Tentative agenda

  • Administrivia
  • CTS Update
  • “Tacit Resolution” Queue
  • maxBuffersPlusVertexBuffersForVertexStage limit #2749
  • It should be possible to unbind a bind group / vertex buffer #3787
  • Defaults for depthWriteEnabled and depthCompare may be surprising #3798
  • Support STORAGE usage on BGRA8Unorm as extension instead of core feature #3799
  • Who actually wants multiple timestampWrites with the same TimestampWrite.location? #3808
  • Agenda for next meeting

Attendance

  • Apple
    • Mike Wyrzykowski
    • Myles C. Maxfield
  • Google
    • Ben Clayton
    • Brandon Jones
    • Corentin Wallez
    • Gregg Tavares
    • Kai Ninomiya
    • Ken Russell
    • Shrek Shao
  • Microsoft
    • Rafael Cintron
  • Mozilla
    • Jim Blandy
    • Kelsey Gilbert

Administrivia

  • Agenda for the F2F - please add more items!
  • What are the exact times for the F2F (for remote attendees?)
    • 8:30 - 17:00 (or 18:00?) California time
    • Try to arrive a little early because it can be a hike to get to the meeting room.
    • Meet at the gate at 8:30.
  • Have added this to Agenda for the F2F

CTS Update

  • KN: nothing major new from last week. Test development still ongoing - our team's contributing.
  • JB: started integrating CTS into our CI. wpt driver working really well for us. Expecting to see extremely disappointing :) but, still, test results soon.
  • KN: glad that worked well for you. Didn't work well for us. We use wpt harness only for reftests. Our tests don't run in consistent amounts of time. Don't know how to split up the tree so each chunk runs in <6 seconds, and we don't have a heartbeat mechanism (not sure wpt has it generally). Now use the same harness we use for our WebGL tests. I happened to write up some of the issues yesterday:
  • JB: still up for discussion. Erich just found it straightforward to run with wpt.
  • KN: if you see wpt issues you may be able to fix them. We couldn't though.
  • KN: working on code coverage for Dawn/Tint based on CTS. Planning to search for things missing from the test suite.
  • KN: Ben wrote a tool a few weeks ago - lines of coverage in Dawn/Tint, running CTS against our Node bindings.
  • KN: Finally also got webGPU tests running in Chrome's code coverage infrastructure, so these show up on our dashboard, and shows us Blink, browser, etc. coverage. Will be filing many CTS issues about missing test coverage.
  • JB: have percentages?
  • KN: yes, can link to dashboard.
  • KN: Dawn repo: coverage went from 8% -> 59%. About 50% coverage from the actual tests.
  • Blink code: 0% -> 76%.
  • CW: and Chrome as a whole went from 69% to 71%. :)
  • CW: trying to identify high priority stuff before shipping in Chrome. CTS is endless bucket.

“Tacit Resolution” Queue

  • WebGPU: Forbid null code point '\0' (U+0000) within strings #3576
    • Not changing anything. Current spec can be implemented on top of C strings by replacing null character with Unicode replacement character.
    • Gregg implemented null replacement pass. Passes the test.
    • MM: q about using object replacement character going forward.
    • KN: only for entry points, names. Has to be something we can put there to make it invalid. Could be a space.

maxBuffersPlusVertexBuffersForVertexStage limit #2749

  • CW: needs to take max bind groups into consideration?
  • KN: decided to add this limit because we were hoping to use argument buffers in Metal implementation, for example. No longer have to do what Dawn does, split bind groups into separate bindings. Arg buffer impl strategy - 1 BG per arg buffer - bind BGs into same binding space as vertex buffers. Earlier, all buffers go into same binding space.
  • KN: didn't decide to add a combined limit for those. This, we do expect to stay around. Rather than statically splitting Metal's fixed limit of 31, thought was - have these 2 limits add up to more than 31, and let apps use more than half for one thing at a particular time. In a same render pass though - breaks Apple's impl strategy of binding things immediately into the Metal render pass. Need to immediately choose a slot, 0-30. Wrote up a bunch of options. Myles had a particular proposal, option (3) here.
  • MM: don't think this breaks anything. We think this limit's implementable and is helpful. If we add this limit, it means people will want something else also - unbinding. But if we don't do unbinding, we can still do this.
  • KR: what about other APIs where you can't interchange these?
  • MM: they have different namespaces for VBs and BGs. The way this works - Vk limit for max VBs would be 8 (for example), max BGs 4 (for example), and maxBuffersPlusVertexBuffersForVertexStage 12. On Metal - max VBs 31, max BGs 31, and maxBuffersPlusVertexBuffersForVertexStage also 31.
  • KN: Metal binding API - as though you're always using vertex pulling. VBs bound in as buffers to the shader. Probably not what happens on all hardware, but that's the binding model.
  • MM: another issue with unbinding, but think we can close this.
  • KN: I have serious concerns about an unbinding mechanism. What Brandon raised last week. Don't want an API with this statefulness. Consumer of render pass has to go and unbind everything, just in case. This is one of the single worst things about OpenGL. The state's simpler than GL, but it's still painful and slow. Don't think that should be our solution to the problem, so think we should look at our solution to this issue. Another impl strategy. Think the options are fine. I know you don't like that as your impl strategy but I think they're fine.
  • MM: reason I don't' want to allocate statically - no way for us to know what the limit should be. Allow for 20 BGs and 10 VBs? Or 10 BGs and 20 VBs? Also, think apps will want to create a lot of bind groups and put a single resource in each bind group. Non-portable, but any use of limits above the base limits is already non-portable. Don't agree with the conclusion that we shouldn't do it.
  • KN: for example with features - we don't add features we expect to implement only on Metal. Don't want to expose limits that are only that high on Metal, either. May not happen here - Vulkan impls have such high limits too. D3D has lower limits. Don't want to expose Metal parameters that are higher than any D3D impl will have. If any reasonable value supported by all the backends - should be the highest that any of the backends go. It interacts with the bucketing strategy.
  • MM: sounds right - if there are enough devices that have a high limit to create a bucket for them, we should.
  • KN: max VBs on D3D12 is always 16. Never changes. max BGs more complex. Only one thing put into the root descriptor.
  • MM: my comment earlier about lots of BGs - doesn't apply to VBs. I think a static limit for VBs makes more sense than for BGs.
  • CW: have had people complain. my glTF vertex streams have 15, and not enough VBs. Have to repack, interleave attributes. VB limit is already painful. Once they understand the concept, have found developers don't mind. Our max is always 4, and people haven't complained.
  • KN: not arduous to have some BG to be a catch-all. You have to set up your pipeline so it connects to those bindings. Can't make a last-minute decision - pipeline has to expect it. I don't see that as a major concern. Picking 16 for VBs and 15 for BGs - IMO fine.
  • CW: checking dynamically is scary. At pipeline compilation time - it's free to add. Additional context people might run into. Free in terms of testing, impl, complexity in the spec. Devs won't run into it.
  • MM: you're right that this would be validated at draw call time. It's a single assembly instruction. Every time you bind something you max it, and at draw call time you make sure it's less than a constant.
  • KN: not concerned about the validation cost. This is a real problem - I understand the motivation for unbinding. You're in a state and you can't add more BGs because too many VBs. Don't want that problem and don't want to add unbinding. To not have these problems, I don't think we should add this limit. Our team was only thinking about this as pipeline validation. Biased - in our impl we don't benefit from the draw-time validation. Not going back on our previous decision - it's a different decision than we thought it was originally.
  • BJ: adding draw-time validation is never desirable. Needs to happen in some situations. More worried about the developer experience. Have seen so many situations, esp. in OpenGL. Building rendering library, have some utility that renders text. Don't want to know/care how it does it. I pass in a render pass, some coordinates. When it returns, do I need to look up on my device - what are the max numbers for this limit? Can't use the VBs because too many BGs. Calling any third-party library, have to reset a lot of state. This would be the first instance of this problem showing up in WebGPU. The more I think about it I'm more willing to fight against it vocally. A code smell that will seriously negatively impact developers trying to build composable apps out of libraries, which is the web way.
  • MM: it's not like render passes are stateless. Resources can conflict with each other. You can create an error render pass easily, if you ignore the first half of the draw calls.
  • KG: can we talk about solutions? One actual issue, you want to be able to know what the state is even if you hand the stateful object to someone else. Save/restore for this? Reflection? Unbind all bind groups? Suggested unbind to get back to original state.
  • BJ: we somewhat have that encapsulation if you put everything in a RenderBundle. Don't want to invent a new mechanism. Do third-party libraries need to put all their stuff into a RenderBundle? And consider the state fully reset when you come back? Feels heavyweight. I struggle with the necessity to add that cognitive overhead to the API. Have already pointed out - there's a reason for a higher vertex buffer limit. Bind groups don't seem as important. Not convincing to me to add this cognitive overhead to support the higher bind group limit on one platform.
  • KN: Proposal: let's not add this limit yet. Put it in post-V1. Until that point, have to statically decide on the two limits. See what people say.
  • KG: I really want to be able to unbind things.
  • CW: we can discuss unbinding independently of this issue. There's post-V1-polish issue 2043. Dealing with holes in pipeline layout. We can get to unbinding discussion.
  • KG: isn't this the next issue?
  • CW: I was going to suggest skipping over this.
  • KG: I'd like to have unbinding in V1 so sophisticated middleware can reset state.

It should be possible to unbind a bind group / vertex buffer #3787

  • BJ: what's a scenario where unbinding is beneficial?
  • KG: if I write middleware where it binds things and needs to be able to unbind them to reset state
  • BJ: middleware invisible -> need to be able to query current state and reset it.
  • KN: unbinding not necessary for this. If you're not overwriting another binding, but adding one - only thing you can do to surrounding code is make something valid that was invalid. Outer code will always look at bindings it put there. As long as you didn't overwrite them, doesn't matter that you added new ones. Overriding is a bigger concern. Unbinding doesn't fix it.
  • CW: we've spent a good deal of time on this issue. Let's take more time and come back to it, perhaps at the F2F
  • MM: wasn't planning on Kai to remove the limit we'd agreed on. Have to discuss with the team.
  • KN: only came up with that idea on the fly.

Defaults for depthWriteEnabled and depthCompare may be surprising #3798

  • KG: we discussed depth writing in prev meeting and resolved on it. Only discussing depth compare. Resolution was - depth write enabled and depth clear -> required, and post-V1 can add new defaults.
  • MM: thought it'd switch just before V1.
  • KG: some intermediate time where it's broken deliberately.
  • CW: thought that was only for depth clear value, but OK.
  • MM: same resolution?
  • KG: tentatively transition to "less", but required.
  • MM: depth compare - why's ALWAYS bad?
  • KG: makes depth buffer write-only.
  • KN: OpenGL's default is also LESS.
  • BJ: if you have a depth buffer you expect to compare against it. Since haven't decided on depth clear, default here is tightly tied to it.
  • MM: agree, if default's 1.0…makes sense to me. Make depth compare required temporarily
  • CW: group's OK doing this for depth be
  • KG: defaults we chose probably because they were Metal's. Neither bad nor good. Not behavior …
  • KN: think the decisions were deliberate. Not sure about the heuristics. Maybe not the fastest path, but the "do-nothing-est" path, in terms of the hardware. May partially predate the change - no difference between depth test disabled, and having default value. Clear value different - 0.0 is an obvious default generally, but not for depth specifically. And depth testing disabled by default - like using ALWAYS. Think these were deliberate decisions. And - I don't want to be on record as saying that I want to change these to 1.0 and LESS in the future. I don't think those are great defaults. Yes, they're OpenGL's defaults, but don't think they're the best default selection. I plan on discussing this more maybe before V1, but has to be a future discussion.
  • MM: philosophy of defaults. I don't think we just took metal's defaults. My philosophy: the audience for this isn't the experts. Experts will set all of them. Audience for these is non-experts. Defaults should be set up for them. Get something on the screen that's sensible. "Black screen" problem's the worst, and that's where I see the defaults being valuable.
  • CW: would like to go through the next agenda item. Corridor convos about philosophy at the F2F.
  • BJ: agree with Myles' sentiment. Also there are cases - e.g. Dzmitry saying, don't sign up for extra work - but think that was in the context of blending. You can turn blending on and it can affect your framerate, but you might not know.

Who actually wants multiple timestampWrites with the same TimestampWrite.location? #3808

  • CW: group agreed - beginning of pass write, and end of pass write, for timestamps. What landed in the spec - a vector of locations. Some confusion between them. Can probably fix by having beginning of pass write and end of pass write, instead of what ended up in the spec.
  • MM: works for me. My original post had two possibilities, I like the bottom one most because it's simple. Can be extended later, and they can copy the results wherever they want them. Think a single query set is simplest.
  • KG: to expand this later, OK to add a new entry point
  • CW: slight preference for the first one - no loss of generality.
  • KN: same
  • CW: let editors talk about this?
  • MM: won't argue - we can go with the frist one.
  • BJ: TBH I prefer the second one. 🙂
  • KN: we can figure it out. Thought we didn't have this validation already, but clearly the PR already landed.

Support STORAGE usage on BGRA8Unorm as extension instead of core feature #3799

  • CW: Jiawei tried implementing BGRA8Unorm storage textures on Vulkan with the workaround we discussed here. Probably missed something, because Teo and Jiawei agree it's not implementable in core Vulkan. And requiring this cap loses a lot of devices. Strong suggestion is to make this an optional feature again.
  • KG: thought this was polyfillable?
  • CW: polyfill not implementable because we missed a requirement that you need to create the BGRA8Unorm texture - were going to do texture format reinterpretation, use it as a view, swizzle - but BGRA8Unorm might not support STORAGE usage. You still need that. A lot of drivers don't support that. Maybe can completely polyfill BGRA8Unorm with RGBA8Unorm by swizzling all the time - heavyweight, and breaks a lot of things. Swap chain textures are actual BGRA8Unorm textures, so can't polyfill.
  • KG: BGRA8 storage is fastest on Metal?
  • MM: No. RGBA and BGRA on Metal - in the Metal API - neither's faster. Only place where BGRA's preferred - not in Metal, but in the compositor. Request here - BGRA should be able to be delivered to the WindowServer. Not that sampling either's faster.
  • KG: ok, high level - perf's better if you use BGRA for presentation.
  • MM: yes. When you configure GPUCanvasContext - the format you request is BGRA. The request is that GPUCanvasContext vends BGRA back buffers.
  • KG: and that you get storage usage.
  • MM: yes, but less strong preference. Strong preference - if we want people to use BGRA for presentation, want it to be as functional as possible. But if people have to pingpong between BGRA/RGBA b/c of Vulkan, not as big a deal as being able to produce BGRA textures.
  • KG: ok, so it's OK to say you can only do storage BGRA with a feature.
  • GT: does it make sense that you can't pass STORAGE into configure? Someone on Windows, get BGRA, pass preferred in to storage texture creation, and it fails. If you couldn't pass the STORAGE flag into configure, problem goes away.
  • CW: have this problem already today on some platforms. Problem's on Vulkan, and maybe D3D12. Can give more details offline.

Agenda for next meeting

  • F2F!
  • Please add more agenda items.
Clone this wiki locally