Skip to content

Minutes 2019 11 11

Corentin Wallez edited this page Jan 4, 2022 · 1 revision

GPU Web 2019-11-11

Chair: Corentin

Scribe: Dean

Location: Google Meet

TL;DR

  • Clarification to the spec/CTS process, the owner of an issue can change at each step
  • Google has a demo of Google Earth in WASM running using WebGPU
  • GPUUploadPass #45 #426, #481, #491
    • Agreement that this is a good direction for exploration,
    • Confusion about the two types of calls (setSubdata-like and mapping-like) being overload
    • Discussion about allowing some forms of races, but consensus we first need to have a race-free API
    • Alternative idea of returning an ArrayBuffer along with a finisher function.
    • Concern about applications being able to exhaust staging memory, we need to handle that case.
  • PR burndown: merged #488 #489 and #490 (#489 later reverted)

Tentative agenda

  • CTS and spec process
  • Immediate uploads and friends #426, #481, #491
  • Agenda for next meeting

Attendance

  • Apple
    • Dean Jackson
    • Justin Fan
    • Myles C. Maxfield
  • Google
    • Austin Eng
    • Corentin Wallez
    • Ken Russell
    • Shrek Shao
    • Ryan Harrisson
  • Intel
    • Yunchao He
  • Microsoft
    • Damyan Pepper
    • Rafael Cintron
  • Mozilla
    • Jeff Gilbert
  • Mehmet Oguz Derin
  • Timo de Kort

Admin

  • Please answer the call for getting a time to meet with Khronos (FIXME: insert link)
    • Apple has not responded
    • Do this by Thursday
  • The W3C wants to send an “Advanced Notice of Working Group Creation” to their members.
    • Please remember to ping your lawyers as soon as possible for review of the draft charter
    • RC: The Microsoft lawyers have reviewed the draft and said it was ok.
    • JG: Same with Mozilla
    • DJ: Apple has ok-ed it too.
  • The Chrome Dev Summit is happening right now.
    • CW: We’re showing a demo of Google Earth
    • Took Google Earth native, ported to run on top of Dawn, emscriptened to wasm, then runs inside Google Chrome with the WebGPU feature enabled.
    • CW: We have some rendering issues on Windows and Linux. I’ll look into sharing the demo.

CTS and spec process

  • CW: Justin, can you comment?
  • JF: No, not yet. We’re not expecting things to pass yet, just looking for more tests.
  • JF: We’ve been adding tasks to the specification tracker. Anyone can pick them up. I put on the tasks that I intend to do, but that doesn’t imply ownership.
  • JF: Dzmitry has set up a table of contents for the spec.
  • CW: Any other change in process?
  • JF: No. [Follow-up at end of meeting]

GPUUploadPass #45 #426, #481, #491

  • CW: Almost a rework of something that came earlier. We haven’t made any process on immediate upload.
  • MM: My biggest question is why is this a pass, rather than a new object? It operates on the CPU timeline, not the GPU timeline. It could just be a naming thing.
  • CW: The FAQ mentions this (slightly). I’ll try to speak for him, but I don’t think he is particularly attached to the name “pass”.
  • CW: It helps that the object is separate from the queue - it has a begin and end, which gives you a spot to detach the buffers that are being used.
  • MM: It seems the biggest difference between this and setSubData is additional validation rules about when you can open/close one of these wrappers.
  • MM: And there is a requirement that when you use the feature, the object is not in use by the GPU.
  • CW: No, you can upload to an in-use buffer. It gets queued up. It is very similar to setSubData.
  • ??: It lets you have a zero copy solution. No staging.
  • CW: It gives you a place to add more immediate stuff. Allows you to bunch operations together, so the array buffer detaching works as you want it.
  • MM: Why is this better than the more explicit form of opening, uploading, closing, etc.
  • CW: You know the buffer is closed when you use it.
  • RC: It lets you re-use buffers more aggressively.
  • CW: Again, DM in the FAQ says he’s not tied to the naming or the type.
  • RC: There is also the option of user-data, where you can give it a buffer if you like.
  • CW: Right, you can also ask for the staging space from the browser.
  • MM: When would you want to use user-data?
  • CW: Small upload, and you don’t want to create garbage objects temporarily.
  • RC: Fetch API, gives ArrayBuffer. You use it directly.
  • CW: There are two overloads. One upload buffer, when you give a size. Gets detached on finish. The second is “write this data to the buffer now”, where it doesn’t get detached.
  • RC: In that case, does “finish” do anything?
  • CW: No.
  • RC: So if I give uploadBuffer my own buffer, I have to leave the data in there until I’ve called finish?
  • MM: I think this is a good direction to go. Stylistically, I think there should be two different calls for “create a buffer” vs “i give you a buffer”, and only one of those should have a finish, and it shouldn’t be called pass.
  • DP: Is there a way to signal that you are done with an ArrayBuffer?
  • JG: ArrayBuffers can be removed.
  • DP: But there is no way to lock a buffer for writing? You have to call finish on the API.
  • MM: Yes.
  • DP: I think this means it should be a single pass.
  • AE: We could have a wrapper around ArrayBuffer that calls finish. These could all be queue operations.
  • CW: e.g. gpuBuffer.getObjectForStaging, gives you a buffer and a completion call?
  • MM: yes
  • CW: I think it breaks the staging conveyor belt.
  • DP: I don’t think it does. It is up to WebGPU itself to know when to recycle the memory.
  • MM: The key piece that isn’t being stated is that it should be possible for the ArrayBuffer to point to GPU memory.
  • RC: I see Corentin’s point about using a ring buffer and deciding which bits to upload.
  • CW: So would it point into a staging buffer, or directly into the GPU memory?
  • MM: I think you can do either, depending on architecture.
  • RC: We’d have some kind of JS validation at that point.
  • CW: So if you ever used the buffer when it was the wrong state, you’d get an error.
  • MM: I want an implementation to be able use the real buffer if possible, and would have an upload flag that if true, would make it unable to be used in an operation.
  • MM: Are you saying that the API here is complicated enough, that they’d be better off using mapping directly?
  • CW: I think so yes.
  • CW: Suppose I’m using a ring buffer. Do we expect implementations to track the regions that are in use? That sounds expensive.
  • MM: I think you’re asking for a new API that is based on ring buffers.
  • JG: Doesn’t sound particularly expensive to me. I am worried about the heuristics.
  • DP: The tracking is designed to detect bugs?
  • CW: To detect races.
  • DP: What is the harm in a race? security?
  • CW: It’s for portability. e.g. someone only tries on Intel where there is UMA.
  • JG: The Web doesn’t have the same portability concerns now that we have SharedArrayBuffers.
  • JG: Could be an extension.
  • MM: The example Corentin gave sounds like a deal-breaker.
  • JG: Which example?
  • CW: The example of using a discrete GPU. I don’t need to worry about the copy. Everything works. Now I’m on an integrated GPU, and I’m writing directly to the memory in use.
  • JG: We opened this Pandora’s Box with Shared Array Buffers. Maybe investigate something like D3D12’s two tiered heaps.
  • MM: In that world, we’d still need to come up with a non-racy solution. So let’s do that, and worry about the racy condition later.
  • MM: I’m not sure a validation mode would help too much here. Many devs wouldn’t use it.
  • JG: If there is a flag in dev-tools to do more validation, then it is the kind of thing you’d do if you received a bug report from an app in the wild.
  • MM: That’s not quite the way the WebKit team works. If a bug comes in that works in Chrome but not in WebKit, we can’t always yell at the site to fix their content.
  • JG: In Mozilla we have a sliding scale - it depends on the site.
  • MM: But if it is an important site, you absolutely need to fix it.
  • JG: I think we just disagree on the approach to interoperability.
  • JG: Can we construct a design that would allow us to do checks (with extra cost)?
  • CW: I think we all agree that we definitely need a non-racy version.
  • JG: Agreed, but we should decide on what we’re trying to solve. I don’t see a huge benefit of these things over “createBufferMapped”.
  • CW: It is because the constants are better. Less overhead.
  • JG: It seems to me that createBufferMapped (MISSED)
  • CW: It is about using a memory allocator vs a staging conveyor belt.
  • JG: Disagree. If it had a pool object, it would make sense.
  • CW: If you createBufferMapped, you have one copy on the client, then another into the staging buffer. Two problems, it can be allocated in a general allocator. Whereas uploadBuffer just bumps a ring pointer. Also, it allocates a bunch of temporary JS objects. GPUBuffer is not a simple object - it has a lot of state tracking.
  • JG: I want to think about it more.
  • RC: CW, how do you think this can be written as a ring buffer?
  • CW: Everything you do in the pass will be pipelined before any other GPU operations. It’s sort-of like an implied fence - everything will be ordered. You can’t be doing anything else while this mapping pass is happening.
  • CW: Trying to collect feedback. We seem to agree that we want a non-racy version, before a racy version.
  • CW: from myles…
  • MM: I said them before. Like the direction, just some stylistic concerns.
  • CW: Would you like to add your comments in the issue?
  • MM: I will
  • RC: If we return a finisher, then it means we can’t use a ring buffer.
  • DP: It can be a ring buffer. The guarantee is that they are executed in order.
  • DP: How does it block? What happens when you run out of memory?
  • CW: You allocate more. That’s what we do in our setSubData.
  • DP: Is there a way I could write my code to avoid ever allocating more data?
  • AE: I suggested uploadBuffer and uploadBufferAsync.
  • CW: You could use createBufferMapped in that case.
  • RC: Is this a replacement for cBM?
  • CW: No, I don’t think it should.
  • DP: e.g. scenario - you want to fill your GPU memory with textures. You don’t want to fill up your system memory, so you’d use a ring buffer.
  • MM: I think for that you’d use buffer mapping.
  • DP: I’d prefer the simple API in that case.
  • MM: Would you prefer only a single API?
  • DP: Yes.
  • DP: Ideally though I’d like to avoid cliffs where things stop working.
  • CW: Lots to think about. To repeat, I think this API should be in addition to buffer mapping.

PR Burndown:

  • CW: Probably follow the same path as buffer uploads??
  • JG: No preference.
  • CW: If there are no objections, I’d like to merge it.
  • CW: We are implementing it in Chromium.
  • MM: That’s fine.

Resolution: Merge this.

  • CW: We used to have the limits commented out.
  • MM: What is the point of an empty object?
  • JG: People want to detect things that are higher than the default limits.
  • RC: Why does it now return an object?
  • JG: I think that’s a WebIDL thing.
  • CW: I think it is because you can’t have a dictionary as an attribute.

Resolution: Merge this.

  • CW: The rationale is that multi-queue will mean we need a default queue, so this would mean that is just an API addition, not changing existing things.
  • RC: Seems fine. Should just be an attribute.
  • CW: “defaultQueue” or “mainQueue”?
  • RC: I’m fine with “defaultQueue”.
  • JG: Yep.
  • JF: I like defaultQueue too.

Resolution: Merge this with defaultQueue as an attribute.

Editing Workflow

  • JF: Each time a checked issue enters a new stage in the issue tracker, the assignee can change.

Agenda for next meeting

  • Keep going with buffer uploads. Please keep discussing on GitHub.
Clone this wiki locally