Minutes 2018 12 03

GPU Web 2018-12-03

Chair: Corentin

Scribe: Ken

Location: Google Meet

TL;DR

Buffer mapping #138
- Folks present are in favor of buffer mapping proposal 1
- WebGL experience is that mapWriteSync would be preferable to setSubData
- Immediate data uploads are at the synchronization level of queue submission
- GC observability could be solved by making mapped memories keep a ref to the buffer so they are all collected together, or the app calls WebGPUBuffer.destroy
Push constants #75 / Dynamic buffer offsets #116
- No concerns with the dynamic buffer proposal.
- Talk more push constants vs. dynamic buffer offset when we have a benchmark
Plans for a spec
- Dean will make a skeleton of spec and move the IDL there.
- Then explainers should be coalesced there.
- Will discuss spec editor responsibilities at the F3F
PR burndown
- PR #111 accepted
  - Discussion on whether web pages should receive monitor change events.
- PR #130, #133 and #135 accepted
- No opinion on #139

Tentative agenda

Buffer mapping #138
Push constants / Dynamic buffer offsets
Multi-queue (stretch)
PR burndown
Agenda for next meeting

Attendance

Apple
- Dean Jackson
- Justin Fan
- Myles C. Maxfield
Google
- Austin Eng
- Corentin Wallez
- Dan Sinclair
- James Darpinian
- Kai Ninomiya
- Ken Russell
Microsoft
- Chas Boyd
- Rafael Cintron
Yandex
- Kirill Dmitrenko

Administrative

CW: Please register to the F2F

Buffer mapping #138

CW: mapping sub-regions of the buffer would be helpful in multi-process impls (like Chrome)
- Another option is to map the whole buffer. Most similar to what we had just before explicit barriers were removed
MM: reviewed it. In general proposal 1 sounds good to Apple. There is a need to do immediate uploads. Should be possible to make a single call and upload data. Think they can co-exist as long as we spec how they may or may not interact.
CW: agree. That’s why there are 2 proposals: both buffer mapping and immediate data upload. (Either setSubData or mapWriteSync.)
MM: proposal 1: interested in creation flags. MAP_READ|MAP_WRITE capabilities, can you talk about that?
CW: in WebGPU, buffers should describe how they’ll be used, because that helps us put them in the right heap. For mapping, the reason these are separate:
- 1. The reason there’s a MAP_ flag at all: when it’s set we know to allocate the buffer in GPU-visible, CPU-visible memory if possible on discrete GPU systems.
- 1. These two flags are separate because they help avoid data races. Something I missed: we can probably limit buffers to be either both MAP_READ or MAP_WRITE but not both at the same time.
MM: these creation parameters won’t ever change?
CW: correct
MM: what’s the downside of marking everything either one or the other?
CW: when you have a vertex buffer you want to be device local. In GPU RAM, and only GPU RAM. When you’re going to write something once from the CPU (or not at all), and read or write it many times from the GPU. Then you want that allocated in GPU RAM, and you don’t put the map flags.
MM: would the immediate uploads work with such a buffer?
CW: yes. (Not sure if this is in the proposal.) The buffer must have been created with the usage TRANSFER_DESTINATION and not the MAP_ flags. WebGPU impl will stage a copy for you.
MM: on some impls this will need a staging buffer?
CW: I think on every impl setSubData will require a staging buffer. This seems OK; people will understand that this is not the most efficient path.
MM: general feedback: sounds pretty good. Would like us to be able to move buffers between the various heaps they should be in. Sounds like we would potentially retain that ability with this API?
CW: what does moving buffers between heaps mean?
MM: not exactly sure: basically, residency and eviction. We can’t pin pages on the GPU at any particular location for an arbitrary period of time. But not sure that this affects this API.
CW: we thought about this for Dawn a bit. All these APIs should be OK. Can do magic behind the app’s back. Biggest change for migrating buffers: always create buffers with TRANSFER_SOURCE and _DEST flags.
MM: two ways to do it. First way, how residency works: draw call requires resident resources, and call is delayed until they are resident. Other way: make sure the buffer can be used by the app no matter where it is. Still have freedom to move things around. Basically agree with you.
CW: having buffers be able to move around the GPU means you have to record at command submit time and not command record time. However, should we choose to record at command recording time, you just can’t move the buffer, but that’s an impl detail.
MM: need to figure out how much flexibility we need. Could be fine; probably is.
CW: my understanding is you’re in favor of setSubData. Do you like buffer mapping proposal 1 better than proposal 2?
MM: we agree that it’s common / valuable to allow authors to map only parts of a buffer.
CW: what about immediate data proposal 1, mapWriteSync and it does an allocation on the buffer side where it puts the data?
MM: we need one of them, don’t have a signal about which one is better.
KR: Vote for MapWriteSync. Thought about this many times for WebGL. At least in Chromium having the mapSync semantics returning shared memory allows removing one copy.
MM: sounds fine. Only other concern is that we need to be clear in some doc about which operations will see previous contents of buffers and which will see new contents. Hope it would be same for immediate uploads as well as then-able async mapping.
CW: failed to discuss this in proposal, but setSubData, mapWriteSync and map following mapWriteAsync will be as if they’re queue submits that set the content of the buffer to a certain value. The ordering of these operations is the same as queue submits between each other.
MM: which queue?
CW: assuming no multi-queue. If multiple queues, things become more complicated. In a single-queue world, all these operations become visible and are ordered the same way submits are between each other.
MM: interaction between buffers and GC: presumably GC shouldn’t collect buffers that are mapped?
CW: yes, this was brought up as an issue. Either all WebGPU mapped memory are garbage as well, so they’re all collected together, or the app wants to destroy the buffer now and calls buffer.destroy(). This would break links and cause everything to be collected faster. The GC is not discoverable.
KN: you can have a reference from WebGPUMappedMemory to buffer, preventing collection. But what if you have just a ref to the ArrayBufferView? Would it have an internal reference back to the WebGPUMappedMemory?
MM: does it really have a pointer to the underlying device’s memory? Because then it’s a magic object.
KN: think we’re trying to enable that.
CW: same for the ArrayBuffer. Would get neutered and unlinked.
KN: but then you can observe GC.
CW: ArrayBuffer should keep ref to WebGPUMappedMemory.
KN: when you lose links to all this stuff is everything unmapped? If you lose a WebGPUMappedMemory?
CW: the buffer is not unmapped.
KR: All Javascript engines should be able to make hidden references from the ArrayBuffer to the WebGPUMappedMemory/WebGPUBuffer.
RC: what if you lose all references to ArrayBuffer, WebGPUMappedMemory and buffer, but it’s still in the command buffer and you submit the cmdbuf after it’s been GC’d? Does cmdbuf hold reference to buffer? Presume it would.
MM: yes.
KN: think so. You can still delete a buffer if it’s being held by other things.
RC: what if you explicitly destroy something? Will it actually get destroyed when last ref to cmdbuf is released?
KN: think we want delete to immediately release memory and invalidate everything referring to it.
CW: interesting point. cmdbufs keep buffers alive. In Chrome will be a challenge.
JD: if you delete a buffer being used for memory does that fail?
KN: if you submit and then delete, delete happens after cmdbuf is submitted. Not quite immediate. Use then delete: OK. Delete then use: error.
MM: so “as if” queue submit happened?
CW: yes.
KN: we do need to have that term. Have several things in the spec phrased in terms of that concept.
CW: we can make more progress on the issue itself. Proposals are a bit underspecified.

Push constants #75 / Dynamic buffer offsets #116

CW: investigation is #116
MM: devshgraphicsprogramming@ said I was measuring the wrong thing in my microbenchmark. Said many different architectures will give the same GPU side behavior for both root contants and values in buffers.
CW: I didn’t see that comment.
MM: not much to say. If we’re still blocked on me writing benchmarks i’m behind on that, can do by the next call. We were discussing diff between push constants and dynamic buffer offsets. There’s a diff between not having push constants, and what I proposed which is to still have the API but fall back to using buffers. Dichotomy between push constants and dynamic buffer offsets is not the right dichotomy.
CW: they kind of fulfill the same purpose. Without allocating, you can change the offset into a buffer. devsh@ was saying, push constants would be used as offsets into arrays. Different but similar enough.
MM: maybe i should make a proposal for dynamic buffer offsets.
CW: I did, issue 116.
CW: says: dynamic buffers are implementable on the API. Metal doesn’t care because you pass the offset every time. D3D has root constant buffer views and UAVs. Would take root table space on D3D which is a cost. On WebGPU when you set a bind group would pass offsets too. Would be a limit.
MM: what’s the WebGPUBindingType?
CW: when you create a BindGroupLayout, says whether it’s a uniform buffer, sampler, storage buffer, etc.
CW: have to make it a different binding type because it translates differently on Vk and D3D12.
MM: if you are OK with adding the two new binding types it’s OK with me.
CW: let’s record: there are no concerns about dynamic buffer offsets. Google still thinks push constants are more powerful than this.
MM: is there a world in which we’d have both? Because if we might have both then we should go with what we agree on already.
CW: probably OK. Push constants and dynamic buffers both eat into D3D12 root table space. Would be unfortunate to have combined limit…
MM: we already have this. Can only set so many different bind groups.
CW: this number is basically 4 because of Vulkan. Unclear if we can fit push constant, 4 dynamic buffer offsets and 4 root table descriptors.
CW: there’s enough space in the root table for the push constants and all the bind groups, or dynamic buffer offsets + all the bind groups. Need to see if we have space for all 3.
CW: can talk about this next week

Multi-queue (stretch)

Let’s skip this week

Plans for a spec

CW: Intel’s made some changes but only in the IDL. Clearly should go into a spec.
CW: at F2F let’s try to finalize the IDL enough that we can take a snapshot and call it WebGPU Alpha 1.0, so that multiple browsers will prototype the IDL. Then we can write a spec.
DJ: think we can start parts of it now. The parts we do agree on, we could start writing stuff for. The sooner we have samples, the better. Even if IDL changes, changing spec / samples won’t be too bad. I’m semi-volunteering to start but may not be best person.
CW: RC mentioned maybe we can have a spec which contains just the IDL for now. Then we can start typing more stuff into it.
RC: thought was that we can have the Bikeshed file with just the IDL in it, and put comments, etc. Coalesce various explainers, etc. Think we should have a file which has a place to put comments.
MM: often there’s a spec editor. Do we want something like that?
DJ: someone can volunteer to start doing the work, but probably wouldn’t have a dedicated editor.
CW: someone should volunteer to make a barebones Bikeshed file with IDL and a no-op paragraph, and set up the build system.
DJ: I’ll volunteer for that.
CW: can discuss responsibilities at the F2F.

PR burndown

8 PRs: https://github.com/gpuweb/gpuweb/pulls
#111 Add an optional WebGPUDevice property to WebGPUSwapChainDescriptor
- Sounded like Justin and Kai were in agreement
- JF: if we want SwapChainDescriptor then it’s OK to use the same descriptor arg to configure things. We could require this during getContext.
- CW: think this was an agreement during the F2F. Sounds good. Only used when doing getContext.
- JF: not providing any ability for developer to switch devices at run time? If user drags the application to a different display backed by a different hardware device, what happens? Will we allow the developer to make that change on their own, or do that switch for them?
- CW: is there precedent in web platform for notifying on display change?
- KN: not sure. devicePixelRatio probably changes.
- KD: it changes but there’s no notification.
- MM: OS migrates things, or doesn’t.
- JF: would be ideal to intercept this and let app respond.
- KN: right now, if you have two devices for internal GPU and external GPU. With WebGL, we won’t do anything fancy, just keep rendering on internal GPU and display on external GPU. Here, we have an object representing external device, but swap chain from canvas is sort of abstract - don’t think it has to be associated with one display - but only makes sense to switch which device it’s on if we gave the app the ability to decide which device was running on. Would like app to be able to recreate swap chain.
- MM: app already can’t force which GPU it’s on. Just making requests like low/high power.
- KN: wouldn’t like to ask app to do it.
- CW: can we say, for MVP, we let the browser do it for you (blit to the screen)?
- KN: I don’t think we can lose context (the app’s device, etc.) when you move the window to a different monitor.
- MM: I guess it’s an exercise left to the reader on how to migrate without losing context.
- JF: I believe Kai that it doesn’t make sense to ask the developer to change GPUs if we don’t give them control over what GPU they’ll be on.
- MM: could we do auto migration if there’s a buffer that’s mapped?
- CW: you can always fake the mapping of a buffer, for example. Probably.
- KN: if mapped buffer is pointing at driver memory then we can’t.
- CW: could migrate at JS task boundary.
- MM: that’s the only time you could do it, cause that’s when you get the OS notification.
- KN: if it’s still mapped at migration time do you just kill the mapping?
- MM: magic ArrayBuffer- do whatever you want?
- CW: seems like a large rabbit hole.
- RC: dragging windows from one window to another shouldn’t lose context. But there are things that should lose context: for example, requesting high power, or losing the last high power request. Will need this for WebVR/WebXR to get good performance.
- KN: agree. Have been talking about this for WebGL: more aggressively losing contexts, and enforcing that apps handle these things more robustly. But don’t think this is a case where we should do this.
- JF: is this also a case where attaching/detaching eGPUs?
- KN: in that case we have no choice but to lose the context.
- MM: if there will be >0 times we’ll auto-migrate, then for each potential option we have to do a calculus about what we want.
- CW: if you unplug discrete adapter, that’s a context loss.
- KN: would technically be an impl detail. If you had a virtual device where you could migrate between real devices, like in OpenGL on Mac, that should look like a single device to the WebGPU app. For them, it’s not a context loss. But there are loss situations: unplugging eGPU, or on battery on power.
- CW: let’s move to a different PR.
#130 Make requestAdapter powerPreference optional
- CW: you get the adapter of the browser’s choosing.
- KN / KR: seems fine.
- DJ: after discussion agreed.
#133 Add multisample state to WebGPURenderPipelineDescriptor
- CW: only comment there was Myles wanted sample count in descriptor inlined in render pipeline descriptor. When we add more stuff we can reintroduce MS descriptor.
- RC: sure.
#135 Add APIs to set viewport and scissor rect for render pass
- MM: at some point we’ll have setViewports?
- CW: yes, and setViewport will just be setViewports of the first one.
#139 Rename Binding to BindGroupBinding
- Lots of “no opinion” votes (MM / CW / RC)
- KN votes to change to BindGroupBinding
- Have to wait for DM’s response

Agenda for next meeting

Push constants (again)
Compressed textures
- Not part of core spec -- an extension -- but good to talk about
Dynamic buffer offsets alignment restrictions - update the investigation?
Buffer mapping again - CW will have proposal with more spec-like text
Can leave agenda open for more topics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes 2018 12 03

GPU Web 2018-12-03

TL;DR

Tentative agenda

Attendance

Administrative

Buffer mapping #138

Push constants #75 / Dynamic buffer offsets #116

Multi-queue (stretch)

Plans for a spec

PR burndown

Agenda for next meeting

Clone this wiki locally