Make it easier to upload data into buffers correctly #418

litherum · 2019-08-20T23:05:41Z

setSubData() was removed in de471bf. We should consider adding it back for two reasons, neither of which were stated during the discussion when it was removed:

Babylon.js uses it, and the Google WebGPU demos use it, so it must be useful. These demos represent a significant percentage (> 50%?) of the public WebGPU examples we know about.
Without it, we would have API that’s easier to use incorrectly than to use correctly.

It is common practice in WebGL and native 3D graphics APIs to upload data for the current frame which commands are being recording commands for. Indeed, in WebGPU, have an implicit present after rAF() returns, so being able to upload your uniforms (or whatever) to a buffer before rAF() returns is important.

Using mapWriteAsync() requires you to wait on a promise, so if you do the naive thing and just wait on the promise inside rAF(), you’ll miss your present. Instead, the easiest way without setSubData() to upload data for the current frame would be to createBufferMapped() in each rAF(), which is unfortunate because it would cause runaway video memory growth. For small apps, this growth probably wouldn’t manifest as a performance problem, but would manifest as a battery life problem. The correct way to do it with mapWriteAsync() is for the application to create a pool of previously-mapped buffers, and manage the pool’s lifetime, state tracking, growth, etc. I’d expect the big engines to get this right, but I would expect most WebGPU applications would just do the simple bad thing of calling createBufferMapped() in each rAF(), which is an anti-pattern. If we gave them setSubData(), it’s more likely that more apps would actually do the right thing.

(You may say: “Why don’t we just make createBufferMapped() smart enough to recycle buffers?” However, this requires there to be a garbage collection every few frames, which would cause more problems than it would solve. Garbage collections often don’t occur until the system is running low on memory, which is way too late to be able to recycle these buffers reasonably.)

I’m just worried about a potential proliferation of WebGPU websites that have pathological memory growth because it's easier to write than to do the correct thing. The problem is potentially large because roughly every WebGPU application is going to want to upload a new set of uniforms for each frame, so potentially every app is going to be tempted to do the wrong thing.

Preview | Diff

Revert of de471bf.

kdashg · 2019-08-21T00:08:02Z

This is only a problem because we forced mapping through a promise, so I think we should consider not doing that.

kdashg · 2019-08-21T00:09:07Z

Functionally speaking, this is the same as createBufferMapped with an additional expectation that "the browser will do the right thing" without the app being clear about precisely what it wants the browser to do, which is an antipattern I hope to prevent.

kdashg

We'll revisit this in a meeting, but I'll be clear that I am unconvinced on this.

litherum · 2019-08-21T01:57:45Z

This is only a problem because we forced mapping through a promise, so I think we should consider not doing that.

What do you propose instead? Preserving portability / safety properties are more important than runaway memory growth.

without the app being clear about precisely what it wants the browser to do

It's more important that we don't create an ecosystem of poorly performing / behaving apps than our APIs adhering to a philosophy.

magcius · 2019-08-21T02:01:34Z

Could we replace mapWriteAsync returning a Promise with it taking a callback that is guaranteed to execute before any submitted queue bundles are executed? I agree that mapWriteAsync, as currently implemented, is difficult to use correctly.

kainino0x · 2019-08-21T02:57:33Z

createBufferMapped() in each rAF(), which is unfortunate because it would cause runaway video memory growth.

However, this requires there to be a garbage collection every few frames,

There's something extremely critical missing here: apps would createBufferMapped AND destroy the resulting buffer (EDIT: link to doc where we show this). Not doing so would constitute a memory leak since garbage collectors are not generally expected to be able to track pressure on GPU memory or separately track the GPU memory budget. (It works the same way in WebGL today: if you don't use delete*, the GC will collect your GPU memory resources, but you cannot expect it to do so promptly, and are more likely to run into future failures to allocate GPU memory.)

It may be good to update the arguments in the first post with this in mind.

Kangz · 2019-08-21T12:39:22Z

Babylon.js uses it, and the Google WebGPU demos use it, so it must be useful. These demos represent a significant percentage (> 50%?) of the public WebGPU examples we know about.

For context, Babylon.js uses setSubData because that was the only upload path in Chromium's initial implementation of WebGPU they were working with. We recently implemented an efficient path for CreateBufferMapped and I think they will be using it soon. Once they do we'll remove setSubData to match the current WebGPU IDL (or keep it if this lands).

I'd be happy to have setSubData back as the "easy but no guarantee it is optimized" solution for data uploads.

kvark · 2019-08-21T18:55:11Z

@litherum

Babylon.js uses it, and the Google WebGPU demos use it, so it must be useful. These demos represent a significant percentage (> 50%?) of the public WebGPU examples we know about.

Babylon.js also uses WebGL to generate mipmaps. Google demos might just not have been updated. I mean, it's totally reasonable for early adopters to cut some corners, we shouldn't take their use as idiomatic just yet.

Also, speaking of the pool of apps, wgpu has moderate amount of user code, some using an equivalent of setSubData (in user-space), and some not. Question of "how do I update my data?" does raise from time to time, but what I like about the API today is that the answer is not about any new functions or primitives, it's about using the existing createBufferMapped and transfer commands.

Without it, we would have API that’s easier to use incorrectly than to use correctly.

This is referring to the fact it's hard to properly release the data for the implementation, leading to the growth of driver-allocated memory, right?

I think I would agree that setSubData makes it easier to use the API and reduces the dependency on garbage collection (in practice for the apps, not theory). At the same time, it:

makes the spec more complex: suddenly there are many ways to upload/update the data. It is also an implicit way to access the submission queue, which could be error-prone ([1]).
could be worse than createBufferMapped in performance, since it requires at least one data copy if the data isn't originally in an ArrayBuffer. createBufferMapped can be used without copies (for example, when creating a new vertex or index buffer from the group up).

Solution?

The problem you are trying to solve, the way I see it, is that createBufferMapped is not the best API for cases where the user needs to upload some data to the GPU. createBufferMapped doesn't imply when it's going to be used and when destroyed, so the implementation can't efficiently allocate temporary storage for it. What would be great to expose is:

the fact that storage is temporary
the fact that we are using it for transfers right now

Ideally, that would lead to an API along these lines:

let pass = device.getQueue().beginUpload();
pass.updateBuffer(buffer, offset, myData);
pass.updateTexture(texture, myData);
// alternatively, give the user memory to fill out
let data = pass.updateBuffer(buffer, offset, size);
fillMyData(data);
// no submissions are allowed while this pass is open
pass.finish();

If you find this similar to #154 - it is. I believe exposing it via the queue upload pass solves the major drawback of #154 (not knowing when command buffer is submitted).

Appendix

[1] We had some user code that iterated over entities and for each tried to update the data of a vertex buffer in order to draw:

let pass = commandEncoder.beginRenderPass(...);
let vertexBuffer = device.createBuffer(...);
for entity in entities {
  vertexBuffer.setSubData(...);
  pass.setVertexBuffer(vertexBuffer);
  pass.draw(...);
}

This is where implicit queue access fires back - it's confusing for the users to realize that setSubData is happening at a different timeline than whatever they are implying.

litherum · 2019-08-22T03:24:26Z

@kvark This is really interesting, thanks for the proposal!

Am I understanding correctly that you are proposing this instead of mapWriteAsync()?

Another question: On UMA devices, does this proposal allow for zero-copy semantics? I believe that mapWriteAsync() does.

kvark · 2019-08-22T04:42:25Z

@litherum sorry, I didn't mean to hijack the PR discussion! And I'm not sure if the proposal can replace mapWriteAsync. If there is interest in the group, we can iterate on it and discuss it more.

kainino0x · 2019-08-23T21:01:03Z

On UMA hardware + multi-process browser there's no way to do a synchronous zero-copy upload, so not really. (Even making all COPY_DST buffers mappable and persistently mapped across processes wouldn't work because we have to wait for a fence before it's OK to hand the buffer to the application.)

magcius · 2019-08-26T19:13:55Z

FYI, I have been using an API inspired by #154 (well, actually more similar to #45) in my prototyping. While not as flexible as mapWriteAsync, it feels nicer to use in my experience.

The big drawback is that users will misunderstand the API, and try to emulate WebGL/D3D11 semantics by uploading in between each draw, with each draw in a separate command encoder. mapWriteAsync doesn't allow for "overlapping" writes, per se, since it's assumed to execute before the first submitted command buffer (I believe?). It might be possible to fix this by saying that uploads are always scheduled before command buffer execution, at the cost of some flexibility with regards to upload timing.

Kangz · 2019-08-27T10:24:46Z

Getting back to @litherum's original comment about runaway memory growth: that's the reason we have GPUBuffer.destroy. I think it is a valid approach to use CreateBufferMapped every frame, if a bit inefficient, as long as the buffers are destroyed after the submission.

[1] We had some user code that iterated over entities and for each tried to update the data of a vertex buffer in order to draw

That seems like it will be a recurring problem for people new to explicit graphics APIs and will require good explainers of how to structure your frame. The same issue could happen with the "upload pass" with users making small upload passes between submission with one draw each. We'll need to educate developers on how to wield WebGPU.

With that said, I think setSubData is a nice middle ground because it is extremely easy to understand (as long as you know it is instant wrt queue submission) and the extra copy happening is ok because if you want the most performance, then you should use CreateBufferMapped/Async.

litherum · 2019-08-27T16:27:40Z

that's the reason we have GPUBuffer.destroy

Right, I know it’s possible to do data transfers correctly with the existing API. However, the existing API is easier to use incorrectly than to use correctly. It’s easier to forget to call destroy() than to call it. Manual memory malloc/free is an anti-pattern on the Web.

magcius · 2019-08-27T16:30:34Z

The createBufferMapped() solution also creates new garbage every frame, which accelerates GC pause stalls. For an MVP, this is fine, but I would really like to see a solution that does not introduce per-frame garbage.

litherum · 2019-08-27T16:31:01Z

On UMA hardware + multi-process browser there's no way to do a synchronous zero-copy upload

Right; there is no solution where we can always do zero-copy. However, there are solutions where we can sometimes do zero-copy. I believe mapWriteAsync() is one of them. I’m wondering if @kvark’s idea is also one of them.

Kangz · 2019-08-28T16:34:50Z

I think @kvark's solution structured like below could theoretically do zero-copy if the browser has enough "zero-copy staging" available in the content process.

let pass = queue.beginUploadPass();
let stagingData = pass.uploadBuffer(buffer, offset, size);
fill(stagingData);
pass.endPass();
// Queue submit disallowed before this point.

It is pretty complicated and the browser will need a lot of heuristics to request the correct amount of zero-copy staging in the content process. It is basically like implementing the RingBufferOfBuffers in the browser but because of the heuristics I think it is better to let the applications handle it themselves.

litherum · 2019-09-02T00:00:43Z

I think @kvark's solution structured like below could theoretically do zero-copy

Is that really true? Passes must execute as-if they are executed in queue submission order. At recording time, the browser has no idea whether that resource will be used for another purpose before currently recording commands will be submitted. If we “execute” the updateBuffer() command at recording time, we clobber whatever was in the buffer too early.

Or, does this proposal also include relaxing the constraint that passes are executed as-if they are in-order?

Kangz · 2019-09-02T13:14:15Z

The important part is that the beginUploadPass blocks queue submits until it is finished, this means we know exactly the ordering of its commands with respect to other command buffer (they happen before).

That said I'd rather have setSubData:

It is pretty complicated and the browser will need a lot of heuristics to request the correct amount of zero-copy staging in the content process. It is basically like implementing the RingBufferOfBuffers in the browser but because of the heuristics I think it is better to let the applications handle it themselves.

litherum · 2019-09-06T04:12:30Z

beginUploadPass blocks queue submits until it is finished

This is pretty unfortunate. It means this pass has different semantics than any of the other passes. It seems to be a harder-to-use version of setSubData.

kvark · 2019-09-06T14:21:08Z

@litherum it does have a different semantics indeed. setSubData() can be trivially implemented on top. This special pass is more powerful and flexible - it can support texture uploads, which are quite handy and not covered by setSubData().

grovesNL · 2019-10-06T16:29:15Z

I'd be happy to have setSubData back as the "easy but no guarantee it is optimized" solution for data uploads.

Doesn't this cause some of the same issues originally discussed though? For example, implementations will end up spending a lot of effort optimizing setSubData, because that's the easy method that most people use by default anyway.

If there's some agreement that the current approach is too difficult for users to use correctly, maybe we should iterate on the design further instead of adding this helper directly into implementations. I think some of the ideas above seem like a good start in this direction, i.e. considering approaches which don't require the use of Promises, or variations on #154.

kdashg · 2019-10-17T22:52:48Z

I think this is reasonably straight-forward to polyfill, so long as destroy before submit doesn't cause the submit to fail:

GPUCommandEncoder.copyDataToBuffer = function(src_data, src_offset,
                                              dst_buffer, dst_offset, size)
{
   size = size || (src_data.byteLength - src_offset);
   
   const [upload_buf, map] = this.device.createBufferMapped({
      size: size,
      usage: GPUBufferUsage.COPY_SRC,
   });
   map.set(src_data, src_offset);
   upload_buf.unmap();
   
   this.copyBufferToBuffer(upload_buf, 0, dst_buffer, dst_offset, size);
   upload_buf.destroy();
};

kainino0x · 2019-10-18T06:17:43Z

Unfortunately, I think the resolution on destroy at the F2F makes the submit fail.

This would be quite nice though.

kdashg · 2019-10-18T22:12:16Z

Alright, well:

GPUQueue.copyDataToBuffer = function(src_data, src_offset,
                                     dst_buffer, dst_offset, size)
{
   size = size || (src_data.byteLength - src_offset);
   
   const [upload_buf, map] = this.device.createBufferMapped({
      size: size,
      usage: GPUBufferUsage.COPY_SRC,
   });
   map.set(src_data, src_offset);
   upload_buf.unmap();

   const enc = this.device.createCommandEncoder();
   enc.copyBufferToBuffer(upload_buf, 0, dst_buffer, dst_offset, size);
   const cbuf = enc.finish();
   this.submit([cbuf]);
   
   upload_buf.destroy();
};

kainino0x · 2019-10-18T23:17:19Z

Looks like a good update to the the old version (the first, simpler one).

Although GPUQueue.device doesn't exist - we could consider adding it.

Separately, I actually liked the version that puts it on the encoder. Aside from the destroy not working, it's a pretty nice solution (IMO) for doing mid-command-buffer buffer updates like Dzmitry wanted to do.

Not sure whether it's a good idea, but I think we could make destroy() compatible with that usage.

litherum · 2019-10-21T19:49:10Z

Discussed on the 2019-10-21 call

Kangz · 2019-10-25T15:12:00Z

As promised (but after the homework deadline) I made a proposal for buffer partitioning. I wasn't able to make it work nicely.

Overall I think @jdashg's interface as a GPUQueue.writeToBuffer(buffer, start, data) is really nice to use and better than GPUBuffer.setSubData(start, data) because it makes it clear what the ordering is with respect to other operations: it is the GPU that updates the buffer, and not the buffer that updates itself out of thin air.

writeToBuffer is a convenience API that will need some work to be optimized but I think it's worth it, it makes it really easy to just put data on the GPU, and because it is immediate it is easier to optimize than alternatives.

kvark · 2019-10-25T15:14:59Z

@Kangz the good thing about writeToBuffer is that theoretically it can write directly into the buffer. However, the main benefit of this function is being able to stage memory uploads in a linear fashion, without involving any generic allocators and arbitrary lifetimes. In that sense, the same infrastructure could be used for texture uploads.

kainino0x · 2019-10-25T18:17:43Z

Should we move copyImageBitmapToTexture onto the GPUQueue?

hjlld · 2020-01-03T12:06:33Z

@jdashg ArrayBuffer does not have a set method, should be

   const [upload_buf, map] = this.device.createBufferMapped({
      size: size,
      usage: GPUBufferUsage.COPY_SRC,
   });
   let view = new Float32Array( map );
   view.set(src_data, src_offset);
   upload_buf.unmap();

or create a new DataView(map), use dataview.setFloat32() in a loop to setup the arraybuffer.

Kangz · 2020-02-26T14:25:00Z

Should we close this in favor of #509 ?

litherum · 2020-02-26T15:41:23Z

Should we close this in favor of #509 ?

Okay

…t's store type must match the store type of the variable. (gpuweb#418)

Add GPUBuffer.setSubData() back

b9738ce

Revert of de471bf.

kdashg requested changes Aug 21, 2019

View reviewed changes

bbernhar mentioned this pull request Sep 23, 2019

Update all fish data in one big buffer webatintel/aquarium#24

Merged

litherum changed the title ~~Add GPUBuffer.setSubData() back~~ Make it easier to upload data into buffers correctly Oct 7, 2019

kvark mentioned this pull request Nov 7, 2019

Proposal: queued data transfers #491

Closed

kvark mentioned this pull request Dec 3, 2019

GPUQueue.writeBuffer/writeTexture #509

Closed

litherum closed this Feb 26, 2020

kainino0x mentioned this pull request Apr 13, 2020

Recycling ArrayBuffer in createBufferMapped #697

Open

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022

[wgsl] Add validation test - v-0033: If the initializer is present, i…

7e533d6

…t's store type must match the store type of the variable. (gpuweb#418)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to upload data into buffers correctly #418

Make it easier to upload data into buffers correctly #418

litherum commented Aug 20, 2019 •

edited

kdashg commented Aug 21, 2019

kdashg commented Aug 21, 2019

kdashg left a comment

litherum commented Aug 21, 2019 •

edited

magcius commented Aug 21, 2019 •

edited

kainino0x commented Aug 21, 2019 •

edited

Kangz commented Aug 21, 2019

kvark commented Aug 21, 2019 •

edited

litherum commented Aug 22, 2019

kvark commented Aug 22, 2019

kainino0x commented Aug 23, 2019

magcius commented Aug 26, 2019

Kangz commented Aug 27, 2019

litherum commented Aug 27, 2019 •

edited

magcius commented Aug 27, 2019

litherum commented Aug 27, 2019

Kangz commented Aug 28, 2019

litherum commented Sep 2, 2019 •

edited

Kangz commented Sep 2, 2019

litherum commented Sep 6, 2019

kvark commented Sep 6, 2019

grovesNL commented Oct 6, 2019

kdashg commented Oct 17, 2019

kainino0x commented Oct 18, 2019

kdashg commented Oct 18, 2019 •

edited by litherum

kainino0x commented Oct 18, 2019 •

edited

litherum commented Oct 21, 2019

Kangz commented Oct 25, 2019

kvark commented Oct 25, 2019

kainino0x commented Oct 25, 2019

hjlld commented Jan 3, 2020

Kangz commented Feb 26, 2020

litherum commented Feb 26, 2020

Make it easier to upload data into buffers correctly #418

Make it easier to upload data into buffers correctly #418

Conversation

litherum commented Aug 20, 2019 • edited

kdashg commented Aug 21, 2019

kdashg commented Aug 21, 2019

kdashg left a comment

Choose a reason for hiding this comment

litherum commented Aug 21, 2019 • edited

magcius commented Aug 21, 2019 • edited

kainino0x commented Aug 21, 2019 • edited

Kangz commented Aug 21, 2019

kvark commented Aug 21, 2019 • edited

Solution?

Appendix

litherum commented Aug 22, 2019

kvark commented Aug 22, 2019

kainino0x commented Aug 23, 2019

magcius commented Aug 26, 2019

Kangz commented Aug 27, 2019

litherum commented Aug 27, 2019 • edited

magcius commented Aug 27, 2019

litherum commented Aug 27, 2019

Kangz commented Aug 28, 2019

litherum commented Sep 2, 2019 • edited

Kangz commented Sep 2, 2019

litherum commented Sep 6, 2019

kvark commented Sep 6, 2019

grovesNL commented Oct 6, 2019

kdashg commented Oct 17, 2019

kainino0x commented Oct 18, 2019

kdashg commented Oct 18, 2019 • edited by litherum

kainino0x commented Oct 18, 2019 • edited

litherum commented Oct 21, 2019

Kangz commented Oct 25, 2019

kvark commented Oct 25, 2019

kainino0x commented Oct 25, 2019

hjlld commented Jan 3, 2020

Kangz commented Feb 26, 2020

litherum commented Feb 26, 2020

litherum commented Aug 20, 2019 •

edited

litherum commented Aug 21, 2019 •

edited

magcius commented Aug 21, 2019 •

edited

kainino0x commented Aug 21, 2019 •

edited

kvark commented Aug 21, 2019 •

edited

litherum commented Aug 27, 2019 •

edited

litherum commented Sep 2, 2019 •

edited

kdashg commented Oct 18, 2019 •

edited by litherum

kainino0x commented Oct 18, 2019 •

edited