Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to upload data into buffers correctly #418

Closed
wants to merge 1 commit into from

Conversation

litherum
Copy link
Contributor

@litherum litherum commented Aug 20, 2019

setSubData() was removed in de471bf. We should consider adding it back for two reasons, neither of which were stated during the discussion when it was removed:

  1. Babylon.js uses it, and the Google WebGPU demos use it, so it must be useful. These demos represent a significant percentage (> 50%?) of the public WebGPU examples we know about.

  2. Without it, we would have API that’s easier to use incorrectly than to use correctly.

It is common practice in WebGL and native 3D graphics APIs to upload data for the current frame which commands are being recording commands for. Indeed, in WebGPU, have an implicit present after rAF() returns, so being able to upload your uniforms (or whatever) to a buffer before rAF() returns is important.

Using mapWriteAsync() requires you to wait on a promise, so if you do the naive thing and just wait on the promise inside rAF(), you’ll miss your present. Instead, the easiest way without setSubData() to upload data for the current frame would be to createBufferMapped() in each rAF(), which is unfortunate because it would cause runaway video memory growth. For small apps, this growth probably wouldn’t manifest as a performance problem, but would manifest as a battery life problem. The correct way to do it with mapWriteAsync() is for the application to create a pool of previously-mapped buffers, and manage the pool’s lifetime, state tracking, growth, etc. I’d expect the big engines to get this right, but I would expect most WebGPU applications would just do the simple bad thing of calling createBufferMapped() in each rAF(), which is an anti-pattern. If we gave them setSubData(), it’s more likely that more apps would actually do the right thing.

(You may say: “Why don’t we just make createBufferMapped() smart enough to recycle buffers?” However, this requires there to be a garbage collection every few frames, which would cause more problems than it would solve. Garbage collections often don’t occur until the system is running low on memory, which is way too late to be able to recycle these buffers reasonably.)

I’m just worried about a potential proliferation of WebGPU websites that have pathological memory growth because it's easier to write than to do the correct thing. The problem is potentially large because roughly every WebGPU application is going to want to upload a new set of uniforms for each frame, so potentially every app is going to be tempted to do the wrong thing.


Preview | Diff

@kdashg
Copy link
Contributor

kdashg commented Aug 21, 2019

This is only a problem because we forced mapping through a promise, so I think we should consider not doing that.

@kdashg
Copy link
Contributor

kdashg commented Aug 21, 2019

Functionally speaking, this is the same as createBufferMapped with an additional expectation that "the browser will do the right thing" without the app being clear about precisely what it wants the browser to do, which is an antipattern I hope to prevent.

Copy link
Contributor

@kdashg kdashg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll revisit this in a meeting, but I'll be clear that I am unconvinced on this.

@litherum
Copy link
Contributor Author

litherum commented Aug 21, 2019

This is only a problem because we forced mapping through a promise, so I think we should consider not doing that.

What do you propose instead? Preserving portability / safety properties are more important than runaway memory growth.

without the app being clear about precisely what it wants the browser to do

It's more important that we don't create an ecosystem of poorly performing / behaving apps than our APIs adhering to a philosophy.

@magcius
Copy link

magcius commented Aug 21, 2019

Could we replace mapWriteAsync returning a Promise with it taking a callback that is guaranteed to execute before any submitted queue bundles are executed? I agree that mapWriteAsync, as currently implemented, is difficult to use correctly.

@kainino0x
Copy link
Contributor

kainino0x commented Aug 21, 2019

createBufferMapped() in each rAF(), which is unfortunate because it would cause runaway video memory growth.

However, this requires there to be a garbage collection every few frames,

There's something extremely critical missing here: apps would createBufferMapped AND destroy the resulting buffer (EDIT: link to doc where we show this). Not doing so would constitute a memory leak since garbage collectors are not generally expected to be able to track pressure on GPU memory or separately track the GPU memory budget. (It works the same way in WebGL today: if you don't use delete*, the GC will collect your GPU memory resources, but you cannot expect it to do so promptly, and are more likely to run into future failures to allocate GPU memory.)

It may be good to update the arguments in the first post with this in mind.

@Kangz
Copy link
Contributor

Kangz commented Aug 21, 2019

Babylon.js uses it, and the Google WebGPU demos use it, so it must be useful. These demos represent a significant percentage (> 50%?) of the public WebGPU examples we know about.

For context, Babylon.js uses setSubData because that was the only upload path in Chromium's initial implementation of WebGPU they were working with. We recently implemented an efficient path for CreateBufferMapped and I think they will be using it soon. Once they do we'll remove setSubData to match the current WebGPU IDL (or keep it if this lands).

I'd be happy to have setSubData back as the "easy but no guarantee it is optimized" solution for data uploads.

@kvark
Copy link
Contributor

kvark commented Aug 21, 2019

@litherum

Babylon.js uses it, and the Google WebGPU demos use it, so it must be useful. These demos represent a significant percentage (> 50%?) of the public WebGPU examples we know about.

Babylon.js also uses WebGL to generate mipmaps. Google demos might just not have been updated. I mean, it's totally reasonable for early adopters to cut some corners, we shouldn't take their use as idiomatic just yet.

Also, speaking of the pool of apps, wgpu has moderate amount of user code, some using an equivalent of setSubData (in user-space), and some not. Question of "how do I update my data?" does raise from time to time, but what I like about the API today is that the answer is not about any new functions or primitives, it's about using the existing createBufferMapped and transfer commands.

Without it, we would have API that’s easier to use incorrectly than to use correctly.

This is referring to the fact it's hard to properly release the data for the implementation, leading to the growth of driver-allocated memory, right?

I think I would agree that setSubData makes it easier to use the API and reduces the dependency on garbage collection (in practice for the apps, not theory). At the same time, it:

  • makes the spec more complex: suddenly there are many ways to upload/update the data. It is also an implicit way to access the submission queue, which could be error-prone ([1]).
  • could be worse than createBufferMapped in performance, since it requires at least one data copy if the data isn't originally in an ArrayBuffer. createBufferMapped can be used without copies (for example, when creating a new vertex or index buffer from the group up).

Solution?

The problem you are trying to solve, the way I see it, is that createBufferMapped is not the best API for cases where the user needs to upload some data to the GPU. createBufferMapped doesn't imply when it's going to be used and when destroyed, so the implementation can't efficiently allocate temporary storage for it. What would be great to expose is:

  1. the fact that storage is temporary
  2. the fact that we are using it for transfers right now

Ideally, that would lead to an API along these lines:

let pass = device.getQueue().beginUpload();
pass.updateBuffer(buffer, offset, myData);
pass.updateTexture(texture, myData);
// alternatively, give the user memory to fill out
let data = pass.updateBuffer(buffer, offset, size);
fillMyData(data);
// no submissions are allowed while this pass is open
pass.finish();

If you find this similar to #154 - it is. I believe exposing it via the queue upload pass solves the major drawback of #154 (not knowing when command buffer is submitted).

Appendix

[1] We had some user code that iterated over entities and for each tried to update the data of a vertex buffer in order to draw:

let pass = commandEncoder.beginRenderPass(...);
let vertexBuffer = device.createBuffer(...);
for entity in entities {
  vertexBuffer.setSubData(...);
  pass.setVertexBuffer(vertexBuffer);
  pass.draw(...);
}

This is where implicit queue access fires back - it's confusing for the users to realize that setSubData is happening at a different timeline than whatever they are implying.

@litherum
Copy link
Contributor Author

@kvark This is really interesting, thanks for the proposal!

Am I understanding correctly that you are proposing this instead of mapWriteAsync()?

Another question: On UMA devices, does this proposal allow for zero-copy semantics? I believe that mapWriteAsync() does.

@kvark
Copy link
Contributor

kvark commented Aug 22, 2019

@litherum sorry, I didn't mean to hijack the PR discussion! And I'm not sure if the proposal can replace mapWriteAsync. If there is interest in the group, we can iterate on it and discuss it more.

@kainino0x
Copy link
Contributor

On UMA hardware + multi-process browser there's no way to do a synchronous zero-copy upload, so not really. (Even making all COPY_DST buffers mappable and persistently mapped across processes wouldn't work because we have to wait for a fence before it's OK to hand the buffer to the application.)

@magcius
Copy link

magcius commented Aug 26, 2019

FYI, I have been using an API inspired by #154 (well, actually more similar to #45) in my prototyping. While not as flexible as mapWriteAsync, it feels nicer to use in my experience.

The big drawback is that users will misunderstand the API, and try to emulate WebGL/D3D11 semantics by uploading in between each draw, with each draw in a separate command encoder. mapWriteAsync doesn't allow for "overlapping" writes, per se, since it's assumed to execute before the first submitted command buffer (I believe?). It might be possible to fix this by saying that uploads are always scheduled before command buffer execution, at the cost of some flexibility with regards to upload timing.

@Kangz
Copy link
Contributor

Kangz commented Aug 27, 2019

Getting back to @litherum's original comment about runaway memory growth: that's the reason we have GPUBuffer.destroy. I think it is a valid approach to use CreateBufferMapped every frame, if a bit inefficient, as long as the buffers are destroyed after the submission.

[1] We had some user code that iterated over entities and for each tried to update the data of a vertex buffer in order to draw

That seems like it will be a recurring problem for people new to explicit graphics APIs and will require good explainers of how to structure your frame. The same issue could happen with the "upload pass" with users making small upload passes between submission with one draw each. We'll need to educate developers on how to wield WebGPU.

With that said, I think setSubData is a nice middle ground because it is extremely easy to understand (as long as you know it is instant wrt queue submission) and the extra copy happening is ok because if you want the most performance, then you should use CreateBufferMapped/Async.

@litherum
Copy link
Contributor Author

litherum commented Aug 27, 2019

that's the reason we have GPUBuffer.destroy

Right, I know it’s possible to do data transfers correctly with the existing API. However, the existing API is easier to use incorrectly than to use correctly. It’s easier to forget to call destroy() than to call it. Manual memory malloc/free is an anti-pattern on the Web.

@magcius
Copy link

magcius commented Aug 27, 2019

The createBufferMapped() solution also creates new garbage every frame, which accelerates GC pause stalls. For an MVP, this is fine, but I would really like to see a solution that does not introduce per-frame garbage.

@litherum
Copy link
Contributor Author

On UMA hardware + multi-process browser there's no way to do a synchronous zero-copy upload

Right; there is no solution where we can always do zero-copy. However, there are solutions where we can sometimes do zero-copy. I believe mapWriteAsync() is one of them. I’m wondering if @kvark’s idea is also one of them.

@Kangz
Copy link
Contributor

Kangz commented Aug 28, 2019

I think @kvark's solution structured like below could theoretically do zero-copy if the browser has enough "zero-copy staging" available in the content process.

let pass = queue.beginUploadPass();
let stagingData = pass.uploadBuffer(buffer, offset, size);
fill(stagingData);
pass.endPass();
// Queue submit disallowed before this point.

It is pretty complicated and the browser will need a lot of heuristics to request the correct amount of zero-copy staging in the content process. It is basically like implementing the RingBufferOfBuffers in the browser but because of the heuristics I think it is better to let the applications handle it themselves.

@litherum
Copy link
Contributor Author

litherum commented Sep 2, 2019

I think @kvark's solution structured like below could theoretically do zero-copy

Is that really true? Passes must execute as-if they are executed in queue submission order. At recording time, the browser has no idea whether that resource will be used for another purpose before currently recording commands will be submitted. If we “execute” the updateBuffer() command at recording time, we clobber whatever was in the buffer too early.

Or, does this proposal also include relaxing the constraint that passes are executed as-if they are in-order?

@Kangz
Copy link
Contributor

Kangz commented Sep 2, 2019

The important part is that the beginUploadPass blocks queue submits until it is finished, this means we know exactly the ordering of its commands with respect to other command buffer (they happen before).

That said I'd rather have setSubData:

It is pretty complicated and the browser will need a lot of heuristics to request the correct amount of zero-copy staging in the content process. It is basically like implementing the RingBufferOfBuffers in the browser but because of the heuristics I think it is better to let the applications handle it themselves.

@litherum
Copy link
Contributor Author

litherum commented Sep 6, 2019

beginUploadPass blocks queue submits until it is finished

This is pretty unfortunate. It means this pass has different semantics than any of the other passes. It seems to be a harder-to-use version of setSubData.

@kvark
Copy link
Contributor

kvark commented Sep 6, 2019

@litherum it does have a different semantics indeed. setSubData() can be trivially implemented on top. This special pass is more powerful and flexible - it can support texture uploads, which are quite handy and not covered by setSubData().

@grovesNL
Copy link
Contributor

grovesNL commented Oct 6, 2019

I'd be happy to have setSubData back as the "easy but no guarantee it is optimized" solution for data uploads.

Doesn't this cause some of the same issues originally discussed though? For example, implementations will end up spending a lot of effort optimizing setSubData, because that's the easy method that most people use by default anyway.

If there's some agreement that the current approach is too difficult for users to use correctly, maybe we should iterate on the design further instead of adding this helper directly into implementations. I think some of the ideas above seem like a good start in this direction, i.e. considering approaches which don't require the use of Promises, or variations on #154.

@litherum litherum changed the title Add GPUBuffer.setSubData() back Make it easier to upload data into buffers correctly Oct 7, 2019
@kdashg
Copy link
Contributor

kdashg commented Oct 17, 2019

I think this is reasonably straight-forward to polyfill, so long as destroy before submit doesn't cause the submit to fail:

GPUCommandEncoder.copyDataToBuffer = function(src_data, src_offset,
                                              dst_buffer, dst_offset, size)
{
   size = size || (src_data.byteLength - src_offset);
   
   const [upload_buf, map] = this.device.createBufferMapped({
      size: size,
      usage: GPUBufferUsage.COPY_SRC,
   });
   map.set(src_data, src_offset);
   upload_buf.unmap();
   
   this.copyBufferToBuffer(upload_buf, 0, dst_buffer, dst_offset, size);
   upload_buf.destroy();
};

@kainino0x
Copy link
Contributor

Unfortunately, I think the resolution on destroy at the F2F makes the submit fail.

This would be quite nice though.

@kdashg
Copy link
Contributor

kdashg commented Oct 18, 2019

Alright, well:

GPUQueue.copyDataToBuffer = function(src_data, src_offset,
                                     dst_buffer, dst_offset, size)
{
   size = size || (src_data.byteLength - src_offset);
   
   const [upload_buf, map] = this.device.createBufferMapped({
      size: size,
      usage: GPUBufferUsage.COPY_SRC,
   });
   map.set(src_data, src_offset);
   upload_buf.unmap();

   const enc = this.device.createCommandEncoder();
   enc.copyBufferToBuffer(upload_buf, 0, dst_buffer, dst_offset, size);
   const cbuf = enc.finish();
   this.submit([cbuf]);
   
   upload_buf.destroy();
};

@kainino0x
Copy link
Contributor

kainino0x commented Oct 18, 2019

Looks like a good update to the the old version (the first, simpler one).

Although GPUQueue.device doesn't exist - we could consider adding it.

Separately, I actually liked the version that puts it on the encoder. Aside from the destroy not working, it's a pretty nice solution (IMO) for doing mid-command-buffer buffer updates like Dzmitry wanted to do.

Not sure whether it's a good idea, but I think we could make destroy() compatible with that usage.

@litherum
Copy link
Contributor Author

Discussed on the 2019-10-21 call

@Kangz
Copy link
Contributor

Kangz commented Oct 25, 2019

As promised (but after the homework deadline) I made a proposal for buffer partitioning. I wasn't able to make it work nicely.

Overall I think @jdashg's interface as a GPUQueue.writeToBuffer(buffer, start, data) is really nice to use and better than GPUBuffer.setSubData(start, data) because it makes it clear what the ordering is with respect to other operations: it is the GPU that updates the buffer, and not the buffer that updates itself out of thin air.

writeToBuffer is a convenience API that will need some work to be optimized but I think it's worth it, it makes it really easy to just put data on the GPU, and because it is immediate it is easier to optimize than alternatives.

@kvark
Copy link
Contributor

kvark commented Oct 25, 2019

@Kangz the good thing about writeToBuffer is that theoretically it can write directly into the buffer. However, the main benefit of this function is being able to stage memory uploads in a linear fashion, without involving any generic allocators and arbitrary lifetimes. In that sense, the same infrastructure could be used for texture uploads.

@kainino0x
Copy link
Contributor

Should we move copyImageBitmapToTexture onto the GPUQueue?

@hjlld
Copy link

hjlld commented Jan 3, 2020

@jdashg ArrayBuffer does not have a set method, should be

   const [upload_buf, map] = this.device.createBufferMapped({
      size: size,
      usage: GPUBufferUsage.COPY_SRC,
   });
   let view = new Float32Array( map );
   view.set(src_data, src_offset);
   upload_buf.unmap();

or create a new DataView(map), use dataview.setFloat32() in a loop to setup the arraybuffer.

@Kangz
Copy link
Contributor

Kangz commented Feb 26, 2020

Should we close this in favor of #509 ?

@litherum
Copy link
Contributor Author

Should we close this in favor of #509 ?

Okay

@litherum litherum closed this Feb 26, 2020
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022
…t's store type must match the store type of the variable. (gpuweb#418)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants