Mapping buffers synchronously #9

nical · 2019-05-19T11:47:23Z

Let's consider a scenario where one would need to map a transfer buffer and fill it with data needed for this frame. In addition, let's take it for granted that we don't want to allocate a new buffer each frame (let me know if this constraint isn't reasonable, but since it's recommended practice in vulkan and friends, I'd like to figure out a way to fulfill it).

The two ways I see to map buffers is to use create_buffer_mapped and map_read/write_async.

The former obviously only works when you are creating the buffer so it is not enough when your goal is to have a finite amount of buffers that you reuse.

The latter (map_read/write_async) has interesting constraints:

The callback doesn't fire until the buffer is ready to be mapped without stalling.
The callback gives you the mapped slice of data which can't escape the callback.

@kvark told me about a neat trick to ensure a buffer is ready to be mapped: once the a buffer has been used in a frame, call map_write_async with a callback that notifies your rendering system that the buffer can be mapped.

Doing that is actually a little convoluted because of rust's unforgiving borrow checker:

You have to use channels to communicate from the callback to your pool of transfer buffers.
You can't move the buffer into the callback because the buffer is borrowed for the duration of the method that you give the callback to.
Once the callback fires you get that mapped slice you asked for but you don't want the mapped data now, you'll want it next time you dequeue this buffer from your pool. So you'll have to map that buffer again using map_write_async

Even if the callback trick gives you the guarantee that the buffer is mappable, I am very uneasy about relying on it firing inside of map_write_async, because that's relying on an implementation detail which could change without affecting the public API.
In addition, for soundness reasons the callback cannot borrow its environment which makes it very inconvenient when you know it has it fire synchronously.

Proposed API change

Instead (or in addition to) being able to map the buffer asynchronously, we could asynchronously request a mappable buffer. The API could look like:

    pub fn request_write_mappable_buffer_async<F>(&self, start: u32, size: u32, callback: F) where
    F: FnOnce(WriteMappableBuffer) 
}

WriteMappableBuffer would be able to give you synchronous access to its data, for example through a map<T>(&mut self) -> &mut[T] method or a method that takes a closure which can borrow its environment.
In order to get back a wgpu::Buffer that can be used in a command queue, one would call WriteMappableBuffer::finish(self) -> wgpu::Buffer.

Note how request_write_mappable_buffer_async consumes the buffer. This ensures that:

the buffer isn't used while the request is in flight
the we get a mappable buffer in the callback which we are free to send to our pool and it has ownership of the buffer.

In addition, create_buffer_mapped could be replaced with create_mappable_buffer and returned a mappable buffer for sonsistency.

Please don't attach any importance to the names in this proposal, I agree they aren't very juicy and I trust you can find more suitabke ones if need be.
Also I'm not sure what makes most sense between asking for a mappable buffer (where mapping would happen when you ask for a slice in the buffer) and asking for a mapped (which holds on to a pointer to the already mapped data). This distinction is, I am sure, very important for wgpu's implementation, but either way is fine from the point of view the user of this API.

The important parts of this proposal is that (in my opinion) rust makes the current async API very hard to work with, whereas something closer to the proposed API would let wgpu have a way to express buffers that can be mapped synchronously even if obtaining them happens through asynchronous means.

The text was updated successfully, but these errors were encountered:

nical · 2019-05-19T11:58:07Z

I anticipate concerns about the device potentially being dropped while the mappable/mapped buffer exists.

An API like

let slice = device.write_map_buffer(mappable_buffer, range);
// or
let slice = mappable_buffer.write_map_buffer(range, &device);
// or
let mapped_buffer = device.write_map_buffer(mappable_buffer, range);
let slice = &mapped_buffer.data[..];
// etc.

to ensure that the mapped buffer does not outlive the device would work as well.

kvark · 2019-05-20T14:56:21Z

Wow, thank you for the proposal!

let's take it for granted that we don't want to allocate a new buffer each frame (let me know if this constraint isn't reasonable, but since it's recommended practice in vulkan and friends, I'd like to figure out a way to fulfill it).

This is not something to take for granted. Let me explain.

In Vulkan/D3D12 resources are "placed" in the allocated memory. Creation (or "placement") of a new resource is light, could easily be done many times per frame. Allocating memory is not light, and that's what nobody should do on a per-frame basis. This is an important distinction, since WebGPU doesn't expose the action of memory allocation.

Buffers are very flexible for several reasons:

all the low-level APIs guarantee that if one can create a buffer with some GPU usage, they can also make it mappable to the host
buffers are described by simply an array of bytes, unlike textures, where there are things like dimensions, tiling, format, etc
a memory allocation for a buffer is not distinguishable from just a large buffer, from which we sub-allocate

For these reasons, it's very easy for WebGPU implementation to manage buffer allocations internally and make them virtually instant from the user perspective, to the point where it's completely fine to create them per-frame. That's why the current rough and effective way to update data is to go through create_buffer_mapped.

nical · 2019-05-20T15:15:03Z

For these reasons, it's very easy for WebGPU implementation to manage buffer allocations internally and make them virtually instant from the user perspective, to the point where it's completely fine to create them per-frame.

That would be (or is, if this is already part of the implementation) great. It means I don't have to maintain a pool of staging buffers, and I appreciate having one less thing to deal with.

I think that the overall direction of the proposed changes are still worth considering since I belive it would be a real ergonomic improvement when mapping an already created buffer is needed (even if not in the context of staging buffers).

If you are busy but still think this is worth investigating, I'd be OK with spending some time standing up a prototype (unless you already have other plans around this part the API).

kvark · 2019-05-20T17:06:24Z

@nical in the near future we are going to be using Rendy for descriptor and memory allocation, and that's going to give us pooling and re-use.

If you are busy but still think this is worth investigating, I'd be OK with spending some time standing up a prototype (unless you already have other plans around this part the API).

I'd like to see a bit more details about the proposal. What struct is this method on? How does one polyfill create_buffer_mapped with it, etc. If you can make a draft PR with empty implementations (or not? :)) that would be great.

9: Restructure the repo to host multiple crates r=grovesNL a=kvark As a follow-up, I'll rename this repository to just `wgpu`. Co-authored-by: Dzmitry Malyshau <kvark@mozilla.com>

dhardy · 2019-11-20T13:27:06Z

For these reasons, it's very easy for WebGPU implementation to manage buffer allocations internally and make them virtually instant from the user perspective, to the point where it's completely fine to create them per-frame. That's why the current rough and effective way to update data is to go through create_buffer_mapped.

Wow, I have to read through several issue threads to get this information. A little bit of extra API documentation would go a long way to making WGPU more accessible!

kvark · 2020-10-04T00:02:26Z

This has now resolved as upstream API separates mapping acquisition from waiting. I.e. there is a map_async call on a buffer, and once it resolves, you can synchronously get mapping ranges and work with them.

kvark added the enhancement New feature or request label May 20, 2019

nical mentioned this issue May 25, 2019

Sketch of the proposed buffer mapping change #13

Closed

kvark mentioned this issue Jun 3, 2019

my project is ~6x slower + validation errors gfx-rs/wgpu#207

Closed

rukai pushed a commit to rukai/wgpu-rs that referenced this issue Jun 16, 2019

Merge gfx-rs#9

99edaa2

9: Restructure the repo to host multiple crates r=grovesNL a=kvark As a follow-up, I'll rename this repository to just `wgpu`. Co-authored-by: Dzmitry Malyshau <kvark@mozilla.com>

kvark closed this as completed Oct 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping buffers synchronously #9

Mapping buffers synchronously #9

nical commented May 19, 2019

nical commented May 19, 2019 •

edited

kvark commented May 20, 2019

nical commented May 20, 2019

kvark commented May 20, 2019

dhardy commented Nov 20, 2019

kvark commented Oct 4, 2020

Mapping buffers synchronously #9

Mapping buffers synchronously #9

Comments

nical commented May 19, 2019

Proposed API change

nical commented May 19, 2019 • edited

kvark commented May 20, 2019

nical commented May 20, 2019

kvark commented May 20, 2019

dhardy commented Nov 20, 2019

kvark commented Oct 4, 2020

nical commented May 19, 2019 •

edited