MapAsync with subranges. #605

Kangz · 2020-03-12T16:21:43Z

YADUP (Yet Another Data Upload Proposal)

Last meeting there seemed to be appetite for asynchronous mapping that would allow requesting subranges but the group wanted to see a more fleshed out proposal.

This version of data upload is very similar to what we have in the spec today with mapWriteAsync and mapReadAsync but the resolution of the mapping promise doesn't give an ArrayBuffer, instead it stores the ArrayBuffer in an internal slot of the GPUBuffer and there's a GPUBuffer.getMappedRange method that allows getting subranges of the internal ArrayBuffer.

This is close the @kainino0x's old GPUMappedMemory idea.

Proposal

partial interface GPUBufferUsage {
    const GPUBufferUsageFlags MAP_READ  = 0x0001;
    const GPUBufferUsageFlags MAP_WRITE = 0x0002;
};

partial interface GPUBuffer {
  Promise<void> mapAsync();
  ArrayBuffer getMappedRange(unsigned long offset = 0, unsigned long size = 0);
  void unmap();
}

partial dictionary GPUBufferDescriptor {
  boolean mappedAtCreation = false;
};

Calling GPUBuffer.mapAsync is an error if the buffer is not valid or if it is not in the "unmapped" state (which means it is not destroyed either). Upon error mapAsync returns a promise that will reject. Upon success mapAsync puts the buffer in the "mapping" state and returns a promise that when it resolves, will put the buffer in the "mapped" state.

Calling GPUBuffer.getMappedRange, if the buffer is not in the "mapped" state, return null. If called in the "mapped" state it returns a new ArrayBuffer that's a view into the content of the buffer at range [offset, offset + size[ (obviously there's a JS exception on a bad range check). size and offset default to 0, and a size of 0 means the remaining size of the buffer after offset: buffer.getMappedRange returns the whole range.

Calling GPUBuffer.unmap is an error if the buffer is not valid or if it is in the unmapped state. On success:

if the buffer is in the "mapping" state, then the promise is rejected and the buffer put in the "unmapped" state
if the buffer is in the "mapped" state, all ArrayBuffers returned by GPUBuffer.getMappedRange() are detached and the buffer if put in the "unmapped" state

Note that modifications to the content of ArrayBuffer returned by getMappedRange are semantically modifications of the content of the buffer itself.

Calling GPUDevice.createBuffer with descriptor.mappedAtCreation can be done even if descriptor.usage doesn't contain the MAP_READ or MAP_WRITE flags. If mappedAtCreation is true, the buffer is created in the "mapped" and its content modified before unmap() and other uses like in a queue.submit().

As usual, other uses of GPUBuffer like in a GPUQueue.submit() would validate that the buffer is in the "unmapped" state. And similar to other proposals there would be restrictions on the usages that can be used in combination with MAP_READ and MAP_WRITE. Contrary to other proposals MAP_READ and MAP_WRITE could be set at the same time, and I suggest the following rules:

If MAP_WRITE is present COPY_SRC is allowed.
If MAP_READ is present, COPY_DST is allowed.
If MAP_READ and MAP_WRITE are present, then both COPY_SRC and COPY_DST are allowed.
(example for a UMA feature) if the adapter is UMA, then if MAP_WRITE is present, then VERTEX and UNIFORM are also allowed.

This mapping mechanism would live side-by-side with a writeToBuffer path.

There's also threading constraints that all calls to getMappedRange and unmap() must be in the same worker so ArrayBuffers can be detached.

Alternatives choices

A single mapAsync is present instead of mapWriteAsync and mapReadAsync. The proposal talks about the ArrayBuffer being the content of the GPUBuffer directly, so it was a bit weird to have
two map functions. The downside if that if the implementation can't wrap shmem in a GPU resource:

either a copy will have to take place on unmap() even for MAP_READ buffers to update the content with writes the application did in the ArrayBuffer
or range-tracking needs to happen for MAP_READ buffers so the implementation knows what to overwrite

It could be possible to not return a promise from mapAsync and instead make the GPUBuffer itself act like a promise with a .then method and maybe a synchronous "state" member.

The assumption is that multi-process browsers will allocate one large shmem corresponding to the whole size of mapped buffers, so multiple ArrayBuffers could look at the same memory and overlap. If we don't want to force one large continuous allocation, getMappedRange could enforce that the ranges are all disjoint between calls to unmap.

The text was updated successfully, but these errors were encountered:

kvark · 2020-03-12T18:24:53Z

Thank you for YADUP!

I like how it both allows sub-ranges to be mapped and elegantly replaces createBufferMapped.

I don't think this replaces writeTo* functions however, just to be clear.

If we don't want to force one large continuous allocation, getMappedRange could enforce that the ranges are all disjoint between calls to unmap.

If we don't have a continuous persistent allocation, we'd have to either be copying data from the GPU buffer, or zeroing it, and that changes the performance characteristics of this API. Is there really a choice here? I thought that the proposal addresses the problem of mem-zeroing in #594 , but that "if we don't want" puts that fix in question.

Also, since we know about buffer state on the client side, is there a reason not to expose it to the user (e.g. as a read-only property of the GPUBuffer)?

JusSn · 2020-03-13T00:24:50Z

Upon success mapAsync puts the buffer in the "mapping" state and returns a promise that when it resolves, will put the buffer in the "mapped" state.

Should your sample IDL function return Promise<void> or Promise<GPUBuffer> instead of async?

@kvark:

This mapping mechanism would live side-by-side with a writeToBuffer path.

Kangz · 2020-03-13T12:22:37Z

@kvark

If we don't want to force one large continuous allocation, getMappedRange could enforce that the ranges are all disjoint between calls to unmap.

If we don't have a continuous persistent allocation, we'd have to either be copying data from the GPU buffer, or zeroing it, and that changes the performance characteristics of this API. Is there really a choice here? I thought that the proposal addresses the problem of mem-zeroing in #594 , but that "if we don't want" puts that fix in question.

That's a good point and a good reason to stick with allowing overlapping subranges.

Also, since we know about buffer state on the client side, is there a reason not to expose it to the user (e.g. as a read-only property of the GPUBuffer)

It's part of another of the choices that make the buffer itself the promise object, I wouldn't mind adding it so it's a bit easier for folks to debug and learn about buffer mapping:

It could be possible to not return a promise from mapAsync and instead make the GPUBuffer itself act like a promise with a .then method and maybe a synchronous "state" member.

@JusSn

Should your sample IDL function return Promise or Promise instead of async?

I thought the async keyword meant the result was a promise, but changed it to match the spec better.

kainino0x · 2020-03-14T01:40:31Z

WebIDL doesn't have async method syntax, only async iterator.

kvark · 2020-03-16T20:07:41Z

Echoing my feedback from the call: this change at first seems like an improvement over the current mapReadAsync and mapWriteAsync. However it comes with caveats:

getMappedRange doesn't make sense for read-backs. The original map call needs a range in it, and that's the ArrayBuffer we can return in a promise (close to the current mapReadAsync)
it assumes there is a persistent staging area associated with a buffer, so that we can give the user the initial contents exactly in the way they left it last time the data was touched
overall, given the increased complexity of this API (comparing to mapWriteAsync), I feel that the fine-grained ability to upload small chunks of data is not useful, given the presence of writeTo*.

Things to improve in this proposal:

separate read call from write call, add ranges
clarify the expectation of the implementation to keep the shmem around

kdashg · 2020-03-17T00:21:45Z

I would prefer keeping read and write call as one. I think that makes for a better API, and that we should instead work to satisfy the usecase of a partial map.

kvark · 2020-03-17T19:22:21Z

@jdashg but this proposal does not offer the use case of a partial map. I'm suggesting the ways to improve this proposal for what it's trying to do. What you are talking about is a different proposal that has client-side buffer tracking, and it will need to be considered independently, even if built upon this one.

kdashg · 2020-03-18T00:38:40Z

We're both suggesting modifications to this proposal to make it better in our own opinions.

Kangz · 2020-03-19T17:57:42Z

Feedback I heard is that we should split the read and write calls so the read path can be simplified and can take an additional range argument. After spending more time on it and discussions with @kvark offline (which considered alternatives like GPUQueue.readBufferBack), I think the original proposal shouldn't be modified, apart from maybe splitting the mapAsync call into mapReadAsync and mapWriteAsync and disallowing MAP_READ | MAP_WRITE.

The rational for keeping range-less mapReadAsync is that in general only a small amount of data is readback, compared to uploads, so users can just create a buffer when they need to readback. The most user-friendly version would be a GPUQueue.readBufferBack(GPUBuffer, offset, size) -> Promise<ArrayBufferWithExplicitDetach> and that can be polyfilled trivially the following way:

GPUQueue.prototype.readBufferBack = function(buffer, offset, size) {
    const readbackBuffer = device.createBuffer({size, usage: MAP_READ | COPY_DST});
    const encoder = device.createCommandEncoder();
    encoder.copyBufferToBuffer(buffer, offset, readbackBuffer, 0, size);
    this.submit([encoder.finish()]);

    await readbackBuffer.mapReadAsync();
    const content = readbackBuffer.getMappedRange();
    content.detach = function() {
        readbackBuffer.destroy(); // has implicit unmap
    };
    return content;
};

So the IDL would be the following:

partial interface GPUBufferUsage {
    const GPUBufferUsageFlags MAP_READ  = 0x0001;
    const GPUBufferUsageFlags MAP_WRITE = 0x0002;
};

partial interface GPUBuffer {
  Promise<void> mapReadAsync();
  Promise<void> mapWriteAsync();
  ArrayBuffer getMappedRange(unsigned long offset = 0, unsigned long size = 0);
  void unmap();
}

partial dictionary GPUBufferDescriptor {
  boolean mappedAtCreation = false;
};

and the behavior the same as the original proposal (splitting mapAsync into mapReadAsync and mapWriteAsync left as an exercise to the reader), with the clarification that it is ok to get overlapping ranges, and that getMappedRange returns the content of the buffer (this requires a persistent shmem allocation), and that MAP_WRITE | MAP_READ is disallowed.

Like the original proposal, this one assumes writeToBuffer is in WebGPU.

Kangz · 2020-03-25T13:52:37Z

@litherum @kvark @jdashg I'd like to make progress on this issue outside of the meetings, it seems that the only discussion left are:

Whether to merge the two map calls in a single mapAsync call (I'm slightly in favor of not doing this because I think the intent is clearer with two separate methods).
Whether to add a range to mapReadAsync (I'm in favor of not doing it because I don't think it's useful, see discussion in the previous comment). If you suggest doing this, please also describe what happens if multiple calls to mapReadAsync are done before unmap, and what happens if they are overlapping ranges, and the constraints on getMappedRange.

kvark · 2020-03-25T15:41:02Z

The (2) question needs to be answered first because if we do specify the range, then the calls are clearly different and (1) is no longer a question.

I do think a range is useful for mapReadAsync. It's trivial from our (implementor) side and allows the users to potentially avoid messing with temporary buffers and doing copies. This could be rare to see, but otherwise it also would be an arbitrary restriction we put in for no reason (i.e. restriction of not having a range for data downloads).

We don't need to change any rules to support this range, i.e. calling mapReadAsync again is a validation error because the buffer is already in the mapped state. Any relaxation of this rule could come later after MVP if/when we consider client-side buffer range tracking, we don't need to block on that.

Kangz · 2020-03-27T15:22:06Z

That could work, and getMappedPointer would require that the range passed is included in the range from mapReadAsync (instead of the range being mapped to 0..size).

await buffer.mapReadAsync(8, 8) // Map [8, 16)
buffer.getMappedRange(8, 8) // valid
buffer.getMappedRange(0, 8) // invalid

I'm very slightly in favor of mapReadAsync without arguments still but don't want to block the proposal on that.

Kangz · 2020-04-03T17:22:35Z

In last week's @jdashg had a concern that this proposal could be difficult for native game engines to adopt if it differs too much from the way they use buffer mapping. In particular being able to map some ranges of a buffer while using the rest as staging.

I looked at the most advanced open-source engine I know of, Godot, and if you look at the description of their buffer management, it seems to be doing exactly what this proposal allows for: use a buffer per frame, and if it is not enough create an additional buffer for this frame.

I can't point at source code or describe it in details, but this shouldn't be too concerning for Unreal Engine either. I couldn't check for Unity.

Hopefully this resolves the concern there was about native game engine being able to use the form of buffer mapping proposed in this issue.

litherum · 2020-04-07T03:38:05Z

Whether to add a range to mapReadAsync

The range would be helpful because it provides a way for applications to not have to transfer the entire contents of the buffer from the GPU Process to the Web Process. If it's difficult for applications to use, they can specify the entirety of the buffer. But, without this, it's impossible for applications to specify a smaller range if they do happen to know which range they want.

Kangz · 2020-04-07T17:15:47Z

Ok, let's have a range argument with the behavior that @kvark described:

We don't need to change any rules to support this range, i.e. calling mapReadAsync again is a validation error because the buffer is already in the mapped state. Any relaxation of this rule could come later after MVP if/when we consider client-side buffer range tracking, we don't need to block on that.

So the IDL would be the following:

partial interface GPUBufferUsage {
    const GPUBufferUsageFlags MAP_READ  = 0x0001;
    const GPUBufferUsageFlags MAP_WRITE = 0x0002;
};

partial interface GPUBuffer {
  Promise<void> mapReadAsync(unsigned long offset = 0, unsigned long size = 0);
  Promise<void> mapWriteAsync();
  ArrayBuffer getMappedRange(unsigned long offset = 0, unsigned long size = 0);
  void unmap();
}

partial dictionary GPUBufferDescriptor {
  boolean mappedAtCreation = false;
};

This also does a number of cleanups to match the style of other WebGPU functions with the valid usage section.

kainino0x · 2020-05-07T17:08:18Z

Merged but will need a small tweak (split read/write or add a flag for read/write). Once that PR is up, this goes back to Discussion, then to Testing.

…b#708) This also does a number of cleanups to match the style of other WebGPU functions with the valid usage section.

Kangz · 2021-09-02T13:25:26Z

Buffer mapping is in the spec now. Closing.

litherum · 2022-03-07T02:09:28Z

Reopening because the spec still references this issue. Search the spec for "allowed buffer usages".

kainino0x · 2022-03-09T01:27:33Z

These 4 inline issues are just editorial things for stuff not fully specified in the spec yet, that link back here as reference for the design. I don't think that means this issue has to be open? We have plenty of inline editorial issues in the spec that don't link to an open issue at all.

Kangz · 2022-04-26T13:33:35Z

I think we should close again.

…eb#605)

kvark added the proposal label Mar 12, 2020

Kangz closed this as completed Mar 19, 2020

Kangz reopened this Mar 19, 2020

kainino0x mentioned this issue Apr 13, 2020

Recycling ArrayBuffer in createBufferMapped #697

Open

Kangz added a commit to Kangz/gpuweb that referenced this issue Apr 15, 2020

Write specification for GPUBuffer mapping following gpuweb#605

6398b79

This also does a number of cleanups to match the style of other WebGPU functions with the valid usage section.

This was referenced Apr 20, 2020

Synchronous GPUBuffer.map(). #506

Closed

Guaranteed-efficient mapping #511

Closed

GPUBuffer.mapRange() with subranges. #649

Closed

Kangz added a commit to Kangz/gpuweb that referenced this issue May 5, 2020

Write specification for GPUBuffer mapping following gpuweb#605

1f6c7e0

This also does a number of cleanups to match the style of other WebGPU functions with the valid usage section.

kainino0x pushed a commit that referenced this issue May 7, 2020

Write specification for GPUBuffer mapping following #605 (#708)

bd36513

This also does a number of cleanups to match the style of other WebGPU functions with the valid usage section.

kainino0x added this to Needs Investigation/Proposal or Revision in Main May 7, 2020

Kangz mentioned this issue May 20, 2020

Can't map half a buffer #555

Closed

JusSn pushed a commit to JusSn/gpuweb that referenced this issue Jun 8, 2020

Write specification for GPUBuffer mapping following gpuweb#605 (gpuwe…

23533d1

…b#708) This also does a number of cleanups to match the style of other WebGPU functions with the valid usage section.

JusSn pushed a commit to JusSn/gpuweb that referenced this issue Jun 8, 2020

Write specification for GPUBuffer mapping following gpuweb#605 (gpuwe…

9e36a03

…b#708) This also does a number of cleanups to match the style of other WebGPU functions with the valid usage section.

almarklein mentioned this issue Jul 2, 2020

Making buffer mapping part of the public API? pygfx/wgpu-py#114

Closed

Kangz closed this as completed Sep 2, 2021

Kangz mentioned this issue Dec 7, 2021

Cannot upload/download to UMA storage buffer without an unnecessary copy and unnecessary memory use #2388

Open

kainino0x moved this from Needs Investigation/Proposal or Revision to Specification Done in Main Jan 19, 2022

litherum added this to the V1.0 milestone Mar 7, 2022

litherum reopened this Mar 7, 2022

Kangz closed this as completed Apr 26, 2022

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022

Fix byte vs element units bug; allow buffer checking by mapping (gpuw…

25731d2

…eb#605)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MapAsync with subranges. #605

MapAsync with subranges. #605

Kangz commented Mar 12, 2020 •

edited

Loading

kvark commented Mar 12, 2020

JusSn commented Mar 13, 2020

Kangz commented Mar 13, 2020

kainino0x commented Mar 14, 2020

kvark commented Mar 16, 2020

kdashg commented Mar 17, 2020

kvark commented Mar 17, 2020

kdashg commented Mar 18, 2020

Kangz commented Mar 19, 2020 •

edited

Loading

Kangz commented Mar 25, 2020

kvark commented Mar 25, 2020

Kangz commented Mar 27, 2020

Kangz commented Apr 3, 2020

litherum commented Apr 7, 2020

Kangz commented Apr 7, 2020 •

edited

Loading

kainino0x commented May 7, 2020

Kangz commented Sep 2, 2021

litherum commented Mar 7, 2022

kainino0x commented Mar 9, 2022

Kangz commented Apr 26, 2022

MapAsync with subranges. #605

MapAsync with subranges. #605

Comments

Kangz commented Mar 12, 2020 • edited Loading

Proposal

Alternatives choices

kvark commented Mar 12, 2020

JusSn commented Mar 13, 2020

Kangz commented Mar 13, 2020

kainino0x commented Mar 14, 2020

kvark commented Mar 16, 2020

kdashg commented Mar 17, 2020

kvark commented Mar 17, 2020

kdashg commented Mar 18, 2020

Kangz commented Mar 19, 2020 • edited Loading

Kangz commented Mar 25, 2020

kvark commented Mar 25, 2020

Kangz commented Mar 27, 2020

Kangz commented Apr 3, 2020

litherum commented Apr 7, 2020

Kangz commented Apr 7, 2020 • edited Loading

kainino0x commented May 7, 2020

Kangz commented Sep 2, 2021

litherum commented Mar 7, 2022

kainino0x commented Mar 9, 2022

Kangz commented Apr 26, 2022

Kangz commented Mar 12, 2020 •

edited

Loading

Kangz commented Mar 19, 2020 •

edited

Loading

Kangz commented Apr 7, 2020 •

edited

Loading