Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapAsync with subranges. #605

Closed
Kangz opened this issue Mar 12, 2020 · 20 comments
Closed

MapAsync with subranges. #605

Kangz opened this issue Mar 12, 2020 · 20 comments
Labels
Projects
Milestone

Comments

@Kangz
Copy link
Contributor

Kangz commented Mar 12, 2020

YADUP (Yet Another Data Upload Proposal)

Last meeting there seemed to be appetite for asynchronous mapping that would allow requesting subranges but the group wanted to see a more fleshed out proposal.

This version of data upload is very similar to what we have in the spec today with mapWriteAsync and mapReadAsync but the resolution of the mapping promise doesn't give an ArrayBuffer, instead it stores the ArrayBuffer in an internal slot of the GPUBuffer and there's a GPUBuffer.getMappedRange method that allows getting subranges of the internal ArrayBuffer.

This is close the @kainino0x's old GPUMappedMemory idea.

Proposal

partial interface GPUBufferUsage {
    const GPUBufferUsageFlags MAP_READ  = 0x0001;
    const GPUBufferUsageFlags MAP_WRITE = 0x0002;
};

partial interface GPUBuffer {
  Promise<void> mapAsync();
  ArrayBuffer getMappedRange(unsigned long offset = 0, unsigned long size = 0);
  void unmap();
}

partial dictionary GPUBufferDescriptor {
  boolean mappedAtCreation = false;
};

Calling GPUBuffer.mapAsync is an error if the buffer is not valid or if it is not in the "unmapped" state (which means it is not destroyed either). Upon error mapAsync returns a promise that will reject. Upon success mapAsync puts the buffer in the "mapping" state and returns a promise that when it resolves, will put the buffer in the "mapped" state.

Calling GPUBuffer.getMappedRange, if the buffer is not in the "mapped" state, return null. If called in the "mapped" state it returns a new ArrayBuffer that's a view into the content of the buffer at range [offset, offset + size[ (obviously there's a JS exception on a bad range check). size and offset default to 0, and a size of 0 means the remaining size of the buffer after offset: buffer.getMappedRange returns the whole range.

Calling GPUBuffer.unmap is an error if the buffer is not valid or if it is in the unmapped state. On success:

  • if the buffer is in the "mapping" state, then the promise is rejected and the buffer put in the "unmapped" state
  • if the buffer is in the "mapped" state, all ArrayBuffers returned by GPUBuffer.getMappedRange() are detached and the buffer if put in the "unmapped" state

Note that modifications to the content of ArrayBuffer returned by getMappedRange are semantically modifications of the content of the buffer itself.

Calling GPUDevice.createBuffer with descriptor.mappedAtCreation can be done even if descriptor.usage doesn't contain the MAP_READ or MAP_WRITE flags. If mappedAtCreation is true, the buffer is created in the "mapped" and its content modified before unmap() and other uses like in a queue.submit().

As usual, other uses of GPUBuffer like in a GPUQueue.submit() would validate that the buffer is in the "unmapped" state. And similar to other proposals there would be restrictions on the usages that can be used in combination with MAP_READ and MAP_WRITE. Contrary to other proposals MAP_READ and MAP_WRITE could be set at the same time, and I suggest the following rules:

  • If MAP_WRITE is present COPY_SRC is allowed.
  • If MAP_READ is present, COPY_DST is allowed.
  • If MAP_READ and MAP_WRITE are present, then both COPY_SRC and COPY_DST are allowed.
  • (example for a UMA feature) if the adapter is UMA, then if MAP_WRITE is present, then VERTEX and UNIFORM are also allowed.

This mapping mechanism would live side-by-side with a writeToBuffer path.

There's also threading constraints that all calls to getMappedRange and unmap() must be in the same worker so ArrayBuffers can be detached.

Alternatives choices

A single mapAsync is present instead of mapWriteAsync and mapReadAsync. The proposal talks about the ArrayBuffer being the content of the GPUBuffer directly, so it was a bit weird to have
two map functions. The downside if that if the implementation can't wrap shmem in a GPU resource:

  • either a copy will have to take place on unmap() even for MAP_READ buffers to update the content with writes the application did in the ArrayBuffer
  • or range-tracking needs to happen for MAP_READ buffers so the implementation knows what to overwrite

It could be possible to not return a promise from mapAsync and instead make the GPUBuffer itself act like a promise with a .then method and maybe a synchronous "state" member.

The assumption is that multi-process browsers will allocate one large shmem corresponding to the whole size of mapped buffers, so multiple ArrayBuffers could look at the same memory and overlap. If we don't want to force one large continuous allocation, getMappedRange could enforce that the ranges are all disjoint between calls to unmap.

@kvark kvark added the proposal label Mar 12, 2020
@kvark
Copy link
Contributor

kvark commented Mar 12, 2020

Thank you for YADUP!

I like how it both allows sub-ranges to be mapped and elegantly replaces createBufferMapped.

I don't think this replaces writeTo* functions however, just to be clear.

If we don't want to force one large continuous allocation, getMappedRange could enforce that the ranges are all disjoint between calls to unmap.

If we don't have a continuous persistent allocation, we'd have to either be copying data from the GPU buffer, or zeroing it, and that changes the performance characteristics of this API. Is there really a choice here? I thought that the proposal addresses the problem of mem-zeroing in #594 , but that "if we don't want" puts that fix in question.

Also, since we know about buffer state on the client side, is there a reason not to expose it to the user (e.g. as a read-only property of the GPUBuffer)?

@JusSn
Copy link
Contributor

JusSn commented Mar 13, 2020

Upon success mapAsync puts the buffer in the "mapping" state and returns a promise that when it resolves, will put the buffer in the "mapped" state.

Should your sample IDL function return Promise<void> or Promise<GPUBuffer> instead of async?

@kvark:

This mapping mechanism would live side-by-side with a writeToBuffer path.

@Kangz
Copy link
Contributor Author

Kangz commented Mar 13, 2020

@kvark

If we don't want to force one large continuous allocation, getMappedRange could enforce that the ranges are all disjoint between calls to unmap.

If we don't have a continuous persistent allocation, we'd have to either be copying data from the GPU buffer, or zeroing it, and that changes the performance characteristics of this API. Is there really a choice here? I thought that the proposal addresses the problem of mem-zeroing in #594 , but that "if we don't want" puts that fix in question.

That's a good point and a good reason to stick with allowing overlapping subranges.

Also, since we know about buffer state on the client side, is there a reason not to expose it to the user (e.g. as a read-only property of the GPUBuffer)

It's part of another of the choices that make the buffer itself the promise object, I wouldn't mind adding it so it's a bit easier for folks to debug and learn about buffer mapping:

It could be possible to not return a promise from mapAsync and instead make the GPUBuffer itself act like a promise with a .then method and maybe a synchronous "state" member.

@JusSn

Should your sample IDL function return Promise or Promise instead of async?

I thought the async keyword meant the result was a promise, but changed it to match the spec better.

@kainino0x
Copy link
Contributor

WebIDL doesn't have async method syntax, only async iterator.

@kvark
Copy link
Contributor

kvark commented Mar 16, 2020

Echoing my feedback from the call: this change at first seems like an improvement over the current mapReadAsync and mapWriteAsync. However it comes with caveats:

  • getMappedRange doesn't make sense for read-backs. The original map call needs a range in it, and that's the ArrayBuffer we can return in a promise (close to the current mapReadAsync)
  • it assumes there is a persistent staging area associated with a buffer, so that we can give the user the initial contents exactly in the way they left it last time the data was touched
  • overall, given the increased complexity of this API (comparing to mapWriteAsync), I feel that the fine-grained ability to upload small chunks of data is not useful, given the presence of writeTo*.

Things to improve in this proposal:

  1. separate read call from write call, add ranges
  2. clarify the expectation of the implementation to keep the shmem around

@kdashg
Copy link
Contributor

kdashg commented Mar 17, 2020

I would prefer keeping read and write call as one. I think that makes for a better API, and that we should instead work to satisfy the usecase of a partial map.

@kvark
Copy link
Contributor

kvark commented Mar 17, 2020

@jdashg but this proposal does not offer the use case of a partial map. I'm suggesting the ways to improve this proposal for what it's trying to do. What you are talking about is a different proposal that has client-side buffer tracking, and it will need to be considered independently, even if built upon this one.

@kdashg
Copy link
Contributor

kdashg commented Mar 18, 2020

We're both suggesting modifications to this proposal to make it better in our own opinions.

@Kangz Kangz closed this as completed Mar 19, 2020
@Kangz Kangz reopened this Mar 19, 2020
@Kangz
Copy link
Contributor Author

Kangz commented Mar 19, 2020

Feedback I heard is that we should split the read and write calls so the read path can be simplified and can take an additional range argument. After spending more time on it and discussions with @kvark offline (which considered alternatives like GPUQueue.readBufferBack), I think the original proposal shouldn't be modified, apart from maybe splitting the mapAsync call into mapReadAsync and mapWriteAsync and disallowing MAP_READ | MAP_WRITE.

The rational for keeping range-less mapReadAsync is that in general only a small amount of data is readback, compared to uploads, so users can just create a buffer when they need to readback. The most user-friendly version would be a GPUQueue.readBufferBack(GPUBuffer, offset, size) -> Promise<ArrayBufferWithExplicitDetach> and that can be polyfilled trivially the following way:

GPUQueue.prototype.readBufferBack = function(buffer, offset, size) {
    const readbackBuffer = device.createBuffer({size, usage: MAP_READ | COPY_DST});
    const encoder = device.createCommandEncoder();
    encoder.copyBufferToBuffer(buffer, offset, readbackBuffer, 0, size);
    this.submit([encoder.finish()]);

    await readbackBuffer.mapReadAsync();
    const content = readbackBuffer.getMappedRange();
    content.detach = function() {
        readbackBuffer.destroy(); // has implicit unmap
    };
    return content;
};

So the IDL would be the following:

partial interface GPUBufferUsage {
    const GPUBufferUsageFlags MAP_READ  = 0x0001;
    const GPUBufferUsageFlags MAP_WRITE = 0x0002;
};

partial interface GPUBuffer {
  Promise<void> mapReadAsync();
  Promise<void> mapWriteAsync();
  ArrayBuffer getMappedRange(unsigned long offset = 0, unsigned long size = 0);
  void unmap();
}

partial dictionary GPUBufferDescriptor {
  boolean mappedAtCreation = false;
};

and the behavior the same as the original proposal (splitting mapAsync into mapReadAsync and mapWriteAsync left as an exercise to the reader), with the clarification that it is ok to get overlapping ranges, and that getMappedRange returns the content of the buffer (this requires a persistent shmem allocation), and that MAP_WRITE | MAP_READ is disallowed.

Like the original proposal, this one assumes writeToBuffer is in WebGPU.

@Kangz
Copy link
Contributor Author

Kangz commented Mar 25, 2020

@litherum @kvark @jdashg I'd like to make progress on this issue outside of the meetings, it seems that the only discussion left are:

  • Whether to merge the two map calls in a single mapAsync call (I'm slightly in favor of not doing this because I think the intent is clearer with two separate methods).
  • Whether to add a range to mapReadAsync (I'm in favor of not doing it because I don't think it's useful, see discussion in the previous comment). If you suggest doing this, please also describe what happens if multiple calls to mapReadAsync are done before unmap, and what happens if they are overlapping ranges, and the constraints on getMappedRange.

@kvark
Copy link
Contributor

kvark commented Mar 25, 2020

The (2) question needs to be answered first because if we do specify the range, then the calls are clearly different and (1) is no longer a question.

I do think a range is useful for mapReadAsync. It's trivial from our (implementor) side and allows the users to potentially avoid messing with temporary buffers and doing copies. This could be rare to see, but otherwise it also would be an arbitrary restriction we put in for no reason (i.e. restriction of not having a range for data downloads).

We don't need to change any rules to support this range, i.e. calling mapReadAsync again is a validation error because the buffer is already in the mapped state. Any relaxation of this rule could come later after MVP if/when we consider client-side buffer range tracking, we don't need to block on that.

@Kangz
Copy link
Contributor Author

Kangz commented Mar 27, 2020

That could work, and getMappedPointer would require that the range passed is included in the range from mapReadAsync (instead of the range being mapped to 0..size).

await buffer.mapReadAsync(8, 8) // Map [8, 16)
buffer.getMappedRange(8, 8) // valid
buffer.getMappedRange(0, 8) // invalid

I'm very slightly in favor of mapReadAsync without arguments still but don't want to block the proposal on that.

@Kangz
Copy link
Contributor Author

Kangz commented Apr 3, 2020

In last week's @jdashg had a concern that this proposal could be difficult for native game engines to adopt if it differs too much from the way they use buffer mapping. In particular being able to map some ranges of a buffer while using the rest as staging.

I looked at the most advanced open-source engine I know of, Godot, and if you look at the description of their buffer management, it seems to be doing exactly what this proposal allows for: use a buffer per frame, and if it is not enough create an additional buffer for this frame.

I can't point at source code or describe it in details, but this shouldn't be too concerning for Unreal Engine either. I couldn't check for Unity.

Hopefully this resolves the concern there was about native game engine being able to use the form of buffer mapping proposed in this issue.

@litherum
Copy link
Contributor

litherum commented Apr 7, 2020

Whether to add a range to mapReadAsync

The range would be helpful because it provides a way for applications to not have to transfer the entire contents of the buffer from the GPU Process to the Web Process. If it's difficult for applications to use, they can specify the entirety of the buffer. But, without this, it's impossible for applications to specify a smaller range if they do happen to know which range they want.

@Kangz
Copy link
Contributor Author

Kangz commented Apr 7, 2020

Ok, let's have a range argument with the behavior that @kvark described:

We don't need to change any rules to support this range, i.e. calling mapReadAsync again is a validation error because the buffer is already in the mapped state. Any relaxation of this rule could come later after MVP if/when we consider client-side buffer range tracking, we don't need to block on that.

So the IDL would be the following:

partial interface GPUBufferUsage {
    const GPUBufferUsageFlags MAP_READ  = 0x0001;
    const GPUBufferUsageFlags MAP_WRITE = 0x0002;
};

partial interface GPUBuffer {
  Promise<void> mapReadAsync(unsigned long offset = 0, unsigned long size = 0);
  Promise<void> mapWriteAsync();
  ArrayBuffer getMappedRange(unsigned long offset = 0, unsigned long size = 0);
  void unmap();
}

partial dictionary GPUBufferDescriptor {
  boolean mappedAtCreation = false;
};

Kangz added a commit to Kangz/gpuweb that referenced this issue Apr 15, 2020
This also does a number of cleanups to match the style of other WebGPU
functions with the valid usage section.
Kangz added a commit to Kangz/gpuweb that referenced this issue May 5, 2020
This also does a number of cleanups to match the style of other WebGPU
functions with the valid usage section.
kainino0x pushed a commit that referenced this issue May 7, 2020
This also does a number of cleanups to match the style of other WebGPU
functions with the valid usage section.
@kainino0x kainino0x added this to Needs Investigation/Proposal or Revision in Main May 7, 2020
@kainino0x
Copy link
Contributor

Merged but will need a small tweak (split read/write or add a flag for read/write). Once that PR is up, this goes back to Discussion, then to Testing.

JusSn pushed a commit to JusSn/gpuweb that referenced this issue Jun 8, 2020
…b#708)

This also does a number of cleanups to match the style of other WebGPU
functions with the valid usage section.
JusSn pushed a commit to JusSn/gpuweb that referenced this issue Jun 8, 2020
…b#708)

This also does a number of cleanups to match the style of other WebGPU
functions with the valid usage section.
@Kangz
Copy link
Contributor Author

Kangz commented Sep 2, 2021

Buffer mapping is in the spec now. Closing.

@Kangz Kangz closed this as completed Sep 2, 2021
@kainino0x kainino0x moved this from Needs Investigation/Proposal or Revision to Specification Done in Main Jan 19, 2022
@litherum litherum added this to the V1.0 milestone Mar 7, 2022
@litherum
Copy link
Contributor

litherum commented Mar 7, 2022

Reopening because the spec still references this issue. Search the spec for "allowed buffer usages".

@litherum litherum reopened this Mar 7, 2022
@kainino0x
Copy link
Contributor

These 4 inline issues are just editorial things for stuff not fully specified in the spec yet, that link back here as reference for the design. I don't think that means this issue has to be open? We have plenty of inline editorial issues in the spec that don't link to an open issue at all.

@Kangz
Copy link
Contributor Author

Kangz commented Apr 26, 2022

I think we should close again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Main
Specification Done
Development

No branches or pull requests

6 participants