Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUBuffer.mapAsync is overly pessimistic #2646

Closed
litherum opened this issue Mar 7, 2022 · 15 comments · Fixed by #3224 or #3338
Closed

GPUBuffer.mapAsync is overly pessimistic #2646

litherum opened this issue Mar 7, 2022 · 15 comments · Fixed by #3224 or #3338
Labels
copyediting Pure editorial stuff (copyediting, bikeshed, etc.)
Projects
Milestone

Comments

@litherum
Copy link
Contributor

litherum commented Mar 7, 2022

The spec says:

Let p be a new Promise.
...
Enqueue an operation on the default queue’s Queue timeline that will execute the following:
...
Resolve p.

This is a bit unfortunate; it means that mapping is blocked on queue commands which are entirely unrelated to the buffer being mapped. Consider a program like this:

let bufferA = device.createBuffer(...);
let bufferB = device.createBuffer(...);

let commandEncoder1 = device.createCommandEncoder();
commandEncoder1.copyBufferToBuffer(bufferA, 0, someOtherBuffer, 0, 4); // Super small operation. Doesn't touch bufferB
let commandBuffer1 = commandEncoder1.finish();

let commandEncoder2 = device.createCommandEncoder();
populateCommandEncoderWithSuperIntensiveOperationsWith(bufferB); // Doesn't touch bufferA
let commandBuffer2 = commandEncoder1.finish();

device.queue.submit([commandBuffer1, commandBuffer2]);

bufferA.mapAsync(...).then(...); // Requires waiting for the super intensive operation to finish, despite bufferA being idle much earlier than that

When you use a bind group in a render/compute pass, we already have to check the state of all the buffers in that bind group to make sure they're not mapped, so it would seem pretty cheap for each buffer to have a "in-flight count" counter inside it. The count would represent the number of command buffers which have been submitted and not yet finished that reference this buffer. Each command buffer would have to remember all the resources it references, so they can have their counts decremented when the command buffer finishes. Then, the map promise could resolve when the in-flight count for that buffer reaches 0, rather than having to wait for all work on the queue to complete like the spec says today.

@litherum
Copy link
Contributor Author

litherum commented Mar 7, 2022

Imagine the above program was slightly modified to replace:

bufferA.mapAsync(...).then(...);

with

bufferB.mapAsync(...).then(...);
bufferA.mapAsync(...).then(...);

This "in-flight count" proposal would make it possible for bufferA's promise to resolve before bufferB, despite mapAsync() being called on bufferB before bufferA. I think this is a feature, though, rather than being a bug. If the author actually wants to wait for all work on the queue to finish, they can GPUQueue.onSubmittedWorkDone().

@litherum
Copy link
Contributor Author

litherum commented Mar 7, 2022

This probably has to be discussed for V1 because changing the order of promises could break production code.

@litherum litherum added this to the V1.0 milestone Mar 7, 2022
@kainino0x
Copy link
Contributor

Thanks for filing this, it's something I've thought about from time to time but never dug to see what the spec actually says. I don't know whether we should allow mapping to complete early or not, though in principle it seems great since it would let mapAsync be used as a special kind of fence, different from onSubmittedWorkDone, a "map as soon as the last thing using this buffer is done, I don't care about stuff that got enqueued afterward".
Very related to #962, and #2119 (comment).

@kainino0x
Copy link
Contributor

Though in many cases, it should be possible for apps to issue mapAsync right after they're done using the buffer, so maybe the more straightforward scheduling model is worthwhile. Definitely a tradeoff.

@Kangz
Copy link
Contributor

Kangz commented Mar 16, 2022

Yeah the potential extra latency is unfortunate. Though with graphical applications using RAF there's already some latency due to frame pipelining so an extra bit of latency might not show. And for more expensive compute operations developers would split submissions right after the expensive operation (because all they are doing is the expensive operation). So I don't know how much of a problem it is in practice.

A premise of doing fine-grained mapping promise resolution like this is that it is cheap to track for a buffer whether it is in flight or not. Submission resource tracking is one of the most expensive things in the API, and iterating on all buffers more, or additional, or additional branches would show. Maybe there's a way to make it low overhead (but more complex for the implementation, I'm not sure).

But also making the spec allow fine grained resolution would mean that the order of promise resolution could depend on the browser, making the API less portable.

We can make the spec do things either way, or even start with what we have now and relax later as a non-breaking change (except for the ordering of promise resolution). But there's a clear tradeoff to be aware of.

@Kangz
Copy link
Contributor

Kangz commented Apr 14, 2022

As discussed in yesterday's meeting, I will detail my claim that we can't at the same time make the buffer mapping less pessimistic and the order of promises defined, in an architecture like WebKit's or Chromium's.

My understanding is that Chromium's architecture and WebKit's architecture for asynchronous calls is the same:

  • On the content side a call like mapAsync always sends a message to the GPU process "map async was requested"
  • The promise is pending until a message from the GPU process side is received with "here is the result of your map async".
  • Some special cases apply, like unmapping a buffer early, or the connection to the GPU process has been lost.

My understanding of what @litherum suggested for promise ordering in the meeting is that promises still resole in "queue order" with the point in the queue considered for buffer mapping being "the end of the last use of the buffer in commands submitted prior to the mapAsync".

It is not possible to have both at the same time (without extensive content-side state tracking of mappable buffers used commands / bindgroups (with UMA ext) / etc, which we very much want to not do in Chromium). The reason is that there is a data race between the content process and the GPU process. Consider the following code:

const mappableBuffer;
const commands = createCommandsThatUse(mappableBuffer);
queue.submit([commands]);

queue.onSubmittedWorkDone().then(() => console.log("work done"));

// Assuming no operations on the queue are done here.
await something();

mappedBuffer.mapAsync(...).then(() => console.log("mapping done"))

Depending on the order of execution, both ordering of console logs can happen. If by the time the GPU process receives the mapAsync call it hasn't sent the callback for onSubmittedWorkDone then the ordering is "work done" "mapping done". However if by the time it gets the mapAsync call it already set the onSubmittedWorkDone (and for example the renderer process already consumed it), then the ordering may be "mapping done" "work done".

Like you said, it's better to make the ordering of promises well-specified so that JS code is more portable. The improved version here would need to lose that property, and the gains we expect are things applications can already do themselves (by mapping right after submit). That's why I think the current spec is what we should keep. But if you're less concerned about promise ordering and really want to change this, we could lift that restriction, and it would be a non-breaking change to add strict ordering in the future. Either way seems ok. (as long as we don't need to do extensive content-side state tracking).

@Kangz

This comment was marked as resolved.

@Kangz Kangz modified the milestones: V1.0, Polish post-V1 Apr 25, 2022
@litherum

This comment was marked as resolved.

@Kangz Kangz modified the milestones: Polish post-V1, V1.0 Apr 25, 2022
@Kangz

This comment was marked as resolved.

@kainino0x
Copy link
Contributor

I read the meeting minutes for April 20th and I'm a bit confused on the state of the issue - sounds like Myles said WebKit wouldn't have the ordering problem Corentin described, but I don't understand why.

I'm pretty wary about introducing undefined promise ordering, and I am pretty sure I agree with Corentin's example that we can't fully de-pessimize without either undefined ordering or a lot of client-side tracking.

@litherum
Copy link
Contributor Author

litherum commented May 3, 2022

I believe the conclusion from the last meeting is that ordering between onSubmittedWorkDone() and mapAsync() is less important than mapAsync() completing as soon as it’s able to.

@kainino0x
Copy link
Contributor

OK, as long as we're clear that we'll have to have undefined ordering.

I think it's fine. I'd love to have full ordering but we can find a solution later to better-define the ordering if we start seeing compatibility issues between browsers in the real world.

@Kangz
Copy link
Contributor

Kangz commented May 17, 2022

Moving to editorial since we have consensus to relax the constraint on ordering for mapAsync.

@Kangz Kangz added the copyediting Pure editorial stuff (copyediting, bikeshed, etc.) label May 17, 2022
@kainino0x kainino0x moved this from Needs Discussion to Needs Specification in Main May 19, 2022
@kainino0x
Copy link
Contributor

kainino0x commented Jul 21, 2022

A note for when we're doing the editorial work: I'd like to have a non-normative note that says UAs should consider issuing a dev tools warning if you (successfully) try to getMappedRange() without having awaited (attached a .then to) the return value of mapAsync().

@kainino0x
Copy link
Contributor

On second thought I think this needs a prose description (not just the algorithm details), including explaining the ordering with onSubmittedWorkDone. (I believe we agreed that onSubmittedWorkDone resolution guarantees the completion of maps issued before it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
copyediting Pure editorial stuff (copyediting, bikeshed, etc.)
Projects
No open projects
Main
Needs Specification
3 participants