-
Notifications
You must be signed in to change notification settings - Fork 327
Proposals for buffer operations (immediate uploads, buffer mapping) #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Proposal 1
This is bad, you've just destroyed the main advantage of Vulkan and OpenGL with ARB_buffer_storage (core in 4.3+) over the older APIs. You need to be able to use persistently mapped buffers as the developer.
Why? The buffer is the app's own buffer, whats the security concern here? Proposal 2
The problem with this one is that its unimplementable on Vulkan. The reason is because its not the buffers that get mapped, its the bound memory. Also only one subrange in a memory object can be mapped, you cannot map two subranges (even non intersection) like you can in DirectX. So if you do not know ahead of time what ranges you are going to be mapping and reading+writing, then you'd need to unmap and map a bigger range every single time your read/write APIs request a range which is outside of the one mapped by Vulkan. The problem with this is that it would require an implicit synchronisation that would have to wait for all previous uses of mapped memory to finish, then unmap, map bigger range and then continue with the reading/writing.
I kind of like this approach, but dislike the design which allows for simultaneous mapping of the buffer asynchronously. Immediate Upload 1 Making a synchronous buffer write operation is silly, you will have to wait for 1 or more (usually 3) frames to complete before the buffer becomes available (not used by GPU) for writing, and you will starve the GPU of work to do and it will idle while you're manipulating the buffer. Immediate Upload 2 I'm not a fan of API staged uploads in style of glBufferSubData, simply because they are always an order of magnitude slower than persistent mapping, involve an extra copy (so you can modify the input memory argument after the call) and present an implementation challenge (you need to stage and buffer-up the uploads) especially for large buffers. This is why Vulkan only allows directs updates of 64kb or less (ringbuffer used for upload doesn't grow in an unwieldy manner). Secondly how are you going to do this outside of a command buffer? General PRoblems
We already have an implementation of that in our engine, a fence is placed (multi queue->many fences are placed) after the last use of the buffer, then this fence is paired with a functor that deletes the object. The key to this approach is to try and coalesce the events (use one fence per many objects) and defer the checking until you are sure you have the time to do it, i.e. check the fences just before new objects are being created, but DONT check every frame on swap or other operation.
YES You can only place around 200k fences per second in OpenGL (my own test) on a laptop Intel i7, most of your approaches would require 1 fence per buffer update operation, you'd quickly run out of CPU time just on the part of the graphics driver. And this is another reason why you want to support large mapped ranges, persistent mapped buffers and possibly zero-copy approaches. Technically speaking any buffer that does not have the INDEX_BUFFER or INDIRECT_DRAW usage hint, you could allow the user to write directly to gpu memory with no intermediate copies.
This seems to be a weird exception to your "no data races rule", somehow you allow cpu2cpu core races but when its gpu2gpu or cpu2gpu you say "noo gpu's are scary". While yes mapped ranges can be represented by any JS object you like that makes GC happy, you need to only issue the native API map operation of map() once from only one thread on a non-overlapping range in both Vulkan and DX12. |
Thanks for your comments, before the detailed answer, here's some context:
Mapping proposal 1
Persistently mapped buffers are not implementable in the general case for multi-process implementations so we cannot expose them at the API level. Single-process WebGPU implementation can (and are likely to) use persistently mapped buffers internally.
In multi-process implementations, mapWriteSync could just return a pointer to shared memory between the two processes. For security reasons this memory will be cleared to 0, and for consistency, the same should happen on single-process implementations.
PCIe bandwidth is huge (X GB/s) so I'm not too concerned. If this becomes a real issue, we could be clearing with compute shaders or find other solutions. Mapping proposal 2
When a buffer is mapped on single-process implementations using Vulkan (which might actually be none of them), the whole memory can be mapped persistently but only a partial view of it given to Javascript. The browser will have the opportunity to do data races, but will provide Javascript only with views that are data-race free.
That's an interesting thought, we could have a fast-path saying that if the buffer has never been used, the mapping is instantaneous. Immediate upload proposal 1
The name is misleading, the operation would be instantaneous, just that it doesn't give you a pointer to the real buffer but staging memory instead. It's just like Immediate upload proposal 2
Agreed, though I feel such an API is important for ease-of-use when starting a project (or for beginners) and when adding random debugging code in places in your code. It's just "put this data on the GPU even if it is a bit slow".
It's also because NVIDIA hardware (and probably others) have a fast path to inline the content of a buffer update inside the command stream itself.
It is as if it were in a command buffer and submitted immediately (though in practice I expect implementations to write the commands and wait until the next application submit). There are no data races. General Problems
This isn't what I meant, our implementation already has what you described including "fence coalescing". This is about Javascript GC and what happens on the application's side when the buffer is GCd. In general GC shouldn't be visible to Javascript and we need to make sure WebGPU doesn't provide "GC discoverability".
Likewise this is about Javascript GC, and implementations are expected to be optimized using "fence coalescing" or other mechanisms. Ours is already.
Even index and indirect buffers could skip validation with backend API support for robust buffer access. We know persistently mapped buffers are something native developers want but our constraints mean we can't provide them directly. We striving to provide the same usability and performance given our constraints but can't just hand out pointers to GPU-visible memory with no checks.
I don't understand what your point is. |
I will address all your points soon. But can you fill me in on:
What exactly is the problem here? |
Imagine the GPU driver is in process A and the application in process B. All APIs except maybe Vulkan with difficult to use extensions, mapping a buffer in A will give you a pointer that's only valid in A. It isn't possible to transfer the memory region in B. We could have a memory allocation in B that mirrors the mapped pointer. However the point of persistently mapped buffers is that CPU and GPU accesses to the data happen concurrently. This is not possible because 1) we don't know when the GPU writes data that needs to be forwarded from A to B, 2) we don't know when the CPU writes data that needs to be forwarded from B to A. Also, races. |
Why is this a function rather than:
? |
This proposal was discussed at the 3 Dec 2018 Teleconference |
I wouldn't call the memory import/export extensions difficult to use. Also ekhm: So maybe Metal is lacking? |
Update: It does indeed appear that Apple Metal is severely limited However macOS and iOS only make <20% total browsing OS market share, while the other 70% is Windows, Android and Linux all of which should support Vulkan (or D3D12) and hence sharing device/driver memory across process boundaries. So it would make sense if persistently mapped buffers were an ubiquitous extension to webGPU until (if at all) Metal provides the same functionality or webGPU is a single-process implementation on macOS & iOS. You really don't want to drag the performance and usability down without a way out for developers just because of one ecosystem. Sure the default can be "managed" buffers but developers expect a fast-path to be made available. |
Well in this case it settles it for WebGPU core. An extension could expose persistently mapped buffers but for browsers to accept to implement it I think it is safe to assume they'll require it to be deterministic. Note that in both buffer mapping proposals the buffers aren't "managed" in the Metal sense of having a staging and a GPU-local copy. Instead it is possible for browsers to implement them with persistently mapped buffers but give pointer access only during specific times. |
I would like to note that RenderDoc provides just the functionality that you're claiming is impossible to provide with webGPU. RenderDoc is an intermediate library that intercepts all Vulkan/OpenGL/DirectX calls and hence hijacks any There are still things I do not like about the proposals, such as:
In general, expecting a user to allocate 3x the memory they need, fence their buffer subranges to not overstep their own data that is in-flight would alleviate most of your performance, usability and sanity issues. |
I also read the minutes of the Dec 3 meeting and here are my 2 cents.
Both approaches are valid and complementary, we should be able to map and read/write (with appropriate invalidation/flush) with almost "immediate" result as well as buffer up buffer uploads in the command buffer which are then queued up.
These flags are completely not for that purpose, basically the spec reserves the right for a driver implementation to handle you garbage (its actually an error in Vulkan) if you read with CPU from a buffer without MAP_READ and to not actually make the CPU writes available and visible to the GPU if MAP_WRITE was not specified. I.e. If you create a non-coherent MAP_READ|MAP_WRITE buffer in Vulkan (there actually is not such impl yet, everything has coherent heaps only) then you can map it with any of these flags but to get valid data:
There are so many other considerations that go into this, you should not assume that host-visible means host-local... there are plenty of GPU architectures (even discrete) and Vulkan drivers where host-visible mappable memory is actually device local. WebGPU should adopt the vulkan approach of exposing numbered driver memory-heaps corresponding directly to memory type flags (device local, map for read, map for write).
This is something only a middleware engine should do such as Unity/Unreal, you should absolutely not move the buffers between heaps for the developer working directly with webGPU.
And now this completely destroys the point of command buffers and next-gen API's lower CPU usage promise.
I do not understand why you've neutered the webGPU queue system so far, even with a single queue in Vulkan the submits can execute out-of-order. |
RenderDoc strongly suggests doing explicit flushes and invalidates because otherwise it has to essentially snapshot the state of mapped buffers on every queue submit, which is not acceptable for WebGPU.
Implementations are free to record a
Agreed that this is suboptimal if you only care about perf, but because we are making a Web API, we have to care about strong portability too. (Vulkan is not portable in the same sense as the Web platform is).
If the application wants that semantic they can implement it on top of what proposal 1 provides. To provide this at the platform level would require it to either be like Metal's "managed" buffers, or will have data races which are not acceptable.
It does because WebGPU has implicit synchronization.
You can implement that using proposal 1 except that instead of subranges of a single large buffer you have subranges of several buffers.
The two things you are describing are not the proposals. This is what the proposals are:
The part you quoted was talking about WebGPU's "MAP_READ" and "MAP_WRITE" flags.
I am aware, and that's why the quoted sentences say "device-local".
Exposing memory heaps leads to non-portability (and finger printing). Exposing heaps also means you need to expose
Think of it more like WDDM which migrates resources to CPU memory when there is memory pressure. We are not interested in optimizing behind the application's back like OpenGL and D3D11 drivers.
The good thing is that this is an implementation detail and not mandated by the spec.
Do you know of a single driver where that happens? I looked at all open-source drivers and they never do this. Driver engineers we talked to also confirmed that.
Metal uses this model and has good performance. |
Yes, I know, so I was advocating providing persistently mapped buffers but without a COHERENT-like flag/behaviour.
It would require it on Vulkan as well if you placed no limits on the update size.
Please argue the benefit of this, why cant the app just see its own recycled previously "mapped" staging arrays whenever possible ?
The application developer wants that semantic for performance, not extra work.
Yes, I've collated the different pieces of data as well as the minutes, also tackled that in my second/last post. The level of implicit synchronization is excessive. The difference between the different buffer update techniques can be as much as 4x!
Well they're called the same in all APIs, I knew it meant webGPU flags, but the way they were described in the meeting gave them a different meaning to Vulkan's, OpenGL's and DirectX's mapping flags. You should also not make the flags mutually exclusive, a read&write buffer should be possible to create and map in that mode.
I can accept the fingerprinting and IPC synch argument, however...
The Vulkan spec explicitly leaves room for that, the fact that its not taken advantage as of yet is slightly orthogonal to that. Also, have the same engineers confirmed that render sub-passes within a single command buffer in Vulkan cannot happen out of order if no dependencies are specified?
I think you're introducing even more barriers, implicit synch, etc. than even Metal has. |
That would work if we force "MAP_WRITE" buffers to only have read-only usages. It would mean that multi-process browsers will have to keep the data in the tab's process, maybe in shared memory. This sounds pretty good.
Having managed buffers always requires at least one copy. The whole point of approach 1 is that it can give you zero-copy when in a single-process or with Vulkan and D3D12 cross-process mapping. Am I missing something?
GPU2GPU races in WebGPU are only through UAVs in a single dispatch or a single render-pass (and compute shader shared memory). That's only a small number of holes imho.
We can't assume that of developers currently using WebGL which are an important target for WebGPU. Native developers get barriers and synchronization wrong too and driver engineers told us they sometimes have to go at game companies and fix their code themselves.
I'm confused, the same would happen for Vulkan developers using
That's because of warp occupancy, not because of Vulkan allowing submits to overlap: a renderpass has to be fully contained in a command buffer (as its beginning and end, but not necessarily the content) so the parallelism we see in the image is not thanks to that part of the Vulkan spec.
WebGPU doesn't have multi-subpass renderpasses. I don't know if drivers take advantage of renderpasses being pre-compiled to optimize and allow parallelism. |
This was discussed in the 10 Dec 2018 meeting |
I will address all of your points soon @Kangz but for now let me deal with the most important ones.
Absolutely not, EXTRA NOTE: Yes, This is complete opposite to what you are proposing that would require a wait event in user space that is not deferrable.
These things are called "Render Graphs", the folks at DICE did a lot of research into them and how to overlap the different parts of the rendering process and ConfettiFX is following in their footsteps. This is the new hot-topic in low-level graphics programming and Vulkan with its subpasses and explicit resource dependencies actually already has all the necessary meta-data to create such "Render Graphs" should driver implementations choose to go that extra mile in the near future. |
Thank you for putting this together, @Kangz Re: Promises and ".thenable" The problem with the proposed approach is there are more methods on the promise object than just .then. We would need to implement all of them in order for the object to truly act like a promise to web developers. Instead, I suggest we have an attribute on States: The document should clearly define the states of
Polling: Spec should be clear that results are delivered on Javascript task boundaries and do not change during a callback or Promise resolution. In other words, if you query GC discoverability As discussed during the call, this can be solved by having the
Questions I'd like the eventual PR to answer: If the web developer maps 5 bytes of data, does the array buffer returned by getPointer only contain 5 bytes or the whole buffer? Are the calls to mapReadAsync and mapWriteAsync nestable? In other words, if I call mapReadAsync 5 times, do I need to call unmap 5 times in order for the buffer to transition from the mapped state to the pending state? |
Thenable: FYI, the idea making this Thenable was not to make it look like a Promise exactly; "thenable" is a concept from the spec (although it only barely mentions "thenable" - it was a more prominent concept before Promises were specced afaik). That said, I don't think I have any issue with just returning a real Promise (although it is one more piece of garbage for the GC).
GC discoverability: (nit) All that really matters is that the ArrayBuffer not suddenly become detached (as if the buffer has been unmapped). So technically the ArrayBuffer only has to point at the WebGPUBuffer, not the WebGPUMappedMemory, I think. |
I think it's going to be important to keep a GC-less path for most of the API currently marked with thenable, because this applies to some functions which will usually be called at high frequency. There was some mention of a spinloop use case for workers too. I'd still prefer to use callbacks in the IDL at this point unless we have a good idea whether a special thenable would be allowed. |
I failed to mention this in the proposal but the
The ArrayBuffer will contain 5 bytes.
Calling |
This follows discussions on gpuweb#138
Just was looking at this issue again and thought about this. |
@kainino0x , what you suggest will also work. |
I am confused by the behavior of mapWriteAsync. Suppose you create a buffer and use it to upload some data in one frame. 100 frames later you call mapWriteAsync on the buffer. Will the returned Should we change the API such that we do away with |
It will always be returned with
This is basically proposal 2 but then |
It's hard to keep up with this discussion... Just shows that we are reaching the limit of where Github UI works, and may need to consider different spaces for hot topics. Zeroing out the dataIt would be unfortunate to zero out the data (that is map for writing only) on every map operation, especially since we don't have persistent mapping, and so we can expect it to be called more often. Can this be avoided by:
Persistent mappingMy understanding here is that a few popular use cases would be helpful for us to find the right solution. I've recently seen one such case: in Dota2, there is a big persistently mapped UBO, and CPU writes down chunks of it and then binds it as a dynamic uniform buffer (providing the offsets) over and over to different draw calls (with advancing offsets). In this case, since GPU doesn't mutate the data, the user can just map/fill/unmap the buffer a few times per frame, and we as a browser implementation can turn those map/unmap calls into simple flush/invalidate on a persistently mapped buffer. So it doesn't appear that exposing it in the API would be required to get that efficiency. @devshgraphicsprogramming I'm sure you have more cases under your belt. Let's talk about them and see if we are missing something important. Overlapped submissionsI don't think we are preventing much of a parallelism within a single queue. The most controversial point is implicit synchronization between dispatches, but outside of that we aren't inserting any more barriers than the user would need for correctness anyway (or at least that's the idea). And for dispatches, it came down to providing the strong test cases, so I guess the next step is to try to see how our API would match cases provided by @devshgraphicsprogramming in #64 (comment) (thank you!) |
@kvark it's not that its impossible to implement most algorithms or engines with your buffer API, but it would be a royal pain in the ass to port existing D3D11/12, OpenGL or Vulkan engines and games to webGPU that already use persistently mapped buffers exclusively and/or manage their own staging memory in Vulkan/D3D12. Lastly we have a big performance issue, because all of these "compromises" multiply like compound interest. I.e. as we notice in OpenGL insights a non-persistently mapped API-backed Psuedo-mapping proposal:
Buffer-update proposal:
My main issue is that webGPU is taking on the task of what used to be the video-driver's jobs back in the OpenGL and pre-D3D12 era, so the performance of my App's webGPU buffer updates (which are a corner-stone of all that a GPU app does) would largely hinge on the quality of the webGPU implementation. Past experience tells us that it was rare for every driver team to do the job well enough to satisfy all the use cases, most of the time a problem surfaced during the production of a new and popular game and got fixed after it reared its head in production. This would likely repeat some of my past grievances with Intel's OpenGL drivers in the form of grievances with Browser XYZ's implementation of webGPU, because after all can you guarantee that each major browser will throw enough people with enough expertise and enough money at the webGPU team as you would at an OpenGL driver impl. ? |
This would be extremely difficult to specify unless we say that @devshgraphicsprogramming we agree that Let's assume that, like you and @kvark suggested, The biggest disadvantages are:
The advantages are:
|
That would be a serious understatement.
They did that to be much faster than the usual OpenGL (ES) and DirectX11 methods, now they will have to be even slower than the old methods due to emulation.
You'd keep all of them, except the "guarantee of having no races" if you went for a non-cached approach with explicit flushes and invalidations (as well as beautiful RenderDoc integration). |
We discussed this with @devshgraphicsprogramming a bit and came to conclusion that just adding mapping ability to On native, persistent mapping became the way of exchanging data because:
This doesn't match our current policy of having a resource to consistently be in only a single mutable usage at a time (we can't map half of the buffer while GPU works on the other half). Thus, providing a buffer API to the user and expecting them to map (portions of) it will never be efficient. Note that for textures, it's still fine to have a single-mutable-usage policy since it applies to subresources individually, and textures don't need mapping, and it's fine to assume the user will only use each subresource as a whole. I think we should step back a bit and think of a solution in terms of what workflows need to be exposed: uploading and downloading chunks of data. Perhaps, we can design an API in such a way that the implementation would be managing a persistently mapped buffer under the hood, but its subranges are exposed to the user as individual objects? Something like this: interface WebIDLUploadBuffer {
attribute ArrayBufferView data;
}
partial interface WebIDLDevice {
Promise<WebIDLUploadBuffer> createUploadBuffer(u32 size);
} The |
The above design is indeed better in my opnion.
I would still like to see the whole mapped range of the buffer as one object that I can explicitly flush and invalidate distinct ranges and subranges of. Think about it as a git push/pull (with some rather non-existent conflict resolution) except that local is the CPU and the remote is the GPU. That would fit nicely into existing engines that would like to port to webGPU, as many of these already have their own allocators for CPU and GPU memory.
I'm having some questions here whether this extra tracking does not destroy the whole point of native persistently mapped buffers. Also what would |
I thought about this briefly as well. It is probably already possible to some extent with the proposed API, but could be more powerful (e.g. with sub-range tracking, and a hint to use a persistently mapped buffer). |
A non-coherent Persistently Mapped Buffer is implementable and should be the default. Since OpenGL AZDO days Persistently Mapped Buffers (coherent and non-coherent) have been advertised as Best Practice by all 3 major desktop GPU vendors. Modern Engines are already tooled around persistently mapped buffers and have their own allocators, etc. that would have to be dumbed down and stripped of their performance for the current webGPU proposed API. |
Exposing large mapped buffers is problematic because it conflicts with the current synchronization policy of a resource having exclusive usage (i.e. can't CPU write to one part while GPU uses the other). Adding manual flush/invalidate on top of that would unfortunately break any portability guarantees, so we can't do this. As for the sketch I proposed, it would only work if we say that CPU-visible buffers can't be used for anything else. That would imply an extra copy (on GPU or DMA) into the actual private memory for anything that the native engines currently are trying to use on GPU while having as CPU visible. As soon as we start thinking about parts of the native buffer being exposed as individual So, for me, the discussion is blocked on the following questions:
For the latter, here is a rough description of what Valve's engine is doing:
|
That actually wouldn't be bad, it would be promoting good usage. Since immutable unaccessible buffers are fastest in all benchmarks 😄
As long as relaxed synchronisation is used that actually allows overlapped DMA transfers with cmd buffer graphics and compute operation executions, all is well.
There is a ton of manual synchronisation there inbetween, as these happen on 3 different timelines (GPU queue that you submitted the draw call cmd buff, some arbitrary timeline of device-scope flushes and invalidates, CPU timeline). |
Note: that would be only in the case you use the same UBO data more than once or twice. |
I did some more digging into Which would make @kvark 's idea a little bit limiting.
By consequence if we are to believe @Kangz about current drivers and Vulkan implementations not taking the advantage of "render graphs" or the possibility for out-of-order execution within a command buffer yet, PSMs would be the only way to truly achieve asynchronous and fully overlapped data transfers (but the synchronisation must happen either before or after a renderpass instance). Also all other buffer manipulation (inline update, fill, copy between buffers, etc.) commands can only take place outside of a renderpass. |
This follows discussions on gpuweb#138
Closing. Buffer mapping has been added a while ago and fairly stable. Further discussion can go in new issues. |
PTAL, this is basically #49 but as an investigation, and with additional alternatives.
Our thoughts on this proposals are the following:
Buffer operations
This describes
WebGPUBuffer
operations that are used by applications to interact directly with the content of the buffer's memory.The two primitives we need to support are the CPU writing data inside the buffer for use by the GPU (upload) and the CPU reading data produced by the GPU (readback).
Design constraints are:
Two alternative proposals are described for buffer mapping,
WebGPUMappedMemory
and whole-buffer mapping.Two other proposals are described for immediate data uploads that aren't mutually exclusive, one base one
mapReadSync
ofWebGPUMappedMemory
and another usingsetSubData
.Buffer mapping proposal 1
map[Write|Read]Async and unmap
The way to have the minimal number of copies for upload and readback is to provide a buffer mapping mechanism.
This mechanism has to be asynchronous to ensure the GPU is done using the buffer before the application can look into the ArrayBuffer.
Otherwise on implementation where the ArrayBuffer is directly a pointer to the buffer memory, data races between the CPU and the GPU could occur.
We want the status of a map operation to act as both a promise, and something that's pollable as there are advantages to both.
WebGPUMappedMemory
is an object that isthen
-able, meaning that it acts like a JavascriptPromise
but is pollable at the same time.The mapping operations for
WebGPUBuffer
are:These operations return new
WebGPUMappedMemory
objects representing the current range of the buffer for writing or mapping.The results are initialized in the "pending" state and transition at Javascript task boundary to the "available" state when the implementation can determine the GPU is done using the buffer.
Calling
mapReadAsync
ormapWriteAsync
puts the buffer in the mapped state.No operations are allowed in a buffer in that state except additional calls to
mapReadiAsync
ormapWriteAsync
and calls tounmap
.In particular a mapped buffer cannot be used in a
WebGPUCommandBuffer
given toWebGPUQueue.submit
.The following must be true or a validation error occurs for
mapWriteAsync
(resp.mapReadAsync
):WebGPUBufferUsage.MAP_WRITE
(resp.WebGPUBufferUsage.MAP_READ
) usage.offset + size
must not overflow and be at most the size of the buffer[offset, offset + size)
range must not intersect the range of anotherWebGPUMappedMemory
on the same buffer which hasn't been previously invalidated.Then a mapped buffer can be unmapped with:
This operation invalidates all the
WebGPUMappedMemory
created from the buffer and puts the buffer in the unmapped state.The buffer must be in the mapped state otherwise a validation error occurs when
unmap
is called.WebGPUMappedMemory
WebGPUMappedMemory
is an object representing a mapped region of a buffer that's both pollable and promise-like.It can be in one of three states: pending, available and invalidated.
The pollable interface is:
isPending
return true if the object is in the pending state, false otherwise.getPointer
returns an ArrayBuffer representing the buffer data if the object is in the available state, null otherwise.WebGPUMappedMemory
is alsothen
-able, meaning that it acts like a JavascriptPromise
:This acts like a
Promise<ArrayBuffer>.then
that is resolved on the Javascript task boundary in which the implementation detects the GPU is done with the buffer.On that boundary:
WebGPUMappedMemory
goes in the available state.WebGPUMappedMemory
was created viaWebGPUBuffer.mapWriteAsync
, its content is cleared to 0.success
is called with the content of the memory as an argument.If
success
hasn't been called when the WebGPUMappedMemory gets invalidated (meaning the object is still in the pending state),error
is called instead.When
WebGPUMappedMemory
goes from the available state to the invalidated state, theArrayBuffer
for its content gets neutered.The return value of
then
acts like the return value ofPromise.then
.The
ArrayBuffer
of aWebGPUMappedMemory
created from amapWriteAsync
is where the application should write the data and its content is made available to the buffer when theWebGPUMappedMemory
is invalidated (i.e.WebGPUBuffer.unmap
is called).Buffer mapping proposal 2
In this proposal a buffer is always mapped as a whole as an asynchronous operation.
Mapping for reading (resp.writing) is done using
WebGPUBuffer.mapRead
(respWebGPUBuffer.mapWrite
).The mapping calls but the buffer in the "mapped" state.
A Javascript error is thrown under these conditions:
MAP_READ
(resp.MAP_WRITE
) usage.Mapping is an asynchronous operation and after its resolution the buffer's
mapping
member will be updated to represent the content of the buffer (resp. filled with zero and ready to receive data from the application).Resolution can only happen at Javascript task boundary, and after the implementation has determined it is safe to give access to the buffer to the CPU.
Resolution is guaranteed to complete before (or at the same time) as when all previously enqueued operations are finished executing (as can be observed with
WebGPUFence
).The buffer is unmapped with a call to
unmap
which puts it in the unmapped state.It is an error to call
unmap
while in the unmapped state.In the mapped state it is an error to do operations in the buffer (such as
setSubdata
or enqueuing commands using the buffer).Immediate data upload proposal 1
When mapping for writing, the application doesn't see GPU state since the content is cleared to 0.
This means WebGPU can expose a
mapWriteSync
primitive that behaves exactly likemapWriteAsync
except that the returnedWebGPUMappedMemory
object starts in the available state.Immediate data upload proposal 2
Buffer mapping is the path with the least number of copies but it is often useful to upload data to a buffer right now, if only for debugging.
A
WebGPUBuffer
operation is provided that takes an ArrayBuffer and copies its content at an offset in the buffer.This operation acts as if it was done after all previous "device-level" commands and before all subsequent "device-level" commands.
"Device level" commands are all commands not buffered in a
WebGPUCommandBuffer
, and includeWebGPUQueue.submit
.The content of
data
is only read during the call and can be modified by the application afterwards.The following must be true or a validation error occurs:
WebGPUBufferUsage.TRANSFER_DST
usage flag.offset + data.length
must not overflow and be at most the size of the buffer.Unused designs
Persistently mapped buffer
Persistently mapped buffer are when the result of mapping the buffer can be kept by the application while the buffer is in use by the GPU.
We didn't find a way to have persistently mapped buffers and at the same time keep things data race free between the CPU and GPU.
Being data race free would be possible if ArrayBuffer could be unneutered but this is not the case.
Promise readback();
This didn't have a pollable interface and forced an extra buffer-to-buffer copy to occur if the GPU execution could be resumed immediately.
Dawn's MapReadAsync(callback);
Not a pollable interface.
Issues
GC discoverability
It isn't clear yet what happens when a buffer gets garbage collected while it is mapped.
The simple answer is that the
WebGPUMappedMemory
objects get invalidated but that would allow the application to discover when the GC runs.GC pressure
The
WebGPUMappedMemory
design makes each mapped region create two garbage collected objects. This could lead to some GC pressure.Side effects between mapped memory regions
What happens when
WebGPUMappedMemory
object's region in the buffer overlap?Are write from one visible from the other?
If they are, maybe
WebGPUMappedMemory.getPointer
should return anArrayBufferView
instead.Interactions with workers
Can a buffer be mapped in multiple different workers?
If that's the case, the pointer should be represented with a
SharedArrayBuffer
.The text was updated successfully, but these errors were encountered: