New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer partitioning (real bad proposal but it was promised) #481
Comments
Thank you for writing down the proposal!
I don't understand why this needs to switch the state to
I also don't feel good about doing this at submit time... Perhaps we could require that a partition exists by the time
Why is this limited to UBO/Storage? What if I need to have the vertex data partitioned, for example? |
You could just call setVertexBuffers with a new buffer, right? My understanding is that this is intended to cut down on the intermediate GPUBindings objects which Vulkan requires. Vulkan has implicit partitioning since you can barrier for different subregions of a buffer. |
The original idea in #156 was that you'd be passing the pair Generally speaking you'd never be using the parent for anything else than just bind-group creation (and whatever else needs a permament immutable native API buffer in its native API state). I actually wanted a separate API object for the partition, and never use the buffer object for anything else than creating partitions, so its more clear. Much how in Vulkan you create a
For all I care you could make it a requirement
#156 intended it to be used for everything @magcius .
The idea was also that you'd only be able to map the partition, not the parent. |
I find this very elegant: having the duality between Texture/TextureView and Buffer/BufferView with all the strong typing. 👍 |
In the last meeting I mentioned that I'll like to un-assign myself from designing a buffer partitioning scheme because I don't believe in the approach anymore. Here's a dump of my current thoughts. Problems buffer partitioning is trying to addressBuffer partitioning (or buffer views) came up when we were discussing proposals for buffer mapping. The problem is that the "state" of the buffer, mapped or unmapped is global to the entire buffer, so you cannot put data in the buffer for follow up frames if you submitted commands using the buffer this frame. For static data this doesn't matter, but applications often have dynamic data (uniforms, vertex, and index) and here are two usecases that buffer partitioning meant to address:
Note that allowing a mix of read-only and read-write usages of the buffer in the same renderpass was a non-goal. Also both usecases above are solved if we're ok pipelining copyBufferToBuffer copies at the start of each frame. Why I don't believe in this approach.State tracking complexityContrary to Splitting buffers isn't a complete algebra of operationsIf you are able to split ranges of buffers, you want to be able to merge ranges of the same buffer as well and this adds even more complexity. The reason why you want to merge ranges is the following: imagine you have a ringbuffer of uniforms parts of which are mapped, being mapped, or in use by the GPU. Imagine you have two neighboring buffer views that are mapped but you'd like to have staging space for data that doesn't fit in only one of the two views. Do you split the data? It would require a lot of complexity on the application side. Instead you want to be able to join both views and have a merged staging area. This adds a bunch mroe questions: is the merging operation synchronous (it would require the whole buffer to be allocated in shmem for remoting implementations), do you see data you previously wrote to the views? etc. GarbageOne of the goals of buffer partitioning was to reduce garbage for the usecase with a large uniform buffer and dynamic offsets into it. But with buffer splitting and merging we are introducing garbage in a different place so it doesn't really help. The proper solution would be to pipeline a copyBufferToBuffer. Bad interactions with other parts of the APIThe point was to be able to use dynamic offsets efficiently, but see the discussion above for what the issue is. Passing the buffer view in setBindGroup adds another level of complexity to an already overloaded method imho.
It's conceptually elegant but not convenient to use because in most cases you'll create a GPUBuffer just to call |
First of all, sorry I can't attend the meetings, they're in a bad timezone for me.
The whole idea was that mapped/unmapped would be per-view not per-buffer, explicitly to ensure that only a view is being tracked, not the whole buffer. I have absolutely no idea why that was somehow lost in translation over the multiple instances when I've explained the idea.
This hurts multi-queue and multi-threading, most developers want to step outside of the traditional "one command stream linearly executed and synched to vblank".
Could be solved by:
My GPU buffer sub-allocator that has been running in production conforms to all the above. Unfortunately in Vulkan and OpenGL the whole buffer would need to be mapped behind the scenes (only D3D12 allows for multiple mappings). Remember that the proposal #156 required the user to KNOW EXPLICITLY WHERE THEY WANT TO CREATE THE VIEW, not for WebGPU to magically allocate them an offset given a size.
Recreating bind-groups in the number of Think of the following simple scenario
Now if I need to have (to avoid stalls between CPU, API and GPU):
Then this gives me 3 buffers for each of the above resources, so 9 in total. But beause the framerate is variable I need every single combination of 3 buffers as a bind group because the update frequencies are independent of each other. Assume 100 FPS (f=frame, p=physics, s=skinning)
So say hello to 3^3=27 bind groups! Now think about how much fun its going to be for an application developer to go back to number theory and work out the common divisors and modular arithmetic to put a lower bound on the nnumber of bind groups they'll need! |
My talks about garbage have meant required garbage -- that is, places where I cannot reuse an object, e.g. Promise, which is create-and-forget and cannot be reset or reused. If I can save off a BufferView, and continue to reuse it for multiple command encoders, that is not garbage in my eyes. |
Closing since the second round of discussions on buffer mapping are finished and resulted in #708 (plus some follow ups) |
* Add texture usage validation test for createTexture * Address Corentin's comment: depth/stencil formats can be used as copy src/dst. * Remove the TODOs (wrongly recovered during git merge) * Address feedback from Corentin * Unconditionally test all texture formats for copy usages * One more minor change to be consistent at least in this file
Edit: this proposal originates in #156 known as "buffer views"
I tried really hard to make some proposal for buffer partitioning that makes sense but the best I could get is what's below. It has major flaws and I wouldn't want it as part of WebGPU but I'm posting it because it was promised, and because maybe someone will inspiration for a better idea.
Feel free to not read if you're not interested in my rambling.
Buffer partitioning
This is a proposal for how buffers could be split in such a way that one piece of the buffer if mapped while another is in use by the GPU.
The basic requirement was that UBOs could be ring-allocated in a mappable buffer, such that a part of the buffer is in use as a UBO by the GPU while the other is being populated by that application. Doing this is important because a single bind-group could be used to address the whole ringbuffer with a dynamic offset.
Basics
GPUBuffer
gains three internal slots:[[is partition]]
a boolean that defaults to false.[[partitions]]
a list of partitions for this buffer (always empty if[[is partition]]
).[[parent buffer]]
the parent of the buffer (always null if not[[is partition]]
)Buffer partitioning is done through a method of
GPUBuffer
:buffer.partition(offset, size)
creates a new Buffer representing the range[offset, offset + size)
of thebuffer
. A validation error happens and an errorGPUBuffer
is returned if one of these conditions isn't respected:offset
andsize
must be aligned to 256.buffer.[[is partition]]
is true.buffer
must not be in thedestroyed
ormapped
state. It can already be partitioned.buffer
that's not in the destroyed state.Upon success a new partition of the
buffer
,result
is returned. The steps are roughly:buffer
is put in thepartitioned
state.GPUBuffer
calledresult
is created.result
is inserted in buffer's[[partitions]]
result
's[[is partition]]
is set to true.result
's[[parent buffer]]
is set tobuffer
result
is put in theunmapped
state.result
's data store represents the correct range ofbuffer
's data store' (hands waving).Interactions with other WebGPU features
Interaction with destroy()
unmapped
state.Interaction with dynamic offset
The intent was that you would be able to make a dynamic UBO on the parent buffer in the bind group and somehow choosing an offset that is in a correct (unmapped) partition at submit time would be allowed.
But this requires, at submit time, to check dynamic offsets are in a correct partition if the dynamic uniform buffer is partitioned. It doesn't look very tractable.
Interaction with GPUQueue.submit()
It is an error to have a partitioned buffer used in the command buffers, except as dynamic UBO / storage buffers. If that case happens, the dynamic offset must be such that
[offset, offset + binding size)
is entirely contained in a single unmapped parition.Interaction with mapping.
And this proposal breaks down even more. The different partitions of a large ring buffer will be
mapReadAsync
ed independently, but then we might want to stitch them together to make bigger partitions that are all in themapped
state. That and splitting a largemapped
region into two sub-regions one that's going to be used as a UBO on the GPU and one that stays mapped for the next frame.The text was updated successfully, but these errors were encountered: