Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource copying/clearing/updating investigations #28

Open
kvark opened this issue Aug 9, 2017 · 9 comments
Open

Resource copying/clearing/updating investigations #28

kvark opened this issue Aug 9, 2017 · 9 comments
Labels
Projects
Milestone

Comments

@kvark
Copy link
Contributor

kvark commented Aug 9, 2017

Native APIs provide different constraints and features when it comes to resource copies and clears, where resources can be buffers or images. In this issue, we'll try to find a common ground (a least common denominator API) that is usable and efficient on all backends.

In Metal, all of the copy/clear operations are done via the MTLBlitCommandEncoder.
In Vulkan, these are transfer operations, supported on any queue type. They require TRANSFER_SRC flag on the source and TRANSFER_DST flag on the destination.

Operation table

operation/backend Vulkan D3D12 Metal
clear buffer vkCmdFillBuffer views only with ClearUnorderedAccessView* nothing
clear image vkCmdClearColorImage, vkCmdClearDepthStencilImage views only with ClearRenderTargetView, ClearDepthStencilView nothing
update buffer vkCmdUpdateBuffer, limited to 64k nothing nothing
update image nothing nothing nothing
buffer -> buffer vkCmdCopyBuffer CopyBufferRegion copy
buffer -> image vkCmdCopyBufferToImage CopyTextureRegion copy
image -> buffer vkCmdCopyImageToBuffer CopyTextureRegion copy
image -> image vkCmdCopyImage CopyTextureRegion copy
image blit vkCmdBlitImage nothing generateMipmaps

Buffer Updates

In D3D12, the only way to update a buffer with new data coming from CPU is to use a staging buffer (that is mapped, filled, then copied to the destination).

In Metal, similar effect can be achieved by creating a buffer with makeBuffer that re-uses the existing storage.

In Vulkan, the implementation may have a fast-path for small buffer updates by in-lining the data right into the command buffer space. The implementation can fall back to a staging-like scheme for larger updates.

Image Blitting

Image blits are different from image copies for allowing format conversion and arbitrary scaling with filtering. A typical use case for blitting is mipmap generation. It is not clear to me why/how Vulkan provides this on a transfer-only queue, but other APIs are far more (and reasonably) limited with regards to where and how they can blit surfaces.

Alignment rules

Vulkan

VkPhysicalDeviceLimits has optimal alignments for buffer data when transferring to/from image:

  • optimalBufferCopyOffsetAlignment is the optimal buffer offset alignment in bytes for vkCmdCopyBufferToImage and vkCmdCopyImageToBuffer
  • optimalBufferCopyRowPitchAlignment is the optimal buffer row pitch alignment in bytes for vkCmdCopyBufferToImage and vkCmdCopyImageToBuffer

These are not enforced by the validation layers but are recommended for optimal performance.

D3D12

MSDN section lists the following restrictions:

  • linear subresource copying must be aligned to D3D12_TEXTURE_DATA_PLACEMENT_ALIGNMENT (512) bytes
  • row pitch aligned to D3D12_TEXTURE_DATA_PITCH_ALIGNMENT (256) bytes

Proposed API

Clears

D3D12 model appears to be the least common denominator. If we have the concept of views, we can have API calls to clear them. In Vulkan, these calls would trivially translate into direct clears. In Metal, we'd need to run a compute shader to clear the resources. Supporting multiple cear rectangles seems to complicate this scheme quite a bit, so I suggest only doing the full-slice clears.

Updates

Given the limited support of resource updates, I suggest not providing this API at all in favor of requiring the user to use staging resources manually.

Copies

All 3 APIs appear to provide the copy capability between buffers and textures. The difference is mostly about the alignment requirements. I suggest having device flags to the minimum offset/pitch required:

  • D3D12: equal to D3D12 constants
  • Vulkan: equal to optimal alignment features
  • Metal: some reasonable default selected by Apple

Blits

D3D12 doesn't support any sort of blitting, I'm inclined to propose no workarounds here. Users doing simple render passes for blitting textures shouldn't be slower than emulating this in the API, anyway.

Afterword

This analysis may be incomplete, corrections are welcome to go directly as the issue edits.

@msiglreith
Copy link

msiglreith commented Aug 9, 2017

In Vulkan, these are transfer operations, supported on any queue type

To clarify, clear commands are not supported on transfer queues even though they count as transfer operations. vkCmdClearColorImage requires graphics or compute queues, vkCmdClearDepthStencilImage requires graphics support.

@grorg
Copy link
Contributor

grorg commented Aug 9, 2017

Note: Metal's texture-to-texture copies don't specify the destination size, and nowhere it is said to be required to match the source size. I assume Metal scales the result to fit the whole destination slice, but it would be great to get clarification from Apple.

I don't believe there is any scaling. It's just a copy of that rectangle into the destination, at a specified origin.

@kvark
Copy link
Contributor Author

kvark commented Aug 9, 2017

@grorg thanks for clarification! I removed the note from the body now.
It seems a little strange to me that Metal doesn't provide scaling for blits yet has a generateMipmaps routine.

@Kangz
Copy link
Contributor

Kangz commented Aug 9, 2017

Thanks for the nice analysis! I'd like to point out that in NXT we have found a way to abstract the D3D12_TEXTURE_DATA_PLACEMENT_ALIGNMENT requirement by splitting copies in two parts if needed. The code handling this can be found here. It is covered by extensive test so we are confident it works (but is only implemented for 2D texture though).

With respect to the proposed API:

  • Having no way to do "updates" would work but we have found it extremely useful to have an immediate nxtBufferSetSubData for tests (not even a queued operation). If we go with no updates someone will need to make a "blessed" helper library to do good enough buffer updates (oustide of compute / render passes of course).
  • No blits sounds ok.
  • Copies: this ties into another topic; ideally there would be default constraints that are validated and work on all platforms. Additionally we could provide a way for an application to discover smaller constraints and explicitly require it at device creation time. From our experiments we think only the rowPitch constraint will need to be present (and maybe image height for 2D arrays / 3D textures).
  • Clears: if the clears are done outside of compute / render passes, then they could be emulated with empty MTLRenderCommandEncoder````. I'd be interested in knowing why Metal didn't find it necessary to allow clears in MTLBlitCommandEncoder```.

It seems a little strange to me that Metal doesn't provide scaling for blits yet has a generateMipmaps routine.

It sounds like it could be a built-in compute shader.

@Wumpf
Copy link

Wumpf commented Mar 17, 2021

For any kind of resource (texture or buffer) easy clearing would be highly desirable for compute shader usecases which don't have any way of doing clear via new RenderPasses.
I hit this in my wgpu(-rs) based fluid sim quite a bit where I have accumulating or temp targets (either buffer or volume textures) that need periodical clearing.
Another more common usecase would be e.g. a histogram that is computed per frame using a compute shader - every frame the buffer for the histogram needs clearing.

Today, all of these cases required specialized clear passes and in many cases (even more to the annoyance of any users) bindgroups, layouts etc.
I think ideally webgpu would land at a clear function on the encoder for both textures and buffers, comparable to the copy texture/buffer methods it already provides

@kvark kvark added this to Needs Discussion in Main Mar 17, 2021
@Kangz
Copy link
Contributor

Kangz commented Mar 17, 2021

Today, all of these cases required specialized clear passes and in many cases (even more to the annoyance of any users) bindgroups, layouts etc.

Can you explain how it requires bindgroups layouts, etc? beingRenderPass for clearing doesn't need them. Or do you want to clear a texture inline in a compute pass? If that's the case then you can do it with your own dispatches and encapsulate them in a function. I agree that the implementation could do it for you, but that seems like something we can add later (so we can ship the first version of WebGPU faster).

@Wumpf
Copy link

Wumpf commented Mar 17, 2021

Yes, I was referring to the usecase of having a dedicated compute pass. For everything where a beginRenderPass works, a separate clear function isn't truly needed anyways.

It's ofc true that a user can encapsulate something like this in a function but this way of clearing requires a lot of book keeping that the clear functions of dx12/vulkan don't need: For every type (image format, image dimension etc.) there i a special compute pipeline needed. Every single resource that needs cleaning also needs a bind group just containing this one resource.

@kvark
Copy link
Contributor Author

kvark commented Sep 17, 2021

@Wumpf would you be willing to write down a more concrete proposal (in a separate issue), with the following:

  • description of the use cases. I.e. clearing buffer and texture data before doing compute passes with them, where RENDER_ATTACHMENT is not even needed otherwise.
  • the implementation paths and costs on each platform. Comparison to what the user could do on their own.
  • suggested API

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022
@hamwj1991
Copy link

metal has BlitCommandEncoder::fillBuffer function

@kainino0x kainino0x added the api WebGPU API label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Main
Needs Discussion
Development

No branches or pull requests

7 participants