-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restriction on where each command can be done (encode, in/out of renderpasses) #21
Comments
As Ben mentioned in the last meeting, for D3D12 there is a command list type enum (direct, bundle, compute, copy) specified in the creation of the command list, queue, and allocator. So it's not a bitfield like Vulkan's command pool. The following references provide a high-level overview but they may be helpful:
|
Thanks @grovesNL for the pointers. That's what Ben mentioned in the meeting but my understanding of what we wanted to figure out was narrower than multi-queue stuff. Assuming you are on a queue with all bits set in Vulkan, then you have the restriction outlined above for commands: some of them have to be done outside of renderpasses and others inside.
This is the equivalent of Metal's only queue, and Vulkan's queue with all bit sets. Are there additional restriction on, for example, which operations can be done with a render-target bound? |
@Kangz Agreed, that documentation doesn't contain enough detail for this investigation. |
@BenConMS and I asked around for clarification with regard to DX12. Here's what we discovered: When you create a D3D12 command queue, you must pass a D3D12_COMMAND_LIST_TYPE enum. Note that this is mutually exclusive enum, not a bitfield. Possible values are All The command list types reflect the fact that specialized hardware exists to perform these operations in parallel on behalf of the developer. While Vulkan restrictions on devices and queues are documented in the Devices and Queues section of the spec. In summary, you ask Vulkan for properties of “physical devices”. From the devices, you can query information about different “queue families”. The (one or more) queue families have bits that tell you what operations you can perform on queues from the families. Later on, when you make a queue, you need to pass the queue family index into the creation function. There are some guarantees, outlined in the following section:
For the MVP of gpuWeb, we can explore only exposing queues that can perform all operations and add additional queue types later on. Hopefully, we can emulate Vulkan's optional transfer functionality with draws. |
I'm fairly sure this is true, but I'm checking. |
A tentative agreement of the meeting is to allow "Setting [compute / graphics] [pipelines / resources / state], synchronization, queries" outside of the render passes. The Metal backend would then need to remember the state set outside of a render/compute command encoder and set it upon the first use during (or simply at the start of) encoding. |
This is an unfortunate result. Consider the following WebGPU pseudocode:
In Metal, this has a few implications:
On the other hand, if we went with the opposite model, the WebGPU code would now be:
This can be implemented naturally in D3D12 (and Vulkan) by either:
This has a few advantages:
|
We don't have to set all the state, we only need to set the intersection of "needed by the shader" and "not yet set up since the start of the encoder".
We should have the shader reflection available to know what exactly is needed.
I'd vote against automatic usage of multiple queues. You only have a concrete set of them available by the
True, although we don't need to defer generic command lists. All it takes is a few resource tables that get updated and checked on draw calls, and they are local to the command buffer being recorded, so effectively multi-threaded. Also, this would only apply to Metal backend.
We bind only what's needed automatically, based on the shader inputs.
I think this very question was asked explicitly on the calls (a few times?) and the conclusion was that ti's basically free. If it's not, we need to get back to the drawing board. |
Unfortunately it's not free (and we didn't have this information at the time). Our recommendation is to do as much work as possible in a single encoder, because ending a pass causes a flush. |
@litherum the example you showed switched between graphics and compute freely which is indeed very expensive, in particular on mobile GPUs that would need to flush the tile caches. The conclusion so far in this issue is that graphics work would need to be explicitly started and ended, and that compute and blit operations cannot be done inside of these bounds. I imagine setting inherited state at the beginning of a MTLRenderCommandEncoder should be cheap compared to the cost of setting up the "rendertarget", is that correct? For MTLComputeCommandEncoder, a lot less state would need to be set, and only in the buffer / texture / sampler tables. I might be wrong but in Metal it seems that the call to setBuffer should just be copying the relevant data and setting a dirty bit, with the work deferred to the dispatch commands. If that's the case then inheriting state is cheap for compute commands too. What do you think?
@grorg it sounds like you are mentioning switching from / to MTLRenderCommandEncoder, but is it the case for switching between MTLComputeCommandEncoder and MTLBlitCommandEncoder (or Blit -> Compute)? In the issue so far we agreed that graphics work should be separate from blit / compute so the render pass flush would be explicitly controlled by the app. |
Switching between encoders even blit and compute is expensive in Metal. Last meeting we agreed that compute should be explicitly delimited, like graphics work. For example with BeginCompute and EndCompute command buffer commands. |
This has been resolved for a while. |
Actually I'll leave this open since it's an investigation that we might want to refer back to. |
Last meeting we were looking at different API's restriction on where each type of command could be done. This isn't about multi-queue scenarios, and assumes we are on a DIRECT queue on D3D12 and a queue with all bits set that is guaranteed to exist in Vulkan.
Metal
In Metal, to put commands in a MTLCommandList, the application has to use encoders. There are three types of encoders that support mostly disjoint operation subsets (all of them can do synchronization):
Vulkan
Operations in Vulkan can either be done inside render passes, outside, or both but are all encoded via the same object.
D3D12
@RafaelCintron I haven't been able to find documentation on the restriction in the doc for ID3D12GraphicsCommandList. Is it because you are allowed to do any command anywhere, or because I didn't look hard enough?
Conclusion
Let's forget about Vulkan allowing to set graphics state outside of renderpasses, and compute state inside render passes. Let's also skip over API details we are not ready to look at (queries >_>).
Operations you can do inside Vulkan renderpasses are basically MTLRenderCommandEncoder operations, while operations you can do outside Vulkan renderpasses are both MTLComputeCommandEncoder and MTLBlitCommandEncoder operations. Which is great!
In my opinion, this means that either:
@grorg do you think we could get data on this?
Raw notes for reference
The text was updated successfully, but these errors were encountered: