-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: Use timeline semaphores for synchronization if possible #17761
Vulkan: Use timeline semaphores for synchronization if possible #17761
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, great job! Just minor comments from my side, but I'd like to get this reviewed by @moudgils as well.
Recommend waiting for @akioCL approval before merging. I will also try to find time to go through it this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going through the review but I feel like this change increased the complexity of semaphores a lot. This is starting to look like spaghetti code. Maybe this is a signal that a deeper refactor is need in order to accommodate binary and timeline semaphores. For example, almost all functions of the semaphore class have a different behavior for binary and timeline semaphore. Maybe we should separate into two classes with a common interface? I'm having a hard time following the semaphore logic in the Vulkan RHI. What do you guys think?
Gems/Atom/RHI/Vulkan/Code/Source/RHI/FrameGraphExecuteGroupHandler.h
Outdated
Show resolved
Hide resolved
I refactored the code a bit so I could remove the FenceTracker class from the public RHI API. The Do you mean the implementation in the |
I'm gettings some validation errors like:
This are likely false positives as discussed here and here. After the refactoring the PR does not longer work for the MultiGPU sample in this branch. I will mark the PR as draft until I fixed that |
I do think we should split the fence and semaphore classes into a normal/timeline_semaphore fence, and a binary/timeline semaphore class. It would make the code easier to read and understand. As for a solution, I'm not a fan of the SemaphoreTracker, SemaphoreTrackerCollection and SemaphoreTrackerHandle. I think we can reuse what we already have. Currently, the binary semaphore has a condition_variable that's signal once, and wait once. We should refactor the Vulkan::SignalEvent class to accept multiple signal calls. We would share a SignalEvent between the dependent semaphores (similar to the shared SemaphoreTrackerHandle). So for all binary semaphores, they would need to wait on CPU until ALL previous dependent semaphore have signal the SignalEvent. We cannot use the "count" approach, because now the Binary semaphore is not the last one, so we need to know who is signaling (maybe a bitset in the SignalEvent class). We can get the dependent semaphores information from the Framegraph. You are currently doing it in the FrameExecuter, but I think it should go at the end of the FrameGraphCompiler::CompileInternal, after all barriers and semaphores have been added to the scopes. Since Timeline semaphores don't care about out of order signal/wait operations, they would only signal the SignalEvent, but they wouldn't waste time waiting on CPU. Only Binary semaphores would waste time waiting until all dependent semaphores have submit their signal into a queue (it also includes itself). What do you think? |
I don't think we have a problem when only binary semaphores are present. I will split the Semaphore/Fence classes into normal/timeline semaphore classes. Your approach also makes sense. Moving it to the FrameGraphCompiler also makes sense. I will implement it. |
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
ab32b26
to
343a16d
Compare
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Gems/Atom/RHI/Vulkan/Code/Source/RHI/FrameGraphExecuteGroupSecondary.h
Outdated
Show resolved
Hide resolved
Gems/Atom/RHI/Vulkan/Code/Source/RHI/FrameGraphExecuteGroupSecondary.cpp
Outdated
Show resolved
Hide resolved
Gems/Atom/RHI/Vulkan/Code/Source/RHI/FrameGraphExecuteGroupPrimary.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com>
Co-authored-by: Martin Winter <102576959+martinwinter-huawei@users.noreply.github.com> Signed-off-by: Martin Sattlecker <120572403+msat-huawei@users.noreply.github.com>
@akioCL I refactored the synchronization code according to your suggestion, and split the Fence/Semaphore into multiple classes. Can you have a look at it again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks much better. I'm still going through the review but I added a comment about the Factory::CreateFence function. Will publish the rest of my comments on Monday.
@@ -168,7 +168,7 @@ namespace AZ::RHI | |||
|
|||
virtual Ptr<Device> CreateDevice() = 0; | |||
|
|||
virtual Ptr<Fence> CreateFence() = 0; | |||
virtual Ptr<Fence> CreateFence(const RHI::Device& device) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to change the public interface of the Factory to have 2 different type of fence implementations. What I would do is have a Vulkan::Fence class, and inside that class a member that is the Fence Implementation (m_fenceImpl) using the FenceBase interface. Then, during Initialization, we create the proper Fence Implementation depending on the device properties, and we funnel the calls of the Vulkan::Fence to the FenceBase class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but I think after addressing them it should be good to go.
@akioCL we did another refactor, what do you think about this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for making the changes
Gems/Atom/RHI/Vulkan/Code/Source/RHI/TimelineSemaphoreFence.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Joerg H. Mueller <joerg.mueller@huawei.com> Co-authored-by: Joerg H. Mueller <joerg.mueller@huawei.com> Co-authored-by: Martin Winter <martin.winter@huawei.com>
9fdd9d8
to
cbac2ab
Compare
…#17761) Signed-off-by: Martin Sattlecker <martin.sattlecker@huawei.com> Co-authored-by: Martin Sattlecker <martin.sattlecker@huawei.com> Co-authored-by: Joerg H. Mueller <joerg.mueller@huawei.com> Co-authored-by: Martin Winter <martin.winter@huawei.com>
…#17761) Signed-off-by: Martin Winter <martin.winter@huawei.com> Co-authored-by: Martin Sattlecker <martin.sattlecker@huawei.com> Co-authored-by: Joerg H. Mueller <joerg.mueller@huawei.com> Co-authored-by: Martin Winter <martin.winter@huawei.com>
…#17761) Signed-off-by: Martin Winter <martin.winter@huawei.com> Co-authored-by: Martin Sattlecker <martin.sattlecker@huawei.com> Co-authored-by: Joerg H. Mueller <joerg.mueller@huawei.com> Co-authored-by: Martin Winter <martin.winter@huawei.com>
…le (o3de#17761)" This reverts commit d9b7dbe.
Signed-off-by: Martin Winter <martin.winter@huawei.com> Co-authored-by: Martin Sattlecker <martin.sattlecker@huawei.com> Co-authored-by: Joerg H. Mueller <joerg.mueller@huawei.com> Co-authored-by: Martin Winter <martin.winter@huawei.com>
What does this PR do?
With this PR the Vulkan RHI backend uses timeline semaphores instead of binary semaphores to synchronize execution across queue. Binary semaphores are still used for swapchains.
The synchronization still supports binary semaphores for synchronization if timeline semaphores are not available for a device.
What does the PR actually do?
SemaphoreTracker
class that counts how many semaphores are on the framegraph before a swapchain, and how many semaphores have already been submitted to the queue. We don't track which specific semaphores have been signaled, just how many semaphores a swapchain depends on have been signaled.FenceTracker
class. The user of a Fence is responsible to call theFenceTracker::SignalUserSemaphore
when calling theFence::SignalOnCpu
function.Why do we need this?
In a previous PR (#17569) we introduced timeline semaphores for user defined Fences. We found that signalling these from the CPU lead to a deadlock somewhere in the graphics driver/kernel, which is explained by the Vulkan specification linked above.
Signalling Fences from the CPU will be used in MultiGPU contexts. See this RFC
In addition to fixing the problem above, this may also lead to a slight performance gain, when semaphores are used for syncing between queues. The queue submits no longer need to wait for each other, except for the swapchain.
How was this PR tested?
Tested with multiple levels and atom sample viewer samples (async compute, multi pipeline, multi scene, mesh) on Windows with Vulkan.
I cannot test this on a machine without timeline semaphore support because I don't have access to hardware. I did deactivate the
m_signalFenceFromCPU
manually however to test this codepath.This was also tested with the MultiGPU RPI sample in atom sampler viewer in this branch: https://github.com/jhmueller-huawei/o3de/tree/multi-device-copy-pass-test