-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HIP Global Launch with HSA_XNACK=1 #5755
Conversation
Looks like there is a race condition on use of the driver argument.
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
core/src/HIP/Kokkos_HIP_Instance.cpp
Outdated
const std::size_t size) const { | ||
if (verify_is_initialized("scratch_functor_host") && | ||
m_scratchFunctorSize < size) { | ||
m_scratchFunctorSize = size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args this is actually wrong. If you first call scratch_functor()
and then scratch_functor_host()
the first call will reset m_scratchFunctorSize so the second will not do anything ...
Maybe we should just have a function which encapsulates the staging. I.e. you hand that member function of the instant the driver, and it returns device ptr. |
I have updated the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good besides the name of the member function.
core/src/HIP/Kokkos_HIP_Instance.hpp
Outdated
template <typename DriverType> | ||
Kokkos::HIP::size_type *HIPInternal::scratch_functor( | ||
DriverType const &driver) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably give it a better name now as it does not really capture that this function is copying the driver into a staging area in host pinned memory and returns a pointer to the device where the driver is being copied asynchronously.
Maybe a transfer_to_device_and_get_pointer
? Unless someone has a better idea because I am not super proud about that name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the current name is not great but I am not sure transfer_to_device_and_get_pointer
is better. Unfortunately, I don't have a good name for this functions :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stage_functor
or stage_functor_for_execution
?
Last thing to discuss is the function name
Sorry just realized making the function a template is not a good idea.
Please take void* and size instead. It will save on number of instantiations
…On Thu, Jan 12, 2023 at 6:28 PM Bruno Turcksin ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In core/src/HIP/Kokkos_HIP_Instance.hpp
<#5755 (comment)>:
> +template <typename DriverType>
+Kokkos::HIP::size_type *HIPInternal::scratch_functor(
+ DriverType const &driver) const {
I agree that the current name is not great but I am not sure
transfer_to_device_and_get_pointer is better. Unfortunately, I don't have
a good name for this functions :-/
—
Reply to this email directly, view it on GitHub
<#5755 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACIQER4BQ3RO4XYKP4AC4LWSCHSPANCNFSM6AAAAAATYQB3GU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop the stray header include
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
// Without this fix, all the atomic tests fail. It is not obvious that this | ||
// problem is limited to HSA_XNACK=1 even if all the tests pass when | ||
// HSA_XNACK=0. That's why we always copy the driver. | ||
KOKKOS_IMPL_HIP_SAFE_CALL(hipStreamSynchronize(m_stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add a note in the comment that the hipMemcpyAsync
below should guard from anyone overwriting the device scratchFunctor
while the current kernel is executing. My read is that this synchronization actually functions by guarding anyone from copying to the scratchFunctorHost
while is still possibly needed for the hipMemcpyAsync
.
I am not sure I see a better way to ensure this than the streamSync
call here, but it might be useful context if we ever have to monkey with this again.
Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
Failure was clearly unrelated (no space left on device) and both HIP builds passed |
This problem is found with all the versions of ROCm we support.
cc: @arghdos