Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HIP Global Launch with HSA_XNACK=1 #5755

Merged
merged 9 commits into from
Jan 21, 2023

Conversation

Rombur
Copy link
Member

@Rombur Rombur commented Jan 11, 2023

This problem is found with all the versions of ROCm we support.

cc: @arghdos

crtrott and others added 3 commits January 10, 2023 15:34
Looks like there is a race condition on use of the driver argument.
dalg24
dalg24 previously requested changes Jan 11, 2023
core/src/HIP/Kokkos_HIP_Instance.cpp Show resolved Hide resolved
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
const std::size_t size) const {
if (verify_is_initialized("scratch_functor_host") &&
m_scratchFunctorSize < size) {
m_scratchFunctorSize = size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args this is actually wrong. If you first call scratch_functor() and then scratch_functor_host() the first call will reset m_scratchFunctorSize so the second will not do anything ...

@crtrott
Copy link
Member

crtrott commented Jan 11, 2023

Maybe we should just have a function which encapsulates the staging. I.e. you hand that member function of the instant the driver, and it returns device ptr.

@Rombur
Copy link
Member Author

Rombur commented Jan 12, 2023

I have updated the PR

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good besides the name of the member function.

Comment on lines 150 to 152
template <typename DriverType>
Kokkos::HIP::size_type *HIPInternal::scratch_functor(
DriverType const &driver) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably give it a better name now as it does not really capture that this function is copying the driver into a staging area in host pinned memory and returns a pointer to the device where the driver is being copied asynchronously.
Maybe a transfer_to_device_and_get_pointer? Unless someone has a better idea because I am not super proud about that name...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the current name is not great but I am not sure transfer_to_device_and_get_pointer is better. Unfortunately, I don't have a good name for this functions :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stage_functor or stage_functor_for_execution ?

@dalg24 dalg24 dismissed their stale review January 12, 2023 22:40

Last thing to discuss is the function name

@dalg24
Copy link
Member

dalg24 commented Jan 12, 2023 via email

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop the stray header include

core/src/HIP/Kokkos_HIP_Instance.hpp Outdated Show resolved Hide resolved
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
// Without this fix, all the atomic tests fail. It is not obvious that this
// problem is limited to HSA_XNACK=1 even if all the tests pass when
// HSA_XNACK=0. That's why we always copy the driver.
KOKKOS_IMPL_HIP_SAFE_CALL(hipStreamSynchronize(m_stream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add a note in the comment that the hipMemcpyAsync below should guard from anyone overwriting the device scratchFunctor while the current kernel is executing. My read is that this synchronization actually functions by guarding anyone from copying to the scratchFunctorHost while is still possibly needed for the hipMemcpyAsync.

I am not sure I see a better way to ensure this than the streamSync call here, but it might be useful context if we ever have to monkey with this again.

Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
@crtrott crtrott merged commit c304818 into kokkos:develop Jan 21, 2023
@dalg24
Copy link
Member

dalg24 commented Jan 21, 2023

Failure was clearly unrelated (no space left on device) and both HIP builds passed

@Rombur Rombur deleted the hip-fix-global-launch branch October 2, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants