Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HIP] Lock access to scratch memory when using Teams #3916

Merged
merged 1 commit into from
Apr 7, 2021

Conversation

Rombur
Copy link
Member

@Rombur Rombur commented Apr 1, 2021

There is potentially a problem if multiple threads launch a parallel_for or a parallel_reduce kernel on the same stream and use Teams. The parallel_for and the parallel_reduce may tried to reallocate the scratch memory and being used somewhere else. The current PR uses a mutex to ensure that only one Team parallel_for or parallel_reduce is running for a given instance. I am open to suggestion, if someone has a better solution.

Note that CUDA has the same problem.

@@ -433,6 +433,9 @@ class ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>,
int m_shmem_size;
void* m_scratch_ptr[2];
int m_scratch_size[2];
// Only let one ParallelFor/Reduce modify the team scratch memory. The
// constructor acquires the mutex which is released in the destructor.
std::unique_lock<std::mutex> m_scratch_lock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::lock_guard is non-copyable. This means you implicitly deleted the copy constructor and copy assignment. Was it intentional? Did you make sure it plays well with the kernel launching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means you implicitly deleted the copy constructor and copy assignment.

True at the same time if you use the copy constructor currently, you are copying pointers without copying their data... Also I don't see what's your use case for that. Also I am using std::unique_lock instead of std::lock_guard. Unlike std::lock_guard, std::unique_lock is movable.

Did you make sure it plays well with the kernel launching?

I am not sure what you mean by that but all the tests pass on Tulip and on the CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed it was std::unique_lock and not std::lock_guard. I can't remember how we pass the driver around in the kernel launching. This is not a trick question I was just curious if you considered it.

@dalg24 dalg24 requested review from dhollman and crtrott April 1, 2021 19:09
Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me! It would be good if we finally have some tests for the cases we try to cover here, though.

@dalg24 dalg24 merged commit 5bd55d2 into kokkos:develop Apr 7, 2021
@Rombur Rombur deleted the multithreading_2 branch June 8, 2021 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants