Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPTarget: scratch implementation for parallel_reduce #3776

Merged
merged 7 commits into from
Feb 10, 2021

Conversation

rgayatri23
Copy link
Contributor

Updates in this PR :

  1. scratch implementation for parallel_reduce.
  2. Added scratch unit tests to the list of unit tests.
  3. Edited unit tests to work with the OpenMPTarget backend restrictions.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially copy/paste from code in ParallelFor specialization. Code duplication is unfortunate but I don't necessary have a better idea.

I am unsure how the lock_array would play with streams but I guess we have the same issue with parallel_for

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add FIXMEs in the unit tests
Make sure you add a comment when it is not obvious

@rgayatri23
Copy link
Contributor Author

Essentially copy/paste from code in ParallelFor specialization. Code duplication is unfortunate but I don't necessary have a better idea.
I dont have an idea to avoid the code duplication for this part, but I have plans to reduce it in ParallelFor and ParallelReduce to avoid code duplication for TagType similar to what we did in the MDRange case.

I am unsure how the lock_array would play with streams but I guess we have the same issue with parallel_for
Yes, when streams are implemented by compilers, we will have to modify this implementation.

while (lock_team != 1) {
// Avoid tripping over other team's block index.
// Find a free block index.
iter = (omp_get_team_num() % max_active_teams);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this happen outside the while loop? Is that also wrong in the ParallelFor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also why do we need two while loops. Isn't the only way to escape the inner while loop for lock_team==1 in which case shmem_block_indx != -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the additional loop is not needed. Deleted it.

#pragma omp target teams distribute parallel for is_device_ptr(lock_array)
for (int i = 0; i < max_active_teams; ++i) {
lock_array[i] = 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do this again and again (i probably have overlooked that in the ParallelFor too) I mean we only need to do it when reallocating or (so the get_lock_array could do that). Since otherwise this thing should be back to zero anyway (a kernel will never leave it not being zero unless it crashes).

Copy link
Contributor Author

@rgayatri23 rgayatri23 Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was initializing it in case a previous kernel crashes. But I guess we don't need since if the kernel crashes, then the application would quit anyway.

// Loop as long as a shmem_block_index is not found.
while (shmem_block_index == -1) {
// Loop until a lock can be acquired on a team
while (lock_team != 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@crtrott crtrott merged commit 397dfb6 into kokkos:develop Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants