Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential data race in Cuda parallel reduce #6236

Merged

Conversation

tcclevenger
Copy link
Contributor

See comment #6217 (comment). Suggested in #4855 (comment) as being a potential issue, I think this is an actual issue. At the very least we avoid doing computation that is unnecessary, and it clears a warning when running compute-sanitizer --tool=racecheck (see #6217).

@PhilMiller PhilMiller requested a review from crtrott June 23, 2023 00:02
Copy link
Member

@Rombur Rombur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need the same fix for HIP

@PhilMiller
Copy link
Contributor

We probably need the same fix for HIP

@Rombur In which case, the fixes from #4855 would likely be applicable as well.

@masterleinad
Copy link
Contributor

Have you measured if you see any impact in performance for, say https://github.com/kokkos/kokkos-tutorials/tree/main/Exercises/04/Solution?

@tcclevenger
Copy link
Contributor Author

Have you measured if you see any impact in performance for, say https://github.com/kokkos/kokkos-tutorials/tree/main/Exercises/04/Solution?

Yes, the differences were negligible (<2% relative increase/decrease between this branch and develop commit it is based on, various problem sizes, launching p_for/reduce kernels).

Copy link
Contributor

@cz4rs cz4rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

We probably need the same fix for HIP

I propose we merge this as is and fix HIP with a separate issue / PR.

@PhilMiller PhilMiller changed the title Potential race condition in Cuda parallel reduce Potential data race in Cuda parallel reduce Aug 9, 2023
@PhilMiller
Copy link
Contributor

Retest this please

@PhilMiller
Copy link
Contributor

The previous Jenkins failure was long enough ago that the logs have been discarded

@PhilMiller
Copy link
Contributor

Looks good to me 👍

We probably need the same fix for HIP

I propose we merge this as is and fix HIP with a separate issue / PR.

#6348

@crtrott
Copy link
Member

crtrott commented Aug 10, 2023

Retest this please.

@PhilMiller
Copy link
Contributor

CUDA builds all passed. None of the pending Jenkins builds is a CUDA backend, so this should be safe to merge.

@crtrott crtrott merged commit 4222b84 into kokkos:develop Aug 23, 2023
27 of 28 checks passed
@dalg24 dalg24 mentioned this pull request Aug 23, 2023
@tcclevenger tcclevenger deleted the potential_racecondition_in_cuda_reduce branch August 30, 2023 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants