Drop volatile from reduce/scan join() routines #4931

PhilMiller · 2022-04-01T22:37:17Z

Following the fixes for data races described in #4855, it appears that the volatile qualifiers may no longer be necessary at all.

There may be other fixes for lurking data races in the CUDA reduction implementation that will still need to be applied:

21786c8 a missing synchronization that quiets that last compute-sanitizer report
CUDA Reductions: Fix data races reported by Nvidia compute-sanitizer #4855 (comment) avoidance of creating a potential data race

This is a squash of everything in #4901, without the actual volatile_preload trick that I thought was necessary.

Other uses of volatile that may be cleanable following this change are listed here:
https://gist.github.com/PhilMiller/575baac87d1965a7bdcb98f812a23dd9

Fixes #4077, #1554

PhilMiller · 2022-04-02T07:00:49Z

@dalg24 Are you that confident in the tests, or do you really think the CUDA reduce/scan code is solid with the added synchronization?

dalg24 · 2022-04-02T12:54:11Z

This is the direction we want to go. I trust the tooling and the tests until proven wrong. I'd rather merge early in the release cycle than wait.

PhilMiller · 2022-04-03T20:21:42Z

This is the direction we want to go. I the trust the tooling and the tests until proven wrong. I'd rather merge early in the release cycle than wait.

Ok, with that in mind, I'll get this commit cleaned up, and we can see what the other senior members of the team think.

masterleinad

I'm on board with giving it a try (without extra volatile preloads).

…es, and many implementations The need for `volatile` on join() operations of reducers and reduction subject types was an accomodation for a quirk of CUDA's non-standard memory model. After fixes for data races in our CUDA reductions reported by Nvidia's compute-sanitizer racecheck tool (#4855), tests seem to pass without maintaining the volatile qualifiers.

PhilMiller · 2022-04-05T18:03:55Z

Nvidia only shipped compute-sanitizer on CUDA 11+. It's possible the compiler has been correspondingly tweaked to be stricter about micro-optimizations that may interact with the memory model. So, only embracing this if/when we require CUDA 11 may be a precaution worth considering.

crtrott

Lets just see what happens. @dalg24 if you are ok with it merge it.

PhilMiller · 2022-04-06T05:02:09Z

Well, that seems like unanimous support. If you're all happy with it, I'm not going to stand in the way

srajama1 · 2022-04-11T23:07:56Z

Folks this is breaking KK for 5 days, can you fix/revert please? High priority PRs are all blocked because testing is broken.

PhilMiller mentioned this pull request Apr 1, 2022

Obviate join(volatile) in reducers by preloading values #4901

Closed

PhilMiller requested review from crtrott, cz4rs, dalg24 and masterleinad April 2, 2022 00:05

dalg24 approved these changes Apr 2, 2022

View reviewed changes

This was linked to issues Apr 2, 2022

Remove need for volatile overloads for types in Kokkos reductions #4077

Closed

Remove volatile requirement for reduction value types #1554

Closed

masterleinad approved these changes Apr 4, 2022

View reviewed changes

crtrott marked this pull request as ready for review April 6, 2022 03:36

crtrott approved these changes Apr 6, 2022

View reviewed changes

dalg24 merged commit 335468d into kokkos:develop Apr 6, 2022

fnrizzi mentioned this pull request Apr 6, 2022

algorithms: remove volatile from unit tests #4939

Merged

ndellingwood mentioned this pull request Apr 6, 2022

Changes to drop volatile lead to kokkos-kernels nightly test failures with OpenMP, Cuda backends on Volta70 #4941

Closed

dalg24 mentioned this pull request Apr 8, 2022

Detect join member function with volatile-qualified arguments #4951

Merged

PhilMiller added this to Done, pending release in Developer: Phil Miller Apr 12, 2022

This was referenced Apr 12, 2022

Remove need for volatile overloads for types in Kokkos reductions #4077

Closed

Remove volatile requirement for reduction value types #1554

Closed

PhilMiller mentioned this pull request Apr 21, 2022

Investigate possible race condition in kokkos::parallel_reduce #4970

Closed

nmm0 mentioned this pull request Jun 13, 2022

Parallel reduce example uses volatile kokkos/kokkos-core-wiki#44

Closed

dalg24 mentioned this pull request Aug 3, 2022

Volatile and C++17/20/23 #5306

Closed

PhilMiller mentioned this pull request Aug 31, 2022

Reflect deprecation of need for volatile in reducer-interfacing bits kokkos/kokkos-core-wiki#151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop volatile from reduce/scan join() routines #4931

Drop volatile from reduce/scan join() routines #4931

PhilMiller commented Apr 1, 2022 •

edited

PhilMiller commented Apr 2, 2022

dalg24 commented Apr 2, 2022 •

edited

PhilMiller commented Apr 3, 2022 •

edited

masterleinad left a comment

PhilMiller commented Apr 5, 2022

crtrott left a comment

PhilMiller commented Apr 6, 2022

srajama1 commented Apr 11, 2022

Drop volatile from reduce/scan join() routines #4931

Drop volatile from reduce/scan join() routines #4931

Conversation

PhilMiller commented Apr 1, 2022 • edited

PhilMiller commented Apr 2, 2022

dalg24 commented Apr 2, 2022 • edited

PhilMiller commented Apr 3, 2022 • edited

masterleinad left a comment

Choose a reason for hiding this comment

PhilMiller commented Apr 5, 2022

crtrott left a comment

Choose a reason for hiding this comment

PhilMiller commented Apr 6, 2022

srajama1 commented Apr 11, 2022

PhilMiller commented Apr 1, 2022 •

edited

dalg24 commented Apr 2, 2022 •

edited

PhilMiller commented Apr 3, 2022 •

edited