Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop volatile from reduce/scan join() routines #4931

Merged
merged 1 commit into from
Apr 6, 2022
Merged

Drop volatile from reduce/scan join() routines #4931

merged 1 commit into from
Apr 6, 2022

Conversation

PhilMiller
Copy link
Contributor

@PhilMiller PhilMiller commented Apr 1, 2022

Following the fixes for data races described in #4855, it appears that the volatile qualifiers may no longer be necessary at all.

There may be other fixes for lurking data races in the CUDA reduction implementation that will still need to be applied:

This is a squash of everything in #4901, without the actual volatile_preload trick that I thought was necessary.

Other uses of volatile that may be cleanable following this change are listed here:
https://gist.github.com/PhilMiller/575baac87d1965a7bdcb98f812a23dd9

Fixes #4077, #1554

@PhilMiller
Copy link
Contributor Author

@dalg24 Are you that confident in the tests, or do you really think the CUDA reduce/scan code is solid with the added synchronization?

@dalg24
Copy link
Member

dalg24 commented Apr 2, 2022

This is the direction we want to go. I trust the tooling and the tests until proven wrong. I'd rather merge early in the release cycle than wait.

@PhilMiller
Copy link
Contributor Author

PhilMiller commented Apr 3, 2022

This is the direction we want to go. I the trust the tooling and the tests until proven wrong. I'd rather merge early in the release cycle than wait.

  • Ok, with that in mind, I'll get this commit cleaned up, and we can see what the other senior members of the team think.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with giving it a try (without extra volatile preloads).

…es, and many implementations

The need for `volatile` on join() operations of reducers and reduction
subject types was an accomodation for a quirk of CUDA's non-standard
memory model. After fixes for data races in our CUDA reductions
reported by Nvidia's compute-sanitizer racecheck tool (#4855), tests
seem to pass without maintaining the volatile qualifiers.
@PhilMiller
Copy link
Contributor Author

Nvidia only shipped compute-sanitizer on CUDA 11+. It's possible the compiler has been correspondingly tweaked to be stricter about micro-optimizations that may interact with the memory model. So, only embracing this if/when we require CUDA 11 may be a precaution worth considering.

@crtrott crtrott marked this pull request as ready for review April 6, 2022 03:36
Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just see what happens. @dalg24 if you are ok with it merge it.

@PhilMiller
Copy link
Contributor Author

Well, that seems like unanimous support. If you're all happy with it, I'm not going to stand in the way

@srajama1
Copy link

Folks this is breaking KK for 5 days, can you fix/revert please? High priority PRs are all blocked because testing is broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Developer: Phil Miller
Done in Release 3.7
5 participants