Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#5641: Fix HIP & CUDA MDRange reduce for sizeof(value_type) < sizeof(int) #5745

Merged
merged 10 commits into from Mar 11, 2023

Conversation

PhilMiller
Copy link
Member

@PhilMiller PhilMiller commented Jan 11, 2023

Fixes #5641

@PhilMiller
Copy link
Member Author

I'm going to temporarily push a test change to confirm that it fails the HIP build, and then force-push it back.

@PhilMiller
Copy link
Member Author

HIP builds with just the assertion but no added test triggering it passed:
https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/11627/pipeline/50#step-231-log-2274

@PhilMiller
Copy link
Member Author

The added assertion does trigger with a test modified to call an offending parallel_reduce
https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/11630/pipeline#step-65-log-901

However, it also reminds me that the guardrail may be necessary for all possible memory space template arguments of LAnd

@PhilMiller PhilMiller force-pushed the 5641-hip-land-bool-guard branch 3 times, most recently from 9e40d39 to 45e33d9 Compare January 11, 2023 02:08
@PhilMiller PhilMiller marked this pull request as ready for review January 11, 2023 02:08
@PhilMiller
Copy link
Member Author

Since I was typo-fixing anyway, I ran ispell over all comments and strings in that file. Everything else is fine.

@crtrott
Copy link
Member

crtrott commented Jan 11, 2023

Why only for LAnd? I bet all other reductions would fail too for bool? (Sum, BAnd, LOr, etc. )

@PhilMiller
Copy link
Member Author

I've only guarded against what's been reported to fail.

If you can point me at a workable set of modules to test things on kokkos-dev-2 or wherever, I can try it out.

@PhilMiller PhilMiller added this to In Review in Developer: Phil Miller Jan 11, 2023
masterleinad
masterleinad previously approved these changes Jan 31, 2023
@PhilMiller
Copy link
Member Author

Why only for LAnd? I bet all other reductions would fail too for bool? (Sum, BAnd, LOr, etc. )

It looks like Kokkos::reduction_identity<bool> only defines lor() and land(), so none of the others are possible to meaningfully test as such.

That said, I've reproduced the (a?) failure for LOr as well, again with MDRange. I'm going to guard off any use value_type == bool for HIP MDRange.

@PhilMiller
Copy link
Member Author

I just tested with char, unsigned char, signed char, int8_t, uint8_t, int16_t, and uint16_t, and those all fail as well. Going up to 32 bits passes across a sweep of MDRange sizes.

That suggests the bug is in some sort of sizing or alignment.

@PhilMiller PhilMiller dismissed masterleinad’s stale review March 1, 2023 02:15

Scope has broadened substantially

@PhilMiller PhilMiller force-pushed the 5641-hip-land-bool-guard branch 2 times, most recently from 632ba08 to e6543da Compare March 1, 2023 06:26
@PhilMiller PhilMiller changed the title #5641: Add a guardrail against compiling the failing case #5641: Fix HIP MDRange reduce for sizeof(value_type) < sizeof(int) Mar 1, 2023
@PhilMiller PhilMiller marked this pull request as draft March 1, 2023 06:27
@PhilMiller
Copy link
Member Author

I adapted the changes in #5333 from TeamPolicy to MDRange and found that they fixed the failing tests across the range of types smaller than int.

I'm committing the sloppy mess now because it's late and I want to get it out. I'll make tests for Team reductions and move them to a sensible place and maybe fix them tomorrow.

@PhilMiller
Copy link
Member Author

Test passes on Serial but fails on HIP before the fix. Test passes with the fix.

Will clang-format, then this is ready.

@PhilMiller
Copy link
Member Author

LOL, OpenMPTarget fails this test too!

@PhilMiller
Copy link
Member Author

Per @rgayatri23 I'm going to skip the test of OpenMPTarget for now.

@PhilMiller
Copy link
Member Author

Rebased on develop with Github CI (hopefully) fixed

@dalg24 dalg24 merged commit ee75763 into kokkos:develop Mar 11, 2023
Developer: Phil Miller automation moved this from In Review to Pending Release 4.1 Mar 11, 2023
@PhilMiller PhilMiller deleted the 5641-hip-land-bool-guard branch March 11, 2023 17:45
@PhilMiller PhilMiller mentioned this pull request Mar 29, 2023
@PhilMiller PhilMiller added this to the Release 4.1 milestone May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Developer: Phil Miller
Pending Release 4.1
Development

Successfully merging this pull request may close these issues.

Parallel reducer, MDRangePolicy and HIP or CUDA: doesn't work with bool, char, int16_t
6 participants