Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bounds errors with Kokkos::sort #4980

Merged
merged 7 commits into from
May 4, 2022
Merged

Conversation

kmorel
Copy link
Contributor

@kmorel kmorel commented Apr 26, 2022

We have noticed that Kokkos::sort sometimes fails in HIP. This
adds a test that demonstrates this bug. When I compile Kokkos
on the crusher platform at OLCF using the ROCm 5.1.0 module, this
test fails. This can be used to determine what is going wrong
with sort and provide a regression test for it.

Note that this PR does not fix anything. Rather, this just
demonstrates the problem reported in #4978. I provide this to
the Kokkos developers so they have a simple way to replicate the
problem in #4978.

@dalg24-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

We have noticed that `Kokkos::sort` sometimes fails in HIP. This
adds a test that demonstrates this bug. When I compile Kokkos
on the crusher platform at OLCF using the ROCm 5.1.0 module, this
test fails. This can be used to determine what is going wrong
with sort and provide a regression test for it.
@dalg24
Copy link
Member

dalg24 commented Apr 28, 2022

OK to test

@dalg24 dalg24 marked this pull request as ready for review April 28, 2022 13:50
@dalg24 dalg24 requested review from Rombur and crtrott April 28, 2022 13:50
@dalg24
Copy link
Member

dalg24 commented Apr 28, 2022

I casted everything to double in the bin calculation in BinOp1D. I changed the type of the data members to reduce the number of conversions. I updated BinOp3D as well but didn't actually add a test for that guy.

@kmorel kmorel changed the title Demonstrate an error with Kokkos::sort Fix bounds errors with Kokkos::sort Apr 28, 2022
Copy link
Contributor Author

@kmorel kmorel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can verify that the changes fix the issue I was originally having when I submitted #4978.

That said, I noticed that some of the regression tests are failing in the PR checks.

@dalg24
Copy link
Member

dalg24 commented Apr 28, 2022

There is a NaN conversion to int in with double...

@dalg24
Copy link
Member

dalg24 commented Apr 28, 2022

Failure (issue with NVIDIA GPG repo key on the Clang+CUDA build) is unrelated.
https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/

This is ready.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

// For integral types the number of bins may be larger than the range
// in which case we can exactly have one unique value per bin
// and then don't need to sort bins.
if (std::is_integral<typename KeyViewType::const_value_type>::value &&
static_cast<uint64_t>(range_) <= static_cast<uint64_t>(max_bins__)) {
(static_cast<double>(max) - static_cast<double>(min)) <=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(static_cast<double>(max) - static_cast<double>(min)) <=
(static_cast<double>(max) - min_) <=

@crtrott crtrott merged commit 4a4fed2 into kokkos:develop May 4, 2022
@kmorel kmorel deleted the demo-sort-bug branch May 4, 2022 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants