-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kokkos::sort defaults to std::sort on host #1208
Comments
Could use GNU parallel sort. It's very fast and hits their OpenMP runtime. Maybe specialise by compiler? |
@nmhamster How new of a GCC does that require? |
In other projects, I’ve had good success with the Intel PSS Mark listed. |
@mhoemmen - I think its fairly old. Our experience was that this was pretty good for performance. I'm fairly sure it was at least in GCC 4.7 and we are now moving beyond that right? |
@etphipp wrote:
It has the usual modified BSD license, from what I can tell. Which version did you use? I'm looking at the OpenMP tasks version now. Would using raw OpenMP tasks break Kokkos? The code does not set the number of threads, etc.. @nmhamster wrote:
Minimum GCC version for us is 4.9.3. Thanks! |
OK, let me at this. I'll first plug in GCC's parallel sort if available. It uses OpenMP, so this should be conditional on the OpenMP execution space. |
I've used the PSS code as well, and have a cleaned up version of it here: https://github.com/ibaned/omega_h/tree/master/src/intel_sort I use that for OpenMP, |
If you could do a performance comparison of GCC's sort and the PSS code, I'd be really interested in the results. |
@ibaned Thanks! :-D I plan to plug in the "Technical Specification for C++ Extensions for Parallelism" too / first. Do you have experience with those? |
no... I assume thats closely related to the GCC parallel sort, but I've never tried calling those (needed something I was sure was on every compiler). |
The only issue with the C++ TS is that it does not promise whether it uses OpenMP or C++ threads, so perhaps I shouldn't plug that in until I learn more. GCC's |
I'm using the same logic as Dan in my tensor code that needs sorting (Thrust for Cuda, PSS for OpenMP, ...): https://gitlab.com/tensors/genten/blob/master/src/Genten_Sptensor_perm.cpp I've generally found Thrust to be faster than Kokkos' sort. I have not tried to use GNU. I have the Intel PSS included in that library (I believe I actually got the code from Dan's library): https://gitlab.com/tensors/genten/blob/master/src/parallel_stable_sort.hpp And there was no issue including this in our library that was released under BSD. |
@etphipp thanks! I think what I would like to do, is plug in all the sorts I can get and try them out. The challenge is figuring out good benchmarks. We'll also need a "sort array x and apply permutation to array y" function (what Tpetra calls "sort2") at least. Jon Clausen tried |
Two general comments:
In terms of bechmarks, one case that I think both my code and Tpetra care about is:
Good data set sizes for both would be Extracting the permutation is important to me too, I reuse this permutation on a dozen different arrays after the sort. |
I'm currently blocked by this issue: #1212 I'll figure that one out :-) |
In terms of Trilinos development, I can work around #1212. In fact, I already did: trilinos/Trilinos#1946 (is closed). However, it would be nice to have #1212 fixed (it requires someone to approve the PR #1213). |
PR #1213 got approved. Thus, this issue is no longer blocked. |
I have a patch ready for this, that calls |
GCC, Clang, and Intel all provide |
This is represented by pull request #1226, which we'll look at during the next promotion cycle (February) |
@ibaned I haven't had a chance to test on all supported platforms yet, so I'm glad y'all are waiting :-) . |
Fix #1208 by using __gnu_parallel::sort if available
Changing assignee to @crtrott since he ended up doing all the testing |
Thanks @ibaned ! |
Revert "Fix #1208 by using __gnu_parallel::sort if available"
Thus, no OpenMP parallelism. Options to fix:
The text was updated successfully, but these errors were encountered: