Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reductions into device side view #1788

Closed
crtrott opened this issue Sep 7, 2018 · 5 comments
Closed

Reductions into device side view #1788

crtrott opened this issue Sep 7, 2018 · 5 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting

Comments

@crtrott
Copy link
Member

crtrott commented Sep 7, 2018

This doesn't work right now, but we should make this work. Would also allow asynchronous reductions.

Test case:

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv);
  {
     int N = (argc>1) ? atoi(argv[1]) : 10000;
     int M = (argc>2) ? atoi(argv[2]) : 10000;
     int R = (argc>3) ? atoi(argv[3]) : 10;

     Kokkos::View<int64_t> result("Result");
     int64_t reducer_result, view_result, scalar_result;

     Kokkos::Timer timer;

     // Test Reducer (this segfaults for CUDA)
     Kokkos::parallel_reduce(N, KOKKOS_LAMBDA (const int& i, int64_t& lsum) {
       lsum += 1;
     },Kokkos::Sum<int64_t,Kokkos::DefaultExecutionSpace>(result));
     double time = timer.seconds();
     // Check whether it was asyncronous
     timer.reset();
     Kokkos::fence();
     double time_fence = timer.seconds();
     Kokkos::deep_copy(reducer_result,result);
     timer.reset();

     // Test View (this segfaults for CUDA)
     Kokkos::parallel_reduce(N, KOKKOS_LAMBDA (const int& i, int64_t& lsum) {
       lsum += 1;
     },result);
     double time = timer.seconds();
     // Check whether it was asyncronous
     timer.reset();
     Kokkos::fence();
     double time_fence = timer.seconds();
     Kokkos::deep_copy(view_result,result);
     timer.reset();

     // Test Scalar
     Kokkos::parallel_reduce(N, KOKKOS_LAMBDA (const int& i, int64_t& lsum) {
       lsum += 1;
     },scalar_result);
     double time2 = timer.seconds();

     // Check whether it was asyncronous
     timer.reset();
     Kokkos::fence();
     double time_fence2 = timer.seconds();


     printf("%e %e %e %e || %li %li\n",time,time_fence,time2,time_fence2,d_result,s_result);
  }
  Kokkos::finalize();
}
@crtrott crtrott self-assigned this Sep 7, 2018
@crtrott crtrott added the Enhancement Improve existing capability; will potentially require voting label Sep 7, 2018
@crtrott
Copy link
Member Author

crtrott commented Sep 7, 2018

Both the View based and the Reducer based reductions segfault.

@dhollman
Copy link

This feels like it should be marked "defect" and not "enhancement"; I'm not sure I understand why this doesn't/isn't supposed to work now.

Also, the fence() feels wasteful here — we're fencing on all of memory to get a single scalar? Is this something we could/should do better with some sort of thin wrapper to streams?

For instance, something like

auto stream_reducer = Kokkos::Sum<int64_t,Kokkos::DefaultStreamExecutionSpace>();
Kokkos::parallel_reduce(N, KOKKOS_LAMBDA (const int& i, int64_t& lsum) {
  lsum += 1;
},stream_reducer);
// wraps cudaStreamAddCallback:
stream_reducer.when_ready([&](int64_t& result) { reducer_result = result; });
// wraps cudaStreamSynchronize:
Kokkos::fence(stream);

@crtrott
Copy link
Member Author

crtrott commented Sep 19, 2018

No the fence is not meant for memory. The fence is meant for timing purposes to figure out whether the call was asynchronous.

@crtrott
Copy link
Member Author

crtrott commented Sep 19, 2018

I marked it enhancement because we

  • never explicitly promised this works
  • it never worked
  • now we will make this work

crtrott added a commit that referenced this issue Sep 30, 2018
@crtrott
Copy link
Member Author

crtrott commented Sep 30, 2018

Pull request #1825

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

3 participants