Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results from TeamThreadRange reduction #1905

Closed
Char-Aznable opened this issue Nov 15, 2018 · 10 comments
Closed

Inconsistent results from TeamThreadRange reduction #1905

Char-Aznable opened this issue Nov 15, 2018 · 10 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)

Comments

@Char-Aznable
Copy link
Contributor

Why do the following 2 slightly different usages of TeamThreadRange reduction give different results:

//some array to contribute to the sum
Kokkos::DualView<T[nTeams][teamSize]> v("v")
//some DualView to store the results
Kokkos::DualView<T[nTeams]> sum("sum")
...
parallel_for(policy, KOKKOS_LAMBDA(const auto& team) {
  int s = 0;
  parallel_reduce(TeamThreadRange(team,teamSize),[&](const auto& i, auto& update) {
     update += v.d_view(team.league_rank(),i);
  },s);
  team.team_barrier();
  Kokkos::single(Kokkos::PerTeam(team),[&](){
      sum.d_view(team.league_rank()) = s;
  });
});
...

vs.

//some array to contribute to the sum
Kokkos::DualView<T[nTeams][teamSize]> v("v")
//some DualView to store the results
Kokkos::DualView<T[nTeams]> sum("sum")
...
parallel_for(policy, KOKKOS_LAMBDA(const auto& team) {
   parallel_reduce(TeamThreadRange(team,teamSize),[&](const auto& i, auto& update) {
     update += v.d_view(team.league_rank(),i);
  },sum.d_view(team.league_rank()));
  team.team_barrier();
});
...

The first case gives the correct answer but the second case gives gibberish. What makes the difference here?

Here's a complete example to try it out:

#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>
#include <Kokkos_Random.hpp>

template < class DualView >
void syncToHost(DualView&& data) {
  using MemorySpace = typename std::decay_t<DualView>::memory_space;
  using HostMirrorSpace = typename std::decay_t<DualView>::host_mirror_space;
  std::forward<DualView>(data).template modify<MemorySpace>();
  std::forward<DualView>(data).template sync<HostMirrorSpace>();
}

template < class DualView >
void syncToDevice(DualView&& data) {
  using MemorySpace = typename std::decay_t<DualView>::memory_space;
  using HostMirrorSpace = typename std::decay_t<DualView>::host_mirror_space;
  std::forward<DualView>(data).template modify<HostMirrorSpace>();
  std::forward<DualView>(data).template sync<MemorySpace>();
}

using ExecutionSpace = Kokkos::DefaultExecutionSpace;
using namespace std;

static constexpr std::size_t nTeams = 8;
static constexpr std::size_t teamSize = 32;

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv); {
  Kokkos::DualView<std::size_t[nTeams][teamSize],ExecutionSpace> v("v");
  Kokkos::DualView<std::size_t[nTeams],ExecutionSpace> sum("sum");

  for(std::size_t i = 0; i < nTeams; ++i) {
    for(std::size_t j = 0; j < teamSize; ++j) {
      v.h_view(i,j) = 1;
    }
  }
  syncToDevice(v);

  auto policy = Kokkos::TeamPolicy<ExecutionSpace>(nTeams,teamSize);
  Kokkos::parallel_for(policy,KOKKOS_LAMBDA (const auto& team) {
    const auto nT = teamSize;
    const auto iTeam = team.league_rank();
    //std::size_t s = 0;
    Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,nT),
      [&](const auto& iT, auto& update) {
      update += v.d_view(iTeam,iT);
    },sum.d_view(iTeam));
    team.team_barrier();
    //sum.d_view(iTeam) = s;
  });
  Kokkos::fence();
  syncToHost(sum);

  for(std::size_t i = 0; i < nTeams; ++i) {
    printf("%10lu%10lu\n",i,sum.h_view(i));
  }

  } Kokkos::finalize();
}
@mhoemmen
Copy link
Contributor

This isn't relevant to your question, but just curious: What's with the rvalue references && on DualView? If you made it an lvalue reference &, you could skip the std::forward thing.

@Char-Aznable
Copy link
Contributor Author

Char-Aznable commented Nov 16, 2018

@mhoemmen It's a good point. I did it because I didn't know enough what the modify and sync function do (what if, e.g., they are ref qualified) and I was trying to keep it generic. Then I simply copy the snippet all over the place.

@mhoemmen
Copy link
Contributor

You definitely don't need the && there. Plain old & suffices.

@jeffmiles63
Copy link
Contributor

You cannot update sum.d_view(team.league_rank() directly within a team parallel_for because all of the threads in the team will be live and will access sum.d_view at some point. The Kokkos::Single is necessary to limit access to sum.d_view to only once per team. See pages 132- 136 in the KokkosTutorial_ORNL18.pdf for more info.

@Char-Aznable
Copy link
Contributor Author

@jeffmiles63 Thanks for the reply. I think Kokkos::Single isn't strictly necessary here because s is the same for all threads after reduction and I believe there's at least one thread will succeed in writing the correct answer to sum.d_view(team.league_rank()). I tried it myself that:

//some array to contribute to the sum
Kokkos::DualView<T[nTeams][teamSize]> v("v")
//some DualView to store the results
Kokkos::DualView<T[nTeams]> sum("sum")
...
parallel_for(policy, KOKKOS_LAMBDA(const auto& team) {
  int s = 0;
  parallel_reduce(TeamThreadRange(team,teamSize),[&](const auto& i, auto& update) {
     update += v.d_view(team.league_rank(),i);
  },s);
  team.team_barrier();
  sum.d_view(team.league_rank()) = s;
});
...

will give the same answer too.

I think something inside parallel_reduce expects the 3rd argument to be a thread-local variable but not something in global memory.

@crtrott
Copy link
Member

crtrott commented Nov 22, 2018

This is weird. I would have thought that this should all give the same answer as long as T is word size or smaller.

@crtrott
Copy link
Member

crtrott commented Nov 22, 2018

Got an even simpler reproducer .. . Only fails with OpenMP. Serial, Pthreads and CUDA are fine.

@crtrott crtrott added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Nov 22, 2018
@crtrott
Copy link
Member

crtrott commented Nov 22, 2018

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv);
  {
     int N = (argc>1) ? atoi(argv[1]) : 10;
     int M = (argc>2) ? atoi(argv[2]) : 64;
     int R = (argc>3) ? atoi(argv[3]) : 8;

     Kokkos::View<double*> results1("r1",N);
     Kokkos::View<double*> results2("r2",N);
     Kokkos::View<double**> data("d",N,M);
     Kokkos::deep_copy(data,1);
     Kokkos::parallel_for(Kokkos::TeamPolicy<>(N,R), KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& team) {
       const int i = team.league_rank();
       Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,M), [&] (const int j, double& update) {
         update += data(i,j) + 1000*i + j;
       },results1(i));
       team.team_barrier();
       double s;
       Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,M), [&] (const int j, double& update) {
         update += data(i,j) + 1000*i + j;
       },s);
       results2(i) = s;
     });

     Kokkos::parallel_for(N, KOKKOS_LAMBDA (const int i) {
       printf("%i %lf %lf %lf\n",i,results1(i),results2(i),1.0*(M*(1000*i+1)+M*(M-1)/2));
     });
  }
  Kokkos::finalize();
}

@ndellingwood ndellingwood added this to the 2019 April milestone Feb 7, 2019
@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Apr 7, 2019
@crtrott
Copy link
Member

crtrott commented Apr 8, 2019

Root cause is that the OpenMP backend didn't produce a local variable for intermediate results but simply used the argument the user provided. I.e. the different threads would stomp over each other.

@crtrott
Copy link
Member

crtrott commented Apr 8, 2019

Pull request issued #2079

crtrott added a commit that referenced this issue Apr 8, 2019
My fix for OpenMP broke it for Serial ...
@crtrott crtrott self-assigned this Apr 8, 2019
@ndellingwood ndellingwood added InDevelop and removed Blocks Promotion Overview issue for release-blocking bugs labels Apr 9, 2019
crtrott added a commit that referenced this issue Apr 9, 2019
crtrott added a commit that referenced this issue Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

5 participants