Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared Memory Allocation Error at parallel_reduce #311

Closed
mndevec opened this issue Jun 1, 2016 · 5 comments
Closed

Shared Memory Allocation Error at parallel_reduce #311

mndevec opened this issue Jun 1, 2016 · 5 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Milestone

Comments

@mndevec
Copy link

mndevec commented Jun 1, 2016

Team scratch memory allocations fail with parallel_reduce. Below is a simple example, where memory allocation works for parallel_for but fails for parallel_reduce. Also, for some reason I get a compile error if I uncomment the below line
Kokkos::parallel_reduce( team_policy_t(1, 8, 32), myfunc1(), reduction);
in the code.

#include <stdio.h>
#include <Kokkos_Core.hpp>
typedef typename Kokkos::Cuda MyMemorySpace;
typedef typename Kokkos::Cuda MyExecSpace;
typedef Kokkos::TeamPolicy<MyExecSpace> team_policy_t ;
typedef typename team_policy_t::member_type team_member_t ;
#define VECTORSIZE 32
#define TEAMSIZE 8
struct myfunc1{
  myfunc1(){}
  KOKKOS_INLINE_FUNCTION
  void operator()(const team_member_t & teamMember) const {
    const int team_size = teamMember.team_size();
    const int team_rank = teamMember.team_rank();
    int thread_id = (teamMember.league_rank()  * team_size + team_rank);
    int *ptr = (int *) ( (teamMember.team_shmem().get_shmem(16384) )) ;
    if (ptr == NULL && thread_id == 0){
      Kokkos::single(Kokkos::PerThread(teamMember),[=] () {
        printf("ptr is Null in parallel for\n");
      });
    }
  }
  KOKKOS_INLINE_FUNCTION
  void operator()(const team_member_t & teamMember, int  &reduction) const {
    const int team_size = teamMember.team_size();
    const int team_rank = teamMember.team_rank();
    int thread_id = (teamMember.league_rank()  * team_size + team_rank);
    int *ptr = (int *) ( (teamMember.team_shmem().get_shmem(16384) )) ;
    reduction++;
    if (ptr == NULL && thread_id == 0){
      Kokkos::single(Kokkos::PerThread(teamMember),[=] () {
        printf("ptr is Null in parallel reduce\n");
      });
    }
  }
  size_t team_shmem_size (int team_size) const {
    return 16384;
  }
};

struct myfunc2{
  myfunc2(){}
  KOKKOS_INLINE_FUNCTION
  void operator()(const team_member_t & teamMember, int  &reduction) const {
    const int team_size = teamMember.team_size();
    const int team_rank = teamMember.team_rank();
    int thread_id = (teamMember.league_rank()  * team_size + team_rank);
    int *ptr = (int *) ( (teamMember.team_shmem().get_shmem(16384) )) ;
    reduction++;
    if (ptr == NULL && thread_id == 0){
      Kokkos::single(Kokkos::PerThread(teamMember),[=] () {
        printf("ptr is Null in parallel reduce\n");
      });
    }
  }
  size_t team_shmem_size (int team_size) const {
    return 16384;
  }
};
int  main (int  argc, char ** argv){
  Kokkos::initialize(argc, argv);
  MyExecSpace::print_configuration(std::cout);
  Kokkos::parallel_for( team_policy_t(1, TEAMSIZE, VECTORSIZE), myfunc1());
  MyExecSpace::fence();
  int reduction = 0;
  //Kokkos::parallel_reduce( team_policy_t(1, 8, 32), myfunc1(), reduction);
  Kokkos::parallel_reduce( team_policy_t(1, TEAMSIZE, VECTORSIZE), myfunc2(), reduction);
  MyExecSpace::fence();
  Kokkos::finalize();
  return 0;
}
@ndellingwood ndellingwood added this to the Backlog milestone Jun 8, 2016
@ndellingwood ndellingwood added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Jun 8, 2016
@crtrott
Copy link
Member

crtrott commented Jun 13, 2016

Both Bugs are confirmed.

@crtrott crtrott modified the milestones: Summer 2016, Backlog Jun 13, 2016
@crtrott
Copy link
Member

crtrott commented Jun 13, 2016

Ok I know what is going on with the shared memory size not being set and I know roughly what is going on with the first one not compiling. Both of those will get addressed as part of my work on adding a better generic reduction interface, and some internal code cleanup. For now you can work around the shared memory issue by setting the scratch size via the policy instead of the functor:

Kokkos::parallel_reduce( team_policy_t(1, TEAMSIZE, VECTORSIZE).set_scratch_size(1,Kokkos::PerTeam(16384)), myfunc2(), reduction);

crtrott added a commit that referenced this issue Jul 1, 2016
This should fix part of issue #311. Not sure yet about the
parallel for and parallel reduce operator in the same functor.
@crtrott
Copy link
Member

crtrott commented Jul 1, 2016

The shared memory thingy is fixed. The "can't compile thing" would be much harder, and for now my position is to retreat to: "If you want more than one operator in a class, you must use the Tag mechanism to distinguish them, even if they are technically distinguishable through the fact that one is a reduction operator the other one is not."

The reason for that is that the internal Functor inspection would be getting more complicated, and its already complicated enough.

@mndevec
Copy link
Author

mndevec commented Jul 5, 2016

Thanks Christian!

@mndevec mndevec closed this as completed Jul 5, 2016
@crtrott
Copy link
Member

crtrott commented Jul 5, 2016

Reopening until pushed to master

@crtrott crtrott reopened this Jul 5, 2016
hcedwar pushed a commit to hcedwar/kokkos that referenced this issue Jul 11, 2016
This should fix part of issue kokkos#311. Not sure yet about the
parallel for and parallel reduce operator in the same functor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

3 participants