Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel_reduce(0), parallel_scan(0) unit tests #436

Closed
ndellingwood opened this issue Sep 20, 2016 · 7 comments
Closed

parallel_reduce(0), parallel_scan(0) unit tests #436

ndellingwood opened this issue Sep 20, 2016 · 7 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting
Milestone

Comments

@ndellingwood
Copy link
Contributor

While adding unit tests for parallel_scan of size 0 (to go along with tests in #175 ) the serial.team_scan test failed with both gcc 4.9 and gcc 5.4.
Error:

[ RUN      ] serial.team_scan
Floating point exception (core dumped)
@ndellingwood
Copy link
Contributor Author

Working on fix for the unit test.

@ndellingwood ndellingwood changed the title serial.team_scan with worksize 0 throws floating point exception parallel_reduce(0), parallel_scan(0) unit tests Sep 21, 2016
@ndellingwood
Copy link
Contributor Author

Fix for above was easy, found a couple more issues with cuda after adding more tests for parallel_reduce(0) and parallel_scan(0). Will track with this issue and changed issue name to better reflect this.

@ndellingwood
Copy link
Contributor Author

Cuda test failures (runtime):
test_reduce(0) test failed with Cuda (called in TestCuda_b.hpp for example TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0); ) as well as TestReduceTeam

In test_reduce, passing in 0 results in a league_size=0, and a team_size of 256 is returned by team_size_max(functor). These values are then passed in the team policy to parallel_reduce.
During execution of the parallel_reduce the body of the functor operator is never executed, however parallel_reduce returns a result of 501501 ( = 1001*1002/2 = sum(1 to 1001) in case that is relevant).

Fix (so far):
The execute() method in the RangePolicy version of ParallelReduce is protected when the amount of work is 0 i.e.

    void execute() {
      const int nwork = m_policy.end() - m_policy.begin();
      if ( nwork ) {  ...
      //stuff...
      }
     else {
       if (m_result_ptr) {
         ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
       }

I added this to the TeamPolicy version of the ParallelReduce as well, that is, within the execute() method I added the if - else check as above but replaced nwork with const int nwork = m_league_size * m_team_size
This fixed the errors I was seeing with Cuda testing with a generated makefile and make build-test (will check with test_all_sandia).

@crtrott Is this sufficient? I'm afraid I may be overlooking something within the team policy that I am not familiar with.

@ndellingwood ndellingwood added this to the Fall 2016 milestone Sep 21, 2016
@ndellingwood ndellingwood self-assigned this Sep 21, 2016
@crtrott crtrott added the Enhancement Improve existing capability; will potentially require voting label Sep 21, 2016
@crtrott
Copy link
Member

crtrott commented Sep 21, 2016

I think this looks fine. If you put it in as a pull request instead of pushing directly I will check it before we merge it in.

@ndellingwood
Copy link
Contributor Author

The following test added to TestThreads.hpp (parallel_for(0) type test, gcc compiler) produces a segfault:

TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);

I have a fix, but not sure if it is the proper fix or may mask something else.

Issue:
Passing in the 0 to test_for results in a league_size of 0.
In the threads' ParallelFor specialized for Dynamic type schedule, a call to exec_team is made with a for loop that calls member.valid_dynamic() (defined in ThreadsExecTeamMember )
Within this call use of m_team_base[0] results in a seg fault,
e.g. m_team_base[0]->get_work_index(m_team_alloc);
None of the body of the ThreadsExecTeamMember ctor is executed so none of the values are 'properly' initialized (whatever that means for league_size of 0).

Working solution:
In the ThreadsExecTeamMember ctor body, added an else condition that sets m_invalid_thread = 0; when the condition if ( team.league_size() ) fails. This results in the unit test passing as valid_dynamic() returns false at the beginning of the code.

@ndellingwood
Copy link
Contributor Author

ndellingwood commented Sep 22, 2016

@crtrott
The following reducers test failed at runtime with Threads backend and gcc 5.4.

/Users/ndellin/Research/kokkos_items/kokkos/core/unit_test/TestReduce.hpp:1581: Failure
Value of: reference_minloc
  Actual: 3611
Expected: minmax_scalar.min_loc
Which is: 8639
[  FAILED  ] threads.reducers (4 ms)

@ndellingwood
Copy link
Contributor Author

Pull request issued #440

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

2 participants