Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triple nested parallelism still fails on bowman #1093

Closed
ibaned opened this issue Sep 8, 2017 · 6 comments
Closed

Triple nested parallelism still fails on bowman #1093

ibaned opened this issue Sep 8, 2017 · 6 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Milestone

Comments

@ibaned
Copy link
Contributor

ibaned commented Sep 8, 2017

https://jenkins-son.sandia.gov/job/Kokkos_SLURM_inner_test/6885/console

@ibaned ibaned added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Sep 8, 2017
@ibaned ibaned added this to the 2017 September milestone Sep 8, 2017
@crtrott
Copy link
Member

crtrott commented Sep 8, 2017

This is definitely our big white whale ...

@hcedwar hcedwar modified the milestones: 2017 September, 2017 December Sep 13, 2017
@hcedwar
Copy link
Contributor

hcedwar commented Nov 29, 2017

Suspect fixed by #1240

@crtrott
Copy link
Member

crtrott commented Aug 29, 2018

This failed again on KNL ....

@ndellingwood
Copy link
Contributor

Our arch-enemy has returned...
This showed up again on Bowman today
Link

@crtrott
Copy link
Member

crtrott commented Nov 26, 2019

Looking into this once more. Main issue is with pthreads. I couldn't reproduce it with OpenMP yet.

This code also fails (very occasionally, like 1 in a 1000 or so):

    Kokkos::parallel_reduce(
        team_policy(nrows / chunk_size, team_size, vector_length),
        KOKKOS_LAMBDA(const member_type &teamMember, double &update) {
          const int row_start = teamMember.league_rank() * chunk_size;
          const int row_end   = row_start + chunk_size;
          Kokkos::parallel_for(
              Kokkos::TeamThreadRange(teamMember, row_start, row_end),
              [&](const int i) {
                ScalarType sum_i = 0.0;
#if 1
for(unsigned int j=0; j<ncols; j++) sum_i+=A(i,j)*x(j);
#else
                Kokkos::parallel_reduce(
                    Kokkos::ThreadVectorRange(teamMember, ncols),
                    [&](const int j, ScalarType &innerUpdate) {
                      innerUpdate += A(i, j) * x(j);
                    },
                    sum_i);
#endif
                //Kokkos::atomic_add(&control(),sum_i);
                Kokkos::atomic_add(&count(),1l);
                Kokkos::single(Kokkos::PerThread(teamMember),
                               [&]() { update += y(i) * sum_i; });
              });
        },
        result);
    Kokkos::fence();

When this fails count() is still correct so it looks like every iteration is executed. Also this obviously means its not the inner most loop ...

@ndellingwood
Copy link
Contributor

Recent failure with Pthreads on Bowman, intel/compilers/17.2.174

Failure output:

2: [ RUN      ] threads.triple_nested_parallelism
2:   TestTripleNestedReduce failed solution(16777216) != result(16318464), nrows(8192) ncols(2048) league_size(64) team_size(32)
2: /home/jenkins/bowman/workspace/Kokkos_SLURM_inner_test@4/core/unit_test/TestTeamVector.hpp:993: Failure
2: Value of: result
2:   Actual: 1.63185e+07
2: Expected: solution
2: Which is: 1.67772e+07

Link to failing nightly build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

6 participants