Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apollo cuda.team_broadcast test fail with clang-6.0 #1762

Closed
ndellingwood opened this issue Aug 23, 2018 · 5 comments
Closed

Apollo cuda.team_broadcast test fail with clang-6.0 #1762

ndellingwood opened this issue Aug 23, 2018 · 5 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)

Comments

@ndellingwood
Copy link
Contributor

The Jenkins build from Apollo reported test failure with cuda.team_broadcast.

This was with the clang-6.0-Cuda_Pthread-release and clang-6.0-Cuda_Serial-release builds.

@vqd8a do you time to look into this? Runtime error is posted below:

[ RUN      ] cuda.team_broadcast
<https://jenkins-son.sandia.gov/job/Kokkos_apollo/ws/kokkos/core/unit_test/TestTeam.hpp>:1067: Failure
Value of: size_t( total )
  Actual: 960
Expected: size_t( expected_result )
Which is: 5120
<https://jenkins-son.sandia.gov/job/Kokkos_apollo/ws/kokkos/core/unit_test/TestTeam.hpp>:1067: Failure
Value of: size_t( total )
  Actual: 960
Expected: size_t( expected_result )
Which is: 5120
<https://jenkins-son.sandia.gov/job/Kokkos_apollo/ws/kokkos/core/unit_test/TestTeam.hpp>:1067: Failure
Value of: size_t( total )
  Actual: 72192
Expected: size_t( expected_result )
Which is: 385024
<https://jenkins-son.sandia.gov/job/Kokkos_apollo/ws/kokkos/core/unit_test/TestTeam.hpp>:1067: Failure
Value of: size_t( total )
  Actual: 72192
Expected: size_t( expected_result )
Which is: 385024
<https://jenkins-son.sandia.gov/job/Kokkos_apollo/ws/kokkos/core/unit_test/TestTeam.hpp>:1067: Failure
Value of: size_t( total )
  Actual: 287904000
Expected: size_t( expected_result )
Which is: 1535488000
<https://jenkins-son.sandia.gov/job/Kokkos_apollo/ws/kokkos/core/unit_test/TestTeam.hpp>:1067: Failure
Value of: size_t( total )
  Actual: 287904000
Expected: size_t( expected_result )
Which is: 1535488000
@ndellingwood ndellingwood added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) DevelopOnly labels Aug 23, 2018
@ndellingwood
Copy link
Contributor Author

@vqd8a here are some instructions for building on Apollo for this case:

Modules:
module load clang/6.0 cuda/9.0.69

Makefile:
generate_makefile.bash --with-cuda --with-openmp --compiler=clang++ --arch="SNB,Volta70"

No need to salloc a node or anything like that.

@vqd8a
Copy link
Contributor

vqd8a commented Aug 23, 2018

thanks, @ndellingwood

@crtrott
Copy link
Member

crtrott commented Aug 28, 2018

This is actually the parallel_reduce(TeamThreadRange) which produces the wrong result in the test. The broadcast appears to be fine ...

@crtrott
Copy link
Member

crtrott commented Aug 28, 2018

Have a fix which actually reduces one of the many reduction implementations for CUDA in Kokkos. Spotchecks running now. Will post pull request afterwards.

crtrott added a commit that referenced this issue Aug 28, 2018
This addresses issue #1762
@vqd8a
Copy link
Contributor

vqd8a commented Aug 29, 2018

@crtrott thanks for fixing this. I just figured out team_broadcast working fine yesterday afternoon then you already fixed the team reduce.

@crtrott crtrott closed this as completed Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

3 participants