Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos::deep_copy memory access failures #1583

Closed
mndevec opened this issue Apr 24, 2018 · 9 comments
Closed

Kokkos::deep_copy memory access failures #1583

mndevec opened this issue Apr 24, 2018 · 9 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Milestone

Comments

@mndevec
Copy link

mndevec commented Apr 24, 2018

This is caught for some of the SpGEMM tests described here: https://github.com/kokkos/kokkos-kernels/wiki/SpGEMM_Benchmarks. (For example coPaperDBLP).

Something for Kokkos::deep_copy between Kokkos versions 2.5 and 2.6 seems to introduce illegal memory accesses. This made some of the spgemm tests fail with some memory access issues.

I can replicate this issue in the below small test code on P100 with cuda 8.0 and gcc 5.3.

#include "Kokkos_Core.hpp"
#include <iostream>

int main() {

  Kokkos::initialize();
  std::cout << "Allocating view" << std::endl;
  Kokkos::View<int *, Kokkos::Cuda> very_big_view("test", 1960173568);
  Kokkos::fence();
  std::cout << "Deep Copy" << std::endl;

  Kokkos::deep_copy(very_big_view, 4);
  Kokkos::fence();
  std::cout << "Deep Copy Done" << std::endl;
  Kokkos::finalize();
}

The output is as below:

bash-4.2$ ./test.exe 
Allocating view
Deep Copy
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered /ascldap/users/mndevec/shepard_work/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:119
Traceback functionality not available

Aborted (core dumped)

@srajama1 @crtrott @ibaned @dsunder @ndellingwood

@crtrott
Copy link
Member

crtrott commented Apr 24, 2018

Oh my ...
I am not sure how this slipped through our testing it really seems straight forward enough.

@crtrott crtrott added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Blocks Promotion Overview issue for release-blocking bugs labels Apr 24, 2018
@crtrott crtrott added this to the 2018 April milestone Apr 24, 2018
@srajama1
Copy link

I think we should add the performance testing framework developed by Michel so Kokkoskernels can stress test Kokkos develop once a week or so. This showed up because we were trying to reproduce an earlier run w/ large matrices. Such a test would have caught other performance problems too.

@crtrott
Copy link
Member

crtrott commented Apr 24, 2018

I am also at somewhat of a loss what is going wrong. I tracked this down to this addition

  typedef   ViewFillA<Kokkos::View<int*, Kokkos::LayoutRight,
                                      Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>,
                                      Kokkos::MemoryTraits<1u> >,
                         Kokkos::LayoutLeft, Kokkos::Cuda, 1, int> FillFunctorType;
  FillFunctorType f(very_big_view,4);
  typedef Kokkos::RangePolicy<Kokkos::Cuda,Kokkos::IndexType<int>> policy_type;
  Kokkos::parallel_for("Copy2",policy_type(0,very_big_view.extent(0)), f);

This fails. But if I use int64_t as IndexType than it works.

@crtrott
Copy link
Member

crtrott commented Apr 24, 2018

So this was somewhat bad luck. The code works if you use a length larger than INT_MAX and it works for everything smaller than INT_MAX/2 it fails for numbers in between. We have tests which exercise more than 2B but we don't have anything in the region 1B-2B which would have triggered this issue ...

@srajama1
Copy link

What is special about 1B-2B ?

@crtrott
Copy link
Member

crtrott commented Apr 24, 2018

Maybe there is somewhere a *2 in the code path which overflows? I really don't know. I just tested whats going on to track down where this is going wrong and why we didn't catch it.

@srajama1
Copy link

Got it ! I was thinking of a * 2 as well but cannot see why that is needed for deep copy. Something weird.

@crtrott
Copy link
Member

crtrott commented Apr 24, 2018

AH I think I know whats going on. Its based on how the internal loop structure in the CUDA impl works. Basically it does a loop with i + stride < end .
Stride in this case would actually be 1.xB and i could also be 1.xB so together they overflow.

@crtrott crtrott self-assigned this Apr 25, 2018
crtrott added a commit that referenced this issue Apr 25, 2018
This tests exercises loops of length 1B<N<2B to expose issue #1583
crtrott added a commit that referenced this issue Apr 25, 2018
ibaned added a commit that referenced this issue Apr 26, 2018
Fix some 32/64 bit index issues mainly for CUDA (issue #1583)
@ndellingwood
Copy link
Contributor

PR #1588 merged, marking this as InDevelop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

4 participants