Nightly cuda/12.0, cuda/11.8 unit test failures #1663

ndellingwood · 2023-02-01T22:27:49Z

Sub-tests are failing in cuda/12.0 builds with the batched_dla_cuda and batched_gemm_cuda unit tests with error message cudaDeviceSynchronize() error( cudaErrorMisalignedAddress): misaligned address

batched_dla_cuda

00:37:42 3: [ RUN      ] Cuda.batched_scalar_serial_inverselu_dcomplex
00:37:42 3: cudaDeviceSynchronize() error( cudaErrorMisalignedAddress): misaligned address /home/jenkins/jenkins-new/workspace/KokkosKernels_KokkosDev2_CUDA120_GCC92_cpp17/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:132
00:37:42 3: Backtrace:
00:37:42 3:                         [0x675133]
00:37:42 3:                         [0x66d398]
00:37:42 3:                         [0x66d3cb]
00:37:42 3:                         [0x67b0fd]
00:37:42 3:                         [0x67bc91]
00:37:42 3:                         [0x651a65]
00:37:42 3:                         [0x5fd33a]
00:37:42 3:                         [0x43c787]
00:37:42 3:                         [0x64f34d]
00:37:42 3:                         [0x6432f3]
00:37:42 3:                         [0x6437a5]
00:37:42 3:                         [0x64398e]
00:37:42 3:                         [0x649a56]
00:37:42 3:                         [0x649cdb]
00:37:42 3:                         [0x413232]
00:37:42 3: __libc_start_main [0x7f7d99a18555]
00:37:42 3:                         [0x41acad]

batched_gemm_cuda

00:37:42 4: [ RUN      ] Cuda.batched_scalar_serial_gemm_nt_nt_dcomplex_dcomplex
00:37:42 4: cudaDeviceSynchronize() error( cudaErrorMisalignedAddress): misaligned address /home/jenkins/jenkins-new/workspace/KokkosKernels_KokkosDev2_CUDA120_GCC92_cpp17/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:132
00:37:42 4: Backtrace:
00:37:42 4:                         [0x7c22d3]
00:37:42 4:                         [0x7ba538]
00:37:42 4:                         [0x7ba56b]
00:37:42 4:                         [0x7c829d]
00:37:42 4:                         [0x7c8e31]
00:37:42 4:                         [0x79eb35]
00:37:42 4:                         [0x45e0b1]
00:37:42 4:                         [0x6fda73]
00:37:42 4:                         [0x6fe1d6]
00:37:42 4:                         [0x79c41d]
00:37:42 4:                         [0x7903c3]
00:37:42 4:                         [0x790875]
00:37:42 4:                         [0x790a5e]
00:37:42 4:                         [0x796b26]
00:37:42 4:                         [0x796dab]
00:37:42 4:                         [0x40e032]
00:37:42 4: __libc_start_main [0x7fb3be79b555]
00:37:42 4:                         [0x4170dd]

Reproducer (kokkos-dev-2):

source /projects/sems/modulefiles/utils/sems-archive-modules-init.sh ; module use /home/projects/x86-64/modulefiles/local
module purge
module load sems-archive-env sems-archive-cmake/3.17.1 sems-archive-gcc/9.2.0 cuda/12.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=Cuda,Serial --arch=Volta70 --compiler=$KOKKOS_PATH/bin/nvcc_wrapper --cxxflags="-O3 -Wall -Wunused-parameter -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized " --cxxstandard="17" --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --with-cuda-options=enable_lambda   --no-examples

The text was updated successfully, but these errors were encountered:

e10harvey · 2023-08-10T16:58:20Z

@lucbv: Do you have any notes on this so I can pickup from where you left off or do you want to pair up?

e10harvey · 2023-08-23T17:52:13Z

Notes:

Binaries in issue1663 build directory.
Use https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html#memcheck-tool
If misalignment originates from a dcomplex view, verify that KOKKOS_ENABLE_COMPLEX_ALIGN is defined in kokkos core.

e10harvey · 2023-08-23T18:07:59Z

Relevant snippet from memcheck:

========= Invalid __local__ read of size 16 bytes
=========     at 0xdeadbeef in void Kokkos::Impl::cuda_parallel_launch_local_memory<Kokkos::Impl::ParallelFor<Test::SerialInverseLU::Functor_BatchedSerialGemm<Kokkos::Cuda, Kokkos::View<Kokkos::complex<double> ***, Kokkos::LayoutLeft, Kokkos::Cuda>, Kokkos::complex<double>, Test::SerialInverseLU::ParamTag<KokkosBlas::Trans::NoTranspose, KokkosBlas::Trans::NoTranspose>, KokkosBlas::Algo::Level3::Blocked>, Kokkos::RangePolicy<Kokkos::Cuda, Test::SerialInverseLU::ParamTag<KokkosBlas::Trans::NoTranspose, KokkosBlas::Trans::NoTranspose>>, Kokkos::Cuda>>(T1

Note that all inverselu invalid reads come from the Blocked algo type.

e10harvey · 2023-08-23T18:24:21Z

Note: Cuda/12 wants all addresses 16-byte aligned but, in the BatchedSerialGemm Blocked implementation, we de-reference a address that is 8-byte aligned.

TODO: Print out pointer scalar types and their size as well as the starting addresses of views/subviews.

e10harvey · 2023-08-28T15:19:22Z

After more debugging I have determined that the misalignment is stemming from Functor_BatchedSerialGemm in Test_Batched_SerialInverseLU.hpp of an address outside the control of the parallel_for caller.

e10harvey · 2023-08-28T19:30:40Z

Given that the functor in question does not use any addresses that are violating 16-byte alignment nor do locals (&_alpha or &_beta) violate 16-byte alignment, I believe this is either a Kokkos Core or a compiler bug. Regardless of where the bug stems from, we should ask someone from Cuda or Kokkos Core to investigate further.

e10harvey · 2023-08-28T22:20:32Z

Here are more triaging results. Note that local memory can only be allocated by the compiler.

Christian and I tried moving Scalar _alpha, _beta above the declaration of the _a, _b, _c locals in the functor class definition:

template <typename DeviceType, typename ViewType, typename ScalarType,
          typename ParamTagType, typename AlgoTagType>
struct Functor_BatchedSerialGemm {
  ScalarType _alpha, _beta;
  ViewType _a, _b, _c;

This change resulted in passing tests in cuda/12.0.

e10harvey · 2023-08-28T22:27:04Z

The (register allocation bug?) still persists in cuda/12.2.

KokkosKernels HEAD SHA: 6c06bd0
Kokkos HEAD SHA: 7e299b4e25c42528e105379c3aa9a318056545ba

Local changes in KokkosKernels: kk_local_changes.txt

Local change in Kokkos: none.

module load sems-archive-env sems-archive-cmake/3.17.1 gcc/11 nvhpc/23.7
make -j16 KokkosKernels_batched_dla_cuda
$ ./batched/dense/unit_test/KokkosKernels_batched_dla_cuda --gtest_filter='*Cuda.batched_scalar_serial_inverselu_dcomplex*'
Note: Google Test filter = *Cuda.batched_scalar_serial_inverselu_dcomplex*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Cuda
[ RUN      ] Cuda.batched_scalar_serial_inverselu_dcomplex
mark0
mark1
a0:0x7fd615f32480
a1:0x7fd615f32680
c0:0x7fd615f32a80
w:0x7fd615f32880
sizeof(AViewType::value_type):16
&_alpha:0x7ffc56298260
&_beta:0x7ffc56298270
mark2
i:0
mark0
mark1
a0:0x7fd615f32480
a1:0x7fd615f32680
c0:0x7fd615f32a80
w:0x7fd615f32880
sizeof(AViewType::value_type):16
&_alpha:0x7ffc56298260
&_beta:0x7ffc56298270
(CudaInternal::singleton().cuda_device_synchronize_wrapper()) error( cudaErrorMisalignedAddress): misaligned address /ascldap/users/eharvey/KOKKOS.base/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:153
Backtrace:
[0x6a0e23] 
[0x69b148] 
[0x69b17b] 
[0x6a6ee7] 
[0x6a786d] 
[0x67f4a5] 
[0x65961a] 
[0x47829d] 
[0x67cf8d] 
[0x672b98] 
[0x67332d] 
[0x673544] 
[0x6773d4] 
[0x672316] 
[0x4193c2] 
[0x7fd63cc7d555] __libc_start_main
[0x420b6d] 
Aborted (core dumped)

Here is some additional sizeof and aligof information using cuda/12.2:

$ ./batched/dense/unit_test/KokkosKernels_batched_dla_cuda --gtest_filter='*Cuda.batched_scalar_serial_inverselu_dcomplex*'
Note: Google Test filter = *Cuda.batched_scalar_serial_inverselu_dcomplex*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Cuda
[ RUN      ] Cuda.batched_scalar_serial_inverselu_dcomplex
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df32680
c0:0x7fc68df32a80
w:0x7fc68df32880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
mark2
i:0
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df32680
c0:0x7fc68df32a80
w:0x7fc68df32880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:1
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df36680
c0:0x7fc68df3ea80
w:0x7fc68df3a880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:2
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df42680
c0:0x7fc68df62a80
w:0x7fc68df52880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:3
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df56680
c0:0x7fc68df9ea80
w:0x7fc68df7a880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:4
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df72680
c0:0x7fc67d400080
w:0x7fc68dfb2880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:5
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df96680
c0:0x7fc67d464280
w:0x7fc67d400080
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:6
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc67d400080
c0:0x7fc67d520480
w:0x7fc67d490280
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:7
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc67d400080
c0:0x7fc67dc00080
w:0x7fc67d4c4280
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:8
mark0
mark1
a0:0x7fc67d400080
a1:0x7fc67dc00080
c0:0x7fc67e000080
w:0x7fc67de00080
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:9
mark0
mark1
a0:0x7fc67d400080
a1:0x7fc67dc00080
c0:0x7fc67e000080
w:0x7fc67de00080
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df32680
c0:0x7fc68df32a80
w:0x7fc68df32880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
mark2
i:0
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df32680
c0:0x7fc68df32a80
w:0x7fc68df32880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:1
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df36680
c0:0x7fc68df3ea80
w:0x7fc68df3a880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:2
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df42680
c0:0x7fc68df62a80
w:0x7fc68df52880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:3
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df56680
c0:0x7fc68df9ea80
w:0x7fc68df7a880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:4
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df72680
c0:0x7fc67d400080
w:0x7fc68dfb2880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:5
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc68df96680
c0:0x7fc67d464280
w:0x7fc67d400080
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:6
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc67d400080
c0:0x7fc67d520480
w:0x7fc67d490280
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:7
mark0
mark1
a0:0x7fc68df32480
a1:0x7fc67d400080
c0:0x7fc67dc00080
w:0x7fc67d4c4280
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:8
mark0
mark1
a0:0x7fc67d400080
a1:0x7fc67dc00080
c0:0x7fc67e000080
w:0x7fc67de00080
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
i:9
mark0
mark1
a0:0x7fc67d400080
a1:0x7fc67dc00080
c0:0x7fc67e000080
w:0x7fc67de00080
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffdfcd472f0
&_beta:0x7ffdfcd47300
In Operator: alignof(decltype(*this)):16
In Operator: alignof(decltype(ViewType)):8
In Operator: alignof(decltype(ScalarType)):16
mark2
[       OK ] Cuda.batched_scalar_serial_inverselu_dcomplex (116 ms)
[----------] 1 test from Cuda (116 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (116 ms total)
[  PASSED  ] 1 test.

NOTE: You have to comment out the following prints in the operator to trigger misalignment:

  KOKKOS_INLINE_FUNCTION
  void operator()(const ParamTagType &, const int k) const {
    auto aa = Kokkos::subview(_a, k, Kokkos::ALL(), Kokkos::ALL());
    auto bb = Kokkos::subview(_b, k, Kokkos::ALL(), Kokkos::ALL());
    auto cc = Kokkos::subview(_c, k, Kokkos::ALL(), Kokkos::ALL());

    /* if (k == 0) {
      printf("In Operator: alignof(decltype(*this)):%lu\n", alignof(decltype(*this)));
      printf("In Operator: alignof(decltype(ViewType)):%lu\n", alignof(ViewType));
      printf("In Operator: alignof(decltype(ScalarType)):%lu\n", alignof(ScalarType));
    } */

$ ./batched/dense/unit_test/KokkosKernels_batched_dla_cuda --gtest_filter='*Cuda.batched_scalar_serial_inverselu_dcomplex*'
Note: Google Test filter = *Cuda.batched_scalar_serial_inverselu_dcomplex*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Cuda
[ RUN      ] Cuda.batched_scalar_serial_inverselu_dcomplex
mark0
mark1
a0:0x7ff6a7f32480
a1:0x7ff6a7f32680
c0:0x7ff6a7f32a80
w:0x7ff6a7f32880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffd8a1ea7b0
&_beta:0x7ffd8a1ea7c0
mark2
i:0
mark0
mark1
a0:0x7ff6a7f32480
a1:0x7ff6a7f32680
c0:0x7ff6a7f32a80
w:0x7ff6a7f32880
sizeof(AViewType::value_type):16
sizeof(ViewType):40
alignof(decltype(*this)):16
alignof(decltype(ViewType)):8
alignof(decltype(ScalarType)):16
&_alpha:0x7ffd8a1ea7b0
&_beta:0x7ffd8a1ea7c0
(CudaInternal::singleton().cuda_device_synchronize_wrapper()) error( cudaErrorMisalignedAddress): misaligned address /ascldap/users/eharvey/KOKKOS.base/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:153
Backtrace:
[0x6a11b3] 
[0x69b4d8] 
[0x69b50b] 
[0x6a7277] 
[0x6a7bfd] 
[0x67f835] 
[0x65996e] 
[0x47828d] 
[0x67d31d] 
[0x672f28] 
[0x6736bd] 
[0x6738d4] 
[0x677764] 
[0x6726a6] 
[0x419382] 
[0x7ff6d284e555] __libc_start_main
[0x420b2d] 
Aborted (core dumped)

matt-stack · 2023-09-27T20:38:51Z

Hello, I am looking into this bug, and came across something I found strange. If you keep all the source for the test the same, but take out one Kokkos::abort, then it seems to not hit this error message. Does anyone have an idea why that would be?

change the abort here to just return 0; or comment it out entirely.

  if (!(m <= 2 && n <= 2))
   Kokkos::abort(
        "InnerGemmFixC<2,2>::serial_invoke, assert failure (m<=2 && n<=2)");

to

  if (!(m <= 2 && n <= 2)) return 0;
//    Kokkos::abort(
//        "InnerGemmFixC<2,2>::serial_invoke, assert failure (m<=2 && n<=2)");

And on my machine I get no error. Because of the lack of abort, am I just missing a cudaCheckLastError call or something like that? I cant tell yet if the Kokkos::abort is an issue here, or its causing me to miss the trigger for the bug, or its not printing the Cuda error. Though when I searched through the src for cuda_abort, it looks like it just prints the message you give it. @crtrott for vis

ndellingwood · 2023-12-15T22:26:32Z

Just to update, these two tests fail with cd8f77c when enabling complex_double types in builds with c++20 enabled as well using for example cuda/12.0.0 + gcc/11.3.0

ndellingwood · 2023-12-15T23:08:34Z

If I configure with the option -DKokkos_ENABLE_COMPLEX_ALIGN=OFF then the tests posted above pass
Adding @crtrott @dalg24 @masterleinad to the loop

ndellingwood · 2024-04-13T00:49:49Z

The same tests fail with cuda/11.8.0 when testing with cusparse and magma tpls enabled

ndellingwood · 2024-06-04T16:28:07Z

Updating the issue to confirm the same tests still fail with cuda/11.8.0, cuda/12.0 +/- c++20 on Weaver (Volta70+Power9) with SHA 32aa75a

Configuration (Weaver, cuda/12.0 w/ c++20):

bsub -Is -n 1 -q rhel8 -gpu "num=1" bash

source /etc/profile.d/modules.sh
module load cmake git gcc/11.3.0 cuda/12.0.0

${KOKKOSKERNELS_PATH}/cm_generate_makefile.bash --with-cuda --with-serial --compiler=${KOKKOS_PATH}/bin/nvcc_wrapper --arch=Volta70,Power9 --with-cuda-options=enable_lambda --kokkos-path=${KOKKOS_PATH} --kokkoskernels-path=${KOKKOSKERNELS_PATH} --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --cxxstandard=20

Test failures:

16:17:09 The following tests FAILED:
16:17:09 	  3 - batched_dla_cuda (Subprocess aborted)
16:17:09 	  4 - batched_gemm_cuda (Subprocess aborted)

ndellingwood · 2024-06-04T18:17:59Z

The tests above passed on kokkos-dev-2 with sems-cuda/12.4 + sems-gcc/13.2.0

lucbv · 2024-06-04T18:44:26Z

@ndellingwood so with cuda 12.4 we have the batched_dla_cuda and batched_gemm_cuda working correctly? Anything else failing on that platform?

ndellingwood · 2024-06-04T19:34:50Z

@ndellingwood so with cuda 12.4 we have the batched_dla_cuda and batched_gemm_cuda working correctly? Anything else failing on that platform?

@lucbv on kokkos-dev-2 the configuration here (with Power9 dropped), using sems-cuda/12.4, the tests passed 100%

ndellingwood added the bug label Feb 1, 2023

e10harvey assigned lucbv and e10harvey May 23, 2023

e10harvey added this to To do in Developer: E10HARVEY via automation Aug 10, 2023

e10harvey moved this from To do to In progress in Developer: E10HARVEY Aug 10, 2023

e10harvey added the CompilerIssue label Sep 5, 2023

cwpearson assigned cwpearson and unassigned e10harvey Oct 18, 2023

ndellingwood changed the title ~~Nightly cuda/12.0 unit test failures~~ Nightly cuda/12.0, cuda/11.8 unit test failures Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nightly cuda/12.0, cuda/11.8 unit test failures #1663

Nightly cuda/12.0, cuda/11.8 unit test failures #1663

ndellingwood commented Feb 1, 2023

e10harvey commented Aug 10, 2023

e10harvey commented Aug 23, 2023

e10harvey commented Aug 23, 2023

e10harvey commented Aug 23, 2023 •

edited

Loading

e10harvey commented Aug 28, 2023 •

edited

Loading

e10harvey commented Aug 28, 2023

e10harvey commented Aug 28, 2023

e10harvey commented Aug 28, 2023

matt-stack commented Sep 27, 2023

ndellingwood commented Dec 15, 2023

ndellingwood commented Dec 15, 2023

ndellingwood commented Apr 13, 2024

ndellingwood commented Jun 4, 2024

ndellingwood commented Jun 4, 2024

lucbv commented Jun 4, 2024

ndellingwood commented Jun 4, 2024

Nightly cuda/12.0, cuda/11.8 unit test failures #1663

Nightly cuda/12.0, cuda/11.8 unit test failures #1663

Comments

ndellingwood commented Feb 1, 2023

e10harvey commented Aug 10, 2023

e10harvey commented Aug 23, 2023

e10harvey commented Aug 23, 2023

e10harvey commented Aug 23, 2023 • edited Loading

e10harvey commented Aug 28, 2023 • edited Loading

e10harvey commented Aug 28, 2023

e10harvey commented Aug 28, 2023

e10harvey commented Aug 28, 2023

matt-stack commented Sep 27, 2023

ndellingwood commented Dec 15, 2023

ndellingwood commented Dec 15, 2023

ndellingwood commented Apr 13, 2024

ndellingwood commented Jun 4, 2024

ndellingwood commented Jun 4, 2024

lucbv commented Jun 4, 2024

ndellingwood commented Jun 4, 2024

e10harvey commented Aug 23, 2023 •

edited

Loading

e10harvey commented Aug 28, 2023 •

edited

Loading