Skip to content

[OpenMP] miscompilation offload to amdgpu #59759

@ye-luo

Description

@ye-luo

On gfx90a and with rocm 5.4.0 device lib. Clang a63b724
Using https://github.com/ye-luo/miniqmc testing commit 6f526b6062682ec892fb02d2919484c8b4db0875

mkdir build_llvm_offload_cuda2hip_real; cd build_llvm_offload_cuda2hip_real
cmake -DCMAKE_CXX_COMPILER=clang++ -DENABLE_OFFLOAD=ON -DOFFLOAD_TARGET=amdgcn-amdhsa -DOFFLOAD_ARCH=gfx90a ..
make -j32 test_omptarget_blas
./src/Platforms/tests/OMPTarget/test_omptarget_blas

failure is sporadic. the result should be integer stored in floats but it is not.

-------------------------------------------------------------------------------
OmpBLAS gemv
-------------------------------------------------------------------------------
/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:179
...............................................................................

/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:175: FAILED:
  CHECK( Cs[batch][index] == Ds[batch][index] )
with expansion:
  586417.0317596535 == 586417.0

/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:175: FAILED:
  CHECK( Cs[batch][index] == Ds[batch][index] )
with expansion:
  728143.0635398587 == 728143.0

===============================================================================
test cases:    1 |    0 passed | 1 failed
assertions: 6576 | 6574 passed | 2 failed

Interestingly, if I edit

diff --git a/src/Platforms/OMPTarget/ompBLAS.cpp b/src/Platforms/OMPTarget/ompBLAS.cpp
index ce895f0..ca9d395 100644
--- a/src/Platforms/OMPTarget/ompBLAS.cpp
+++ b/src/Platforms/OMPTarget/ompBLAS.cpp
@@ -93,7 +93,6 @@ ompBLAS_status gemv(ompBLAS_handle&     handle,
   return gemv_impl(handle, trans, m, n, alpha, A, lda, x, incx, beta, y, incy);
 }
 
-#if !defined(OPENMP_NO_COMPLEX)
 ompBLAS_status gemv(ompBLAS_handle&                  handle,
                     const char                       trans,
                     const int                        m,
@@ -125,7 +124,6 @@ ompBLAS_status gemv(ompBLAS_handle&                   handle,
 {
   return gemv_impl(handle, trans, m, n, alpha, A, lda, x, incx, beta, y, incy);
 }
-#endif

which basically compiles a few more unused offload regions. test_omptarget_blas passes reliably.

Even with the above workaround, if I add -DCMAKE_CXX_FLAGS=-foffload-lto in CMake, the test returns to failure mode.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions