-
Notifications
You must be signed in to change notification settings - Fork 15k
Closed
Labels
Description
On gfx90a and with rocm 5.4.0 device lib. Clang a63b724
Using https://github.com/ye-luo/miniqmc testing commit 6f526b6062682ec892fb02d2919484c8b4db0875
mkdir build_llvm_offload_cuda2hip_real; cd build_llvm_offload_cuda2hip_real
cmake -DCMAKE_CXX_COMPILER=clang++ -DENABLE_OFFLOAD=ON -DOFFLOAD_TARGET=amdgcn-amdhsa -DOFFLOAD_ARCH=gfx90a ..
make -j32 test_omptarget_blas
./src/Platforms/tests/OMPTarget/test_omptarget_blas
failure is sporadic. the result should be integer stored in floats but it is not.
-------------------------------------------------------------------------------
OmpBLAS gemv
-------------------------------------------------------------------------------
/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:179
...............................................................................
/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:175: FAILED:
CHECK( Cs[batch][index] == Ds[batch][index] )
with expansion:
586417.0317596535 == 586417.0
/ccs/home/yeluo/test/miniqmc/src/Platforms/tests/OMPTarget/test_omp_BLAS.cpp:175: FAILED:
CHECK( Cs[batch][index] == Ds[batch][index] )
with expansion:
728143.0635398587 == 728143.0
===============================================================================
test cases: 1 | 0 passed | 1 failed
assertions: 6576 | 6574 passed | 2 failed
Interestingly, if I edit
diff --git a/src/Platforms/OMPTarget/ompBLAS.cpp b/src/Platforms/OMPTarget/ompBLAS.cpp
index ce895f0..ca9d395 100644
--- a/src/Platforms/OMPTarget/ompBLAS.cpp
+++ b/src/Platforms/OMPTarget/ompBLAS.cpp
@@ -93,7 +93,6 @@ ompBLAS_status gemv(ompBLAS_handle& handle,
return gemv_impl(handle, trans, m, n, alpha, A, lda, x, incx, beta, y, incy);
}
-#if !defined(OPENMP_NO_COMPLEX)
ompBLAS_status gemv(ompBLAS_handle& handle,
const char trans,
const int m,
@@ -125,7 +124,6 @@ ompBLAS_status gemv(ompBLAS_handle& handle,
{
return gemv_impl(handle, trans, m, n, alpha, A, lda, x, incx, beta, y, incy);
}
-#endif
which basically compiles a few more unused offload regions. test_omptarget_blas passes reliably.
Even with the above workaround, if I add -DCMAKE_CXX_FLAGS=-foffload-lto
in CMake, the test returns to failure mode.