Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation error when building cuRAND or cuBLAS tests #91

Closed
sbalint98 opened this issue May 14, 2021 · 9 comments
Closed

Compilation error when building cuRAND or cuBLAS tests #91

sbalint98 opened this issue May 14, 2021 · 9 comments
Assignees

Comments

@sbalint98
Copy link
Contributor

Summary

When compiling oneMKL with both tests and cuRAND or cuBLAS enabled a compilation error occurs.

Version

The current oneMKL develop head is used eg: 1ed12c7

Environment

  • HW you use
    Intel Gold 6130 CPU with Nvidia gtx1080 GPUs
  • Backend library version
    Cuda 10.0
    MKL, and TBB obtained via intel installer version 2021.1.1
  • OS name and version
    Ubuntu 20.04
  • Compiler version
    dpc++ compiler cloned from develop with hash: 4e26734cb87c451e0562559d5d6f83b7eabcaea3
compiled with buildbot/configure.py --cuda
and buildbot/compile.py

Steps to reproduce

git clone https://github.com/oneapi-src/oneMKL.git
mkdir build && cd build

LD_LIBRARY_PATH=/root/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:$LD_LIBRARY_PATH \
CXX=/root/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang++ \
CC=/root/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang \
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DTBB_ROOT=/root/hipSYCL-main/dpc++/tbb/latest \
-DMKL_ROOT=/root/hipSYCL-main/dpc++/mkl/latest \
-DREF_BLAS_ROOT=/root/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.1/openblas-0.3.14-npb5lv7dhfygc3lgh6zx3x6chlyt4kth/ \
-DENABLE_CUBLAS_BACKEND=OFF \
-DENABLE_CURAND_BACKEND=ON \
-DENABLE_MKLGPU_BACKEND=OFF ..

LD_LIBRARY_PATH=/root/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:$LD_LIBRARY_PATH ninja

Observed behavior

When either of ENABLE_CURAND_BACKEND and ENABLE_CUBLAS_BACKEND is defined the compilation fails. I believe this can be traced back to the following issues:

  • In case ENABLE_CURAND_BACKEND=ON ENABLE_CUBLAS_BACKEND=OFF
    The compilation terminates with an error. I suspect that this is caused by the code in test_helper.hpp 70-81. In case ENABLE_CURAND_BACKEND is defined the compilation will fail since TEST_RUN_NVIDIAGPU_CURAND_SELECT will be defined with the backend selector oneapi::mkl::backend::curand and there are no such blas functions defined in blas_ct_backends.hpp
    compile_error_curand.log

  • In case ENABLE_CURAND_BACKEND=OFF ENABLE_CUBLAS_BACKEND=ON
    The compilation fails since the cuRAND tests are attempted to be compiled with the cublas backend selector.
    compile_error_cublas.log
    A possible workaround in case only cuBLAS is of interest is to comment out adding the rng domain in the root level CMakelists.txt. In that case, the compilation is successful. only a few warnings about SYCL 2020 depreciation warnings are displayed.

Expected behavior

All combination should compile successfully. I believe a possible fix might be to use a single Cuda backend selector instead of separate cuBLAS and cuRAND?

@vrpascuzzi
Copy link
Contributor

All combination should compile successfully. I believe a possible fix might be to use a single Cuda backend selector instead of separate cuBLAS and cuRAND?

This is what I had in mind as well, some time ago. I think the decision to keep RNG and BLAS domains separate was kept. Indeed, I found similar issues as you've reported; when building cuRAND alone without BLAS, a BLAS header was included in an installed file. I needed to tweak this by-hand.

In short, this can be fixed up a bit.

@mmeterel mmeterel self-assigned this May 20, 2021
@mmeterel
Copy link
Contributor

@sbalint98 Hmmm, I cannot reproduce what you are seeing. I tried several combinations locally and here is what I observe.

You are using the latest develop branch, right?

TARGET_DOMAIN ENABLE_CUBLAS_BACKEND ENABLE_CURAND_BACKEND Build PASS/FAIL
blas ON OFF PASS
rng ON OFF PASS
blas, rng ON OFF PASS
       
rng OFF ON PASS
blas OFF ON PASS
blas, rng OFF ON PASS
       
blas ON ON BLAS test is being compiled for cuRAND backend ==> Fail
rng ON ON RNG test is being compiled for cuBLAS backend ==> Fail
blas, rng ON ON RNG test is being compiled for cuBLAS backend ==> Fail

@sbalint98
Copy link
Contributor Author

Thank you for your investigation, unfortunately, I still have the same issue as described above. I am using the current HEAD of develop.

I will try to do some more detailed exploration over the weekend. Could you provide the details of how you have executed cmake?

@mmeterel
Copy link
Contributor

mmeterel commented May 27, 2021

Thank you for your investigation, unfortunately, I still have the same issue as described above. I am using the current HEAD of develop.

I will try to do some more detailed exploration over the weekend. Could you provide the details of how you have executed cmake?

Hello @sbalint98 , sorry for the delay.
Here are my steps to build. Please let me know if that helps.

llvm=llvm_version
lnx_cuda=/project/mmeterel/tools/dpc++/opensource/${llvm}/lnx_cuda
export CXX=${lnx_cuda}/compiler/bin/clang++
export LD_LIBRARY_PATH=${lnx_cuda}/compiler/lib:${LD_LIBRARY_PATH}

export REF_BLAS_ROOT=/project/mmeterel/home/.conan/data/lapack/3.7.1/conan/stable/package/9a51b86f574d44d4c0a85f83edb89eaaf9159b86
export NETLIB_ROOT=/project/mmeterel/home/.conan/data/lapack/3.7.1/conan/stable/package/9a51b86f574d44d4c0a85f83edb89eaaf9159b86
export BUILD_SHARED_LIB=ON
export BUILD_DOC=OFF
export ENABLE_CUBLAS_BACKEND=ON
export ENABLE_CURAND_BACKEND=OFF
export ENABLE_MKLGPU_BACKEND=OFF
export ENABLE_MKLCPU_BACKEND=OFF
export ENABLE_NETLIB_BACKEND=OFF
export TARGET_DOMAINS=blas

export CMAKE_ROOT=/project/mmeterel/tools/cmake-3.16.0-Linux-x86_64
export PATH="${CMAKE_ROOT}/bin:${PATH}"

mkdir build_nvidia
cd build_nvidia

cmake .. -DMKL_ROOT=${MKL_ROOT} -DREF_BLAS_ROOT=${REF_BLAS_ROOT} -DBUILD_SHARED_LIBS=${BUILD_SHARED_LIB} -DENABLE_CUBLAS_BACKEND=${ENABLE_CUBLAS_BACKEND} -DENABLE_MKLGPU_BACKEND=${ENABLE_MKLGPU_BACKEND} -DENABLE_MKLCPU_BACKEND=${ENABLE_MKLCPU_BACKEND} -DENABLE_NETLIB_BACKEND=${ENAB
LE_NETLIB_BACKEND} -DTARGET_DOMAINS=${TARGET_DOMAINS} -DENABLE_CURAND_BACKEND=${ENABLE_CURAND_BACKEND}
cmake --build . -j8

@mmeterel
Copy link
Contributor

mmeterel commented Jun 7, 2021

@sbalint98 Did you get a chance to try the commands I sent? Do you still see the issue?

@sbalint98
Copy link
Contributor Author

Sorry, for the very long delay, I will try the commands very soon, and get back to you.

@sbalint98
Copy link
Contributor Author

My problem was that I didn't use the -DTARGET_DOMAINS flag, and therefore all the domains have been added. Using that flag has solved my problem.

The issue I see right now, and the one that I have probably have encountered previously and you also noted in your table, arises when both rng and blas are targeted, and either the cuRAND or cuBLAS backends are enabled. In that case, the compilation failed for me with every combination of cuRAND and cuBLAS backends enabled. If this is not a legal configuration, would you agree that it would be nice to have the error shown during the configuration?

@Michoumichmich
Copy link

Michoumichmich commented Jun 9, 2021

You can try the setup script I use there, but the combination is not supported yet for testing

@sbalint98
Copy link
Contributor Author

I understand, thanks for the link and your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants