Installation with CUDA11 #20

constantinpape · 2021-09-21T12:37:13Z

Hi,

I tried to install RAMA with CUDA11, using the pip installation instruction (python -m pip install git+https://github.com/pawelswoboda/RAMA.git). Using

CUDA version 11.1
GCC version 10.2
CMAKE version 3.20

This fails with the following error:

  /tmp/pip-req-build-d9hjyjt3/external/cudaMST/gpuMST/gpuMST.cu(138): warning: function "__any"                                         
  /g/easybuild/x86_64/CentOS/7/skylake/software/CUDAcore/11.1.1/bin/../targets/x86_64-linux/include/device_atomic_functions.h(178): here was declared deprecated ("__any() is deprecated in favor of __any_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")
  
  /g/easybuild/x86_64/CentOS/7/skylake/software/GCCcore/10.2.0/include/c++/10.2.0/tuple(566): error: pack "_UElements" does not have the same number of elements as "_Elements"
            detected during instantiation of "__nv_bool std::tuple<_Elements...>::__nothrow_constructible<_UElements...>() [with _Elements=<const thrust::device_vector<int, thrust::device_allocator<int>> &, const thrust::device_vector<int, thrust::device_allocator<int>> &, const thrust::device_vector<float, thrust::device_allocator<float>> &>, _UElements=<>]"
  /tmp/pip-req-build-d9hjyjt3/src/multicut_message_passing.cu(350): here
  
  /g/easybuild/x86_64/CentOS/7/skylake/software/GCCcore/10.2.0/include/c++/10.2.0/tuple(566): error: pack "_UElements" does not have the same number of elements as "_Elements"
            detected during:
              instantiation of "__nv_bool std::tuple<_Elements...>::__nothrow_constructible<_UElements...>() [with _Elements=<thrust::device_vector<int, thrust::device_allocator<int>> &, thrust::device_vector<int, thrust::device_allocator<int>> &, thrust::device_vector<float, thrust::device_allocator<float>> &>, _UElements=<>]"
  (1616): here
              instantiation of "std::tuple<_Elements &...> std::tie(_Elements &...) noexcept [with _Elements=<thrust::device_vector<int, thrust::device_allocator<int>>, thrust::device_vector<int, thrust::device_allocator<int>>, thrust::device_vector<float, thrust::device_allocator<float>>>]"
  /tmp/pip-req-build-d9hjyjt3/src/dCOO.cu(166): here

I also tried with CUDA 10, but ran into other issues. CUDA 11 would be much better for me in order to be compatible with our stack.

The text was updated successfully, but these errors were encountered:

aabbas90 · 2021-09-21T14:15:39Z

Hi,

Thanks for reporting. Would it be possible for you to try out with CUDA 11.2? It seems to be a known issue with CUDA 11.1 as reported on nvidia forums. Thanks!

constantinpape · 2021-09-21T15:55:45Z

I see. Unfortunately we don't have 11.2 available on our cluster yet. I will check with our cluster admins if it's possible to add 11.2.

aabbas90 · 2021-09-22T07:21:07Z

By the way, you can also install the CUDA toolkit locally. The driver does not need to be updated. I did it from:

https://developer.nvidia.com/cuda-11.2.2-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Debian&target_version=10&target_type=runfilelocal

Then run:
bash <PATH_TO_ABOVE_FILE> --toolkit --installpath=<YOURINSTALLDIR>
and add the install path to PATH, LD_LIBRARY_PATH environment variables.

constantinpape · 2021-09-23T15:24:29Z

By the way, you can also install the CUDA toolkit locally.

That's not really helpful on our cluster set-up; if I can't use one of our precompiled CUDA versions I can't deploy this and it's overall much more effort for me to setup.

We have a CUDA 11.4 set-up now and I can run the installation with it, but it fails at runtime:

Going to use GeForce RTX 2080 Ti 7.5, device number 0
Traceback (most recent call last):
  File "test_rama.py", line 10, in <module>
    node_labels = rama_py.rama_cuda(
MemoryError: std::bad_alloc: cudaErrorUnsupportedPtxVersion: the provided PTX was compiled with an unsupported toolchain.

aabbas90 · 2021-09-23T15:33:51Z

Sorry for the inconvenience.

The error with CUDA 11.4 seems like the CUDA driver is old and does not support CUDA 11.4. For 11.4 you need a driver of version >=470.42.01 as per CUDA release notes. Could you please check if the driver version satisfies this version using nvidia-smi? Thanks!

constantinpape · 2021-09-23T17:52:28Z

Sorry for the inconvenience.

No worries, that's the downside of custom CUDA code, installation is always a pain.

The error with CUDA 11.4 seems like the CUDA driver is old and does not support CUDA 11.4. For 11.4 you need a driver of version >=470.42.01 as per CUDA release notes. Could you please check if the driver version satisfies this version using nvidia-smi?

Ok, that makes sense. Indeed the driver is older Driver Version: 460.32.03. I will talk to our cluster admins what their plan on updating is.

constantinpape · 2021-09-23T18:35:03Z

Ok, I managed to install with CUDA 10.2 now. (This failed for me earlier because I didn't use the gcc that was used to compile this CUDA version.)

Thanks for the help. I will look into running some of my own benchmarks for rama and will for sure follow up with some other questions ;).

aabbas90 self-assigned this Sep 21, 2021

constantinpape closed this as completed Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation with CUDA11 #20

Installation with CUDA11 #20

constantinpape commented Sep 21, 2021

aabbas90 commented Sep 21, 2021 •

edited

constantinpape commented Sep 21, 2021 •

edited

aabbas90 commented Sep 22, 2021

constantinpape commented Sep 23, 2021

aabbas90 commented Sep 23, 2021

constantinpape commented Sep 23, 2021

constantinpape commented Sep 23, 2021

Installation with CUDA11 #20

Installation with CUDA11 #20

Comments

constantinpape commented Sep 21, 2021

aabbas90 commented Sep 21, 2021 • edited

constantinpape commented Sep 21, 2021 • edited

aabbas90 commented Sep 22, 2021

constantinpape commented Sep 23, 2021

aabbas90 commented Sep 23, 2021

constantinpape commented Sep 23, 2021

constantinpape commented Sep 23, 2021

aabbas90 commented Sep 21, 2021 •

edited

constantinpape commented Sep 21, 2021 •

edited