Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set c++17 standard in CMake for recent torch/cuda versions #109

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

RaulPPelaez
Copy link
Contributor

Compiling with CUDA 12 and a very recent pytorch version (such as v2.1.0 from the nightly) will make compilation fail because C++17 is required to compile pytorch:

(test7) $ Torch_DIR=$(python -c 'import torch;print(torch.utils.cmake_prefix_path)')  cmake -DCMAKE_BUILD_TYPE=Release ..                                 
make -j15                                                                                                                                                                    
-- The CXX compiler identification is GNU 12.3.0                                                                                                                             
-- Detecting CXX compiler ABI info                                                                                                                                           
-- Detecting CXX compiler ABI info - done                                                                                                                                    
-- Check for working CXX compiler: /shared/raul/mambaforge/envs/test7/bin/x86_64-conda-linux-gnu-c++ - skipped                                                               
-- Detecting CXX compile features                                                                                                                                            
-- Detecting CXX compile features - done                                                                                                                                     
-- The CUDA compiler identification is NVIDIA 12.1.105                                                                                                                       
-- Detecting CUDA compiler ABI info                                                                                                                                          
-- Detecting CUDA compiler ABI info - done                                                                                                                                   
-- Check for working CUDA compiler: /shared/raul/mambaforge/envs/test7/bin/nvcc - skipped                                                                                    
-- Detecting CUDA compile features                                                                                                                                           
-- Detecting CUDA compile features - done                                                                                                                                    
-- Found Python3: /shared/raul/mambaforge/envs/test7/bin/python3.11 (found version "3.11.0") found components: Interpreter Development Development.Module Development.Embed  
-- Found CUDA: /shared/raul/mambaforge/envs/test7 (found version "12.1")                                                                                                     
-- Found CUDAToolkit: /shared/raul/mambaforge/envs/test7/include (found version "12.1.105") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD                                            
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed                                   
-- Looking for pthread_create in pthreads                                             
-- Looking for pthread_create in pthreads - not found                                 
-- Looking for pthread_create in pthread                                              
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Caffe2: CUDA detected: 12.1
-- Caffe2: CUDA nvcc is: /shared/raul/mambaforge/envs/test7/bin/nvcc
-- Caffe2: CUDA toolkit directory: /shared/raul/mambaforge/envs/test7
-- Caffe2: Header version is: 12.1
-- /shared/raul/mambaforge/envs/test7/lib/libnvrtc.so shorthash is 8144a3bc      
-- USE_CUDNN is set to 0. Compiling without cuDNN support                          
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support                                                                                                          -- Autodetected CUDA architecture(s):  8.9 8.9 8.9 8.9                                                                                                                       
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89                                                                                                            
-- MKL_ARCH: intel64                                                                                                                                                         
-- MKL_ROOT /shared/raul/mambaforge/envs/test7                                                                                                                               
-- MKL_LINK: dynamic                                                                                                                                                         
-- MKL_INTERFACE_FULL: intel_ilp64                                                                                                                                           
-- MKL_THREADING: intel_thread                                                                                                                                               
-- MKL_MPI: intelmpi                                                                                                                                                         
CMake Warning at /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):                 
  static library kineto_LIBRARY-NOTFOUND not found.                                                                                                                          
Call Stack (most recent call first):                                                                                                                                         
  /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)                                     CMakeLists.txt:13 (find_package)                                                                                                                                           
                                                                                                                                                                             
                                                                                                                                                                             
-- Configuring done (1.6s)                                                                                                                                                   
-- Generating done (0.1s)                                                                                                                                                    
-- Build files have been written to: /shared/raul/NNPOps/build                                                                                                               
(test7) $ make -j15                                                                                                                             [140/1551]
[ 21%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/BatchedNN.cpp.o                                                                                          
[ 21%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/ani/CpuANISymmetryFunctions.cpp.o                            
[ 26%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/CFConv.cpp.o                                                                                             
[ 26%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/ani/CudaANISymmetryFunctions.cu.o                                            
[ 34%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/SymmetryFunctions.cpp.o                                          
[ 34%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/CFConvNeighbors.cpp.o                                                       
[ 43%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/getNeighborPairsCPU.cpp.o                                                                      [ 43%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/getNeighborPairsCUDA.cu.o                                                                     
[ 52%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/neighbors.cpp.o                                                                                
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pmeCPU.cpp.o                                                                                         
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pme.cpp.o                                                                                            
[ 60%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/schnet/CudaCFConv.cu.o                                                                                          
[ 60%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pmeCUDA.cu.o                                                                                        
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/schnet/CpuCFConv.cpp.o                                                                                           
In file included from /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/include/torch/extension.h:4,                                                     
                 from /shared/raul/NNPOps/src/pytorch/pme/pmeCUDA.cu:1:                                                                                                      
/shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is req
uired to use PyTorch.                                                                                                                                                        
    4 | #error C++17 or later compatible compiler is required to use PyTorch.                                                                                                      |  ^~~~~                                                                                                                                                               
[ 60%] Built target copy_test   

Simply setting the standard from 14 to 17 in CMakeLists.txt fixes it.
CUDA 11 also supports C++17, but CUDA 10.2 does not. I check for this and leave it at C++14 in that case.
GCC supports C++17 since version 7, so I default it to it.

@RaulPPelaez
Copy link
Contributor Author

This is ready to merge.

@raimis raimis self-requested a review August 17, 2023 14:02
@RaulPPelaez
Copy link
Contributor Author

CUDA 11.8 build tends to fail due to some form of disk access error when installing CUDA. Must be a bug in the Jimver thingy. There is a new version, lets try with that...

@raimis
Copy link
Contributor

raimis commented Aug 17, 2023

I have purged the GA cache. If it fails, try to rerun.

@RaulPPelaez
Copy link
Contributor Author

I am not sure if I do not have rights to do so or just do not know how, but I cannot rerun the CI. I will just make a spurious commit.

@RaulPPelaez
Copy link
Contributor Author

11.8 Still refuses to download it seems.

@raimis
Copy link
Contributor

raimis commented Aug 17, 2023

[Linux (CUDA 11.8, Python 3.10, PyTorch 2.0)](https://github.com/openmm/NNPOps/actions/runs/5892449251/job/15981745203#step:1:39)
You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 0 MB

@RaulPPelaez
Copy link
Contributor Author

Do you know if this disk limit is per action or per individual check?
If it is the former maybe we can do something, for the latter I do not really know why cuda 11.2 takes more space than 11.8 as to go over the threshold.

@RaulPPelaez
Copy link
Contributor Author

This is ready for review.
With the changes in conda-forge regarding CUDA, from version 12 there is no need to install cuda at the OS level in the CI (so no Jimver/cuda github action). This is good news here because the current CI is constantly running out of space.
However, the workflow is different enough that I decided to move it to a different CI. The idea being that eventually the old one will be dropped (when CUDA 12 is the oldest version supported I guess).

I had to deal with a couple of quicks in the compilation process for pytorch 2.1 and CUDA 12. In particular:

  • torch is autodetecting wrongly the cuda archs, sending sm_35 to nvcc 12, which is deprecated. To fix it I just set TORCH_CUDA_ARCHS=8.9 to give it an example
  • conda-forge installs cuda headers to a non standard directory $CONDA_PREFIX/$targetsDir/include. For some reason this is preventing torch from finding the cuda headers. I had to set CUDA_INC_PATH manually to that directory.

@RaulPPelaez
Copy link
Contributor Author

I am using the changes to CMakeLists.txt as a patch to build this conda-forge/nnpops-feedstock#29

@RaulPPelaez
Copy link
Contributor Author

@mikemhenry I would like to merge this, but I believe the self hosted runner is not working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants