Enable ORT with CUDA 11 toolkit#4168
Conversation
| if (MPI_FOUND) | ||
| set(MPI_HEADER_FILE "${MPI_INCLUDE_DIR}/mpi.h") | ||
| message( STATUS "Determining MPI version from the header file: ${MPI_HEADER_FILE}" ) | ||
| file (STRINGS ${MPI_HEADER_FILE} MPI_MAJOR_VERSION_DEFINED |
There was a problem hiding this comment.
I think horovod used to use this logic to get version which proved to be unreliable since there are multiple MPI implementations out there. We can simply use something like:
execute_process(COMMAND mpirun --version RESULT_VARIABLE mpirun_output)
message("mpi version='${mpirun_output}'")
There was a problem hiding this comment.
ok, will update the PR.
| CUDNN_RETURN_IF_ERROR(cudnnCreateRNNDescriptor(&cudnn_rnn_desc_)); | ||
|
|
||
| CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle, | ||
| CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor_v6(cudnnHandle, |
There was a problem hiding this comment.
cudnnSetRNNDescriptor_v6 [](start = 26, length = 24)
should we still support older cudnn v7 here? #Closed
There was a problem hiding this comment.
No. The deprecation policy is changed in cuDNN 8. Here are the details: https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#backward-compatibility #Closed
1. Seperate HOROVOD and MPI 2. Seperate NCCL from HOROVOD in CMakeLists.txt 2. Remove dependency on external cub 3. cudnnSetRNNDescriptor is changed in cuDNN 8.0
5804660 to
fe6b081
Compare
CUDA 11 toolkit has been released on June 5.
Here are the details in this PR.
TODO:
Ampere(sm_80) support will be added later since some failures happen during compiling CUDA codes due to NV compiler issue.