Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@
[submodule "third_party/libnop"]
path = third_party/libnop
url = https://github.com/google/libnop.git
[submodule "third_party/hipify"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC, this dependency does not seem existing in PyTorch's .gitmodules. Any reason for this difference?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's because PyTorch currently has hipify logic as part of its source code at https://github.com/pytorch/pytorch/blob/master/torch/utils/hipify
We can't use that for Tensorpipe as it'll create a circular dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in PyTorch hipify is not used as a git submodule. For new project to hipify we are using hipify-torch repo as a git submodule, so that all the hipification code is in a single place, providing many interfaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to change PyTorch's hipify to use the same strategy?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we would like to in the long run, but PyTorch being a much bigger codebase, and having a more complex hipification strategy, will require much more coordination and effort.

path = third_party/hipify
url = https://github.com/ROCmSoftwarePlatform/hipify-torch.git
7 changes: 7 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ include(Sanitize)
# Misc checks to cope with various compiler modes.
include(MiscCheck)

# ROCm related
if (TP_USE_ROCM)
list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/third_party/hipify/cmake")
include(Hipify)
hipify(CUDA_SOURCE_DIR ${PROJECT_SOURCE_DIR})
endif()

add_subdirectory(tensorpipe)

install(EXPORT TensorpipeTargets
Expand Down
6 changes: 6 additions & 0 deletions cmake/Options.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ endmacro()

# TODO: Default to ON if CUDA available.
option(TP_USE_CUDA "Enable support for CUDA tensors" OFF)
option(TP_USE_ROCM "Enable support for ROCM tensors" OFF)

# if both TP_USE_CUDA and TP_USE_ROCM is set then break
if(TP_USE_CUDA AND TP_USE_ROCM)
message(FATAL_ERROR "Tensorpipe can be built either for CUDA or ROCm, TP_USE_CUDA and TP_USE_ROCM both are set, erroring out!!!!")
endif()
Comment on lines +37 to +39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting to do this now, but how difficult would that limitation be to lift?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first version of ROCm support we intend to still use the CudaBuffer name (and the cuda_ipc, ... ones) for the HIP version of TensorPipe as well, out of simplicity, hence using the CUDA and HIP versions together would cause name conflicts. Eventually I think we could/should fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are thinking tensorpipe build for CUDA or ROCm will be mutual exclusive, since pyTorch lib is also mutual exclusive.
Even for channels there will be only hip based channels, hip_basic, hip_gdr, hip_ipc, ... .
So the builds will have corresponding flags turned ON. So I didn't get whats the concern here. Please let me know more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question here was motivated by the fact that in principle there shouldn't be any hard blocker to have both CUDA and ROCm (once we fix the name conflicts) right? Hence this is mainly a comment about "code style". In practice though yes, when used from PyTorch they will be mutually exclusive, hence no worries about this.


# Optional features
option(TP_BUILD_BENCHMARK "Build benchmarks" OFF)
Expand Down