-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify libtorch and libcaffe2 #17783
Unify libtorch and libcaffe2 #17783
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress, CI passes! :)
One question is about libtorch.so - as far as I understand there's not such file any more. We probably should keep libtorch.so name because external things can depend on it. I'd imagine there are fewer dependencies on libcaffe2.so externally and we can make a symlink maybe.
I think it's getting to a point to get more people's input. @soumith @orionr @yf225 @bddppq - please take a look. Also, we have a release soon. Should we strive to land it before that? (pro: stress testing during binary builds, cons: potentially more churn)
# formerly-libtorch | ||
# ========================================================== | ||
|
||
set(TORCH_SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../torch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably just lift this main folder up to avoid ..
but we can do it in a separate PR
set(TORCH_SRCS | ||
${GENERATED_CXX_TORCH} | ||
${GENERATED_H_TORCH} | ||
${TORCH_SRC_DIR}/csrc/autograd/anomaly_mode.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be nice to dedup this list with build_variables.py in the future (not this PR) - I wonder whether we can just read the same file in in both build systems
target_compile_definitions(caffe2 PUBLIC _THP_CORE) | ||
|
||
|
||
# until they can be unified, keep these lists synced with setup.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably it's worth to look into it in the future (not sure what exact history of this duplication is) - from setup.py it seems we're building only stub.cpp, so it might be outdated comment
if (NOT WIN32) | ||
target_compile_options(caffe2 PRIVATE "-fvisibility=hidden") | ||
# FIXME kostmo | ||
# target_compile_options(caffe2 PRIVATE "-fvisibility=hidden") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after we fixed THPPointer - can we bring it on or are there still a lot of failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still failures, e.g.:
ImportError: /scratch/kostmo/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZTIN5torch3jit4NodeE
if (NOT MSVC) | ||
target_compile_options(caffe2 INTERFACE "$<$<COMPILE_LANGUAGE:CXX>:-std=c++11>") | ||
endif() | ||
#if (NOT MSVC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems redundant with this line:
set_property(TARGET caffe2 PROPERTY CXX_STANDARD 11)
Ran into this Windows build failure after rebasing (to resolve a merge conflict):
|
It's getting in a good shape - all of the following with visibility=hidden enabled:
However, before we can land this we need to decide naming problem:
@soumith @gchanan - do you have insights on above? Also asking for review from more folks: @orionr @bddppq @yf225 @pjh5 |
torch/CMakeLists.txt
Outdated
list(APPEND TORCH_PYTHON_SRCS | ||
${TORCH_SRC_DIR}/csrc/cuda/Module.cpp | ||
${TORCH_SRC_DIR}/csrc/cuda/Storage.cpp | ||
${TORCH_SRC_DIR}/csrc/cuda/Stream.cpp | ||
${TORCH_SRC_DIR}/csrc/cuda/Event.cpp | ||
${TORCH_SRC_DIR}/csrc/cuda/utils.cpp | ||
${TORCH_SRC_DIR}/csrc/cuda/comm.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this is the reason causing the undefined symbol in rocm build? (although I don't understand why cuda build doesn't see the same issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I saw it has been moved to caffe2/CMakeLists.txt. But from the generated build.ninja file, comm.cpp is not included in the rocm build.
endif () | ||
|
||
if (USE_CUDA) | ||
list(APPEND Caffe2_GPU_SRCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to be added to Caffe2_HIP_SRCS if USE_ROCM
So apparently windows CI (on jenkins) doesn't start if the merge doesn't go smoothly. Brace for rebase I guess. |
caffe2/CMakeLists.txt
Outdated
# Only necessary on Windows? | ||
# Contains "caffe2" and "caffe2_gpu". | ||
file(WRITE ${CMAKE_BINARY_DIR}/empty.cpp "") | ||
add_library(torch STATIC ${CMAKE_BINARY_DIR}/empty.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this solve on windows? It looks like torch is just a wrapper around caffe2 and caffe2_gpu; what do we need this target for?
caffe2/CMakeLists.txt
Outdated
file(WRITE ${DUMMY_EMPTY_FILE} "") | ||
|
||
if (MSVC) | ||
set(DUMMY_FILE_CONTENT "__declspec(dllexport) int ignore_this_library_placeholder(){return 0;}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs escaping before ;
, that is
set(DUMMY_FILE_CONTENT "__declspec(dllexport) int ignore_this_library_placeholder(){return 0\\;}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By exporting this function, torch.lib
will be generated, but it will not reference any other libs because it is not calling any funcs here, so it will only contain the placeholder func in the symbol table, which is quite useless.
I don't know why the empty stub is needed here. Maybe it was used to keep the backward compatibility? But it was not true, we are already appending caffe2.lib
and c10.lib
when we do find_package(Torch REQUIRED)
here when BUILD_SHARED_LIB
is ON
. Or maybe we just to pretend torch.dll
still exists?
Anyway, if you still want to keep the dependency link between torch.lib
and caffe2.lib
/ caffe_gpu.lib
, then you have to call at least one func from these libs in empty.cpp
and switch off the optimizing flags. Here is my PR for your reference: https://github.com/pytorch/pytorch/pull/16071/files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why the empty stub is needed here. Maybe it was used to keep the backward compatibility?
Relaying an answer from Dima:
primarily to keep the shared library file name to be torch, not caffe2 to make sure that people builds and extensions don’t break
Alternatively we could just generate only libtorch instead of libcaffe2 but it’d make the PR bigger and also would still have the same issue for C2 users ( but the latter part might be fine)
As for the build timeout issue for caffe2 win builds, I think that you can try the following things:
|
@pytorchbot retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kostmo is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: #17783 has made ninja and makefile builds to print out build commands unconditionally, this has made the build log very verbose, e.g. ROCm CI build log becomes >13mb. Large build log make searching for real error hard. #20508 has reverted the ninja change, and this one reverts the makefile change. Pull Request resolved: #21053 Differential Revision: D15533412 Pulled By: bddppq fbshipit-source-id: ad89b617d06acc670d75d4cf25111a4081e9c95e
Summary: This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR. The shell `libtorch` library that had been introduced as a stopgap in #17783 is again flattened in this PR. Pull Request resolved: #20774 Differential Revision: D15769965 Pulled By: kostmo fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
Summary: This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR. The shell `libtorch` library that had been introduced as a stopgap in pytorch/pytorch#17783 is again flattened in this PR. Pull Request resolved: pytorch/pytorch#20774 Differential Revision: D15769965 Pulled By: kostmo fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch". This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so". This means that, for now,
caffe2/CMakeLists.txt
becomes a lot bigger, andtorch/CMakeLists.txt
becomes smaller.The torch Python bindings (
torch_python.so
) still remain intorch/CMakeLists.txt
.The follow-up to this PR will rename references to
caffe2
totorch
, and flatten the shell into one library.