Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch rename #20774

Closed
wants to merge 1 commit into from

Conversation

@kostmo
Copy link
Member

commented May 21, 2019

This renames the CMake caffe2 target to torch, as well as renaming caffe2_gpu to torch_gpu (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of caffe2 from CMake variable names is beyond the scope of this PR.

The shell libtorch library that had been introduced as a stopgap in #17783 is again flattened in this PR.

@kostmo

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2019

Current error in Windows build is:

C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build\c10/util/Optional.h(601): error: identifier "Typeinfo for  ::c10::bad_optional_access" is undefined in device code

Error: Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!"

CMake Error at torch_generated_gelu_op.cu.obj.Release.cmake:279 (message):
  Error generating file
  C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build/build/caffe2/CMakeFiles/torch.dir/operators/./torch_generated_gelu_op.cu.obj


[291/1733] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/operators/torch_generated_given_tensor_byte_string_to_uint8_fill_op.cu.obj
@dzhulgakov

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

Hm, looks like we're triggering some bug in MSVC/cuda interaction:

C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build\c10/util/Optional.h(601): error: identifier "Typeinfo for  ::c10::bad_optional_access" is undefined in device code

C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build\c10/util/Optional.h(601): error: identifier "Typeinfo for  ::c10::bad_optional_access" is undefined in device code

C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build\c10/util/Optional.h(601): error: identifier "Typeinfo for  ::c10::bad_optional_access" is undefined in device code

C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build\c10/util/Optional.h(602): error: identifier "Typeinfo for  ::c10::bad_optional_access" is undefined in device code

C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build\c10/util/Optional.h(601): error: identifier "Typeinfo for  ::c10::bad_optional_access" is undefined in device code

Error: Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!"

CMake Error at torch_generated_layer_norm_op.cu.obj.Release.cmake:279 (message):
  Error generating file
  C:/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-build/build/caffe2/CMakeFiles/torch.dir/operators/./torch_generated_layer_norm_op.cu.obj

It seems similar to #11203 and 35c8f93 . I'd guess constexpr's aren't handled correctly.

I have no clue though on why changes in CMake triggered that. We could just ifdef-out constexpr version for visual studio. But I want to first hear insights from @peterjc123 @ezyang @smessmer .

@ezyang

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2019

@kostmo kostmo force-pushed the kostmo:torch-rename branch from 2b98d7d to 90120e8 Jun 11, 2019

@kostmo kostmo changed the title [WIP] Torch rename Torch rename Jun 11, 2019

@dzhulgakov
Copy link
Member

left a comment

It's happening!!!

@facebook-github-bot
Copy link
Contributor

left a comment

@kostmo is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

@kostmo merged this pull request in 49481d5.

@kostmo kostmo deleted the kostmo:torch-rename branch Jun 13, 2019

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 13, 2019

Torch rename (#20774)
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants).  Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.

The shell `libtorch` library that had been introduced as a stopgap in pytorch/pytorch#17783 is again flattened in this PR.
Pull Request resolved: pytorch/pytorch#20774

Differential Revision: D15769965

Pulled By: kostmo

fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
@kostmo

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

kostmo added a commit to kostmo/pytorch that referenced this pull request Jun 13, 2019

@soumith

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

dont revert it, we'll issue fixes in torch_xla for this.

@soumith

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

cc: @ailzhang seems like a very simple fix

@soumith

This comment has been minimized.

@kostmo

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

The pytorch_xla_linux_trusty_py3_6_gcc5_4_build build is green again in master somehow after #21718 merged

facebook-github-bot added a commit that referenced this pull request Jun 17, 2019

Attempt to fix TRT build after library merge (#21775)
Summary:
After fixing #20774 the TRT build was broken

Because of missing annotations, pybind_state_gpu.so was missing symbols, but pybind_state.so did not. It caused a weird combination when trying to import pybind_state_gpu first left system in semi-initialized state and lead to sigsev.

Minimal repro:
```
>>> import ctypes

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so: undefined symbol: _ZN6caffe219TensorRTTransformer9TransformEPNS_9WorkspaceEPNS_6NetDefERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_11TensorShapeESt4hashISB_ESt8equal_toISB_ESaISt4pairIKSB_SC_EEE

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state.so')
Segmentation fault (core dumped)
```

Too lazy to repro locally, let's see if CI passes
Pull Request resolved: #21775

Differential Revision: D15829605

Pulled By: dzhulgakov

fbshipit-source-id: 1adb2bde56b0cd68f84cfca67bc050adcf787cd9
@xkszltl

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

@kostmo

  1. There're several "caffe2_library" and "caffe2_gpu_library" still in the comment of caffe2_rocksdb's CMakeLists.txt
  2. I saw you mention torch_gpu_library in the commit description, but I couldn't find it anywhere, could you point me to the right place?
@kostmo

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2019

There're several "caffe2_library" and "caffe2_gpu_library" still in the comment of caffe2_rocksdb's CMakeLists.txt

Yes, cleaning up comments and other variable names will come in a follow-up commit. This PR was kept as small as possible to minimize merge conflicts.

torch_gpu_library

I guess the commit message did not reflect the eventual decision to combine the cpu and gpu code into a single library. There is no separage torch_gpu.so library now.

@xkszltl

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

I see.
So will it still be possible to run cuda build on non-cuda machine in cpu only mode (It's probably ok to assume cuda is installed if needed)

@dzhulgakov

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

@xkszltl - if CUDA is installed - yes, the cuda build would be runnable (that has been the mode for PyTorch since beginning).

Caffe2 build used to maintain stronger contract: even if built with USE_CUDA=ON, the build would produce separate libcaffe2 and libcaffe2_gpu libraries that, when copied to another cpu machine without cuda installed, would still be runnable (i.e. the libcaffe2 one). Maintaining this property is pretty tedious and we decided to drop it (as it was never true for PyTorch build).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.