New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU-only c++ extension libraries (functorch, torchtext) built against PyTorch wheels are not fully compatible with PyTorch wheels #80489
Comments
for some more context: this is currently blocking the functorch release. We've brainstormed a couple of options for now:
|
+1 for Option 5 (from Ed). We plan on dropping cu102 for the next 1.13 release here is the reference issue: 1026 |
Sure, but problem is bigger than cu102: i.e. if we release PyTorch, do we force devs to use exactly the same version of comipler to build extension, to do we allow some leeway here. If later, we need to figure out what is going on. |
Yes I agree we need to figure out whats goin on anyways just to understand all our possible options here |
Does the devtoolset change (gcc 9 vs 7) also apply to the conda binaries? (I'm trying to determine if we need to build conda binaries as well) In the past functorch has not published conda binaries (instead, our pip wheels have worked with pytorch pip wheels and conda binaries, but maybe this is not expected) |
Yes its the same with conda ref 1030 |
We already force people to run same version of compiler.
In fact, I'm guessing upgrading the devtoolset fixes #51039 |
We can update toolset as frequently as we want, but we can't get rid of |
To reproduce the functorch failures easily:
To reproduce the torchtext problems:
|
We're unblocking the functorch release by going with Option 5 (drop support for cuda 10.2), but we should still continue to root-cause this (because it may matter for the future, even if we drop cuda 10.2 support from PyTorch) |
We can probably add a check for this in our binary smoketest as well to make sure we account for this |
Problem originates from the fact that cu102 binaries are compiled with gcc-7 (as CUDA-10.2 is not compatible with gcc-9), but rest of the wheels/conda packages are built using gcc-9. There are slightly C++ ABI change between two compilers, see https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html, which is preserved in I.e. for We should add the check that all PyTorch Linux nightly binaries are shipped with the same ABI suffix |
fyi torchdata release seems to have the same issue because it builds binaries against one of the PyTorch binaries. (https://github.com/pytorch/data/blob/release/0.4.0/.github/workflows/_build_test_upload.yml#L57). So it will also need a dot release cc @ejguan |
@zou3519 Thanks for flagging this issue. I don't think this would affect torchdata though, because we only provide cpu binaries and torchdata only depends on PyTorch PyThon API rather than libtorch. Let me test Edit: It works for torchdata (0.4.0) with torch-cu102 (1.12.0) |
@ejguan and I discussed offline, torchdata isn't impacted because it doesn't depend on libtorch |
…aries (#81058) (#81058) Summary: Fixes: #80489 Test using cuda 11.3 manywheel binary: ``` import torch print(torch.__version__) print(torch._C._PYBIND11 (d55b25a633b7e2e6122becf6dbdf0528df6e8b13)_BUILD_ABI) ```` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 ``` Functorch test torch : 1.13.0.dev20220707+cu113, functorch with cu102 ``` import torch print(torch.__version__) print(torch._C._PYBIND11 (d55b25a633b7e2e6122becf6dbdf0528df6e8b13)_BUILD_ABI) from functorch import vmap x = torch.randn(2, 3, 5) vmap(lambda x: x, out_dims=3)(x) ``` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 /home/atalman/temp/testc1.py:5: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:73.) x = torch.randn(2, 3, 5) Traceback (most recent call last): File "/home/atalman/temp/testc1.py", line 6, in <module> vmap(lambda x: x, out_dims=3)(x) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 361, in wrapped return _flat_vmap( File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 488, in _flat_vmap return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched flat_outputs = [ File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp> _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim) IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3) ``` Related Builder PR: pytorch/builder#1083 Test PR: #81232 Pull Request resolved: #81058 Approved by: https://github.com/zou3519, https://github.com/malfet Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d552ba3b4f53da9b6a5f6e0463111e43b367ef8a Reviewed By: DanilBaibak Differential Revision: D37813240 Pulled By: atalman fbshipit-source-id: 94d94e777b0e9d5da106173c06117b3019ba71c4
…aries (pytorch#81058) (pytorch#81058) Summary: Fixes: pytorch#80489 Test using cuda 11.3 manywheel binary: ``` import torch print(torch.__version__) print(torch._C._PYBIND11 (pytorch@d55b25a633b7e2e6122becf6dbdf0528df6e8b13)_BUILD_ABI) ```` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 ``` Functorch test torch : 1.13.0.dev20220707+cu113, functorch with cu102 ``` import torch print(torch.__version__) print(torch._C._PYBIND11 (pytorch@d55b25a633b7e2e6122becf6dbdf0528df6e8b13)_BUILD_ABI) from functorch import vmap x = torch.randn(2, 3, 5) vmap(lambda x: x, out_dims=3)(x) ``` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 /home/atalman/temp/testc1.py:5: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:73.) x = torch.randn(2, 3, 5) Traceback (most recent call last): File "/home/atalman/temp/testc1.py", line 6, in <module> vmap(lambda x: x, out_dims=3)(x) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 361, in wrapped return _flat_vmap( File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 488, in _flat_vmap return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched flat_outputs = [ File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp> _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim) IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3) ``` Related Builder PR: pytorch/builder#1083 Test PR: pytorch#81232 Pull Request resolved: pytorch#81058 Approved by: https://github.com/zou3519, https://github.com/malfet Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d552ba3b4f53da9b6a5f6e0463111e43b367ef8a Reviewed By: DanilBaibak Differential Revision: D37813240 Pulled By: atalman fbshipit-source-id: 94d94e777b0e9d5da106173c06117b3019ba71c4
…aries (#81058) (#81058) (#81884) Summary: Fixes: #80489 Test using cuda 11.3 manywheel binary: ``` import torch print(torch.__version__) print(torch._C._PYBIND11 (d55b25a633b7e2e6122becf6dbdf0528df6e8b13)_BUILD_ABI) ```` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 ``` Functorch test torch : 1.13.0.dev20220707+cu113, functorch with cu102 ``` import torch print(torch.__version__) print(torch._C._PYBIND11 (d55b25a633b7e2e6122becf6dbdf0528df6e8b13)_BUILD_ABI) from functorch import vmap x = torch.randn(2, 3, 5) vmap(lambda x: x, out_dims=3)(x) ``` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 /home/atalman/temp/testc1.py:5: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:73.) x = torch.randn(2, 3, 5) Traceback (most recent call last): File "/home/atalman/temp/testc1.py", line 6, in <module> vmap(lambda x: x, out_dims=3)(x) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 361, in wrapped return _flat_vmap( File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 488, in _flat_vmap return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched flat_outputs = [ File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp> _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim) IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3) ``` Related Builder PR: pytorch/builder#1083 Test PR: #81232 Pull Request resolved: #81058 Approved by: https://github.com/zou3519, https://github.com/malfet Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d552ba3b4f53da9b6a5f6e0463111e43b367ef8a Reviewed By: DanilBaibak Differential Revision: D37813240 Pulled By: atalman fbshipit-source-id: 94d94e777b0e9d5da106173c06117b3019ba71c4
🐛 Describe the bug
When installing functorch alongside a different PyTorch wheel (torch 1.12 {cpu, cu102, cu113, cu116}) than it was built with, we are experiencing either
import functorch
These seem to stem from different symbols existing in the torch (cpu, cu113, cu116) wheels vs the torch (cu102) wheels. Possibly related: pytorch/builder#1028 .
We (@malfet and I) are not sure if this is a problem with PyTorch or the way we build extensions. FWIW this did not happen during the last functorch releases (0.1.x).
functorch repro
See pytorch/functorch#916 for original issue.
Case 1: built functorch against the torch 1.12 (cpu) wheels.
import torch; import functorch
errors with missing symbol_ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEEC1Ev
Case 2: built functorch against the torch 1.12 (cu102) wheels
repro.py
gives the expected outputExpected output
unexpected output
The exception handling appears to be incorrect.
torchtext repro
torchtext is built against torch (cu102).
When installing torchtext with torch (cpu) and running the above two lines, we get the following error message:
error message
This exhibits the same behavior as the functorch repo; it is not expected that there is additional information about the c++ stack trace.
Versions
PyTorch 1.12 (latest release)
torchtext 0.13 (latest release)
functorch RC binaries
cc @ezyang @gchanan @zou3519 @malfet @seemethere
The text was updated successfully, but these errors were encountered: