Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

linked error of Pytorch 1.0 release #15291

Open
llv22 opened this issue Dec 17, 2018 · 16 comments
Open

linked error of Pytorch 1.0 release #15291

llv22 opened this issue Dec 17, 2018 · 16 comments
Labels
module: build Build system issues module: macos Mac OS related issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@llv22
Copy link

llv22 commented Dec 17, 2018

馃悰 Bug

To Reproduce

Steps to reproduce the behavior:

  1. download pytorch to local
  2. Disable USE_DISTRIBUTE=NO
  3. MACOSX_DEPLOYMENT_TARGET=10.13 CC=clang CXX=clang++ python setup.py bdist_wheel
  4. install compiled library under dist/ folder

Environment

  • PyTorch Version (e.g., 1.0): 1.0
  • OS (e.g., Linux): macOS
  • How you installed PyTorch (conda, pip, source): conda
  • Build command you used (if compiling from source): MACOSX_DEPLOYMENT_TARGET=10.13 CC=clang CXX=clang++ python setup.py bdist_wheel
  • Python version: 3.6.7
  • CUDA/cuDNN version: 9.2
  • GPU models and configuration: Nvidia 1080T
  • Xcode 8.3.2 with command line

Error as follow:

In [1]: import torch                                                                                                                                                                                      
---------------------------------------------------------------------------
ImportError                              Traceback (most recent call last)
<ipython-input-1-eb42ca6e4af3> in <module>
----> 1 import torch
~/miniconda3/lib/python3.6/site-packages/torch/__init__.py in <module>
     82    pass
     83 
---> 84 from torch._C import *
     85 
     86 __all__ += [name for name in dir(_C)
ImportError: dlopen(/Users/llv23/miniconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-darwin.so, 9): Symbol not found: _ompi_mpi_char
  Referenced from: /Users/llv23/miniconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.dylib
  Expected in: flat namespace
 in /Users/llv23/miniconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.dylib
@zou3519 zou3519 added the module: build Build system issues label Dec 17, 2018
@hoonkai
Copy link

hoonkai commented Oct 7, 2019

I'm getting the same error as well if I set USE_MPI to OFF.

@pietern pietern added the module: macos Mac OS related issues label Oct 8, 2019
@pietern
Copy link
Contributor

pietern commented Oct 8, 2019

@llv22 Thanks for reporting the issue.

Can you post back with the output of conda list? I imagine MPI gets pulled in through Caffe2 ops somehow, causing the issue even if you build with USE_DISTRIBUTED=OFF. Also can you please post the output of the CMake configuration summary (this is printed after running CMake).

@hoonkai Can you do the same? I'm curious to find out why this problem persists, even if you build with USE_MPI=OFF. It should disable MPI support for both torch.distributed and Caffe2.

@pietern pietern added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 8, 2019
@llv22
Copy link
Author

llv22 commented Oct 24, 2019

@pietern sorry for late response, just check with github notification.

conda list > pytorch.txt

output is attached as follow:

thanks again for your check-up
pytorch.txt

@pietern
Copy link
Contributor

pietern commented Oct 24, 2019

@llv22 Thank you. Can you share the CMake configuration summary (printed after executing). That should tell us where MPI gets included in the build. I don't know where the runtime problem comes from, though, since it is installed as a Conda package, so its directory should be in the library path..

@llv22
Copy link
Author

llv22 commented Oct 25, 2019

@pietern sure, but perhaps needs a little delay as the latest building trace has been lost since last building. have to re-trigger the building and scratch the logging again. will upload and @you after the trace has been uploaded.

@llv22
Copy link
Author

llv22 commented Oct 25, 2019

@pietern here are the building trace
1, building trace
building_install_details.log
building_install.log
2, build wheel trace
building_wheel.log
3, issue related with built wheel of torch

In [1]: import torch                                                            
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-eb42ca6e4af3> in <module>
----> 1 import torch

~/miniconda3/lib/python3.6/site-packages/torch/__init__.py in <module>
     82     pass
     83 
---> 84 from torch._C import *
     85 
     86 __all__ += [name for name in dir(_C)

ImportError: dlopen(/Users/llv23/miniconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-darwin.so, 9): Symbol not found: _ompi_mpi_char
  Referenced from: /Users/llv23/miniconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.dylib
  Expected in: flat namespace
 in /Users/llv23/miniconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.dylib

And I also found that there raised up quite a lot of error in the final stage of building for packaging. Thanks a lot for checking, and looking forward to your reply

@pietern
Copy link
Contributor

pietern commented Nov 20, 2019

@llv22 Thanks for proving the build logs.

It looks like MPI is pulled in through THD. This is torch.distributed from the pre-1.0 days. If you update to PyTorch 1.1 or later, THD is no longer included, and I suspect the issue will be gone as well.

@llv22
Copy link
Author

llv22 commented Nov 22, 2019

@pietern you mean I update to pyTorch 1.1, OK. don't close the ticket, I will build it and past result here.

@pietern
Copy link
Contributor

pietern commented Nov 22, 2019

@llv22 Sounds good. Let us know what happens :-)

@llv22
Copy link
Author

llv22 commented Nov 22, 2019

FAILED: c10/test/CMakeFiles/c10_string_view_test.dir/util/string_view_test.cpp.o 
/Applications/Xcode-8.3.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++  -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -D_FILE_OFFSET_BITS=64 -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -I../c10/.. -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem /Users/llv23/miniconda3/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem /Users/llv23/miniconda3/include/python3.6m -isystem /Users/llv23/miniconda3/lib/python3.6/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /opt/rocm/hip/include -isystem /include -isystem ../cmake/../third_party/cub -isystem /usr/local/cuda/include -isystem ../third_party/ideep/mkl-dnn/include -isystem ../third_party/ideep/include -isystem ../third_party/googletest/googlemock/include -isystem ../third_party/googletest/googlemock -isystem ../third_party/googletest/googletest/include -isystem ../third_party/googletest/googletest -Wno-deprecated -fvisibility-inlines-hidden -Wno-deprecated-declarations -Xpreprocessor -fopenmp -I/usr/local/include -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -fno-math-errno -fno-trapping-math -Wno-unused-private-field -Wno-missing-braces -Wno-c++14-extensions -Wno-constexpr-not-const -O3  -isysroot /Applications/Xcode-8.3.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk -mmacosx-version-min=10.13 -fPIE   -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -std=gnu++11 -MD -MT c10/test/CMakeFiles/c10_string_view_test.dir/util/string_view_test.cpp.o -MF c10/test/CMakeFiles/c10_string_view_test.dir/util/string_view_test.cpp.o.d -o c10/test/CMakeFiles/c10_string_view_test.dir/util/string_view_test.cpp.o -c ../c10/test/util/string_view_test.cpp
../c10/test/util/string_view_test.cpp:682:15: error: static_assert expression is not an integral constant expression
static_assert(1 == string_view("abc").find('b'), "");
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../c10/util/string_view.h:548:39: note: non-constexpr function 'forward<c10::basic_string_view<char>::charIsEqual_>' cannot be used in a constant expression
            : find_first_if_(pos + 1, std::forward<Condition>(condition));
                                      ^
../c10/util/string_view.h:359:12: note: in call to '&"abc"->find_first_if_(0, c10::basic_string_view<char>::charIsEqual_{ch})'
    return find_first_if_(pos, charIsEqual_{ch});
           ^
../c10/test/util/string_view_test.cpp:682:39: note: in call to '&"abc"->find(98, 0)'
static_assert(1 == string_view("abc").find('b'), "");
                                      ^
/Applications/Xcode-8.3.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/type_traits:1677:1: note: declared here
forward(typename std::remove_reference<_Tp>::type& __t) _NOEXCEPT

@pietern
seems something wrong with c++11/14. weird, static_assert has been rejected.

My mac setting as follow:
Xcode 8.3.2
OS 10.13.6
CUDA toolkit 9.2, V9.2.148

Can we have some tricks to sidestep such compilation error? looking forwards to your reply

@pietern
Copy link
Contributor

pietern commented Nov 22, 2019

@llv22 I think Xcode 8.3 is too old to use C++14 by default. I don't know how to fix this.

Paging @ezyang and @smessmer...

@llv22
Copy link
Author

llv22 commented Nov 22, 2019

@pietern so any hints for the suggested version for C++14?
It is just a conservative strategy for me to keep the last successfully compiled environment. But I think I can manage to adapt ^_^ should not be a blocker.
Let me know your advice.
PS: but if needed to upgrade both CUDA and Xcode, I may need more time to report result. Because after such upgrading, I have to recompile both pytorch, tensorflow and mxnet, which seems taxing but also archivable with a little bit longer time.

@pietern
Copy link
Contributor

pietern commented Nov 25, 2019

I'm don't use Xcode myself so don't know what the best version is. I do know that 10.13 is a couple years old and think that upgrading both macOs and Xcode should probably do the trick...

@llv22
Copy link
Author

llv22 commented Nov 25, 2019

@pietern OK, I will try to figure out later, and let me you the status.In case having chance to fix, will close this ticket then. ^_^thanks

@smessmer
Copy link
Contributor

smessmer commented Nov 25, 2019

You need at least XCode 9.0 to compile PyTorch. Also note that each CUDA version only supports exactly one XCode version. Possible combinations are

  • XCode 9.2 + CUDA 9.2
  • XCode 9.4 + CUDA 10.0
  • XCode 10.1 + CUDA 10.1

Other XCode versions might also work if they use the same clang compiler version as the XCode version officially supported by a CUDA version (I've for example seen XCode 9.0 + CUDA 9.2 working) but they're not officially supported by CUDA.

Note that we're about to remove support for XCode 9.2. Probably starting with PyTorch 1.5, you will need at least XCode 9.4.

@llv22
Copy link
Author

llv22 commented Nov 26, 2019

@smessmer great, really brilliant information. I will check and update status here. But perhaps a little delay as too busy recent months. ^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: macos Mac OS related issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants