Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when building PyTorch from source in CentOS Linux 7 #56777

Open
AntMorais opened this issue Apr 23, 2021 · 8 comments
Open

Error when building PyTorch from source in CentOS Linux 7 #56777

AntMorais opened this issue Apr 23, 2021 · 8 comments
Labels
module: build Build system issues module: tensorpipe Related to Tensorpipe RPC Agent triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@AntMorais
Copy link

AntMorais commented Apr 23, 2021

🐛 Bug

Error when building from source in CentOS 7.

To Reproduce

Steps to reproduce the behavior:

  1. Follow steps for Linux in https://github.com/pytorch/pytorch#from-source
  2. Run command python setup.py install

Build summary:

-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.18.2
--   CMake command         : /home/USER/anaconda3/bin/cmake
--   System                : Linux
--   C++ compiler          : /opt/rh/devtoolset-9/root/usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 9.3.1
--   Using ccache if found : ON
--   Found ccache          : CCACHE_PROGRAM-NOTFOUND
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : TH_BLAS_MKL;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;IDEEP_USE_MKL;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /home/USER/anaconda3
--   CMAKE_INSTALL_PREFIX  : /home/USER/pytorch/torch
-- 
--   TORCH_VERSION         : 1.9.0
--   CAFFE2_VERSION        : 1.9.0
--   BUILD_CAFFE2          : ON
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.8.5
--     Python executable   : /home/USER/anaconda3/bin/python
--     Pythonlibs version  : 3.8.5
--     Python library      : /home/USER/anaconda3/lib/libpython3.8.so.1.0
--     Python includes     : /home/USER/anaconda3/include/python3.8
--     Python site-packages: lib/python3.8/site-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_BLAS              : 1
--     BLAS                : mkl
--   USE_LAPACK            : 1
--     LAPACK              : mkl
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : 
--   USE_FBGEMM            : ON
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : ON
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : ON
--   USE_MKLDNN            : ON
--   USE_MKLDNN_CBLAS      : OFF
--   USE_NCCL              : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI             : OFF
--     USE_GLOO            : ON
--     USE_TENSORPIPE      : ON
--   USE_DEPLOY           : OFF
--   Public Dependencies  : Threads::Threads;caffe2::mkl;caffe2::mkldnn
--   Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;fp16;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl

Error:

-- Generating done
-- Build files have been written to: /home/USER/pytorch/build
cmake --build . --target install --config Release -- -j 8
[2248/5425] Building CXX object third_party/tensorpipe/tensorpipe/CMakeFiles/tensorpipe.dir/common/shm_segment.cc.o
FAILED: third_party/tensorpipe/tensorpipe/CMakeFiles/tensorpipe.dir/common/shm_segment.cc.o 
/opt/rh/devtoolset-9/root/usr/bin/c++ -DTH_BLAS_MKL -I../cmake/../third_party/benchmark/include -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../third_party/tensorpipe/third_party/libuv/include -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem /home/USER/anaconda3/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem /home/USER/anaconda3/include/python3.8 -isystem /home/USER/anaconda3/lib/python3.8/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -O3 -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -std=gnu++14 -MD -MT third_party/tensorpipe/tensorpipe/CMakeFiles/tensorpipe.dir/common/shm_segment.cc.o -MF third_party/tensorpipe/tensorpipe/CMakeFiles/tensorpipe.dir/common/shm_segment.cc.o.d -o third_party/tensorpipe/tensorpipe/CMakeFiles/tensorpipe.dir/common/shm_segment.cc.o -c ../third_party/tensorpipe/tensorpipe/common/shm_segment.cc
../third_party/tensorpipe/tensorpipe/common/shm_segment.cc: In function ‘std::tuple<tensorpipe::Error, tensorpipe::Fd> tensorpipe::{anonymous}::openTmpfileInDevShm()’:
../third_party/tensorpipe/tensorpipe/common/shm_segment.cc:79:15: error: ‘O_TMPFILE’ was not declared in this scope
   79 |   int flags = O_TMPFILE | O_EXCL | O_RDWR | O_CLOEXEC;
      |               ^~~~~~~~~
[2255/5425] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8Depthwise3DAvx2.cc.o
ninja: build stopped: subcommand failed.

Expected behavior

Expect PyTorch to be installed correctly.

Environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
Clang version: Could not collect
CMake version: version 3.20.1

Python version: 3.8 (64-bit runtime)
Is CUDA available: N/A
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip] Could not collect
[conda] blas                      1.0                         mkl  
[conda] mkl                       2020.2                      256  
[conda] mkl-include               2020.2                      256  
[conda] mkl-service               2.3.0            py38he904b0f_0  
[conda] mkl_fft                   1.2.0            py38h23d657b_0  
[conda] mkl_random                1.1.1            py38h0573a6f_0  
[conda] numpy                     1.19.2           py38h54aff64_0  
[conda] numpy-base                1.19.2           py38hfa32c7d_0  
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1

Additional context

Installing in a server through ssh.

cc @malfet @seemethere @walterddr @osalpekar @jiayisuse @lw @beauby @pritamdamania87 @mrshenli @jjlilley @gqchen @rohan-varma

@ezyang ezyang added module: build Build system issues module: tensorpipe Related to Tensorpipe RPC Agent triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 23, 2021
@ezyang
Copy link
Contributor

ezyang commented Apr 23, 2021

looks like a portability problem

@lw
Copy link
Contributor

lw commented Apr 23, 2021

What kernel and glibc versions are you using? (Unfortunately these don't seem to get included in the env report). The flag that's not found was introduced in Linux 3.11 (Sep 2013), so it shouldn't be that recent...

To get unblocked, you could modify your local checkout and add a new line after this one, with something like this:

set(TP_ENABLE_SHM OFF CACHE BOOL "" FORCE)

@walterddr
Copy link
Contributor

possibly related to pytorch/tensorpipe#305. I remember seeing this while creating that PR

@pritamdamania87
Copy link
Contributor

To get unblocked, you could modify your local checkout and add a new line after this one, with something like this:

@lw Should we expose this option as a build time flag that users can specify instead of modifying the code?

@AntMorais
Copy link
Author

Thanks @lw ! Here's the info:
glibc version:ldd (GNU libc) 2.17
kernel version: 3.10.0-514.6.2.el7.x86_64
I added the line you mentioned and had another error:

[1722/3168] Building CXX object caffe2...u.dir/__/aten/src/ATen/Functions.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o 
/opt/rh/devtoolset-9/root/usr/bin/c++ -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -I../torch/csrc/api -I../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../torch/csrc -I../third_party/miniz-2.0.8 -I../third_party/kineto/libkineto/include -I../third_party/kineto/libkineto/src -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../caffe2/core/nomnigraph/include -I../third_party/FXdiv/include -I../c10/.. -Ithird_party/ideep/mkl-dnn/include -I../third_party/ideep/mkl-dnn/src/../include -I../third_party/pthreadpool/include -I../third_party/cpuinfo/include -I../third_party/QNNPACK/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../third_party/cpuinfo/deps/clog/include -I../third_party/NNPACK/include -I../third_party/fbgemm/include -I../third_party/fbgemm -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/FP16/include -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../third_party/fmt/include -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem /home/antonio/anaconda3/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem /home/antonio/anaconda3/include/python3.8 -isystem /home/antonio/anaconda3/lib/python3.8/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem ../third_party/ideep/mkl-dnn/include -isystem ../third_party/ideep/include -isystem include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o -c aten/src/ATen/Functions.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[1729/3168] Building CXX object caffe2...ten/src/ATen/RedispatchFunctions.cpp.o
ninja: build stopped: subcommand failed.

@lw
Copy link
Contributor

lw commented Apr 26, 2021

@AntMorais Thanks for the info, indeed you have kernel 3.10, which doesn't have that feature we're using. I'm glad though that the fix I suggested managed to unblock you. The new error you're getting seems unrelated, so I'd suggest to open a new issue. At first glance, it could be due to the compiler running out-of-memory and thus getting killed. You could consider lowering the build parallelism.

@lw Should we expose this option as a build time flag that users can specify instead of modifying the code?

If we decide to support such old kernels, we should make this check automatic within the build system. We haven't yet agreed on a strict "cutoff" line for kernel versions, but since in this case the fix is quite simple I went ahead and did it in pytorch/tensorpipe#372

@AntMorais
Copy link
Author

Solved with export MAX_JOBS=4, thanks @lw !

@ProgrammerPeter
Copy link

What kernel and glibc versions are you using? (Unfortunately these don't seem to get included in the env report). The flag that's not found was introduced in Linux 3.11 (Sep 2013), so it shouldn't be that recent...

To get unblocked, you could modify your local checkout and add a new line after this one, with something like this:

set(TP_ENABLE_SHM OFF CACHE BOOL "" FORCE)

This sovles my problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: tensorpipe Related to Tensorpipe RPC Agent triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants