-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Description
Hello,
I am very new to building pytorch, I managed to build it in CentOS 6.9 without MAGMA support and its my turn to build with MAGMA support. My build fails with somewhere in the middle of compiling 4638 objects. And, of course, I have no idea, why this fails.
Below is my build script, build summary output snippet, and output of the object failing to build.
build script:
module purge
module load python/3.8.2-2
#module load cuda/10.0
#module load cudnn/7.6.5
module load magma/2.5.3
module load gflags/2.2.2
module load glog/0.4.0
PKG_CONFIG_PATH="/data/apps/python/3.8.2-2/lib/pkgconfig:$PKG_CONFIG_PATH"
CUDA_HOME=/data/apps/cuda/10.0
CUDNN_LIB_DIR=/data/apps/cudnn/7.6.5/lib64
CUDNN_INCLUDE_DIR=/data/apps/cudnn/7.6.5/include
MAX_JOBS=48
#export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
#export BLAS=openblas
#export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
MAGMA_HOME=/data/apps/magma/2.5.3
#export MAGMA_INCLUDE_DIR=/data/apps/python/3.8.2-2/include
#export MAGMA_LIBRARIES=/data/apps/python/3.8.2-2/lib
#export USE_SYSTEM_LIBS=ON
USE_MKL=0
USE_MKLDNN=0
USE_MKLML=0
BUILD_TEST=0
CMAKE_BUILD_TYPE=Debug
USE_MKLDNN=0
USE_GFLAGS=1
USE_GLOG=1
export LD_LIBRARY_PATH="/data/apps/python/3.8.2-2/lib:$LD_LIBRARY_PATH"
export TORCH_CUDA_ARCH_LIST="5.0"
#export USE_GLOG=0
export CFLAFGS="-I/data/apps/glog/0.4.0/include $CFLAGS"
rm -rf /tmp/pytorch.log
make clean
python3 setup.py clean
git submodule update --init
python3 setup.py install | tee -a /tmp/pytorch.log
I installed the following packages in my python install:
$ pip3 install ninja pyyaml setuptools cmake cffi numpy
Attached is my build log file.
pytorch.log
The error is mkl-dnn related, but I did turn off mkl-dnn support ( note in the above mentioned build script ). Not sure why this happens. I am using pytorch version 1.6.0.
Below is the snippet of the error from my terminal (also available in attached log file):
[2430/4638] Building CXX object third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/rnn/ref_rnn.cpp.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
-- Building version 1.6.0a0+cb27067
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/dfs2/app-sources/pytorch-apr24/pytorch/torch -DCMAKE_PREFIX_PATH=/data/apps/python/3.8.2-2/lib/python3.8/site-packages -DJAVA_HOME=/usr/java/latest -DNUMPY_INCLUDE_DIR=/data/apps/python/3.8.2-2/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/data/apps/python/3.8.2-2/bin/python3 -DPYTHON_INCLUDE_DIR=/data/apps/python/3.8.2-2/include/python3.8 -DPYTHON_LIBRARY=/data/apps/python/3.8.2-2/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=1.6.0a0+cb27067 -DUSE_NUMPY=True /dfs2/app-sources/pytorch-apr24/pytorch
cmake --build . --target install --config Release -- -j 24
File "setup.py", line 740, in <module>
build_deps()
File "setup.py", line 316, in build_deps
build_caffe2(version=version,
File "/dfs2/app-sources/pytorch-apr24/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
cmake.build(my_env)
File "/dfs2/app-sources/pytorch-apr24/pytorch/tools/setup_helpers/cmake.py", line 340, in build
self.run(build_args, my_env)
File "/dfs2/app-sources/pytorch-apr24/pytorch/tools/setup_helpers/cmake.py", line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File "/data/apps/python/3.8.2-2/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '24']' returned non-zero exit status 1.
If don't include MAGMA support( I am using 2.5.3 ), build passes without issues. Can you please give me a hand with this?
thanks a lot.
cc @malfet