Skip to content

Please help, building failing with MAGMA support #38212

@techie879

Description

@techie879

Hello,

I am very new to building pytorch, I managed to build it in CentOS 6.9 without MAGMA support and its my turn to build with MAGMA support. My build fails with somewhere in the middle of compiling 4638 objects. And, of course, I have no idea, why this fails.

Below is my build script, build summary output snippet, and output of the object failing to build.

build script:

module purge

module load python/3.8.2-2
#module load cuda/10.0
#module load cudnn/7.6.5
module load magma/2.5.3
module load gflags/2.2.2
module load glog/0.4.0
PKG_CONFIG_PATH="/data/apps/python/3.8.2-2/lib/pkgconfig:$PKG_CONFIG_PATH"

CUDA_HOME=/data/apps/cuda/10.0
CUDNN_LIB_DIR=/data/apps/cudnn/7.6.5/lib64
CUDNN_INCLUDE_DIR=/data/apps/cudnn/7.6.5/include
MAX_JOBS=48
#export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
#export BLAS=openblas
#export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
MAGMA_HOME=/data/apps/magma/2.5.3
#export MAGMA_INCLUDE_DIR=/data/apps/python/3.8.2-2/include
#export MAGMA_LIBRARIES=/data/apps/python/3.8.2-2/lib
#export USE_SYSTEM_LIBS=ON
USE_MKL=0
USE_MKLDNN=0
USE_MKLML=0
BUILD_TEST=0
CMAKE_BUILD_TYPE=Debug
USE_MKLDNN=0
USE_GFLAGS=1
USE_GLOG=1
export LD_LIBRARY_PATH="/data/apps/python/3.8.2-2/lib:$LD_LIBRARY_PATH"
export TORCH_CUDA_ARCH_LIST="5.0"
#export USE_GLOG=0
export CFLAFGS="-I/data/apps/glog/0.4.0/include $CFLAGS"
rm -rf /tmp/pytorch.log
make clean
python3 setup.py clean
git submodule update --init
python3 setup.py install | tee -a /tmp/pytorch.log

I installed the following packages in my python install:

$ pip3 install ninja pyyaml setuptools cmake cffi numpy

Attached is my build log file.
pytorch.log

The error is mkl-dnn related, but I did turn off mkl-dnn support ( note in the above mentioned build script ). Not sure why this happens. I am using pytorch version 1.6.0.

Below is the snippet of the error from my terminal (also available in attached log file):

[2430/4638] Building CXX object third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/rnn/ref_rnn.cpp.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
-- Building version 1.6.0a0+cb27067
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/dfs2/app-sources/pytorch-apr24/pytorch/torch -DCMAKE_PREFIX_PATH=/data/apps/python/3.8.2-2/lib/python3.8/site-packages -DJAVA_HOME=/usr/java/latest -DNUMPY_INCLUDE_DIR=/data/apps/python/3.8.2-2/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/data/apps/python/3.8.2-2/bin/python3 -DPYTHON_INCLUDE_DIR=/data/apps/python/3.8.2-2/include/python3.8 -DPYTHON_LIBRARY=/data/apps/python/3.8.2-2/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=1.6.0a0+cb27067 -DUSE_NUMPY=True /dfs2/app-sources/pytorch-apr24/pytorch
cmake --build . --target install --config Release -- -j 24
  File "setup.py", line 740, in <module>
    build_deps()
  File "setup.py", line 316, in build_deps
    build_caffe2(version=version,
  File "/dfs2/app-sources/pytorch-apr24/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/dfs2/app-sources/pytorch-apr24/pytorch/tools/setup_helpers/cmake.py", line 340, in build
    self.run(build_args, my_env)
  File "/dfs2/app-sources/pytorch-apr24/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/data/apps/python/3.8.2-2/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '24']' returned non-zero exit status 1.

If don't include MAGMA support( I am using 2.5.3 ), build passes without issues. Can you please give me a hand with this?

thanks a lot.

cc @malfet

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: buildBuild system issuestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions