Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build issue in NVIDIA DLRM DCNv2 #3

Open
rgandikota opened this issue Dec 1, 2023 · 2 comments · May be fixed by #4
Open

Docker build issue in NVIDIA DLRM DCNv2 #3

rgandikota opened this issue Dec 1, 2023 · 2 comments · May be fixed by #4

Comments

@rgandikota
Copy link

rgandikota commented Dec 1, 2023

Benchmark

Docker build command

docker build -t mlperf-nvidia:recommendation_hugectr .

High-level error message

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for mpi4py
Successfully built mlperf-logging mlperf-common
Failed to build mpi4py
ERROR: Could not build wheels for mpi4py, which is required to install pyproject.toml-based projects

Environment

  • Operating System : Ubuntu 22.04
  • CUDA Version: 12.2
  • GPU : A100

Docker build logs:

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
Install the buildx component to build images with BuildKit:
https://docs.docker.com/go/buildx/

Sending build context to Docker daemon 89.6kB
Step 1/24 : ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:23.09-py3
Step 2/24 : FROM ${FROM_IMAGE_NAME}
---> c61ed1549935
Step 3/24 : ARG SM="80;90"
---> Using cache
---> a6af93610cc5
Step 4/24 : ARG ENABLE_MULTINODES=ON
---> Using cache
---> 9a4c20343726
Step 5/24 : ARG HWLOC_VERSION=2.4.1
---> Using cache
---> d22102c266d5
Step 6/24 : ARG RELEASE=true
---> Using cache
---> 9f3aa7cdc8a9
Step 7/24 : RUN apt-get update -y && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends clang-format libboost-serialization-dev libtbb-dev libaio-dev libgflags-dev zlib1g-dev libbz2-dev libsnappy-dev liblz4-dev libzstd-dev zlib1g-dev libzstd-dev libssl-dev libsasl2-dev && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 9938695d9093
Step 8/24 : ENV PATH=/usr/local/bin:$PATH
---> Using cache
---> 361beb4a8cd3
Step 9/24 : RUN cd /opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201 && rm -rfv hwloc201.h hwloc/include/hwloc.h
---> Using cache
---> ff9b56a37f54
Step 10/24 : RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp https://download.open-mpi.org/release/hwloc/v2.4/hwloc-${HWLOC_VERSION}.tar.gz && mkdir -p /var/tmp && tar -x -f /var/tmp/hwloc-${HWLOC_VERSION}.tar.gz -C /var/tmp && cd /var/tmp/hwloc-${HWLOC_VERSION} && ./configure CPPFLAGS="-I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/" LDFLAGS="-L/usr/local/cuda/lib64" --enable-cuda && make -j$(nproc) && make install && rm -rf /var/tmp/hwloc-${HWLOC_VERSION} /var/tmp/hwloc-${HWLOC_VERSION}.tar.gz
---> Using cache
---> 12f9044dd0fe
Step 11/24 : ENV CPATH=/usr/local/include:$CPATH
---> Using cache
---> f23ea8b4ae84
Step 12/24 : ENV NCCL_LAUNCH_MODE=PARALLEL
---> Using cache
---> 33ebca8d749d
Step 13/24 : ENV SHARP_COLL_NUM_COLL_GROUP_RESOURCE_ALLOC_THRESHOLD=0 SHARP_COLL_LOCK_ON_COMM_INIT=1 SHARP_COLL_LOG_LEVEL=3 HCOLL_ENABLE_MCAST=0
---> Using cache
---> 235a7d075f60
Step 14/24 : RUN ln -s /usr/lib/x86_64-linux-gnu/libibverbs.so.1.14.39.0 /usr/lib/x86_64-linux-gnu/libibverbs.so
---> Using cache
---> 391154037cc3
Step 15/24 : WORKDIR /workspace/dlrm
---> Using cache
---> 0ec34207b58c
Step 16/24 : COPY . .
---> Using cache
---> 16d7e048986a
Step 17/24 : RUN pip3 install --no-cache-dir -r requirements.txt
---> Running in cac77ff3f392
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/mlcommons/logging.git@3.1.0-rc1 (from -r requirements.txt (line 1))
Cloning https://github.com/mlcommons/logging.git (to revision 3.1.0-rc1) to /tmp/pip-req-build-llz7jws6
Running command git clone --filter=blob:none --quiet https://github.com/mlcommons/logging.git /tmp/pip-req-build-llz7jws6
Running command git checkout -q b32424904879020a47c8d9813b439e4e3017f8d5
Resolved https://github.com/mlcommons/logging.git to commit b32424904879020a47c8d9813b439e4e3017f8d5
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting git+https://github.com/NVIDIA/mlperf-common.git (from -r requirements.txt (line 2))
Cloning https://github.com/NVIDIA/mlperf-common.git to /tmp/pip-req-build-wn305jtb
Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/mlperf-common.git /tmp/pip-req-build-wn305jtb
Resolved https://github.com/NVIDIA/mlperf-common.git to commit 779c29968d9dd08feaa099bf916439558a62a45c
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting mpi4py (from -r requirements.txt (line 3))
Downloading mpi4py-3.1.5.tar.gz (2.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 9.9 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pandas>=1.0 in /usr/local/lib/python3.10/dist-packages (from mlperf-logging==3.0.0->-r requirements.txt (line 1)) (1.5.3)
Requirement already satisfied: pyyaml>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from mlperf-logging==3.0.0->-r requirements.txt (line 1)) (6.0.1)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages (from mlperf-logging==3.0.0->-r requirements.txt (line 1)) (1.22.2)
Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from mlperf-logging==3.0.0->-r requirements.txt (line 1)) (1.11.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.0->mlperf-logging==3.0.0->-r requirements.txt (line 1)) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.0->mlperf-logging==3.0.0->-r requirements.txt (line 1)) (2023.3)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas>=1.0->mlperf-logging==3.0.0->-r requirements.txt (line 1)) (1.16.0)
Building wheels for collected packages: mlperf-logging, mlperf-common, mpi4py
Building wheel for mlperf-logging (setup.py): started
Building wheel for mlperf-logging (setup.py): finished with status 'done'
Created wheel for mlperf-logging: filename=mlperf_logging-3.0.0-py3-none-any.whl size=238649 sha256=501be8dee5fba47f9b21d6bdcebba7c219f6afa00b4df9a85ba6735cda9847e0
Stored in directory: /tmp/pip-ephem-wheel-cache-zeu1_sv9/wheels/28/99/ec/54d1122b8daf8ece8026fcc2d28ef65d12ca3cb461d325fd30
Building wheel for mlperf-common (setup.py): started
Building wheel for mlperf-common (setup.py): finished with status 'done'
Created wheel for mlperf-common: filename=mlperf_common-0.3-py3-none-any.whl size=23720 sha256=79a57fb9ab91667b17c96500ebba5771375e0e4ee43b0bf754e46cecf77de6b5
Stored in directory: /tmp/pip-ephem-wheel-cache-zeu1_sv9/wheels/9b/bb/32/dd53ce122fd18a798e25c0afba97467ffb555bde95bc40cad1
Building wheel for mpi4py (pyproject.toml): started
Building wheel for mpi4py (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error

× Building wheel for mpi4py (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [263 lines of output]
running bdist_wheel
running build
running build_src
running build_py
creating build
creating build/lib.linux-x86_64-3.10
creating build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/run.py -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/main.py -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/init.py -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/bench.py -> build/lib.linux-x86_64-3.10/mpi4py
creating build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/pool.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/aplus.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/main.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/_base.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/init.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/_core.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/_lib.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/server.py -> build/lib.linux-x86_64-3.10/mpi4py/futures
creating build/lib.linux-x86_64-3.10/mpi4py/util
copying src/mpi4py/util/init.py -> build/lib.linux-x86_64-3.10/mpi4py/util
copying src/mpi4py/util/pkl5.py -> build/lib.linux-x86_64-3.10/mpi4py/util
copying src/mpi4py/util/dtlib.py -> build/lib.linux-x86_64-3.10/mpi4py/util
copying src/mpi4py/dl.pyi -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/run.pyi -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/main.pyi -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/init.pyi -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/MPI.pyi -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/bench.pyi -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/py.typed -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/libmpi.pxd -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/init.pxd -> build/lib.linux-x86_64-3.10/mpi4py
copying src/mpi4py/MPI.pxd -> build/lib.linux-x86_64-3.10/mpi4py
creating build/lib.linux-x86_64-3.10/mpi4py/include
creating build/lib.linux-x86_64-3.10/mpi4py/include/mpi4py
copying src/mpi4py/include/mpi4py/mpi4py.h -> build/lib.linux-x86_64-3.10/mpi4py/include/mpi4py
copying src/mpi4py/include/mpi4py/mpi4py.MPI.h -> build/lib.linux-x86_64-3.10/mpi4py/include/mpi4py
copying src/mpi4py/include/mpi4py/mpi4py.MPI_api.h -> build/lib.linux-x86_64-3.10/mpi4py/include/mpi4py
copying src/mpi4py/include/mpi4py/mpi4py.i -> build/lib.linux-x86_64-3.10/mpi4py/include/mpi4py
copying src/mpi4py/include/mpi4py/mpi.pxi -> build/lib.linux-x86_64-3.10/mpi4py/include/mpi4py
copying src/mpi4py/futures/server.pyi -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/pool.pyi -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/main.pyi -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/aplus.pyi -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/init.pyi -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/_lib.pyi -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/futures/_core.pyi -> build/lib.linux-x86_64-3.10/mpi4py/futures
copying src/mpi4py/util/pkl5.pyi -> build/lib.linux-x86_64-3.10/mpi4py/util
copying src/mpi4py/util/dtlib.pyi -> build/lib.linux-x86_64-3.10/mpi4py/util
copying src/mpi4py/util/init.pyi -> build/lib.linux-x86_64-3.10/mpi4py/util
running build_clib
MPI configuration: [mpi] from 'mpi.cfg'
MPI C compiler: /usr/local/mpi/bin/mpicc
MPI C++ compiler: /usr/local/mpi/bin/mpicxx
MPI F compiler: /usr/local/mpi/bin/mpifort
MPI F90 compiler: /usr/local/mpi/bin/mpif90
MPI F77 compiler: /usr/local/mpi/bin/mpif77
checking for library 'lmpe' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -llmpe -o _configtest
/usr/bin/ld: cannot find -llmpe: No such file or directory
collect2: error: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
building 'mpe' dylib library
creating build/temp.linux-x86_64-3.10
creating build/temp.linux-x86_64-3.10/src
creating build/temp.linux-x86_64-3.10/src/lib-pmpi
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c src/lib-pmpi/mpe.c -o build/temp.linux-x86_64-3.10/src/lib-pmpi/mpe.o
creating build/lib.linux-x86_64-3.10/mpi4py/lib-pmpi
/usr/local/mpi/bin/mpicc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,--no-as-needed build/temp.linux-x86_64-3.10/src/lib-pmpi/mpe.o -o build/lib.linux-x86_64-3.10/mpi4py/lib-pmpi/libmpe.so
checking for library 'vt-mpi' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -lvt-mpi -o _configtest
/usr/bin/ld: cannot find -lvt-mpi: No such file or directory
collect2: error: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
checking for library 'vt.mpi' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -lvt.mpi -o _configtest
/usr/bin/ld: cannot find -lvt.mpi: No such file or directory
collect2: error: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
building 'vt' dylib library
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c src/lib-pmpi/vt.c -o build/temp.linux-x86_64-3.10/src/lib-pmpi/vt.o
/usr/local/mpi/bin/mpicc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,--no-as-needed build/temp.linux-x86_64-3.10/src/lib-pmpi/vt.o -o build/lib.linux-x86_64-3.10/mpi4py/lib-pmpi/libvt.so
checking for library 'vt-mpi' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -lvt-mpi -o _configtest
/usr/bin/ld: cannot find -lvt-mpi: No such file or directory
collect2: error: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
checking for library 'vt.mpi' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -lvt.mpi -o _configtest
/usr/bin/ld: cannot find -lvt.mpi: No such file or directory
collect2: error: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
building 'vt-mpi' dylib library
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c src/lib-pmpi/vt-mpi.c -o build/temp.linux-x86_64-3.10/src/lib-pmpi/vt-mpi.o
/usr/local/mpi/bin/mpicc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,--no-as-needed build/temp.linux-x86_64-3.10/src/lib-pmpi/vt-mpi.o -o build/lib.linux-x86_64-3.10/mpi4py/lib-pmpi/libvt-mpi.so
checking for library 'vt-hyb' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -lvt-hyb -o _configtest
/usr/bin/ld: cannot find -lvt-hyb: No such file or directory
collect2: error: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
checking for library 'vt.ompi' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -lvt.ompi -o _configtest
/usr/bin/ld: cannot find -lvt.ompi: No such file or directory
collect2: error: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
building 'vt-hyb' dylib library
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c src/lib-pmpi/vt-hyb.c -o build/temp.linux-x86_64-3.10/src/lib-pmpi/vt-hyb.o
/usr/local/mpi/bin/mpicc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,--no-as-needed build/temp.linux-x86_64-3.10/src/lib-pmpi/vt-hyb.o -o build/lib.linux-x86_64-3.10/mpi4py/lib-pmpi/libvt-hyb.so
running build_ext
MPI configuration: [mpi] from 'mpi.cfg'
MPI C compiler: /usr/local/mpi/bin/mpicc
MPI C++ compiler: /usr/local/mpi/bin/mpicxx
MPI F compiler: /usr/local/mpi/bin/mpifort
MPI F90 compiler: /usr/local/mpi/bin/mpif90
MPI F77 compiler: /usr/local/mpi/bin/mpif77
checking for dlopen() availability ...
checking for header 'dlfcn.h' ...
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
success!
checking for library 'dl' ...
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
x86_64-linux-gnu-gcc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for function 'dlopen' ...
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
x86_64-linux-gnu-gcc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
building 'mpi4py.dl' extension
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DHAVE_DLFCN_H=1 -DHAVE_DLOPEN=1 -I/usr/include/python3.10 -c src/dynload.c -o build/temp.linux-x86_64-3.10/src/dynload.o
x86_64-linux-gnu-gcc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.10/src/dynload.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o build/lib.linux-x86_64-3.10/mpi4py/dl.cpython-310-x86_64-linux-gnu.so
checking for MPI compile and link ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for missing MPI functions/symbols ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
checking for function 'MPI_Type_create_f90_integer' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for function 'MPI_Type_create_f90_real' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for function 'MPI_Type_create_f90_complex' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for function 'MPI_Status_c2f' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for function 'MPI_Status_f2c' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for symbol 'MPI_LB' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for symbol 'MPI_UB' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for dlopen() availability ...
checking for header 'dlfcn.h' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
success!
checking for library 'dl' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
checking for function 'dlopen' ...
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
/usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
building 'mpi4py.MPI' extension
/usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DHAVE_DLFCN_H=1 -DHAVE_DLOPEN=1 -I/usr/include/python3.10 -c src/MPI.c -o build/temp.linux-x86_64-3.10/src/MPI.o
/usr/local/mpi/bin/mpicc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.10/src/MPI.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o build/lib.linux-x86_64-3.10/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so
writing build/lib.linux-x86_64-3.10/mpi4py/mpi.cfg
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 404, in build_wheel
return self._build_with_temp_dir(
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
self.run_setup()
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 644, in
File "", line 641, in main
File "", line 492, in run_setup
File "/tmp/pip-install-v2r6okyl/mpi4py_c6cddf4bfc8c4ebd9b5de61b26253ebf/conf/mpidistutils.py", line 541, in setup
return fcn_setup(**attrs)
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/init.py", line 103, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 963, in run_command
super().run_command(command)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 370, in run
install = self.reinitialize_command("install", reinit_subcommands=True)
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/init.py", line 216, in reinitialize_command
cmd = _Command.reinitialize_command(self, command, reinit_subcommands)
File "/usr/lib/python3.10/distutils/cmd.py", line 305, in reinitialize_command
return self.distribution.reinitialize_command(command,
File "/usr/lib/python3.10/distutils/dist.py", line 938, in reinitialize_command
command = self.get_command_obj(command_name)
File "/usr/lib/python3.10/distutils/dist.py", line 858, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/init.py", line 174, in init
super().init(dist)
File "/usr/lib/python3.10/distutils/cmd.py", line 62, in init
self.initialize_options()
File "/tmp/pip-build-env-_s4_k_me/overlay/local/lib/python3.10/dist-packages/setuptools/command/install.py", line 50, in initialize_options
orig.install.initialize_options(self)
File "/usr/lib/python3.10/_distutils_system_mod.py", line 33, in initialize_options
super().initialize_options()
TypeError: super(type, obj): obj must be an instance or subtype of type
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for mpi4py
Successfully built mlperf-logging mlperf-common
Failed to build mpi4py
ERROR: Could not build wheels for mpi4py, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python -m pip install --upgrade pip
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

@rgandikota rgandikota changed the title Docker build issue in DLRM DCNv2 Docker build issue in NVIDIA DLRM DCNv2 Dec 4, 2023
@jndinesh
Copy link

jndinesh commented Dec 7, 2023

We also attempted to build DLRM using the nvcr.io/nvidia/pytorch:23.10-py3 as base image, but we were unsuccessful.

After reviewing other submissions, we attempted to use the nvcr.io/nvdlfwea/pytorch:23.09-py3](http://nvcr.io/nvdlfwea/pytorch:23.09-py3) variation submitted by supermicro, but unfortunately, we do not have access to it. Here is the URL.

Could someone assist in determining if this issue is related to base layer override? If so, could you please point us to the correct image?

@ShriyaPalsamudram ShriyaPalsamudram linked a pull request Dec 7, 2023 that will close this issue
@jndinesh
Copy link

jndinesh commented Dec 7, 2023

Thanks much Shirya. Verified locally. Changes looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants