We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I try to build dockerfile using dockerfile/cuda11.1.1.dockerfile, I get the following error:
dockerfile/cuda11.1.1.dockerfile
~/superbenchmark main !1 ?2 ❯ docker buildx build \ --platform linux/amd64 --cache-to type=inline,mode=max \ --tag superbench-dev --file dockerfile/cuda11.1.1.dockerfile . [+] Building 172.9s (8/18) => [internal] load build definition from cuda11.1.1.dockerfile 0.0s => => transferring dockerfile: 4.00kB 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 35B 0.0s => [internal] load metadata for nvcr.io/nvidia/pytorch:20.12-py3 1.4s => [internal] load build context 0.6s => => transferring context: 788.47kB 0.5s => [ 1/14] FROM nvcr.io/nvidia/pytorch:20.12-py3@sha256:cc14c0cf580989bb1ff39fa78ca697b77a8860b17acead4a60b853bb45499f8d [+] Building 173.1s (8/18) automake build-essential curl dmidecode git jq libaio-dev lib => [internal] load build definition from cuda11.1.1.dockerfile 0.0s0.8.tgz -O docker.tgz && tar --extract --file docker.tgz --strip-components 1 --director => => transferring dockerfile: 4.00kB 0.0sshd && sed -i "s/[# ]*PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/s => [internal] load .dockerignore 0.0sD_LINUX-5.2-2.2.3.0-ubuntu20.04-x86_64.tgz && tar xzf MLNX_OFED_LINUX-5.2-2.2.3.0-ubun 16 => => transferring context: 35B 0.0s => [internal] load metadata for nvcr.io/nvidia/pytorch:20.12-py3 1.4s => [internal] load build context 0.6s => => transferring context: 788.47kB 0.5s => [ 1/14] FROM nvcr.io/nvidia/pytorch:20.12-py3@sha256:cc14c0cf580989bb1ff39fa78ca697b77a8860b17acead4a60b853bb45499f8d [+] Building 173.2s (8/18) automake build-essential curl dmidecode git jq libaio-dev lib => [internal] load build definition from cuda11.1.1.dockerfile 0.0s0.8.tgz -O docker.tgz && tar --extract --file docker.tgz --strip-components 1 --director => => transferring dockerfile: 4.00kB 0.0sshd && sed -i "s/[# ]*PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/s => [internal] load .dockerignore 0.0sD_LINUX-5.2-2.2.3.0-ubuntu20.04-x86_64.tgz && tar xzf MLNX_OFED_LINUX-5.2-2.2.3.0-ubun 16 => => transferring context: 35B 0.0s => [internal] load metadata for nvcr.io/nvidia/pytorch:20.12-py3 1.4s => [internal] load build context 0.6s => => transferring context: 788.47kB 0.5s => [ 1/14] FROM nvcr.io/nvidia/pytorch:20.12-py3@sha256:cc14c0cf580989bb1ff39fa78ca697b77a8860b17acead4a60b853bb45499f8d [+] Building 173.4s (8/18) automake build-essential curl dmidecode git jq libaio-dev lib => [internal] load build definition from cuda11.1.1.dockerfile 0.0s0.8.tgz -O docker.tgz && tar --extract --file docker.tgz --strip-components 1 --director => => transferring dockerfile: 4.00kB 0.0sshd && sed -i "s/[# ]*PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/s => [internal] load .dockerignore 0.0sD_LINUX-5.2-2.2.3.0-ubuntu20.04-x86_64.tgz && tar xzf MLNX_OFED_LINUX-5.2-2.2.3.0-ubun 16 => => transferring context: 35B 0.0s => [internal] load metadata for nvcr.io/nvidia/pytorch:20.12-py3 1.4s => [internal] load build context 0.6s => => transferring context: 788.47kB 0.5s => [ 1/14] FROM nvcr.io/nvidia/pytorch:20.12-py3@sha256:cc14c0cf580989bb1ff39fa78ca697b77a8860b17acead4a60b853bb45499f8d [+] Building 183.3s (8/18) [+] Buil[+] Building 665.5s (16/18) => [internal] load build definition from cuda11.1.1.dockerfile 0.0s => => transferring dockerfile: 4.00kB 0.0st => [internal] load .dockerignore 0.0s => => transferring context: 35B 0.0s => [internal] load metadata for nvcr.io/nvidia/pytorch:20.12-py3 1.4st => [internal] load build context 0.6s => => transferring context: 788.47kB 0.5s => [ 1/14] FROM nvcr.io/nvidia/pytorch:20.12-py3@sha256:cc14c0cf580989bb1ff39fa78ca697b77a8860b17acead4a60b8 0.0sH => CACHED [ 2/14] RUN apt-get update && apt-get install -y --no-install-recommends autoconf auto 0.0s => [ 3/14] RUN cd /tmp && wget https://download.docker.com/linux/static/stable/x86_64/docker-20.10.8.tgz 9.5s => [ 4/14] RUN mkdir -p /root/.ssh && touch /root/.ssh/authorized_keys && mkdir -p /var/run/sshd && 0.6s/ => [ 5/14] RUN cd /tmp && wget -q http://content.mellanox.com/ofed/MLNX_OFED-5.2-2.2.3.0/MLNX_OFED_LIN 277.4s => [ 6/14] RUN cd /opt && wget -q https://azhpcstor.blob.core.windows.net/azhpc-images-store/hpcx-v2.8. 62.9s => [ 7/14] RUN cd /tmp && git clone https://github.com/Mellanox/nccl-rdma-sharp-plugins.git && cd n 22.1s => [ 8/14] RUN cd /tmp && git clone -b v2.10.3-1 https://github.com/NVIDIA/nccl.git && cd nccl && 264.6s => [ 9/14] RUN cd /tmp && mkdir -p mlc && cd mlc && wget --user-agent="Mozilla/5.0 (X11; Fedora; 0.8s => [10/14] WORKDIR /opt/superbench 0.1s => [11/14] ADD third_party third_party 0.1s => ERROR [12/14] RUN make -j 40 -C third_party cuda 25.8s ------ > [12/14] RUN make -j 40 -C third_party cuda: #0 0.415 make: Entering directory '/opt/superbench/third_party' #0 0.415 mkdir -p /opt/superbench/bin #0 0.418 mkdir -p /opt/superbench/lib #0 0.445 if [ -d cuda-samples ]; then rm -rf cuda-samples; fi #0 0.445 bash -c "source /opt/hpcx/hpcx-init.sh && hpcx_load && make CC=mpicc -C GPCNET all && hpcx_unload" #0 0.465 git clone -b v11.1 https://github.com/NVIDIA/cuda-samples.git ./cuda-samples #0 0.468 Cloning into './cuda-samples'... #0 0.493 make[1]: Entering directory '/opt/superbench/third_party' #0 0.493 make[1]: warning: jobserver unavailable: using -j1. Add '+' to parent make rule. #0 0.495 make[1]: Leaving directory '/opt/superbench/third_party/GPCNET' #0 0.495 make[1]: *** No rule to make target 'all'. Stop. #0 0.496 make: *** [Makefile:98: gpcnet] Error 2 #0 0.496 make: *** Waiting for unfinished jobs.... #0 20.08 Note: switching to 'c4e2869a2becb4b6d9ce5f64914406bf5e239662'. #0 20.08 #0 20.08 You are in 'detached HEAD' state. You can look around, make experimental #0 20.08 changes and commit them, and you can discard any commits you make in this #0 20.08 state without impacting any branches by switching back to a branch. #0 20.08 #0 20.08 If you want to create a new branch to retain commits you create, you may #0 20.08 do so (now or later) by using -c with the switch command. Example: #0 20.08 #0 20.08 git switch -c <new-branch-name> #0 20.08 #0 20.08 Or undo this operation with: #0 20.08 #0 20.08 git switch - #0 20.08 #0 20.08 Turn off this advice by setting config variable advice.detachedHead to false #0 20.08 #0 20.56 cd ./cuda-samples/Samples/bandwidthTest && make clean && make TARGET_ARCH=x86_64 SMS="70 75 80 86" #0 20.59 make[1]: warning: jobserver unavailable: using -j1. Add '+' to parent make rule. #0 20.59 make[1]: Entering directory '/opt/superbench/third_party/cuda-samples/Samples/bandwidthTest' #0 20.61 rm -f bandwidthTest bandwidthTest.o #0 20.62 rm -rf ../../bin/x86_64/linux/release/bandwidthTest #0 20.62 make[1]: Leaving directory '/opt/superbench/third_party/cuda-samples/Samples/bandwidthTest' #0 20.62 make[1]: warning: jobserver unavailable: using -j1. Add '+' to parent make rule. #0 20.62 make[1]: Entering directory '/opt/superbench/third_party/cuda-samples/Samples/bandwidthTest' #0 20.65 /usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common -m64 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o bandwidthTest.o -c bandwidthTest.cu #0 25.31 /usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o bandwidthTest bandwidthTest.o #0 25.62 mkdir -p ../../bin/x86_64/linux/release #0 25.62 cp bandwidthTest ../../bin/x86_64/linux/release #0 25.63 make[1]: Leaving directory '/opt/superbench/third_party/cuda-samples/Samples/bandwidthTest' #0 25.63 cp -v ./cuda-samples/Samples/bandwidthTest/bandwidthTest /opt/superbench/bin/ #0 25.63 './cuda-samples/Samples/bandwidthTest/bandwidthTest' -> '/opt/superbench/bin/bandwidthTest' #0 25.63 make: Leaving directory '/opt/superbench/third_party' ------ error: failed to solve: executor failed running [/bin/sh -c make -j ${NUM_MAKE_JOBS} -C third_party cuda]: exit code: 2
I cloned the recent main branch and the commit UUID is a9634ef
main
The problem is in step 12 of the docker build.
Please help. Thanks.
The text was updated successfully, but these errors were encountered:
Hi @lifefeel,
Seems you didn't clone the submodules before building the image, could you try
git submodule update --init --recursive -j 16
and check the output of ls third_party/*?
ls third_party/*
If you can see contents in each subdirectory, then re-try docker buildx build .... Hope it works for you.
docker buildx build ...
Sorry, something went wrong.
Thank you for quick reply. It works well!
Close the issue.
No branches or pull requests
When I try to build dockerfile using
dockerfile/cuda11.1.1.dockerfile
, I get the following error:I cloned the recent
main
branch and the commit UUID is a9634efThe problem is in step 12 of the docker build.
Please help. Thanks.
The text was updated successfully, but these errors were encountered: