Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build the docker image from source, but torch.cuda.is_available()==false #12773

Open
linkerr opened this issue Oct 17, 2018 · 10 comments
Open

Build the docker image from source, but torch.cuda.is_available()==false #12773

linkerr opened this issue Oct 17, 2018 · 10 comments
Labels
module: docker triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@linkerr
Copy link

linkerr commented Oct 17, 2018

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

We have a set of listed resources available on the website. Our primary means of support is our discussion forum:

System info Tesla P100

_20181018004116

cuda test show pass

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 8
Result = PASS

but pytorch can't find cuda device:

image

I have no idea, please help, thank you!

@weiyangfb
Copy link
Contributor

@linkerr could you also run the collect_env.py script and post results?

@linkerr
Copy link
Author

linkerr commented Oct 18, 2018

@weiyangfb Hi, first of all, thank you, the collect_env.py run result:
_20181018103009

@zou3519
Copy link
Contributor

zou3519 commented Oct 22, 2018

Please test compiling a CUDA sample and running it in the docker image.

@fepegar
Copy link

fepegar commented Nov 19, 2019

I'm having the same problem.

collect_env on host:

Collecting environment information...
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.2 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: version 3.13.3

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce GTX 1060
Nvidia driver version: 430.40
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.14.5
[pip3] torch==1.2.0
[pip3] torchvision==0.4.0
[conda] torch                     1.2.0                    pypi_0    pypi
[conda] torchvision               0.4.0                    pypi_0    pypi

collect_env on image:

Collecting environment information...
PyTorch version: 1.4.0a0+64cdc64
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.7
Is CUDA available: No
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.3

Versions of relevant libraries:
[pip] numpy==1.17.3
[pip] torch==1.4.0a0+64cdc64
[conda] blas                      1.0                         mkl
[conda] magma-cuda100             2.5.1                         1    pytorch
[conda] mkl                       2019.4                      243
[conda] mkl-include               2019.4                      243
[conda] mkl-service               2.3.0            py37he904b0f_0
[conda] mkl_fft                   1.0.15           py37ha843d7b_0
[conda] mkl_random                1.1.0            py37hd6b4f25_0
[conda] torch                     1.4.0a0+64cdc64          pypi_0    pypi

Running CUDA sample on image:

root@c853a04e3284:/workspace/cuda-samples/Samples/deviceQuery# make
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
root@c853a04e3284:/workspace/cuda-samples/Samples/deviceQuery# l
Makefile           deviceQuery*     deviceQuery_vs2012.sln      deviceQuery_vs2013.vcxproj  deviceQuery_vs2017.sln      deviceQuery_vs2019.vcxproj
NsightEclipse.xml  deviceQuery.cpp  deviceQuery_vs2012.vcxproj  deviceQuery_vs2015.sln      deviceQuery_vs2017.vcxproj
README.md          deviceQuery.o    deviceQuery_vs2013.sln      deviceQuery_vs2015.vcxproj  deviceQuery_vs2019.sln
root@c853a04e3284:/workspace/cuda-samples/Samples/deviceQuery# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Related to #21259?

@smessmer smessmer added module: docker triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 19, 2019
@Godricly
Copy link

Are you building images with kaniko or dockerindocker?

@Spenhouet
Copy link

We face the same issue.

The pytorch installation from source is done similar to: https://github.com/pytorch/pytorch/blob/v1.5.0/docker/pytorch/Dockerfile

FROM nvcr.io/nvidia/cuda:10.2-cudnn8-runtime-ubuntu18.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    apt-utils \
    build-essential \
    cmake \
    git \
    curl \
    ca-certificates \
    libjpeg-dev \
    libpng-dev \
    wget \
    bsdtar && \
    rm -rf /var/lib/apt/lists/*

ARG PYTHON_VERSION=3.7
RUN curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    chmod +x ~/miniconda.sh && \
    ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh && \
    /opt/conda/bin/conda install -y python=$PYTHON_VERSION numpy pyyaml scipy ipython mkl mkl-include ninja cython typing && \
    /opt/conda/bin/conda install -y -c pytorch magma-cuda100 && \
    /opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/bin:$PATH

ARG PYTORCH_TEMP_PATH=/tmp/pytorch-install
WORKDIR ${PYTORCH_TEMP_PATH}
ARG PYTORCH_VERSION=1.5.1
RUN git clone --depth 1 --branch v${PYTORCH_VERSION} https://github.com/pytorch/pytorch.git .
RUN git submodule sync && git submodule update --init --recursive
RUN TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
    pip install -v .

The container was build on a system without CUDA. Not sure if this is an issue. There were no special flags set for the build process. Not sure if that is necessary.

From within the container nvidia-smi does show the available GPUs but torch.cuda.is_available() returns False.

@Godricly
Copy link

Godricly commented Jul 27, 2020

@Spenhouet Have you specified you compiling options(sm50 like stuff)? My cuda extension compiles after setup up those things.

@Spenhouet
Copy link

@Godricly There were no further configurations than the above Dockerfile. Where would I set these compiler options?

@Godricly
Copy link

Godricly commented Jul 28, 2020

@Spenhouet If you are compiling C++ extension, you can setup them with CUDA compiler flags. Like the setup.py file.

@Spenhouet
Copy link

@Godricly I provided the dockerfile above. Could you please provide a dockerfile including these compiler flags that you are mentioning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: docker triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

7 participants