Build the docker image from source, but torch.cuda.is_available()==false #12773

linkerr · 2018-10-17T16:50:51Z

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

We have a set of listed resources available on the website. Our primary means of support is our discussion forum:

Discussion Forum

System info Tesla P100

cuda test show pass

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 8
Result = PASS

but pytorch can't find cuda device:

I have no idea, please help, thank you!

weiyangfb · 2018-10-17T22:24:59Z

@linkerr could you also run the collect_env.py script and post results?

linkerr · 2018-10-18T02:33:26Z

@weiyangfb Hi, first of all, thank you, the collect_env.py run result:

zou3519 · 2018-10-22T17:55:30Z

Please test compiling a CUDA sample and running it in the docker image.

fepegar · 2019-11-19T08:10:13Z

I'm having the same problem.

collect_env on host:

Collecting environment information...
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.2 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: version 3.13.3

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce GTX 1060
Nvidia driver version: 430.40
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.14.5
[pip3] torch==1.2.0
[pip3] torchvision==0.4.0
[conda] torch                     1.2.0                    pypi_0    pypi
[conda] torchvision               0.4.0                    pypi_0    pypi

collect_env on image:

Collecting environment information...
PyTorch version: 1.4.0a0+64cdc64
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.7
Is CUDA available: No
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.3

Versions of relevant libraries:
[pip] numpy==1.17.3
[pip] torch==1.4.0a0+64cdc64
[conda] blas                      1.0                         mkl
[conda] magma-cuda100             2.5.1                         1    pytorch
[conda] mkl                       2019.4                      243
[conda] mkl-include               2019.4                      243
[conda] mkl-service               2.3.0            py37he904b0f_0
[conda] mkl_fft                   1.0.15           py37ha843d7b_0
[conda] mkl_random                1.1.0            py37hd6b4f25_0
[conda] torch                     1.4.0a0+64cdc64          pypi_0    pypi

Running CUDA sample on image:

root@c853a04e3284:/workspace/cuda-samples/Samples/deviceQuery# make
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
root@c853a04e3284:/workspace/cuda-samples/Samples/deviceQuery# l
Makefile           deviceQuery*     deviceQuery_vs2012.sln      deviceQuery_vs2013.vcxproj  deviceQuery_vs2017.sln      deviceQuery_vs2019.vcxproj
NsightEclipse.xml  deviceQuery.cpp  deviceQuery_vs2012.vcxproj  deviceQuery_vs2015.sln      deviceQuery_vs2017.vcxproj
README.md          deviceQuery.o    deviceQuery_vs2013.sln      deviceQuery_vs2015.vcxproj  deviceQuery_vs2019.sln
root@c853a04e3284:/workspace/cuda-samples/Samples/deviceQuery# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Related to #21259?

Godricly · 2020-06-29T07:43:19Z

Are you building images with kaniko or dockerindocker?

Spenhouet · 2020-07-23T09:43:44Z

We face the same issue.

The pytorch installation from source is done similar to: https://github.com/pytorch/pytorch/blob/v1.5.0/docker/pytorch/Dockerfile

FROM nvcr.io/nvidia/cuda:10.2-cudnn8-runtime-ubuntu18.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    apt-utils \
    build-essential \
    cmake \
    git \
    curl \
    ca-certificates \
    libjpeg-dev \
    libpng-dev \
    wget \
    bsdtar && \
    rm -rf /var/lib/apt/lists/*

ARG PYTHON_VERSION=3.7
RUN curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    chmod +x ~/miniconda.sh && \
    ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh && \
    /opt/conda/bin/conda install -y python=$PYTHON_VERSION numpy pyyaml scipy ipython mkl mkl-include ninja cython typing && \
    /opt/conda/bin/conda install -y -c pytorch magma-cuda100 && \
    /opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/bin:$PATH

ARG PYTORCH_TEMP_PATH=/tmp/pytorch-install
WORKDIR ${PYTORCH_TEMP_PATH}
ARG PYTORCH_VERSION=1.5.1
RUN git clone --depth 1 --branch v${PYTORCH_VERSION} https://github.com/pytorch/pytorch.git .
RUN git submodule sync && git submodule update --init --recursive
RUN TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
    pip install -v .

The container was build on a system without CUDA. Not sure if this is an issue. There were no special flags set for the build process. Not sure if that is necessary.

From within the container nvidia-smi does show the available GPUs but torch.cuda.is_available() returns False.

Godricly · 2020-07-27T06:35:23Z

@Spenhouet Have you specified you compiling options(sm50 like stuff)? My cuda extension compiles after setup up those things.

Spenhouet · 2020-07-27T06:37:20Z

@Godricly There were no further configurations than the above Dockerfile. Where would I set these compiler options?

Godricly · 2020-07-28T11:43:12Z

@Spenhouet If you are compiling C++ extension, you can setup them with CUDA compiler flags. Like the setup.py file.

Spenhouet · 2020-07-28T11:46:10Z

@Godricly I provided the dockerfile above. Could you please provide a dockerfile including these compiler flags that you are mentioning?

smessmer added module: docker triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build the docker image from source, but torch.cuda.is_available()==false #12773

Build the docker image from source, but torch.cuda.is_available()==false #12773

linkerr commented Oct 17, 2018 •

edited

weiyangfb commented Oct 17, 2018

linkerr commented Oct 18, 2018

zou3519 commented Oct 22, 2018

fepegar commented Nov 19, 2019

Godricly commented Jun 29, 2020

Spenhouet commented Jul 23, 2020

Godricly commented Jul 27, 2020 •

edited

Spenhouet commented Jul 27, 2020

Godricly commented Jul 28, 2020 •

edited

Spenhouet commented Jul 28, 2020

Build the docker image from source, but torch.cuda.is_available()==false #12773

Build the docker image from source, but torch.cuda.is_available()==false #12773

Comments

linkerr commented Oct 17, 2018 • edited

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

System info Tesla P100

cuda test show pass

but pytorch can't find cuda device:

weiyangfb commented Oct 17, 2018

linkerr commented Oct 18, 2018

zou3519 commented Oct 22, 2018

fepegar commented Nov 19, 2019

Godricly commented Jun 29, 2020

Spenhouet commented Jul 23, 2020

Godricly commented Jul 27, 2020 • edited

Spenhouet commented Jul 27, 2020

Godricly commented Jul 28, 2020 • edited

Spenhouet commented Jul 28, 2020

linkerr commented Oct 17, 2018 •

edited

Godricly commented Jul 27, 2020 •

edited

Godricly commented Jul 28, 2020 •

edited