Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

look into why the NVIDIA CUDA Dockerfile doesn't build pytorch #513

Closed
soumith opened this issue Jan 19, 2017 · 7 comments
Closed

look into why the NVIDIA CUDA Dockerfile doesn't build pytorch #513

soumith opened this issue Jan 19, 2017 · 7 comments
Assignees

Comments

@soumith
Copy link
Member

soumith commented Jan 19, 2017

@devendrachaplot reports that starting from: compute.nvidia.com/nvidia/cuda pytorch has compile errors. Look into it.

@ngimel
Copy link
Collaborator

ngimel commented Jan 19, 2017

FWIW, with the following Dockerfile

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04 

RUN apt-get update && apt-get install -y --no-install-recommends \
         build-essential \
         cmake \
         git \
         curl \
         ca-certificates \
         libjpeg-dev \
         libpng-dev &&\
     rm -rf /var/lib/apt/lists/*

RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh  && \
     chmod +x ~/miniconda.sh && \
     ~/miniconda.sh -b -p /opt/conda && \     
     rm ~/miniconda.sh && \
     /opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy scipy ipython mkl&& \
     /opt/conda/bin/conda clean -ya 
ENV PATH /opt/conda/envs/pytorch-py35/bin:$PATH
RUN conda install --name pytorch-py35 -c soumith magma-cuda80
# This must be done before pip so that requirements.txt is available
WORKDIR /opt/pytorch
COPY . .

RUN cat requirements.txt | xargs -n1 pip install --no-cache-dir && \
    TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX" \
    CMAKE_LIBRARY_PATH=/opt/conda/envs/pytorch-py35/lib \
    CMAKE_INCLUDE_PATH=/opt/conda/envs/pytorch-py35/include \
    pip install -v .

WORKDIR /workspace
RUN chmod -R a+w /workspace

I can build and run tests Ok.

@devendrachaplot
Copy link

devendrachaplot commented Jan 20, 2017

This isn't extremely urgent for me, but if you want to reproduce the error, these are the steps I followed:
I used the docker compute.nvidia.com/nvidia/cuda.

In the docker, I ran:
wget https://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh
bash Anaconda2-4.2.0-Linux-x86_64.sh
export CMAKE_PREFIX_PATH=/root/anaconda2/
conda install numpy mkl setuptools cmake gcc cffi
conda install -c soumith magma-cuda80
git clone https://github.com/pytorch/pytorch.git
cd pytorch
pip install -r requirements.txt
python setup.py install

The error message:
6 errors detected in the compilation of "/tmp/tmpxft_00000e36_00000000-7_THCTensorSortShort.cpp1.ii".
CMake Error at THC_generated_THCTensorSortShort.cu.o.cmake:267 (message):
Error generating file
/pytorch/torch/lib/build/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortShort.cu.o

make[2]: *** [CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortShort.cu.o] Error 1
6 errors detected in the compilation of "/tmp/tmpxft_00000e5d_00000000-7_THCTensorSortDouble.cpp1.ii".
CMake Error at THC_generated_THCTensorSortDouble.cu.o.cmake:267 (message):
Error generating file
/pytorch/torch/lib/build/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortDouble.cu.o

make[2]: *** [CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortDouble.cu.o] Error 1
6 errors detected in the compilation of "/tmp/tmpxft_00000e4c_00000000-7_THCTensorSortChar.cpp1.ii".
CMake Error at THC_generated_THCTensorSortChar.cu.o.cmake:267 (message):
Error generating file
/pytorch/torch/lib/build/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortChar.cu.o

make[2]: *** [CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortChar.cu.o] Error 1
6 errors detected in the compilation of "/tmp/tmpxft_00000e78_00000000-7_THCTensorSortInt.cpp1.ii".
CMake Error at THC_generated_THCTensorSortInt.cu.o.cmake:267 (message):
Error generating file
/pytorch/torch/lib/build/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortInt.cu.o

make[2]: *** [CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortInt.cu.o] Error 1
make[1]: *** [CMakeFiles/THC.dir/all] Error 2
make: *** [all] Error 2

@ngimel
Copy link
Collaborator

ngimel commented Jan 20, 2017

It's better to specify a particular tag for the image rather than rely on :latest being what you want. Can you try Dockerfile above? If you want to build pytorch interactively within container, you can just run the RUN commands inside container.

@devendrachaplot
Copy link

It works with the Dockerfile mentioned by @ngimel. Thanks a lot!

@ngimel
Copy link
Collaborator

ngimel commented Jan 23, 2017

Just a heads up, now this dockerfile is broken because of #556

@soumith
Copy link
Member Author

soumith commented Jan 23, 2017

@ngimel fixed now

@apaszke
Copy link
Contributor

apaszke commented Jan 25, 2017

Official dockerfiles are now merged in master (thanks to @ngimel).

@apaszke apaszke closed this as completed Jan 25, 2017
mrshenli pushed a commit to mrshenli/pytorch that referenced this issue Apr 11, 2020
Fix filename for download_saved_models
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this issue Nov 25, 2020
* Fix pytorch#459.

During the backward and forward passes of computeAt, update TensorView's
domain only when needed. This avoids propagation of inconsistent domains
happening with fusions like the reproducer of pytorch#459.
KyleCZH pushed a commit to KyleCZH/pytorch that referenced this issue Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants