Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

digits-gan docker image #2

Closed
ljstrnadiii opened this issue May 17, 2017 · 12 comments
Closed

digits-gan docker image #2

ljstrnadiii opened this issue May 17, 2017 · 12 comments

Comments

@ljstrnadiii
Copy link

ljstrnadiii commented May 17, 2017

Hi @gheinrich ,

I am attempting to build a docker file to build in image for digits-gan. I think this would make it a lot easier than installing everything from scratch. I have been trying to familiarize myself with the dockerfile for DIGITS5. Tensorflow requires cuDNN and from what I understand DIGITS5 also requires cuDNN. Since cuDNN is not on aptitude, I am not sure how to properly include it in the dockerfile.

This is what I have added to the DIGITS5 dockerfile so far:

python-h5py \
python-numpy \
python-protobuf \
python-scipy \

and

# DIGITS install
ENV DIGITS_ROOT=~/digits
RUN git clone https://github.com/gheinrich/DIGITS-GAN.git $DIGITS_ROOT
RUN pip install -r $DIGITS_ROOT/requirements.txt

# Tensorflow install
ENV TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
RUN pip install --upgrade $TF_BINARY_URL


# Allow plugin and GAN implementation
RUN pip install -e $DIGITS_ROOT
RUN pip install -e $DIGITS_ROOT/plugins/data/gan/
RUN pip install -e $DIGITS_ROOT/plugins/view/gan/
  1. when I run docker bin/bash and simply test tensorflow in python interpreter it says it can't find libcudnn and cudnn.
  2. when i open the DIGITS webapp and select GAN model there is no tensorflow custom model option.

Do you know what would it take to complete the dockerfile? Also, do you plan to push this "experimental feature" to digits5 soon?

cheers

@gheinrich
Copy link
Owner

Hello I have also been using Docker to containerizing this. My Dockerfile looks like:

FROM nvidia/caffe:0.15
LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

### installation ###

RUN apt-get update && apt-get install -y --no-install-recommends \
            torch7-nv=0.9.99-1+cuda8.0 \
            graphviz \
            g++ \
            git \
            python-dev \
            python-pip \
            python-six \
            python-requests \
            python-flask \
            python-gevent \
            python-flaskext.socketio \
            python-flaskext.wtf \
            python-wtforms \
            python-wxgtk2.8 \
            python-pydot \
            python-lmdb \
            python-pil \
            python-skimage \
            python-matplotlib \
            python-caffe-nv \
            caffe-nv \
            gunicorn \
            nginx \
            xterm \
            nano \
            supervisor \
            curl \
            libprotobuf-dev && \
    rm -rf /var/lib/apt/lists/*

# Custom version of DIGITS
ENV DIGITS_VERSION=b645cd1
ENV DIGITS_HOME=/source/digits
RUN mkdir -p $DIGITS_HOME
RUN GIT_SSL_NO_VERIFY=true git clone https://github.com/gheinrich/DIGITS-GAN.git $DIGITS_HOME && cd $DIGITS_HOME && git checkout $DIGITS_VERSION
RUN pip install -e $DIGITS_HOME

# Install Tensorflow wheel
RUN pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl

### configuration ###

VOLUME /data
VOLUME /jobs

# Fix issue in non-interactive mode
RUN mkdir -p ~/.config/matplotlib && echo "backend:agg" > ~/.config/matplotlib/matplotlibrc

# DIGITS config
ENV DIGITS_JOBS_DIR=/jobs
ENV DIGITS_LOGFILE_FILENAME=/jobs/digits.log

# GAN example does not support multiple GPUs
ENV CUDA_VISIBLE_DEVICES=0

# These are the ports DIGITS and Tensorboard listen to
EXPOSE 5000

# Tensorflow references unversioned CuDNN lib
RUN ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.5 /usr/lib/x86_64-linux-gnu/libcudnn.so

# configuration file for commands to run on start-up
COPY supervisord.app.conf /etc/supervisor/conf.d/supervisord.conf

# supervisord will run all the commands we specify in supervisord.conf
ENTRYPOINT ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
# ENTRYPOINT ["/bin/bash"]

@ljstrnadiii
Copy link
Author

Great! I'll give it a try. Thanks for all the great work, @gheinrich.

cheers

@gheinrich
Copy link
Owner

Sample supervisord.conf for reference:

[supervisord]
nodaemon=true

[program:digits_srv]
command=python2 -m digits

[program:tensorboard]
command=tensorboard --logdir=/jobs --reload_interval=20

@ljstrnadiii
Copy link
Author

The Dockerfile works. I am able to train the tensorflow model and the encoder and everything is converging. I have run into two problems:

  1. There is no GAN option for "Select Visualization Method"
  2. viewing tensorboard through a container running on GCP remotely is a bit difficult to figure out.

My thoughts:

  1. The GAN visualization method was introduced on a commit later than
    ENV DIGITS_VERSION=b645cd1
    So, I'll go grab your most recent commit rsa and give that a shot.

  2. When i run the docker file I think I should call:
    nvidia-docker run -d -p 80:5000 -p 90:6006 digitsgan
    this way I go to server_IP:90 in my browser to view tensorboard. (tensorboard runs on 6006)

@gheinrich
Copy link
Owner

You can add this to your Dockerfile after DIGITS install to install GAN plug-ins:

RUN pip install -e $DIGITS_HOME/plugins/data/gan/
RUN pip install -e $DIGITS_HOME/plugins/view/gan/

@gheinrich
Copy link
Owner

I used --net=host in Docker run command line to expose all ports in the container to the host. Maybe not the cleanest way to do it but it's simple.

@ljstrnadiii
Copy link
Author

Attempted but failure (using original Dockerfile above). I get:

...
ImportError: No module named _tkinter, please install the python-tk package

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /source/digits/plugins/data/gan
Storing debug log for failure in /root/.pip/pip.log
The command '/bin/sh -c pip install -e $DIGITS_HOME/plugins/data/gan' returned a non-zero code: 1

@gheinrich
Copy link
Owner

Did you add the two lines just before # Install Tensorflow wheel in your Dockerfile?

@gheinrich
Copy link
Owner

Or maybe use this Dockerfile (which installs python-tk):

FROM nvidia/caffe:0.15
LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

### installation ###

RUN apt-get update && apt-get install -y --no-install-recommends \
            torch7-nv=0.9.99-1+cuda8.0 \
            graphviz \
            g++ \
            git \
            python-dev \
            python-pip \
            python-six \
            python-requests \
            python-flask \
            python-gevent \
            python-flaskext.socketio \
            python-flaskext.wtf \
            python-wtforms \
            python-wxgtk2.8 \
            python-pydot \
            python-lmdb \
            python-pil \
            python-skimage \
            python-matplotlib \
            python-caffe-nv \
            python-tk \
            caffe-nv \
            gunicorn \
            nginx \
            xterm \
            nano \
            supervisor \
            curl \
            libprotobuf-dev && \
    rm -rf /var/lib/apt/lists/*

# Custom version of DIGITS
ENV DIGITS_VERSION=b645cd1
ENV DIGITS_HOME=/source/digits
RUN mkdir -p $DIGITS_HOME
RUN GIT_SSL_NO_VERIFY=true git clone https://github.com/gheinrich/DIGITS-GAN.git $DIGITS_HOME && cd $DIGITS_HOME && git checkout $DIGITS_VERSION
RUN pip install -e $DIGITS_HOME

# GAN plug-ins
RUN pip install -e $DIGITS_HOME/plugins/data/gan/
RUN pip install -e $DIGITS_HOME/plugins/view/gan/

# Install Tensorflow wheel
RUN pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl

### configuration ###

VOLUME /data
VOLUME /jobs

# Fix issue in non-interactive mode
RUN mkdir -p ~/.config/matplotlib && echo "backend:agg" > ~/.config/matplotlib/matplotlibrc

# DIGITS config
ENV DIGITS_JOBS_DIR=/jobs
ENV DIGITS_LOGFILE_FILENAME=/jobs/digits.log

# GAN example does not support multiple GPUs
ENV CUDA_VISIBLE_DEVICES=0

# These are the ports DIGITS and Tensorboard listen to
EXPOSE 5000

# Tensorflow references unversioned CuDNN lib
RUN ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.5 /usr/lib/x86_64-linux-gnu/libcudnn.so

# configuration file for commands to run on start-up
COPY supervisord.lab.conf /etc/supervisor/conf.d/supervisord.conf

# supervisord will run all the commands we specify in supervisord.conf
ENTRYPOINT ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
# ENTRYPOINT ["/bin/bash"]

@ljstrnadiii
Copy link
Author

ljstrnadiii commented May 17, 2017

yes, exactly:

# Custom version of DIGITS
ENV DIGITS_VERSION=b645cd1
ENV DIGITS_HOME=/source/digits
RUN mkdir -p $DIGITS_HOME
RUN GIT_SSL_NO_VERIFY=true git clone https://github.com/gheinrich/DIGITS-GAN.git $DIGITS_HOME && cd $DIGITS_HOME && git checkout $DIGITS_VERSION
RUN pip install -e $DIGITS_HOME
RUN pip install -e $DIGITS_HOME/plugins/data/gan
RUN pip install -e $DIGITS_HOME/plugins/view/gan


# Install Tensorflow wheel
RUN pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl

### configuration ###

VOLUME /data
VOLUME /jobs

Not sure where tk is necessary! it doesn't show up when searching your repo anyways

@ljstrnadiii
Copy link
Author

Running!

@Arsey
Copy link

Arsey commented Mar 16, 2018

Be careful with the latest tornado (starting from version 5) that is a dependency for matplotlib. It has issues related to SSL and it's not possible to build the docker image. Instead just install
RUN pip install https://github.com/tornadoweb/tornado/archive/branch4.5.zip
before
RUN pip install -e $DIGITS_HOME

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants