Skip to content
Permalink
Browse files

Docker: add mxnet to build tag, switch to horovod/horovod (#845)

* Docker: add mxnet to build tag, switch to horovod/horovod

Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>

* Update docs as well

Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>

* Add HOROVOD_WITH_MXNET=1

Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>
  • Loading branch information...
alsrgv committed Feb 21, 2019
1 parent 34529a8 commit d95a9fd226ff73ad9cf7af20a1330b15a654dbb4
Showing with 9 additions and 8 deletions.
  1. +3 −3 Dockerfile
  2. +3 −2 build-docker-images.sh
  3. +3 −3 docs/docker.md
@@ -34,7 +34,7 @@ RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
rm get-pip.py

# Install TensorFlow, Keras, PyTorch and MXNet
RUN pip install tensorflow-gpu==${TENSORFLOW_VERSION} keras h5py torch==${PYTORCH_VERSION} torchvision ${MXNET_URL}
RUN pip install 'numpy<1.15.0' tensorflow-gpu==${TENSORFLOW_VERSION} keras h5py torch==${PYTORCH_VERSION} torchvision ${MXNET_URL}

# Install Open MPI
RUN mkdir /tmp/openmpi && \
@@ -50,7 +50,7 @@ RUN mkdir /tmp/openmpi && \

# Install Horovod, temporarily using CUDA stubs
RUN ldconfig /usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs && \
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod && \
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir horovod && \
ldconfig

# Create a wrapper for OpenMPI to allow running as root by default
@@ -79,7 +79,7 @@ RUN cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_confi

# Download examples
RUN apt-get install -y --no-install-recommends subversion && \
svn checkout https://github.com/uber/horovod/trunk/examples && \
svn checkout https://github.com/horovod/horovod/trunk/examples && \
rm -rf /examples/.svn

WORKDIR "/examples"
@@ -11,7 +11,8 @@ function build_one()
horovod_version=$(docker run ${tag} pip freeze | grep ^horovod= | awk -F== '{print $2}')
tensorflow_version=$(docker run ${tag} pip freeze | grep ^tensorflow-gpu= | awk -F== '{print $2}')
pytorch_version=$(docker run ${tag} pip freeze | grep ^torch= | awk -F== '{print $2}')
final_tag=uber/horovod:${horovod_version}-tf${tensorflow_version}-torch${pytorch_version}-py${py}
mxnet_version=$(docker run ${tag} pip freeze | grep ^mxnet | awk -F== '{print $2}')
final_tag=horovod/horovod:${horovod_version}-tf${tensorflow_version}-torch${pytorch_version}-mxnet${mxnet_version}-py${py}
docker tag ${tag} ${final_tag}
docker rmi ${tag}
}
@@ -24,4 +25,4 @@ build_one 2.7
build_one 3.5

# print recent images
docker images uber/horovod
docker images horovod/horovod
@@ -4,23 +4,23 @@ To streamline the installation process on GPU machines, we have published the re
you can get started with Horovod in minutes. The container includes [Examples](../examples) in the `/examples`
directory.

Pre-built docker containers with Horovod are available on [DockerHub](https://hub.docker.com/r/uber/horovod).
Pre-built docker containers with Horovod are available on [DockerHub](https://hub.docker.com/r/horovod/horovod).

### Building

Before building, you can modify `Dockerfile` to your liking, e.g. select a different CUDA, TensorFlow or Python version.

```bash
$ mkdir horovod-docker
$ wget -O horovod-docker/Dockerfile https://raw.githubusercontent.com/uber/horovod/master/Dockerfile
$ wget -O horovod-docker/Dockerfile https://raw.githubusercontent.com/horovod/horovod/master/Dockerfile
$ docker build -t horovod:latest horovod-docker
```

### Running on a single machine

After the container is built, run it using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker).

**Note**: you can replace `horovod:latest` with the [specific](https://hub.docker.com/r/uber/horovod/tags) pre-build
**Note**: you can replace `horovod:latest` with the [specific](https://hub.docker.com/r/horovod/horovod/tags) pre-build
Docker container with Horovod instead of building it by yourself

```bash

0 comments on commit d95a9fd

Please sign in to comment.
You can’t perform that action at this time.