Workflow that shows how to train neural networks on EC2 instances with GPU support. The goal is to present a simple and stable setup to train on GPU instances by using Docker and the NVIDIA Container Runtime nvidia-docker. A minimal example is given to train a small CNN built in Keras on MNIST. We achieve a 30-fold speedup in training time when training on GPU versus CPU.
-
Install Docker
-
Install Docker Machine
-
Install AWS Command Line Interface
- Build Docker image for CPU
docker build -t docker-keras . -f Dockerfile.cpu
- Run training container (NB: you might have to increase the container resources [link])
docker run docker-keras
- Configure your AWS CLI. Ensure that your account has limits for GPU instances [link]
aws configure
- Launch EC2 instance with Docker Machine. Choose an Ubuntu AMI based on your region (https://cloud-images.ubuntu.com/locator/ec2/). For example, to launch a p2.xlarge EC2 instance named ec2-p2 with a Tesla K80 GPU run (NB: change region, VPC ID and AMI ID as per your setup)
docker-machine create --driver amazonec2 \
--amazonec2-region eu-west-1 \
--amazonec2-ami ami-58d7e821 \
--amazonec2-instance-type p2.xlarge \
--amazonec2-vpc-id vpc-abc \
ec2-p2
- ssh into instance
docker-machine ssh ec2-p2
- Update NVIDIA drivers and install nvidia-docker (see this blog post for more details)
# update NVIDIA drivers
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt-get update
sudo apt-get install -y nvidia-375 nvidia-settings nvidia-modprobe
# install nvidia-docker
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker_1.0.1-1_amd64.deb && rm /tmp/nvidia-docker_1.0.1-1_amd64.deb
- Run training container on GPU instance
sudo nvidia-docker run idealo/nvidia-docker-keras
This will pull the Docker image idealo/nvidia-docker-keras
from DockerHub and start the training.
The corresponding Dockerfile can be found under Dockerfile.gpu
for reference.
We trained MNIST for 3 epochs (~98% accuracy on validation set):
• MacBook Pro (2.8 GHz Intel Core i7, 16GB RAM): 620 seconds
• p2.xlarge (Tesla K80): 41 seconds
• p3.2xlarge (Tesla V100): 20 seconds
See LICENSE for details.