# Docker Image
We create a docker image to be used to run `TrainTestClassify.py` on the Batch AI cluster.

In [None]:
import os
from os import path
import json
import shutil
import dotenv
%load_ext dotenv

Create a new `.env` file to contain names used in multiple notebooks.

In [None]:
dotenv_path = os.path.join('.', '.env')  # The location of the dotenv file
if os.path.isfile(dotenv_path):          # Remove any pre-existing dotenv file to ensure a blank slate
    os.remove(dotenv_path)
with open(dotenv_path, 'w'):             # Create an empty dotenv file
    None

Write in your docker login and image repository name.

In [None]:
dotenv.set_key(dotenv_path, 'docker_login', 'YOUR_DOCKER_LOGIN')
dotenv.set_key(dotenv_path, 'image_repo', '/mlbaiht')

In [None]:
dotenv.set_key(dotenv_path, 'docker_login', 'mabouatmicrosoft')
dotenv.set_key(dotenv_path, 'image_repo', '/mlbaiht')

Import the contents of the `.env` file into the environment

In [None]:
%dotenv -o

The name of the Docker image that we are creating.

In [None]:
image_name = os.getenv('docker_login') + os.getenv('image_repo')

Create the Docker directory.

In [None]:
!mkdir -p Docker

Add to the directory the requirements file that specifies the Python modules needed to run the training script.

In [None]:
%%writefile Docker/requirements.txt

lightgbm==2.1.2
pandas==0.23.4
scikit-learn==0.19.1


Add to the directory the dockerfile specifying the build.

In [None]:
%%writefile Docker/Dockerfile

# Start from a Python image
# FROM python:3.5-stretch
FROM microsoft/cntk:2.1-gpu-python3.5-cuda8.0-cudnn6.0

# Copy into the image the definition of the requirements
COPY requirements.txt .

# Install the requiremnts
RUN python -m pip install -r requirements.txt
    
# Define the default entry point to call Python and list its contents
CMD ["python", "-m", "pip", "freeze"]


Create the docker image. The first time this is run, this could take almost a minute.

In [None]:
%%time
print('Creating Docker image {}'.format(image_name))
!docker build -t $image_name Docker --no-cache

Push the image to the docker repo.

In [None]:
%%time
!docker push $image_name

We can now test our image with our script locally. The `volume` argument maps the local directory that contains our data and script to `/data` in the container. Then, we call `bash` with a command string that calls Python with a path to the `TrainTestClassifier.py` script and script arguments including the path to the directory that contains the input files. The remaining script arguments are the same as those in the last cell of the [training script creation notebook](http://localhost:8888/notebooks/01_Training_Script.ipynb), and the results should be similar. 

This should take around five minutes.

In [None]:
%%time
!docker run --volume $(pwd):/data $image_name bash -c ' python /data/TrainTestClassifier.py --inputs /data --match 5 --estimators 1000 --ngrams 2 --min_child_samples 10 '

In [the next notebook](02_Configure_Batch_AI.ipynb), we create a file to contain the Batch AI configuration and some Azure resources we will use to create the Batch AI cluster.