# Docker Image
We create a docker image to be used to run `TrainTestClassify.py` on the Batch AI cluster.

In [None]:
import os
from os import path
import json
import shutil
import dotenv
%load_ext dotenv

Create a file to contain the name of the docker login and image name to be used.

In [None]:
dotenv_path = os.path.join('.', '.env')  # The location of the dotenv file
if os.path.isfile(dotenv_path):          # Remove any pre-existing dotenv file to ensure a blank slate
    os.remove(dotenv_path)
with open(dotenv_path, 'w'):             # Create an empty dotenv file
    None
# Your docker login and image repository name
dotenv.set_key(dotenv_path, 'docker_login', 'YOUR_DOCKER_LOGIN')
dotenv.set_key(dotenv_path, 'image_repo', '/mlbaiht')

In [None]:
dotenv_path = os.path.join('.', '.env')  # The location of the dotenv file
if os.path.isfile(dotenv_path):          # Remove any pre-existing dotenv file to ensure a blank slate
    os.remove(dotenv_path)
with open(dotenv_path, 'w'):             # Create an empty dotenv file
    None
# Your docker login and image repository name
dotenv.set_key(dotenv_path, 'docker_login', 'mabouatmicrosoft')
dotenv.set_key(dotenv_path, 'image_repo', '/mlbaiht')

Import the contents of the file into the environment

In [None]:
%dotenv -o

Create the Docker directory.

In [None]:
!mkdir -p Docker

Add to the directory the conda environment file that specifies the virtual environment for the training script. This includes the Python modules needed to run the training script.

In [None]:
%%writefile Docker/requirements.txt

pandas==0.23.4
scikit-learn==0.19.1
lightgbm==2.1.2


Add to the directory the dockerfile specifying the build.

In [None]:
%%writefile Docker/Dockerfile

# Start from a Ubuntu image
FROM python:3.5.6-stretch

# Copy into the image the definition of the requirements.
COPY requirements.txt .

# Install the requiremnts.
RUN python -m pip install -r requirements.txt
    
# Define the entry point to call Python and list the environment contents.
CMD ["python", "-m", "pip", "freeze"]


Create the docker image. The first time this is run, this could under a minute.

In [None]:
%%time
image_name = os.getenv('docker_login') + os.getenv('image_repo')
print('Creating Docker image {}'.format(image_name))
!docker build -t $image_name Docker --no-cache

Push the image to the docker repo.

In [None]:
%%time
!docker push $image_name

We can now test our image with our script locally. The `volume` argument maps the local directory that contains our data and script to `/data` in the container. Then, we call Python with a path to the `TrainTestClassifier.py` script, and we pass as an argument to the script the path to the directory that contains the input files.

This should take less than two minutes.

In [None]:
%%time
!docker run --volume $(pwd):/data $image_name python /data/TrainTestClassifier.py --inputs /data

In [the next notebook](02_Configure_Batch_AI.ipynb), we create a file to contain the Batch AI configuration we will use.