Benchmarking for data science algorithms workflow: Example #1
====

**Algorithm:** Support Vector Classification of digit images (with sci-kit learn). Python code [here](https://github.com/edwardchalstrey1/benchmarking_test/blob/master/classifier/classifier.py).

**Task:** Compare speed and performance of the algorithm in different evironments and for different versions 

### Before getting started

1) [Install Docker](https://docs.docker.com/v17.09/engine/installation/) on all machines required for benchmarking and set up your account on [Docker Hub](https://hub.docker.com). In this example, we are just using the local machine. 

2) Make sure the software (data science algorithm) you are developing is version controlled so you can push newer versions to GitHub

3) Your algorithm/code should output benchmark data that can can be collected when it is run

In [1]:
from IPython.display import HTML, display
import tabulate
import ast

Run our algorithm locally and collect benchmark results
---

In [2]:
import classifier

In [3]:
local_report, local_results = classifier.results()

In [23]:
print(local_report) # Sci-kit learns performance stats:

              precision    recall  f1-score   support

           0       1.00      0.65      0.79        88
           1       1.00      0.74      0.85        91
           2       1.00      0.64      0.78        86
           3       1.00      0.64      0.78        91
           4       1.00      0.55      0.71        92
           5       0.93      0.98      0.95        91
           6       1.00      0.68      0.81        91
           7       1.00      0.49      0.66        89
           8       0.25      1.00      0.40        88
           9       1.00      0.61      0.76        92

   micro avg       0.70      0.70      0.70       899
   macro avg       0.92      0.70      0.75       899
weighted avg       0.92      0.70      0.75       899



Build a Docker image for our algorithm and push to Docker Hub
---

1) Create a docker image for installing your software on a linux distribution with the bare essential dependencies and outputting the benchmark stats. Example Dockerfile [here](https://github.com/edwardchalstrey1/benchmarking_test/blob/master/classifier/Dockerfile).

2) Build a docker container and tag the version as latest. Optionally also tag a version: ```docker build -t edwardchalstrey/classifier:latest -t edwardchalstrey/classifier:1.0 .```

3) Push to Docker Hub. This allows you to then pull the container to any machine you wish to benchmark on

In [4]:
%%bash
docker build -t edwardchalstrey/classifier:latest .

Sending build context to Docker daemon  56.32kB
Step 1/8 : FROM python:3
 ---> ac069ebfe1e1
Step 2/8 : RUN apt-get update
 ---> Using cache
 ---> 5a84d23aa7b5
Step 3/8 : RUN pip3 install numpy
 ---> Using cache
 ---> 4383ac463a3b
Step 4/8 : RUN pip3 install scipy
 ---> Using cache
 ---> 6fa2c9da864b
Step 5/8 : RUN pip3 install scikit-learn
 ---> Using cache
 ---> 0b888dbaed11
Step 6/8 : COPY classifier.py /classifier.py
 ---> Using cache
 ---> 584d3a9d4955
Step 7/8 : COPY display_classifier_results.py /display_classifier_results.py
 ---> Using cache
 ---> aaa2e8a68fa3
Step 8/8 : CMD python3 display_classifier_results.py
 ---> Using cache
 ---> 9cfc28a647ce
Successfully built 9cfc28a647ce
Successfully tagged edwardchalstrey/classifier:latest


In [5]:
%%bash
docker push edwardchalstrey/classifier

The push refers to repository [docker.io/edwardchalstrey/classifier]
c9a5858f6be4: Preparing
da35d2adcbff: Preparing
f12d5b3c5c82: Preparing
337d3babfd9c: Preparing
493622b04a5f: Preparing
65ef2276d16f: Preparing
4b381ae03f9a: Preparing
08a5b66845ac: Preparing
65ef2276d16f: Waiting
4b381ae03f9a: Waiting
08a5b66845ac: Waiting
88a85bcf8170: Preparing
65860ac81ef4: Preparing
a22a5ac18042: Preparing
65860ac81ef4: Waiting
88a85bcf8170: Waiting
6257fa9f9597: Preparing
578414b395b9: Preparing
abc3250a6c7f: Preparing
13d5529fd232: Preparing
578414b395b9: Waiting
abc3250a6c7f: Waiting
13d5529fd232: Waiting
6257fa9f9597: Waiting
a22a5ac18042: Waiting
da35d2adcbff: Layer already exists
c9a5858f6be4: Layer already exists
493622b04a5f: Layer already exists
f12d5b3c5c82: Layer already exists
337d3babfd9c: Layer already exists
4b381ae03f9a: Layer already exists
65860ac81ef4: Layer already exists
65ef2276d16f: Layer already exists
08a5b66845ac: Layer already exists
88a85bcf8170: Layer already exists
6

In [6]:
%%bash
docker pull edwardchalstrey/classifier

Using default tag: latest
latest: Pulling from edwardchalstrey/classifier
Digest: sha256:e8cdb53e7a420b9b408a65ad06939932d6429e118fa4de3f549cac1f069be33c
Status: Image is up to date for edwardchalstrey/classifier:latest


Run the docker container and collect benchmark stats
-----

Here I save the results to stdout, but you could instead save the benchmark stats to a file within the container then use ```docker cp``` to move this outside the container.

In [7]:
%%bash --out docker_results
docker run edwardchalstrey/classifier:latest

In [8]:
docker_results = ast.literal_eval(docker_results)
docker_report, docker_results = docker_results

How do they compare?
---

In this example, the benchmark stats I have collected are the preformance stats measured by sci-kit learn, as well as the time taken to fit the classification model and the time it takes to predict the catagories of the test data.

Here I have labelled the results from running the algorithm code on my machine directly as "Basic" and the Docker version as "Container".

In [9]:
headers = ["Version"]
c_results = ["Basic 1.0"]
d_results = ["Container 1.0"]
for k, v in local_results.items():
    headers.append(k)
    c_results.append(v)
for k, v in docker_results.items():
    d_results.append(v)
display(HTML(tabulate.tabulate([headers, c_results, d_results], tablefmt='html')))

0,1,2,3
Version,Training time (s),Prediction time (s),Performance (micro avg f1 score)
Basic 1.0,0.10861325263977051,0.052420854568481445,0.6974416017797553
Container 1.0,0.1337735652923584,0.07343864440917969,0.6974416017797553


In [10]:
print(local_report) # Performance stats are identical
print(docker_report)

              precision    recall  f1-score   support

           0       1.00      0.65      0.79        88
           1       1.00      0.74      0.85        91
           2       1.00      0.64      0.78        86
           3       1.00      0.64      0.78        91
           4       1.00      0.55      0.71        92
           5       0.93      0.98      0.95        91
           6       1.00      0.68      0.81        91
           7       1.00      0.49      0.66        89
           8       0.25      1.00      0.40        88
           9       1.00      0.61      0.76        92

   micro avg       0.70      0.70      0.70       899
   macro avg       0.92      0.70      0.75       899
weighted avg       0.92      0.70      0.75       899

              precision    recall  f1-score   support

           0       1.00      0.65      0.79        88
           1       1.00      0.74      0.85        91
           2       1.00      0.64      0.78        86
           3       1.00 

Now let's release the next version of our algorithm
----
1) Navigate to Dockerhub and click the [builds tab](https://cloud.docker.com/repository/docker/edwardchalstrey/classifier/builds) for your algorithm's repository and set up [build rules](https://docs.docker.com/docker-hub/builds/) to your liking. Configure it to use the github repo where your algorithm code is mantained. In this case I have set it to simply build a new version of the container and tag as ```edwardchalstrey/classifier:latest``` whenever there is a push to the master branch of [my GitHub repo](https://github.com/edwardchalstrey1/benchmarking_test).

2) Edit the classifier algorithm to create a new version (e.g. version 1.1)

3) Commit and push changes to github

4) The latest version can then be pulled: ```docker pull edwardchalstrey/classifier:latest```

**Instead of using the automated build here I do a regular build:**

In [11]:
%%bash
docker build -t edwardchalstrey/classifier:latest .

Sending build context to Docker daemon  56.32kB
Step 1/8 : FROM python:3
 ---> ac069ebfe1e1
Step 2/8 : RUN apt-get update
 ---> Using cache
 ---> 5a84d23aa7b5
Step 3/8 : RUN pip3 install numpy
 ---> Using cache
 ---> 4383ac463a3b
Step 4/8 : RUN pip3 install scipy
 ---> Using cache
 ---> 6fa2c9da864b
Step 5/8 : RUN pip3 install scikit-learn
 ---> Using cache
 ---> 0b888dbaed11
Step 6/8 : COPY classifier.py /classifier.py
 ---> 21a5d2032eb4
Step 7/8 : COPY display_classifier_results.py /display_classifier_results.py
 ---> f8cbb7851539
Step 8/8 : CMD python3 display_classifier_results.py
 ---> Running in 4f3510693745
Removing intermediate container 4f3510693745
 ---> cbf3fe391e8e
Successfully built cbf3fe391e8e
Successfully tagged edwardchalstrey/classifier:latest


You can then collect benchmark stats again for the new version of your algorithm and compare to previous versions and other machines/environments
---

In [12]:
%%bash --out docker_results_1
docker run edwardchalstrey/classifier:latest

In [21]:
docker_report_1, docker_results_1 = docker_results_1

In [22]:
d1_results = ["Container 1.1"]
for k, v in docker_results_1.items():
    d1_results.append(v)
display(HTML(tabulate.tabulate([headers, c_results, d_results, d1_results], tablefmt='html')))

0,1,2,3
Version,Training time (s),Prediction time (s),Performance (micro avg f1 score)
Basic 1.0,0.10861325263977051,0.052420854568481445,0.6974416017797553
Container 1.0,0.1337735652923584,0.07343864440917969,0.6974416017797553
Container 1.1,0.044446706771850586,0.04564070701599121,0.9688542825361512


In [25]:
print(docker_report_1)

              precision    recall  f1-score   support

           0       1.00      0.99      0.99        88
           1       0.99      0.97      0.98        91
           2       0.99      0.99      0.99        86
           3       0.98      0.87      0.92        91
           4       0.99      0.96      0.97        92
           5       0.95      0.97      0.96        91
           6       0.99      0.99      0.99        91
           7       0.96      0.99      0.97        89
           8       0.94      1.00      0.97        88
           9       0.93      0.98      0.95        92

   micro avg       0.97      0.97      0.97       899
   macro avg       0.97      0.97      0.97       899
weighted avg       0.97      0.97      0.97       899

