Benchmarking for data science algorithms workflow: Example #1
====

**Algorithm:** Support Vector Classification of MNIST digit images.

**Benchmarks:** Compare training time, prediction time and performance of the classifier on the local machine vs a Docker container on the local machine and compare different versions of the algorithm

### Before getting started

1) [Install Docker](https://docs.docker.com/v17.09/engine/installation/) on all machines required for benchmarking and set up your account on [Docker Hub](https://hub.docker.com). In this example, we are just using the local machine. 

2) Make sure the software (data science algorithm) you are developing is version controlled so you can push newer versions to GitHub

3) Your algorithm/code should output benchmark data that can can be collected when it is run

In [1]:
from IPython.display import HTML, display
import tabulate
import ast

Run our algorithm locally and collect benchmark results
---

Version 1.0 and version 1.1 of the code use different classifier models:

In [2]:
%%writefile classifier.py
from sklearn import datasets, svm, metrics
import time

def results():

    digits = datasets.load_digits()

    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1))

    expected = digits.target[n_samples // 2:]

    models = [svm.SVC(gamma=0.01),
          svm.SVC(gamma=0.001)]

    ### Version 1.0 ###

    classifier = models[0]

    ### Version 1.1 ###

#     classifier = models[1]

    start = time.time()
    classifier.fit(data[:n_samples // 2], digits.target[:n_samples // 2])
    end = time.time()
    training_time = end - start

    start = time.time()
    predicted = classifier.predict(data[n_samples // 2:])
    end = time.time()
    classifier_time = end - start

    report = metrics.classification_report(expected, predicted, output_dict=True)

    performance = report['micro avg']['f1-score']

    return([metrics.classification_report(expected, predicted), {"Training time (s)": training_time, "Prediction time (s)": classifier_time,
    "Performance (micro avg f1 score)": report['micro avg']['f1-score']}])


Writing classifier.py


In [3]:
import classifier
local_report, local_results = classifier.results()
print(local_report)

              precision    recall  f1-score   support

           0       1.00      0.65      0.79        88
           1       1.00      0.74      0.85        91
           2       1.00      0.64      0.78        86
           3       1.00      0.64      0.78        91
           4       1.00      0.55      0.71        92
           5       0.93      0.98      0.95        91
           6       1.00      0.68      0.81        91
           7       1.00      0.49      0.66        89
           8       0.25      1.00      0.40        88
           9       1.00      0.61      0.76        92

   micro avg       0.70      0.70      0.70       899
   macro avg       0.92      0.70      0.75       899
weighted avg       0.92      0.70      0.75       899



### Get median benchmarks

In [4]:
import classifier
import statistics as st

repeats = 10
tt = []
pt = []
t = []

for i in range(0,repeats):
    
    report, results = classifier.results()
    tt.append(results["Training time (s)"])
    pt.append(results["Prediction time (s)"])
    t.append(results["Performance (micro avg f1 score)"])
    
local_results = {"Training time (s)": st.median(tt), "Prediction time (s)": st.median(pt),
    "Performance (micro avg f1 score)": st.median(t)}

Build a Docker image for our algorithm and push to Docker Hub
---

1) Create a docker image for installing your software on a linux distribution with the bare essential dependencies and outputting the benchmark stats. Example Dockerfile [here](https://github.com/edwardchalstrey1/benchmarking_test/blob/master/classifier/Dockerfile).

2) Build a docker container and tag the version as latest. Optionally also tag a version: ```docker build -t edwardchalstrey/classifier:latest -t edwardchalstrey/classifier:1.0 .```

3) Push to Docker Hub. This allows you to then pull the container to any machine you wish to benchmark on

### The Dockerfile

In [5]:
%%writefile Dockerfile
FROM python:3

RUN apt-get update
RUN pip3 install numpy
RUN pip3 install scipy
RUN pip3 install scikit-learn

COPY classifier.py /classifier.py
COPY iterate_benchmarks.py /iterate_benchmarks.py

CMD python3 iterate_benchmarks.py

Writing Dockerfile


### Script to run benchmarks within Docker container

In [6]:
%%writefile iterate_benchmarks.py 
import classifier
import statistics as st

repeats = 10
tt = []
pt = []
t = []

for i in range(0,repeats):
    
    report, results = classifier.results()
    tt.append(results["Training time (s)"])
    pt.append(results["Prediction time (s)"])
    t.append(results["Performance (micro avg f1 score)"])
    
results = {"Training time (s)": st.median(tt), "Prediction time (s)": st.median(pt),
    "Performance (micro avg f1 score)": st.median(t)}
print(results)

Writing iterate_benchmarks.py


In [7]:
%%bash
docker build -t edwardchalstrey/classifier:1.0 .

Sending build context to Docker daemon  105.5kB
Step 1/8 : FROM python:3
 ---> ac069ebfe1e1
Step 2/8 : RUN apt-get update
 ---> Using cache
 ---> 5a84d23aa7b5
Step 3/8 : RUN pip3 install numpy
 ---> Using cache
 ---> 4383ac463a3b
Step 4/8 : RUN pip3 install scipy
 ---> Using cache
 ---> 6fa2c9da864b
Step 5/8 : RUN pip3 install scikit-learn
 ---> Using cache
 ---> 0b888dbaed11
Step 6/8 : COPY classifier.py /classifier.py
 ---> Using cache
 ---> 332880ae4763
Step 7/8 : COPY iterate_benchmarks.py /iterate_benchmarks.py
 ---> Using cache
 ---> cdbcdd0db4ae
Step 8/8 : CMD python3 iterate_benchmarks.py
 ---> Using cache
 ---> 7175da4a27c8
Successfully built 7175da4a27c8
Successfully tagged edwardchalstrey/classifier:1.0


In [8]:
%%bash
docker push edwardchalstrey/classifier:1.0

The push refers to repository [docker.io/edwardchalstrey/classifier]
a4b781a8bd5f: Preparing
9f74b5180e0a: Preparing
f12d5b3c5c82: Preparing
337d3babfd9c: Preparing
493622b04a5f: Preparing
65ef2276d16f: Preparing
4b381ae03f9a: Preparing
08a5b66845ac: Preparing
88a85bcf8170: Preparing
65860ac81ef4: Preparing
a22a5ac18042: Preparing
6257fa9f9597: Preparing
578414b395b9: Preparing
abc3250a6c7f: Preparing
13d5529fd232: Preparing
65ef2276d16f: Waiting
4b381ae03f9a: Waiting
6257fa9f9597: Waiting
578414b395b9: Waiting
08a5b66845ac: Waiting
abc3250a6c7f: Waiting
13d5529fd232: Waiting
88a85bcf8170: Waiting
65860ac81ef4: Waiting
a22a5ac18042: Waiting
a4b781a8bd5f: Layer already exists
9f74b5180e0a: Layer already exists
337d3babfd9c: Layer already exists
f12d5b3c5c82: Layer already exists
493622b04a5f: Layer already exists
4b381ae03f9a: Layer already exists
08a5b66845ac: Layer already exists
88a85bcf8170: Layer already exists
65860ac81ef4: Layer already exists
65ef2276d16f: Layer already exists
a

Run the docker container and collect benchmark stats
-----

Here I save the results to stdout, but you could instead save the benchmark stats to a file within the container then use ```docker cp``` to move this outside the container.

In [9]:
%%bash --out docker_results
docker run edwardchalstrey/classifier:1.0

In [10]:
docker_results = ast.literal_eval(docker_results)

How do they compare?
---

In this example, the benchmark stats I have collected are the preformance stats measured by sci-kit learn, as well as the time taken to fit the classification model and the time it takes to predict the catagories of the test data.

Here I have labelled the results from running the algorithm code on my machine directly as "Basic" and the Docker version as "Container".

In [11]:
headers = ["Version"]
c_results = ["Basic 1.0"]
d_results = ["Container 1.0"]
for k, v in local_results.items():
    headers.append(k)
    c_results.append(v)
for k, v in docker_results.items():
    d_results.append(v)
display(HTML(tabulate.tabulate([headers, c_results, d_results], tablefmt='html')))

0,1,2,3
Version,Training time (s),Prediction time (s),Performance (micro avg f1 score)
Basic 1.0,0.10748064517974854,0.05083489418029785,0.6974416017797553
Container 1.0,0.14680159091949463,0.07211601734161377,0.6974416017797553


Now let's release the next version of our algorithm
----
1) Navigate to Dockerhub and click the [builds tab](https://cloud.docker.com/repository/docker/edwardchalstrey/classifier/builds) for your algorithm's repository and set up [build rules](https://docs.docker.com/docker-hub/builds/) to your liking. Configure it to use the github repo where your algorithm code is mantained. In this case I have set it to simply build a new version of the container and tag as ```edwardchalstrey/classifier:latest``` whenever there is a push to the master branch of [my GitHub repo](https://github.com/edwardchalstrey1/benchmarking_test). Here I will also tag separate versions.

2) Edit the classifier algorithm to create a new version (e.g. version 1.1)

3) Commit and push changes to github

4) The latest version can then be pulled: ```docker pull edwardchalstrey/classifier:latest```

**Instead of using the automated build here I do a regular build:**

In [12]:
%%writefile classifier.py
from sklearn import datasets, svm, metrics
import time

def results():

    digits = datasets.load_digits()

    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1))

    expected = digits.target[n_samples // 2:]

    models = [svm.SVC(gamma=0.01),
          svm.SVC(gamma=0.001)]

    ### Version 1.0 ###

#     classifier = models[0]

    ### Version 1.1 ###

    classifier = models[1]

    start = time.time()
    classifier.fit(data[:n_samples // 2], digits.target[:n_samples // 2])
    end = time.time()
    training_time = end - start

    start = time.time()
    predicted = classifier.predict(data[n_samples // 2:])
    end = time.time()
    classifier_time = end - start

    report = metrics.classification_report(expected, predicted, output_dict=True)

    performance = report['micro avg']['f1-score']

    return([metrics.classification_report(expected, predicted), {"Training time (s)": training_time, "Prediction time (s)": classifier_time,
    "Performance (micro avg f1 score)": report['micro avg']['f1-score']}])


Overwriting classifier.py


In [13]:
%%bash
docker build -t edwardchalstrey/classifier:1.1 .

Sending build context to Docker daemon  105.5kB
Step 1/8 : FROM python:3
 ---> ac069ebfe1e1
Step 2/8 : RUN apt-get update
 ---> Using cache
 ---> 5a84d23aa7b5
Step 3/8 : RUN pip3 install numpy
 ---> Using cache
 ---> 4383ac463a3b
Step 4/8 : RUN pip3 install scipy
 ---> Using cache
 ---> 6fa2c9da864b
Step 5/8 : RUN pip3 install scikit-learn
 ---> Using cache
 ---> 0b888dbaed11
Step 6/8 : COPY classifier.py /classifier.py
 ---> Using cache
 ---> 8ac8658c2951
Step 7/8 : COPY iterate_benchmarks.py /iterate_benchmarks.py
 ---> Using cache
 ---> d74a3fe73239
Step 8/8 : CMD python3 iterate_benchmarks.py
 ---> Using cache
 ---> 1b3f600ea676
Successfully built 1b3f600ea676
Successfully tagged edwardchalstrey/classifier:1.1


In [14]:
%%bash
docker push edwardchalstrey/classifier:1.1

The push refers to repository [docker.io/edwardchalstrey/classifier]
18661e28fe84: Preparing
17e84f583592: Preparing
f12d5b3c5c82: Preparing
337d3babfd9c: Preparing
493622b04a5f: Preparing
65ef2276d16f: Preparing
4b381ae03f9a: Preparing
08a5b66845ac: Preparing
88a85bcf8170: Preparing
4b381ae03f9a: Waiting
65860ac81ef4: Preparing
a22a5ac18042: Preparing
6257fa9f9597: Preparing
578414b395b9: Preparing
abc3250a6c7f: Preparing
13d5529fd232: Preparing
08a5b66845ac: Waiting
88a85bcf8170: Waiting
65860ac81ef4: Waiting
a22a5ac18042: Waiting
6257fa9f9597: Waiting
578414b395b9: Waiting
abc3250a6c7f: Waiting
13d5529fd232: Waiting
65ef2276d16f: Waiting
493622b04a5f: Layer already exists
f12d5b3c5c82: Layer already exists
337d3babfd9c: Layer already exists
18661e28fe84: Layer already exists
17e84f583592: Layer already exists
65ef2276d16f: Layer already exists
08a5b66845ac: Layer already exists
4b381ae03f9a: Layer already exists
65860ac81ef4: Layer already exists
88a85bcf8170: Layer already exists
a

You can then collect benchmark stats again for the new version of your algorithm and compare to previous versions and other machines/environments
---

In [15]:
%%bash --out docker_results_1
docker run edwardchalstrey/classifier:1.1

In [16]:
docker_results_1 = ast.literal_eval(docker_results_1)

In [17]:
d1_results = ["Container 1.1"]
for k, v in docker_results_1.items():
    d1_results.append(v)
display(HTML(tabulate.tabulate([headers, c_results, d_results, d1_results], tablefmt='html')))

0,1,2,3
Version,Training time (s),Prediction time (s),Performance (micro avg f1 score)
Basic 1.0,0.10748064517974854,0.05083489418029785,0.6974416017797553
Container 1.0,0.14680159091949463,0.07211601734161377,0.6974416017797553
Container 1.1,0.04823911190032959,0.04134511947631836,0.9688542825361512
