<img src="./documentation/images/ibm-logo.png" alt="ibm-logo" align="center" style="width: 200px;"/>

**AI ENTERPRISE WORKFLOW CERTIFICATION**

<hr />

### Capstone Project - Model Production

# Outline

1. Build a draft version of an API with train, predict, and logfile endpoints.
2. Using Docker, bundle your API, model, and unit tests.
3. Using test-driven development iterate on your API in a way that anticipates scale, load, and drift.
4. Create a post-production analysis script that investigates the relationship between model performance and the business metric.
5. Articulate your summarized findings in a final report.

At a higher level you are being asked to:

1. Ready your model for deployment
2. Query your API with new data and test your monitoring tools
3. Compare your results to the gold standard

# Deliverables

### Create a flask API

In [1]:
%%writefile app.py

import argparse
from flask import Flask, jsonify, request
from flask import render_template
import joblib
import socket
import json
import numpy as np
import pandas as pd
import os

## import model specific functions and variables
from modelling import *
from logger import *

app = Flask(__name__)

@app.route("/")
def hello():
    html = "<h3>Hello {name}!</h3>" \
           "<b>Hostname:</b> {hostname}<br/>"
    return html.format(name=os.getenv("NAME", "world"), hostname=socket.gethostname())

@app.route('/predict', methods=['GET','POST'])
def predict():
    """
    basic predict function for the API
    """
    
    ## input checking
    if not request.json:
        print("ERROR: API (predict): did not receive request data")
        return jsonify([])
    
    if 'country' not in request.json:
        print("ERROR API (predict): received request, but no 'country' found within")
        return jsonify(False)
        
    if 'year' not in request.json:
        print("ERROR API (predict): received request, but no 'year' found within")
        return jsonify(False)
        
    if 'month' not in request.json:
        print("ERROR API (predict): received request, but no 'month' found within")
        return jsonify(False)
        
    if 'day' not in request.json:
        print("ERROR API (predict): received request, but no 'day' found within")
        return jsonify(False)
    
    if 'dev' not in request.json:
        print("ERROR API (predict): received request, but no 'dev' found within")
        return jsonify([])
    
    if 'verbose' not in request.json:
        print("WARNING API (predict): received request, but no 'verbose' found within")
        verbose = 'True'
    else:
        verbose = request.json['verbose']
        
    ## predict
    _result = result = model_predict(year=request.json['year'],
                                     month=request.json['month'],
                                     day=request.json['day'],
                                     country=request.json['country'],
                                     dev=request.json['dev']=="True",
                                     verbose=verbose=="True")
    
    result = {}
    ## convert numpy objects so ensure they are serializable
    for key,item in _result.items():
        if isinstance(item,np.ndarray):
            result[key] = item.tolist()
        else:
            result[key] = item

    return(jsonify(result))

@app.route('/train', methods=['GET','POST'])
def train():
    """
    basic train function for the API

    the 'dev' give you the ability to toggle between a DEV version and a PROD verion of training
    """

    if not request.json:
        print("ERROR: API (train): did not receive request data")
        return jsonify(False)

    if 'dev' not in request.json:
        print("ERROR API (train): received request, but no 'dev' found within")
        return jsonify(False)
    
    if 'verbose' not in request.json:
        print("WARNING API (predict): received request, but no 'verbose' found within")
        verbose = 'True'
    else:
        verbose = request.json['verbose']

    print("... training model")
    model = model_train(dev=request.json['dev']=="True", verbose=verbose=="True")
    print("... training complete")

    return(jsonify(True))

@app.route('/logging', methods=['GET','POST'])
def load_logs():
    """
    basic logging function for the API
    """

    if not request.json:
        print("ERROR: API (train): did not receive request data")
        return jsonify(False)

    if 'env' not in request.json:
        print("ERROR API (log): received request, but no 'env' found within")
        return jsonify(False)
        
    if 'type' not in request.json:
        print("ERROR API (log): received request, but no 'type' found within")
        return jsonify(False)
        
    if 'month' not in request.json:
        print("ERROR API (log): received request, but no 'month' found within")
        return jsonify(False)
        
    if 'year' not in request.json:
        print("ERROR API (log): received request, but no 'year' found within")
        return jsonify(False)
    
    print("... fetching logfile")
    logfile = log_load(env=request.json['env'],
                       tag=request.json['type'],
                       year=request.json['year'],
                       month=request.json['month'])
    
    result = {}
    result["logfile"]=logfile
    return(jsonify(result))

if __name__ == '__main__':

    ## parse arguments for debug mode
    ap = argparse.ArgumentParser()
    ap.add_argument("-d", "--debug", action="store_true", help="debug flask")
    args = vars(ap.parse_args())

    if args["debug"]:
        app.run(debug=True, port=8080)
    else:
        app.run(host='0.0.0.0', threaded=True ,port=8080)

Overwriting app.py


**Test the Flask API**

From the project directory I started the app:

```bash
$ python app.py
```

Then went to [http://localhost:8080/](http://localhost:8080/).

I ran the cells below

In [2]:
## API predict
import requests
from ast import literal_eval

query = {"year":"2018","month":"1","day":"5","country":"total","dev":"True","verbose":"True"}
port = 8080
r = requests.post('http://localhost:{}/predict'.format(port),json=query)
response = literal_eval(r.text)
print(response)

{'y_pred': [183770.3050000001]}


In [3]:
## API train
query = {"dev":"True","verbose":"True"}
port = 8080
r = requests.post('http://localhost:{}/train'.format(port),json=query)

In [4]:
## API logging
query = {"env":"test","type":"train","year":"2020","month":"5"}
port = 8080
r = requests.post('http://localhost:{}/logging'.format(port),json=query)
response = literal_eval(r.text)
print(response)

{'logfile': 'test-train-2020-5.log'}


I stopped the server.  We will relaunch it in a few moments from within Docker.

### Create Unit Tests

In [5]:
%%writefile ./unittests/__init__.py

import unittest
import getopt
import sys
import os

## parse inputs
try:
    optlist, args = getopt.getopt(sys.argv[1:],'v')
except getopt.GetoptError:
    print(getopt.GetoptError)
    print(sys.argv[0] + "-v")
    print("... the verbose flag (-v) may be used")
    sys.exit()

VERBOSE = False
RUNALL = False

sys.path.append(os.path.realpath(os.path.dirname(__file__)))

for o, a in optlist:
    if o == '-v':
        VERBOSE = True

## api tests
from ApiTests import *
ApiTestSuite = unittest.TestLoader().loadTestsFromTestCase(ApiTest)

## model tests
from ModelTests import *
ModelTestSuite = unittest.TestLoader().loadTestsFromTestCase(ModelTest)

## logger tests
from LoggerTests import *
LoggerTestSuite = unittest.TestLoader().loadTestsFromTestCase(LoggerTest)

MainSuite = unittest.TestSuite([ApiTestSuite,ModelTestSuite,LoggerTestSuite])


Overwriting ./unittests/__init__.py


In [6]:
%%writefile ./unittests/ModelTests.py
#!/usr/bin/env python

"""
model tests
"""

import unittest
from modelling import *

class ModelTest(unittest.TestCase):
    """
    test the essential functionality
    """
    
    def test_01_train(self):
        """
        test the train functionality
        """
    
        ## train the model
        model_train(verbose=False)
        
        prefix = "test" if DEV else "prod"
        models = [f for f in os.listdir(MODEL_DIR) if re.search(prefix,f)]
        self.assertEqual(len(models),11)
        
    def test_02_load(self):
        """
        test the train functionality
        """
        
        ## load the model
        models = model_load(verbose=False)
        
        for tag, model in models.items():
            self.assertTrue("predict" in dir(model))
            self.assertTrue("fit" in dir(model))
        
    def test_03_predict(self):
        """
        test the predict function input
        """
    
        ## query inputs
        query = "2018", "1", "5", "total"
        
        ## load model first
        result = model_predict(year=query[0], month=query[1], day=query[2], country=query[3], verbose=False)
        y_pred = result["y_pred"]
        self.assertTrue(y_pred.dtype==np.float64)
            
    def test_04_predict(self):
        """
        test the predict function accuracy
        """
    
        ## example predict
        example_queries = [("2018", "1", "5", "total"),
                           ("2019", "2", "5", "eire"),
                           ("2018", "12", "5", "france")]
        
        for query in example_queries:
            result = model_predict(year=query[0], month=query[1], day=query[2], country=query[3], verbose=False)
            y_pred = result["y_pred"]
            self.assertTrue(y_pred.dtype==np.float64)
            
## run the tests
if __name__ == "__main__":
    unittest.main()

Overwriting ./unittests/ModelTests.py


In [7]:
%run ./unittests/ModelTests.py

....
----------------------------------------------------------------------
Ran 4 tests in 112.186s

OK


In [8]:
%%writefile ./unittests/LoggerTests.py
#!/usr/bin/env python
"""
logger tests
"""

import unittest
## import model specific functions and variables
from logger import *

class LoggerTest(unittest.TestCase):
    """
    test the essential log functionality
    """
        
    def test_01_train(self):
        """
        test the train functionality
        """

        ## train logfile
        today = date.today()
        logfile = "{}-train-{}-{}.log".format("test",today.year,today.month)
        log_path = os.path.join(LOG_DIR, logfile)
        
        self.assertTrue(os.path.exists(log_path))

    def test_02_predict(self):
        """
        test the predict functionality
        """
        
        ## train logfile
        today = date.today()
        logfile = "{}-predict-{}-{}.log".format("test",today.year,today.month)
        log_path = os.path.join(LOG_DIR, logfile)
        
        self.assertTrue(os.path.exists(log_path))

    def test_03_load(self):
        """
        test the load functionality
        """

        ## load model first
        logfile = log_load(env="test",tag="train",year=2020,month=5, verbose=False)
        logpath = os.path.join(LOG_DIR, logfile)
        with open(logpath, "r") as log:
            text = log.read()
        self.assertTrue(len(text.split("\n"))>2)

        
### Run the tests
if __name__ == '__main__':
    unittest.main()


Overwriting ./unittests/LoggerTests.py


In [9]:
%run ./unittests/LoggerTests.py

...
----------------------------------------------------------------------
Ran 3 tests in 0.035s

OK


In [10]:
%%writefile ./unittests/ApiTests.py
#!/usr/bin/env python
"""
api tests

these tests use the requests package however similar requests can be made with curl

e.g.
data = '{"key":"value"}'
curl -X POST -H "Content-Type: application/json" -d "%s" http://localhost:8080/predict'%(data)
"""

import sys
import os
import unittest
import requests
import re
from ast import literal_eval
import numpy as np
import pandas as pd

port = 8080

try:
    requests.post('http://localhost:{}/predict'.format(port))
    server_available = True
except:
    server_available = False
    
## test class for the main window function
class ApiTest(unittest.TestCase):
    """
    test the essential functionality
    """
    
    @unittest.skipUnless(server_available,"local server is not running")
    def test_predict(self):
        """
        test the predict functionality
        """
        
        query = {"year":"2018","month":"1","day":"5","country":"total","dev":"True","verbose":"True"}
        r = requests.post('http://localhost:{}/predict'.format(port),json=query)
        response = literal_eval(r.text)
        self.assertTrue(isinstance(response["y_pred"][0], float))

    @unittest.skipUnless(server_available,"local server is not running")
    def test_train(self):
        """
        test the train functionality
        """
      
        query = {"dev":"True","verbose":"False"}
        r = requests.post('http://localhost:{}/train'.format(port),json=query)
        train_complete = re.sub("\W+","",r.text)
        self.assertEqual(train_complete,'true')
        
    @unittest.skipUnless(server_available,"local server is not running")
    def test_logging(self):
        """
        test the logging functionality
        """
        
        query = {"env":"test","type":"train","year":"2020","month":"5"}
        r = requests.post('http://localhost:{}/logging'.format(port),json=query)
        response = literal_eval(r.text)
        self.assertEqual(response.get("logfile"),'test-train-2020-5.log')

### Run the tests
if __name__ == '__main__':
    unittest.main()

Overwriting ./unittests/ApiTests.py


In [11]:
%run ./unittests/ApiTests.py

...
----------------------------------------------------------------------
Ran 3 tests in 100.579s

OK


In [12]:
%%writefile run-tests.py
#!/usr/bin/python 

import sys
import unittest

from unittests import *
unittest.main()

Overwriting run-tests.py


**Run Unit Tests with a single script**

```bash
    ~$ python run-tests.py
```

### Docker Container

**Create Docker File**

Before we build the DockerFile first we need to create a requirement.txt

In [13]:
%%writefile requirements.txt

cython
numpy
flask
pandas
scikit-learn
matplotlib
IPython
seaborn

Overwriting requirements.txt


In [14]:
%%writefile Dockerfile

# Use an official Python runtime as a parent image
FROM python:3.7.5-stretch

RUN apt-get update && apt-get install -y \
python3-dev \
build-essential    
        
# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["python", "app.py"]

Overwriting Dockerfile


**Build the Docker image and run it**

Step one: build the image (from the directory that was created with this notebook)
 
```bash
    ~$ docker build -t capstone-ml-app .
```

Check that the image is there.

```bash
    ~$ docker image ls
```

You may notice images that you no longer use.  You may delete them with

```bash
    ~$ docker image rm IMAGE_ID_OR_NAME
```

Run the container

```bash
docker run -p 4000:8080 capstone-ml-app
```

**Test the running app**

First go to [http://localhost:4000/](http://localhost:4000/) to ensure the app is running and accessible.

In [15]:
## API predict

query = {"year":"2018","month":"1","day":"5","country":"total","dev":"True","verbose":"True"}
port = 4000
r = requests.post('http://localhost:{}/predict'.format(port),json=query)
response = literal_eval(r.text)
print(response)

{'y_pred': [183770.3050000001]}


In [16]:
## API logging

query = {"env":"test","type":"train","year":"2020","month":"5"}
port = 4000
r = requests.post('http://localhost:{}/logging'.format(port),json=query)
response = literal_eval(r.text)
print(response)

{'logfile': 'test-train-2020-5.log'}


### Post Production Analysis