# Capstone Deliverables

In [1]:
import os
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import Image
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

## Testing

Are there unit tests for the API?

    Yes - unittests/ApiTests.py

Are there unit tests for the model?

    Yes - unittests/ModelTests.py

Are there unit tests for the logging?

    Yes - unittests/LoggerTests.py

Can all of the unit tests be run with a single script and do all of the unit tests pass?

    Yes - run-tests.py

In [2]:
!python run-tests.py -v


test_01_train (ApiTests.ApiTest)
test the train functionality ... ok
test_02_predict_empty (ApiTests.ApiTest)
ensure appropriate failure types ... ok
test_03_predict (ApiTests.ApiTest)
test the predict functionality ... Result from test_03_predict: {"country":"all","target_date":"2018-01-05","y_known":[183479.624],"y_pred":[180107.04546666666],"y_proba":null}

ok
test_04_logs (ApiTests.ApiTest)
test the log functionality ... ok
test_01_train (LoggerTests.LoggerTest)
ensure log file is created ... ok
test_02_train (LoggerTests.LoggerTest)
ensure that content can be retrieved from log file ... ok
test_03_predict (LoggerTests.LoggerTest)
ensure log file is created ... ok
test_04_predict (LoggerTests.LoggerTest)
ensure that content can be retrieved from log file ... ok
test_01_train (ModelTests.ModelTest)
test the train functionality ... ... test flag on
...... sub-setting data
...... sub-setting countries
... loading ts data from files
... saving test version of mod

Is there a mechanism to monitor performance?

    WIP - ran out of time

Was there an attempt to isolate the read/write unit tests from production models and logs?

    Yes
    (1) by creating "test versions of models" distinguished by the prefix to model job name "test-"
    (2) by creating test logs for training and prediction

## API, ingestion and model

Does the API work as expected? For example, can you get predictions for a specific country as well as for all countries combined?

    Yes

In [4]:
!docker run -p 4000:8080 capstone

 * Serving Flask app "app" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)
^C


In [5]:
!python query-api.py

    Results:
               country target_date  y_known     y_pred       y_proba
    0             all  2017-11-30  170445.260  170418.35720    None
    0             all  2017-12-31  173092.473  170589.10270    None
    0             all  2018-01-31  129925.585  133476.87405    None
    0             all  2018-02-28  263632.401  269517.11650    None
    0             all  2018-03-31  116642.342  117181.30100    None
    0             all  2018-04-30  144481.060  145168.12250    None
    0             all  2018-05-31  236716.830  235655.49755    None
    0             all  2018-06-30   93654.340   96385.45850    None
    0             all  2018-07-31  152791.380  148481.70770    None
    0             all  2018-08-31  219264.551  229374.80445    None
    0             all  2018-09-30  270842.590  244044.53360    None
    0             all  2018-10-31  320371.492  318685.87700    None
    0             all  2018-11-30  387279.740  385130.53500    None
    0             all  2018-12-31  178314.260  175340.66410    None
    0             all  2019-01-31  135549.990  130746.47355    None
    0             all  2019-02-28  165443.920  164121.92850    None
    0             all  2019-03-31  125779.021  134689.01955    None
    0             all  2019-04-30  195218.250  188127.34520    None
    0             all  2019-05-31  191391.680  194427.89650    None
    0             all  2019-06-30    1793.980  138915.06960    None
    0  united_kingdom  2017-11-30  160736.980  159026.26332    None
    0  united_kingdom  2017-12-31  148110.823  148124.26808    None
    0  united_kingdom  2018-01-31  116077.065  117399.65768    None
    0  united_kingdom  2018-02-28  217019.201  214622.90820    None
    0  united_kingdom  2018-03-31  103206.131  107707.73292    None
    0  united_kingdom  2018-04-30  125986.030  126940.41360    None
    0  united_kingdom  2018-05-31  224732.640  221815.13448    None
    0  united_kingdom  2018-06-30   77115.230   84733.77644    None
    0  united_kingdom  2018-07-31  135398.810  129720.75160    None
    0  united_kingdom  2018-08-31  201918.791  205726.52100    None
    0  united_kingdom  2018-09-30  235499.470  235164.62640    None
    0  united_kingdom  2018-10-31  299139.572  295841.59880    None
    0  united_kingdom  2018-11-30  375412.060  364785.72128    None
    0  united_kingdom  2018-12-31  163467.850  166148.73200    None
    0  united_kingdom  2019-01-31  122629.750  120224.73296    None
    0  united_kingdom  2019-02-28  142085.490  141947.36444    None
    0  united_kingdom  2019-03-31  104058.691  109359.64300    None
    0  united_kingdom  2019-04-30  174900.620  184747.48936    None
    0  united_kingdom  2019-05-31  170593.560  166606.77960    None
    0  united_kingdom  2019-06-30    1793.980  121572.19012    None


Does the data ingestion exists as a function or script to facilitate automation?

    Yes - see ingestion.py

Were multiple models compared?

    Multiple hyper-parameters were tried using GridSearchCV using RandomForestRegressor.

Did the EDA investigation use visualizations?

    Link to [Other notebook](./EDA.ipynb)

Is everything containerized within a working Docker image?

    Yes - the included Dockerfile creates an image that runs using:

    docker build -t capstone .

    See above for code demonstrating API running in a container and successful query against that API.



Did they use a visualization to compare their model to the baseline model?

    WIP - ran out of time