__Chapter 9 - Embedding a Machine Learning Model into a Web Application__

1. [Import](#Import)
1. [Serializing fiited scikit-learn estimators](#Serializingfiited-scikit-learn-estimators)
1. [Setting up an SQLite database for data storage](#Setting-up-an-SQLite-database-for-data-storage)
1. [Developing a web application with Flask](#Developing-a-web-application-with-Flask)
    1. [Extremely basic Flask app](#Extremely-basic-Flask-app)
    1. [Very basic Flask app](#Very-basic-Flask-app)
    1. [Movie classifier web app](#Movie-classifier-web-app)

# Import

<a id = 'Import'></a>

In [2]:
# standard libary and settings
import os
import sys
import importlib
import itertools
from io import StringIO
import warnings

warnings.simplefilter("ignore")
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:95% !important; }</style>"))

# data extensions and settings
import numpy as np

np.set_printoptions(threshold=np.inf, suppress=True)
import pandas as pd

pd.set_option("display.max_rows", 500)
pd.options.display.float_format = "{:,.6f}".format

# modeling extensions
import sklearn.base as base
import sklearn.cluster as cluster
import sklearn.datasets as datasets
import sklearn.decomposition as decomposition
import sklearn.ensemble as ensemble
import sklearn.feature_extraction as feature_extraction
import sklearn.feature_selection as feature_selection
import sklearn.linear_model as linear_model
import sklearn.metrics as metrics
import sklearn.model_selection as model_selection
import sklearn.neighbors as neighbors
import sklearn.pipeline as pipeline
import sklearn.preprocessing as preprocessing
import sklearn.svm as svm
import sklearn.tree as tree
import sklearn.discriminant_analysis as discriminant_analysis
import sklearn.utils as utils

# visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt

# custom extensions and settings
sys.path.append("/home/mlmachine") if "/home/mlmachine" not in sys.path else None
sys.path.append("/home/prettierplot") if "/home/prettierplot" not in sys.path else None

import mlmachine as mlm
from prettierplot.plotter import PrettierPlot
import prettierplot.style as style

# magic functions
%matplotlib inline

# Serializing fiited scikit-learn estimators

Training a model can take awhile, and we lose it when the Python interpreter closes. Since we don't want to train a model every time we want to use it, we can use the pickle module to save the learned model. Pickle enables us to serialize and deserialize Python objects to compact bytecode so that we can save our classifier in its current state and then reload it later, even after the interpreter has been closed. With this pickle file in hand, we can classify new samples without needing the model to learn from the training data from scratch again.

This was created in the Chapter 8 notebook.



<a id = 'Serializingfiited-scikit-learn-estimators'></a>

In [3]:
# load stop words
import re
import pickle

cur_dir = os.path.dirname("__file__")
stop = pickle.load(
    open(
        os.path.join(
            cur_dir, "ch09_Flask_Apps", "movieClassifier", "pkl_objects", "stopwords.pkl"
        ),
        "rb",
    )
)

# regrex function for removing emoticons and replacing them at the end
def text_processor(text):
    text = re.sub("<[^>]*>", "", text)
    emoticons = re.findall("(?::|;|=)(?:-)?(?:\)|\(|D|P)", text.lower())
    text = re.sub("[\W]+", " ", text.lower()) + " ".join(emoticons).replace("-", "")
    tokenized = [w for w in text.split() if w not in stop]
    return tokenized


# create HashingVectorizer
vect = feature_extraction.text.HashingVectorizer(
    decode_error="ignore",
    n_features=2 ** 21,
    preprocessor=None,
    tokenizer=text_processor,
)

# load classifier
clf = pickle.load(
    open(
        os.path.join(
            cur_dir,
            "ch09_Flask_Apps",
            "movieClassifier",
            "pkl_objects",
            "classifier.pkl",
        ),
        "rb",
    )
)



In [4]:
# review model
clf

SGDClassifier(alpha=0.0001, average=False, class_weight=None,
              early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
              l1_ratio=0.15, learning_rate='optimal', loss='log', max_iter=None,
              n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5,
              random_state=1, shuffle=True, tol=None, validation_fraction=0.1,
              verbose=0, warm_start=False)

In [5]:
# test that model was loaded correctly using obviously positive sample review
label = {0: "negative", 1: "positive"}
example = ["I love this movie. Best movie ever"]
X = vect.transform(example)
print(
    "Prediction: {:s} \nProbability: {:.1f}% \n".format(
        label[clf.predict(X)[0]], np.max(clf.predict_proba(X) * 100)
    )
)

Prediction: positive 
Probability: 91.3% 



In [6]:
# test that model was loaded correctly using obviously negative sample review
label = {0: "negative", 1: "positive"}
example = ["I hate this movie. It really sucks. The worst ever."]
X = vect.transform(example)
print(
    "Prediction: {:s} \nProbability: {:.1f}% \n".format(
        label[clf.predict(X)[0]], np.max(clf.predict_proba(X) * 100)
    )
)

Prediction: negative 
Probability: 96.0% 



# Setting up an SQLite database for data storage

SQLite is an open source SQL database engine that doesn't require a server to operate. It's effectively a single self-contained database file. We can use this to store information about our model and how users are interacting with our model.

<a id = 'Setting-up-an-SQLite-database-for-data-storage'></a>

In [7]:
# load data into sqlite database
import sqlite3

if os.path.exists("ch09_Flask_Apps/movieClassifier/reviews.sqlite"):
    os.remove("ch09_Flask_Apps/movieClassifier/reviews.sqlite")
conn = sqlite3.connect("ch09_Flask_Apps/movieClassifier/reviews.sqlite")
c = conn.cursor()
c.execute("CREATE TABLE review_db(review TEXT, sentiment INTEGER, date TEXT)")

ex1 = "I love this movie"
c.execute(
    "INSERT INTO review_db (review, sentiment, date)" "VALUES (?, ?, DATETIME('now'))",
    (ex1, 1),
)
ex2 = "I disliked this movie"
c.execute(
    "INSERT INTO review_db (review, sentiment, date)" "VALUES (?, ?, DATETIME('now'))",
    (ex2, 0),
)
conn.commit()
conn.close()

In [8]:
# retrieve data from sqlite database
conn = sqlite3.connect("ch09_Flask_Apps/movieClassifier/reviews.sqlite")
c = conn.cursor()
c.execute("SELECT * FROM review_db")
results = c.fetchall()
conn.close()
print(results)

[('I love this movie', 1, '2019-06-30 00:10:58'), ('I disliked this movie', 0, '2019-06-30 00:10:58')]


# Developing a web application with Flask

Flask applications files are in the folder ch09_Flask_Apps

<a id = 'Developing-a-web-application-with-Flask'></a>

## Extremely basic Flask app

<a id = 'Extremely-basic-Flask-app'></a>

In [8]:
#
#%run ch09_Flask_Apps/flaskApp/app.py

## Very basic Flask app

<a id = 'Very-basic-Flask-app'></a>

In [9]:
#
#%run ch09_Flask_Apps/flaskApp2/app.py

## Movie classifier web app

<a id = 'Movie-classifier-web-app'></a>

In [10]:
#
%run ch09_Flask_Apps/movieClassifier/app.py

 * Serving Flask app "app" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [17/Nov/2018 13:22:43] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [17/Nov/2018 13:22:43] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
127.0.0.1 - - [17/Nov/2018 13:22:53] "[37mPOST /results HTTP/1.1[0m" 200 -
127.0.0.1 - - [17/Nov/2018 13:22:53] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
127.0.0.1 - - [17/Nov/2018 13:22:55] "[37mPOST /thanks HTTP/1.1[0m" 200 -
127.0.0.1 - - [17/Nov/2018 13:22:55] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -


In [9]:
# ensure new review(s) added to table
conn = sqlite3.connect("ch09_Flask_Apps/movieClassifier/reviews.sqlite")
c = conn.cursor()
c.execute("SELECT * FROM review_db")
results = c.fetchall()
conn.close()
print(results)

[('I love this movie', 1, '2019-06-30 00:10:58'), ('I disliked this movie', 0, '2019-06-30 00:10:58')]
