# Lab 5: Model Deployment

At the end of this lab, you will learn how to export a trained machine learning model to reuse it in future classification tasks from python and from interactive web applications.

This lab consists on the following tasks:

1. Load and preprocess an example dataset
1. Create a simple machine learning classifier
1. Export a trained ML model to a computer file.
1. Use the saved model to make predictions over unknown inputs
1. Use the model directly from a web platform.

In [None]:
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

## Loading the data

Find the dataset in the relative path with respect to the current file.

As an example, we will classify three variaties of wheat seeds (Kama, Rosa, Canadian) from seven real-value attributes extracted from soft X-ray images:

1. area A,
1. perimeter P,
1. compactness C = $\frac{4*\pi*A}{P^2}$,
1. length of kernel LK,
1. width of kernel WK,
1. asymmetry coefficient A_Coef
1. length of kernel groove LKG.

More dataset info in https://archive.ics.uci.edu/ml/datasets/seeds

In [None]:
# Relative path prefix to be able to find the dataset
dataset_filename = "./datasets/seed_data.csv"
print(dataset_filename)

In [None]:
data = pd.read_csv(dataset_filename)

## Preprocessing

Since our objective is to show how to use the models in production, we are not going to apply any special preprocessing in the dataset, it is important that you apply a proper preprocessing pipeline to your own homework before exporting the model.

In [None]:
data

In [None]:
data.describe()

In [None]:
data.hist(figsize = (10, 8))

In [None]:
# Separate in different variables the features and the Class label
X = data.drop('target', axis=1)
Y = data['target']

## Train a classifier using scikit-learn

In this case we are using a support vector classifier with a linear kernel.

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20, random_state=123, stratify=Y)

In [None]:
support_vector_classifier = SVC(kernel='linear')
support_vector_classifier.fit(X_train, Y_train)

In [None]:
Y_predicted = support_vector_classifier.predict(X_test)

In [None]:
confusion_matrix(Y_test, Y_predicted)

In [None]:
print(classification_report(Y_test, Y_predicted))

# Exporting the model

So far, we have created a classifier that is stored the variable `support_vector_classifier`, which has an accuracy of $90\%$.

To export the trained model, we store a file in the hard drive that is going to use for production.

For this purpose, we often use the package `pickle`, which is included by default in Python installation.

In [None]:
import pickle

In [None]:
# DO NOT MODIFY: Relative path prefix to store the files in the 
FOLDER_PATH = "dami_dsv/model_deployment/"

In [None]:
trained_model_filename = FOLDER_PATH + "trained_model_seeds_dataset.pickle"

In [None]:
# CREATE FILE WITH THE SPECIFIC VARIABLE IN THE SPECIFIED FOLDER
data_to_save = support_vector_classifier
file_path = trained_model_filename

In [None]:
# Creates a binary object and writes the indicated variables
with open(file_path, "wb") as writeFile:
    pickle.dump(data_to_save, writeFile)

# Load the model

After the model has been stored, we can load the pickle file and assign it to a new variable.

In [None]:
# Initially trained model
support_vector_classifier

In [None]:
# Here we will load the same model, but in a variable that is completely empty
loaded_model = None

In [None]:
# The path is exactly the same than when we saved the file, it is written here just for clarification.
trained_model_filename

In [None]:
# Load model
with open(trained_model_filename, "rb") as readFile:
    loaded_model = pickle.load(readFile)

In [None]:
# Verify that the model has the SVC classifier
loaded_model

`THE MODEL WAS LOADED CORRECTLY FROM THE PICKLE FILE!`

## Make predictions with the loaded trained model

We can evaluate the loaded model to see that the performance metrics are the same than at the beginning

In [None]:
Y_predicted_loaded_model = loaded_model.predict(X_test)

In [None]:
confusion_matrix(Y_test, Y_predicted_loaded_model)

In [None]:
print(classification_report(Y_test, Y_predicted_loaded_model))

In [None]:
# Predict a single sample
#  ["Area", "Perimeter", "Compactness", "Length of Kernel", 
#           "Width of Kernel", "Asymmetry Coeff.", "Length Kernel Groove"]
data_to_classify = [12.3, 13.34, 0.8684, 5.243, 2.974, 5.637, 5.063]

In [None]:
colnames = data.columns
colnames = colnames.drop('target').values
print(colnames)

In [None]:
sample = pd.DataFrame(data = [data_to_classify], columns = colnames)
sample

In [None]:
prediction = loaded_model.predict(sample)
print("The predicted class for one sample is:", prediction[0])

# Deploy on a web application

To follow the rest of the lab it is necessary to install the `Dash` package, documentation can be found on https://dash.plotly.com/: `pip install dash==1.13.3`. The specific version (1.13.3) of dash is needed to guarantee compatibility with provided codes.

The next step is to run the following script in the terminal/console:  `python dami_dsv/model_deployment/dash_example_web.py`.

After executing the file, dash will launch a web application in your computer with a message similar to:

``` console
Dash is running on http://127.0.0.1:8050/

 Warning: This is a development server. Do not use app.run_server
 in production, use a production WSGI server like gunicorn instead.

 * Serving Flask app "dash_example_web" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
```

Leave the code running and open a tab in your internet browser. Access the webpage http://127.0.0.1:8050/ (or the one shown in the prompt before) and you will see an interactable web application that uses the classifier model. To close the application, click on the console/terminal and press `Ctrl+C` multiple times.