### A End-to-End Deployment of a Machine Learning Algorithm into a Live Production Environment

In this tutorial, we explore how to deploy a machine learning algorithm into a live production environment so that it could be “consumed” in a platform-agnostic way.  We'll start by using Jupyter Notebooks to develop a machine learning algorithm and progress to make it publicly available as a web application using Voila, GitHub and mybinder.

#### 1. Develop a Machine Learning Algorithm

Our first step is to develop the machine learning algorithm that we want to deploy. In the real world, this step involves many weeks or months of development time and lots of iteration across the stages of the data science pipeline.  For this tutorial, we will develop a basic ML algorithm as the main purpose of this tutorial is to illustrate how to deploy an algorithm for use by “consumers”.

Create a new directory and name it `drug_classification`.  This will be our project home directory.

We will use a drug classification [dataset](https://www.kaggle.com/prathamtripathi/drug-classification) from Kraggle. Having a [CC0: Public Domain](https://creativecommons.org/publicdomain/zero/1.0/) means that it has no copyright and that it may be used in our tutorial with no restrictions. 

Click on `Download` button and save the `archive.zip` in the project home directory created above.  Extract the dataset `drug200.csv` within the `zip` file, into the home directory.

![Download Drug Classification Dataset](./images/download.jpg)


The Python code to develop a predictive machine learning algorithm to classify drug prescriptions given a range of patient parameters is as follows: 

In [6]:
import pandas as pd

from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold, cross_val_score

df_drug = pd.read_csv("./drug200.csv")

# Label encode categorical features
label_encoder = LabelEncoder()

categorical_features = [feature for feature in df_drug.columns if df_drug[feature].dtypes == 'O']
for feature in categorical_features:
    df_drug[feature]=label_encoder.fit_transform(df_drug[feature])

# Drop target feature
X = df_drug.drop("Drug", axis=1)
y = df_drug["Drug"]

model = DecisionTreeClassifier(criterion="entropy")
model.fit(X, y)

kfold = KFold(random_state=42, shuffle=True)
cv_results = cross_val_score(model, X, y, cv=kfold, scoring="accuracy")

print(cv_results.mean(), cv_results.std())

0.99 0.012247448713915901


Here, we have trained a machine learning algorithm to predict drug presriptions and that cross validation (i.e. folding the data) has been used to evaluate the model accuracy at 99%.

In a production environment, we will not want to retrain our model every time a user wanted to predict a drug presription, hence our next step is to preserve the state of our trained model using [`pickle`](https://docs.python.org/3/library/pickle.html#:~:text=%E2%80%9CPickling%E2%80%9D%20is%20the%20process%20whereby,back%20into%20an%20object%20hierarchy.)...

In [7]:
import pickle

pickle_file = open('model.pkl', 'ab')
pickle.dump(model, pickle_file)                     
pickle_file.close()

Now whenever we need to use the trained model, we simply need to reload its state from the `model.pkl` file rather than re-executing the training step.

#### 2. Make an Individual Prediction from the Trained Model

We will assume that consumers of the machine learning algorithm want to make predictions for individual patients rather than a batch of patients.

Those consumers wish to communicate with the algorithm using text-like values for the parameters (i.e. blood pressure is “NORMAL” or “HIGH” rather than their label encoded equivalents of 0 and 1).

Therefore, we will start by reviewing the values for all of the label encoded categorical features used as an input to the algorithm, as well as, the target variable itself.

In [10]:
df_drug = pd.read_csv("drug200.csv")

label_encoder = LabelEncoder()

categorical_features = [feature for feature in df_drug.columns if df_drug[feature].dtypes == 'O']
for feature in categorical_features:
    print(feature, list(df_drug[feature].unique()), list(label_encoder.fit_transform(df_drug[feature].unique())), "\n")

Sex ['F', 'M'] [0, 1] 

BP ['HIGH', 'LOW', 'NORMAL'] [0, 1, 2] 

Cholesterol ['HIGH', 'NORMAL'] [0, 1] 

Drug ['DrugY', 'drugC', 'drugX', 'drugA', 'drugB'] [0, 3, 4, 1, 2] 



Above is a list of each categorical feature with the unique values that appear in the data and the corresponding numerical values as transformed by the `LabelEncoder()`.

With this, we can provide a set of dictionaries that map the text-like values (e.g. “HIGH”, “LOW” etc.) into their encoded equivalents and then develop a simple function to make an individual prediction.

In [12]:
gender_map = {"F": 0, "M": 1}
bp_map = {"HIGH": 0, "LOW": 1, "NORMAL": 2}
cholestol_map = {"HIGH": 0, "NORMAL": 1}
drug_map = {0: "DrugY", 3: "drugC", 4: "drugX", 1: "drugA", 2: "drugB"}

def predict_drug(Age, 
                 Sex, 
                 BP, 
                 Cholesterol, 
                 Na_to_K):

    # 1. Read the machine learning model from its pickled state ...
    pickle_file = open('model.pkl', 'rb')     
    model = pickle.load(pickle_file)
    
    # 2. Transform the "raw" parameters passed into the function to the encoded numerical values using the maps dictionaries
    Sex = gender_map[Sex]
    BP = bp_map[BP]
    Cholesterol = cholestol_map[Cholesterol]

    # 3. Make an individual prediction for this set of data
    y_predict = model.predict([[Age, Sex, BP, Cholesterol, Na_to_K]])[0]

    # 4. Return the "raw" version of the prediction i.e. the actual name of the drug rather than the numerical encoded version
    return drug_map[y_predict] 

We can invoke the function to make some predictions based on values from the original dataset. By comparing the values returned  against the original dataset, we can verify that our implementation is correct.

In [16]:
predict_drug(47, "F", "LOW",  "HIGH", 14)



'drugC'

In [17]:
predict_drug(60, "F", "LOW",  "HIGH", 20)



'DrugY'

> Note that our `predict_drug` function does not re-train the model, rather it 'revives' its saved state by `pickle` from the `model.pkl` file and we can verify that the predictions for drug recommendation are correct.

#### 3. Develop a Web Service Wrapper

A web service is a “wrapper” that accepts requests from clients/consumers using HTTP GET and HTTP PUT methods, invokes the Python code and returns the result as an HTML response.

Clients and consumers only need to be able to formulate HTTP requests and will be able to utilize the web service (in our case, the drug classification prediction service).  Exposing the service this way make it platform-agnostic as nearly all programming languages and environments will have a way of handling a HTTP request and response.

In the Python, there are several different approaches available. In this tutorial, we will use [`flask`](https://flask.palletsprojects.com/en/2.0.x/) to construct our web service wrapper.

Create a new file `service.py` in the project directory.  

Enter the following code for the web service wrapper in `service.py`:

In [None]:
from flask import Flask, request, jsonify

import pickle

app = Flask(__name__)

gender_map = {"F": 0, "M": 1}
bp_map = {"HIGH": 0, "LOW": 1, "NORMAL": 2}
cholesterol_map = {"HIGH": 0, "NORMAL": 1}
drug_map = {0: "DrugY", 3: "drugC", 4: "drugX", 1: "drugA", 2: "drugB"}

def predict_drug(Age, 
                 Sex, 
                 BP, 
                 Cholesterol, 
                 Na_to_K):

    # 1. Read the machine learning model from its pickled state ...
    pickle_file = open('model.pkl', 'rb')     
    model = pickle.load(pickle_file)
    
    # 2. Transform the "raw" parameters passed into the function to the encoded numerical values using the maps dictionaries
    Sex = gender_map[Sex]
    BP = bp_map[BP]
    Cholesterol = cholestol_map[Cholesterol]

    # 3. Make an individual prediction for this set of data
    y_predict = model.predict([[Age, Sex, BP, Cholesterol, Na_to_K]])[0]

    # 4. Return the "raw" version of the prediction i.e. the actual name of the drug rather than the numerical encoded version
    return drug_map[y_predict] 

@app.route("/")
def hello():
    return "A web service for accessing a machine learning model to make drug recommendations."

@app.route('/drug', methods=['GET'])
def api_all():

    Age = int(request.args['Age'])
    Sex = request.args['Sex']
    BP = request.args['BP']
    Cholesterol = request.args['Cholesterol']
    Na_to_K = float(request.args['Na_to_K'])

    drug = predict_drug(Age, Sex, BP, Cholesterol, Na_to_K)

    return(jsonify(recommended_drug = drug))

app.run()

The first part of our service wrapper consist of the label mappings and the `predict_drug` function we have previously defined.

The next part consist of app routings, which is to map a specific URL with the associated function.  In our wrapper, we define 2 app routes: `/` and `/drug`.  To learn more about app routing and how it works in `Flask`, click [here](https://dev.to/emma_donery/python-flask-app-routing-3l57).

In your terminal (from the project directory), run the service wrapper script using the command: `python service.py`.

![Flask running in terminal](./images/service.jpg)

This invokes the `Flask` application and we can proceed to test our web service using one of the following methods:

- Open a web browser and enter: http://127.0.0.1:5000/drug?Age=60&Sex=F&BP=LOW&Cholesterol=HIGH&Na_to_K=20
- Open a terminal and enter: `curl -X GET` "http://127.0.0.1:5000/drug?Age=60&Sex=F&BP=LOW&Cholesterol=HIGH&Na_to_K=20"

Both are ways of making a HTTP GET request to our drug recommendation service wrapper.  Using the `/drug` app route, will invoke the `api_all` function in our wrapper.  We then pass the values of the parameters via a query string i.e. `Age=60&Sex=F&BP=LOW&Cholesterol=HIGH&Na_to_K=20` 

Using the browser method, you will find that our wrapper returns the drug recommendation in a JSON object.

![Drug recommendation returned as JSON object](./images/browser.jpg)

#### 4. Deploy the Web Service to Cloud Platform

Now we have a predictive machine learning model that can predict drug prescriptions with 99% accuracy, a helper function to make individual predictions and a web service wrapper that enables these components to be called from a browser or command line.

However, all of these are being executed in the development environment. The next stage is to deploy everything into the cloud so that clients can “consume” the web service over the public Internet.

There are many different public services available for web app deployment including [Google Cloud Platform](https://cloud.google.com/gcp/), [Amazon Web Services](https://aws.amazon.com/), [Microsoft Azure](https://azure.microsoft.com/en-us/).  For this tutorial, I've chosen to deploy on [Heroku](https://www.heroku.com/), as it didn't require my credit card details.  For an illustration of how to deploy a Flask app on Heroku, please click [here](https://github.com/hakngrow/heroku-boilerplate).

After deploying the Flask app, you'll be able to access the drug prediction web service via a public Heroku subdomain like https://drug-classification.herokuapp.com/drug?Age=60&Sex=F&BP=LOW&Cholesterol=HIGH&Na_to_K=20 

#### 5. Build a Client Application to Consume the Deployed Web Service

Any programming language or environment that can invoke HTTP requests can call the deployed web service with just a few lines of code. Non-Python environments like C#, JavaScript etc. can all be used but I will finish off this project by writing a Python client using `ipywidgets`.

`ipywidgets` is a module that lets us create interactive widgets in Jupyter notebooks. For example, buttons, text boxes, sliders, progress bars, and more. To learn more about it, click [here](https://ipywidgets.readthedocs.io/en/latest/#).

Before we can use widgets in notebooks, we need to install the following 3 modules:
```
pip install widgetsnbextension 
pip install ipywidgets 
pip install voila
```

In the project directory, create a new file named `client.ipynb`. 

Next, we need to enable the widgets and `voila` extensions to be properly displayed on the notebook. Insert a cell in the notebook and enter the following:
```
!jupyter nbextension enable --py widgetsnbextension --sys-prefix
!jupyter serverextension enable voila --sys-prefix
```

![Enable extensions in notebook](./images/extensions.jpg)

After the extensions have been enabled, comment out the commands so that it will not be executed by our web app.

![Commands to enable extensions commented out](./images/extensions-commented.jpg)

Next, insert another cell below and enter the following code:

In [4]:
import requests

from ipywidgets import Label, BoundedFloatText, BoundedIntText, Dropdown, Button, Output, VBox

from IPython.display import display, clear_output

prescribe_label = Label('Drug prescription prediction for age, gender, bp, cholesterol and "Na to K"')
age_text = BoundedIntText(min=16, max=100, value=47, description="Age:", disabled=False)
gender_dropdown = Dropdown(options=['F', 'M'], description='Gender:', disabled=False)
bp_dropdown = Dropdown(options=['HIGH', 'LOW', 'NORMAL'], value="LOW", description='BP:', disabled=False)
cholesterol_dropdown = Dropdown(options=['HIGH', 'NORMAL'], description='Cholesterol:', disabled=False)
na_to_k_text = BoundedFloatText(min=0.0, max=50.0, value=14, description="Na to K", disabled=False)
prescribe_button = Button(description="Presribe")
prescribe_output = Output()

# Button click event handlers ...
def prescribe_button_on_click(b):

    clear_output()
    
    # Remember to change the base URL to that of your platform provider
    # My web service was deployed on Heroku, hence the subdomain drug-classification.herokuapp.com
    request_url = f"https://drug-classification.herokuapp.com/drug?Age={age_text.value}&Sex={gender_dropdown.value}&BP={bp_dropdown.value}&Cholesterol={cholesterol_dropdown.value}&Na_to_K={na_to_k_text.value}"
    
    response = requests.get(request_url)
    recommended_drug = response.json()["recommended_drug"]

    prescribe_output.clear_output()
    with prescribe_output:
        print(f"The recommended drug is {recommended_drug}")
        
prescribe_button.on_click(prescribe_button_on_click)

vbox_prescribe = VBox([prescribe_label, age_text, gender_dropdown, bp_dropdown, cholesterol_dropdown, na_to_k_text, prescribe_button, prescribe_output])

display(vbox_prescribe)


The above code creates all the UI widgets to collect the parameters required to execute our drug recommendation i.e. text box for age, dropdown for gender, etc.

It also defines the event handler of the `Prescribe` button such that when clicked:
- Builds a HTTP request URL that consist of the address of our web service and a query string for the parameters
- Sends a HTTP GET request to our web service and waits for a response
- Receives the HTTP reponse and displays a formatted message (i.e. the recommendation)

Having installed the `voila` package ealier, you should see the `voila` button on the notebook interface.

![voila Button](./images/voila.jpg)

There are two ways to invoke the notebook as a web application, either press the `voila` button or execute the notebook from the terminal using the command: `voila client.ipynb`.  When any of them is executed, a new tab with our web app will appear on your default web browser.

![The Web Application](./images/webapp.jpg)

If you click “Prescribe” button with the default values, the recommendation should be for `drugC`.

Change the `Age` to 60 and `Na to K` to 20 and `DrugY` should be prescribed. Set the `Age` back to 47, `Na to K` back to 14 and change `BP` to “HIGH” and `drugA` should be prescribed.

These simple tests prove that the web service using a decision tree based predictive machine learning algorithm is fully deployed to the cloud, can be called by any development environment capable of making a HTTP GET request and is fully working end-to-end.

#### 5. Host the Web App on Binder

We want the notebook to be displayed as a web app that is hosted on [Binder](http://mybinder.org/), which anyone can access with a URL.  Go to [Binder](http://mybinder.org) and choose the following configurations for `client.ipynb`.

![My Binder Setup](./images/binder.jpg)

- 5.1 Select the `GitHub` platform
- 5.2 Enter the URL of your GitHub repo
- 5.3 Enter the name of the notebook after `/voila/render/`, for our case `/voila/render/client.ipynb`
- 5.4 Select the `URL` option
- 5.5 Click the `Launch` button.


It will take a while to build and launch the web app.  Once it has completed, a new tab in your default browser will launch with the web app.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hakngrow/drug_classification/HEAD?urlpath=%2Fvoila%2Frender%2Fclient.ipynb)

Copy the text for displaying the Binder badge and paste it on the README file. Your web app will be shown when the badge is clicked on the Git repo.



