<a href="https://colab.research.google.com/github/patternproject/p.FlaskDemo/blob/master/flaskDataMode_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deploying a Machine Learning Model
(building on flask API)

Now that we have seen how to build a web API with Flask, we can finally expose our machine learning model via an endpoint. But we need to train a model first. Let's build a classifier with RandomForest algorithm with the Bank Marketing dataset

In [0]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

Then we will load the dataset into a DataFrame

In [0]:
file_url = 'https://raw.githubusercontent.com/PacktWorkshops/The-Data-Science-Workshop/master/Chapter03/bank-full.csv'
df = pd.read_csv(file_url, sep=';')

Then we will extract the response variable, which is the y column in this dataset, using the .pop() method from pandas:

In [0]:
y = df.pop('y')

After this, we need to one-hot encode the categorical variables using the .get_dummies() method:

In [0]:
df_dummies = pd.get_dummies(df)

The final step before modeling is to split the data into training and testing sets. To do so, we will use the train_test_split() function from sklearn:

In [0]:
X_train, X_test, y_train, y_test = train_test_split(df_dummies, y, test_size=0.33, random_state=42)

Now we can train our RandomForest algorithm.

In [6]:
rf_model = RandomForestClassifier(random_state=8)
rf_model.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=8, verbose=0,
                       warm_start=False)

We can make predictions on the test set using the .predict() method:

In [7]:
rf_model.predict(X_test)

array(['no', 'no', 'no', ..., 'no', 'no', 'yes'], dtype=object)

We can also predict the outcome on a single record from the test set. sklearn models expect a 2-dimensional array as input, so we need to wrap our record into another list:

In [8]:
rf_model.predict([X_test.iloc[3776,]])

array(['no'], dtype=object)

Before adding our model to the Flask app, we need to save it as a file. We will use the .dump() method from the joblib package:

In [9]:
import joblib
joblib.dump(rf_model, "model.pkl")

['model.pkl']

Your model is saved on the filesystem, and the filename is model.pkl. To load this model, we can use the .load() method:

In [0]:
saved_model = joblib.load("model.pkl")

We can now use it to make predictions:

In [11]:
saved_model.predict([X_test.iloc[3776,]])

array(['no'], dtype=object)

Now we can create a new API endpoint called /predict that will predict the outcome using this model on the data it receives as input. Within the API function, we need to read the input data, perform the prediction with our pre-loaded model, convert the prediction into a string using the array2string method from numpy, and finally convert it to JSON using jsonify():

In [0]:
import socket
import threading
import requests
import json
from flask import Flask, jsonify, request

In [13]:
ip_address = socket.gethostbyname(socket.gethostname())
ip_address

'172.28.0.2'

In [0]:
app = Flask(__name__)

In [0]:
import numpy as np

@app.route('/predict', methods=['POST'])
def rf_predict():
  data = request.get_json()
  prediction = saved_model.predict(data)
  str_pred = np.array2string(prediction)
  return jsonify(str_pred)

Now we need to send a POST request with the record we want to get prediction from. We will use the same example as previously: record number 3776. First, we need to convert it into a list by using the .to_list() method from pandas:

In [16]:
record = X_test.iloc[3776,].to_list()
record

[36,
 229,
 28,
 258,
 2,
 -1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1]

Then we will transform it into JSON. sklearn models expect as input a 2-dimensional array, so we need to wrap our input list into another list before calling json.dumps():

In [0]:
j_data = json.dumps([record])

In [0]:
headers = {'content-type': 'application/json', 'Accept-Charset': 'UTF-8'}


Finally, we can send a POST request with this converted record:

In [19]:
r = requests.post("http://172.28.0.2/predict", data=j_data, headers=headers)
r.text

ConnectionError: ignored

Great! We got the exact same prediction as before, but this time we got it from our model deployed as a Flask app. As you can see, it is relatively simple to expose a machine learning algorithm as a web API.