# Deployment of Machine Learning Models

We are going to use the churn prediction model from chapters 4 and 5. To make things easier, we are going to reuse the notebook from chapter 04, adding a section concerning the deployment of the model to web service. 


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [56]:
df = pd.read_csv('data/WA_Fn-UseC_-Telco-Customer-Churn.csv')

In [57]:
df.TotalCharges = pd.to_numeric(df.TotalCharges, errors='coerce')
df.TotalCharges = df.TotalCharges.fillna(0)

df.columns = df.columns.str.lower().str.replace(' ', '_')
string_columns = list(df.dtypes[df.dtypes == 'object'].index)

for col in string_columns:
    df[col] = df[col].str.lower().str.replace(' ', '_')

df.churn = (df.churn == 'yes').astype(int)

### Creating train and test sets

In [8]:
from sklearn.model_selection import train_test_split

#Create a train and a test set
df_train_full, df_test = train_test_split(df, test_size=0.2, random_state=1)

# Divide the test set into traind and validation
df_train, df_val = train_test_split(df_train_full, test_size=0.33, random_state=11)

# Create the target arrays and remove the target columns from the train and validation datasets
y_train = df_train.churn.values #B
y_val = df_val.churn.values #B
del df_train['churn'] #C
del df_val['churn'] #C

In [11]:
categorical = ['gender', 'seniorcitizen', 'partner', 'dependents',
               'phoneservice', 'multiplelines', 'internetservice',
               'onlinesecurity', 'onlinebackup', 'deviceprotection',
               'techsupport', 'streamingtv', 'streamingmovies',
               'contract', 'paperlessbilling', 'paymentmethod']

numerical = ['tenure', 'monthlycharges', 'totalcharges']

In [60]:
from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression

def train(df, y, C=1.0):
    cat = df[categorical + numerical].to_dict(orient='records')
    
    dv = DictVectorizer(sparse=False)
    dv.fit(cat)

    X = dv.transform(cat)

    model = LogisticRegression(solver='liblinear', C=C)
    model.fit(X, y)

    return dv, model


def predict(df, dv, model):
    cat = df[categorical + numerical].to_dict(orient='records')
    
    X = dv.transform(cat)

    y_pred = model.predict_proba(X)[:, 1]

    return y_pred

In [62]:
from sklearn.metrics import roc_auc_score

y_train = df_train_full.churn.values
y_test = df_test.churn.values

dv, model = train(df_train_full, y_train, C=0.5)
y_pred = predict(df_test, dv, model)

auc = roc_auc_score(y_test, y_pred)
print('AUC = %.3f' % auc)

AUC = 0.858


## Model Deployment

Before dealing with model deployment, we are going to create a function that uses our model to predict the probability of churning for a single customer.

In [32]:
def predict_single(customer, dv, model):
    X = dv.transform([customer])
    y_pred = model.predict_proba(X)[:, 1]
    return y_pred[0]

Let's use the following customer as an example:

In [33]:
customer = {
'customerid': '8879-zkjof',
'gender': 'female',
'seniorcitizen': 0,
'partner': 'no',
'dependents': 'no',
'tenure': 41,
'phoneservice': 'yes',
'multiplelines': 'no',
'internetservice': 'dsl',
'onlinesecurity': 'yes',
'onlinebackup': 'no',
'deviceprotection': 'yes',
'techsupport': 'yes',
'streamingtv': 'yes',
'streamingmovies': 'yes',
'contract': 'one_year',
'paperlessbilling': 'yes',
'paymentmethod': 'bank_transfer_(automatic)',
'monthlycharges': 79.85,
'totalcharges': 3320.75,
}

predict_single(customer, dv, model)

0.06349852904587676

### Saving the model
To use this model in different files, we need to save it. We are going to do that using ```pickle```.

In [34]:
import pickle

with open('churn-model.bin', 'wb') as f_out:
    pickle.dump((dv, model), f_out)

Now, we have to create a Flask app to test the model. It was created on the ```chur_serving.py``` script. Let's send a request to it with our customer.

In [46]:
import requests

url = 'http://localhost:9696/predict'
response = requests.post(url, json=customer)
result = response.json()
print(result)

{'churn': False, 'churn_probability': 0.06349852904587676}


The ```result``` variable contains the response from the churn service.

Now, let's test the service that was deployed to AWS:

In [53]:
host = '' # Insert here the AWS service URL
url = 'http://%s/predict' % host
response = requests.post(url, json=customer)
result = response.json()
result

{'churn': False, 'churn_probability': 0.06349852904587676}

Again, we have the same result, but this time the service was deployed inside a container on AWS Elastic Beanstalk. To reach it, we used its public URL.