# Churn Prediction

## Problem Statement

There is a telecom company that offers phone and internet services. There is a problem: some of our customers are churning. We would like to build a model that can identify the customers that are likely to churn. We have collected a dataset about our customers: what type of services they use, how much they paid, and how long they stayed with us. We also know who canceled their contracts and stopped using our services (churned). 

## What's in this section

In this notebook, we are going to select our final model. We will make predictions on test data and report evaluation metrics on the same.

We are then going to save the model `Pickle`. We are then going to create a web service that uses the model to make predictions. We are then going to use `Docker` to package our web service. Then  we can run it on the host machine — laptop (regardless of the OS) or any public cloud provider.

## Imports

In [1]:
# usual imports 
import numpy as np
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
# plt.style.use('seaborn')

from sklearn.metrics import confusion_matrix

from sklearn.model_selection import cross_val_predict, cross_val_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import precision_score, recall_score
from collections import defaultdict
from sklearn.metrics import auc
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import classification_report
# from sklearn.metrics import plot_confusion_matrix
from sklearn.metrics import confusion_matrix

from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from sklearn import tree
from graphviz import Source
from IPython.display import SVG,display

from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction import DictVectorizer
from sklearn.model_selection import KFold

from sklearn.metrics import fbeta_score, make_scorer
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

from sklearn.dummy import DummyClassifier

from sklearn.feature_selection import RFE

# helper functions
from churn_prediction_utils import *

In [2]:
%store -r df_train_full_explore
%store -r df_train_full
%store -r df_train
%store -r df_val
%store -r df_test

%store -r y_train_full
%store -r y_train
%store -r y_val
%store -r y_test

%store -r categorical_features
%store -r numerical_features

%store -r X_train_full_scaled
%store -r dv_full_scaled
%store -r standard_scalar_full_data
%store -r feature_names

%store -r X_train_full_scaled
%store -r X_train_full_not_scaled
%store -r X_train_scaled
%store -r X_train_not_scaled
%store -r X_val_scaled
%store -r X_val_not_scaled
%store -r X_test_scaled
%store -r X_test_not_scaled

%store -r evaluation_metrics
%store -r f_scorer

%store -r phase_one_model_to_evaluation_metrics_df
%store -r model_to_mean_evaluation_metrics_with_smote_df
%store -r baseline_performance_metrics_df
%store -r model_to_evaluation_metrics_with_feature_selection_df
%store -r phase_one_metrics_collector_map
%store -r model_to_evaluation_metrics_with_smote_map

## Final model selection
We will go with logistic regression model with L1 regularization and C value equal to 0.1 as it has highest f1.5 value of 0.688061 on training data.

## Prediction on test set

In [4]:
sm = SMOTE(random_state= 7)
X_train_full_scaled_oversampled, y_train_full_oversampled = sm.fit_sample(X_train_full_scaled, y_train_full)

final_lr_model = LogisticRegression(solver='liblinear', random_state= 42, C= 0.1, penalty= 'l1')
final_lr_model.fit(X_train_full_scaled_oversampled, y_train_full_oversampled)

y_test_proba = final_lr_model.predict_proba(X_test_scaled)
y_test_scores = y_test_proba[:, 1]
y_test_pred = (y_test_scores > 0.5).astype(int)


In [5]:
performance_metrics_on_test_set_map = defaultdict(list)
performance_metrics_on_test_set_map['model_name'] = ['Final Logistic Regression Model']
for metric in evaluation_metrics:
    if metric == 'f1.5':
        metric_value = fbeta_score(y_test, y_test_pred, beta=1.5)
    elif metric == 'f1':
        metric_value = f1_score(y_test, y_test_pred)
    elif metric == 'roc_auc':
        metric_value = roc_auc_score(y_test, y_test_scores)
    elif metric == 'recall':
        metric_value = recall_score(y_test, y_test_pred)
    elif metric == 'precision':
        metric_value = precision_score(y_test, y_test_pred)
    elif metric == 'accuracy':
        metric_value = accuracy_score(y_test, y_test_pred)
    performance_metrics_on_test_set_map[metric].append(metric_value)

pd.DataFrame(performance_metrics_on_test_set_map)  

Unnamed: 0,model_name,f1.5,f1,roc_auc,recall,precision,accuracy
0,Final Logistic Regression Model,0.691285,0.626768,0.858719,0.827586,0.504378,0.756565


Final model has `F-beta` score of 0.691285.

##  Using Pickle to save the model

In [6]:
import pickle
 
with open('churn-model-development.bin', 'wb') as f_out:
    pickle.dump((dv_full_scaled, final_lr_model), f_out)

## Using Flask and Docker 

First lets run docker container:

`docker run -it -p 9696:9696 churn-prediction:latest`

In [21]:
customer = {
    'customerid': '879-zkjof',
    'gender': 'male',
    'seniorcitizen': 0,
    'partner': 'no',
    'dependents': 'no',
    'tenure': 41,
    'phoneservice': 'yes',
    'multiplelines': 'no',
    'internetservice': 'dsl',
    'onlinesecurity': 'yes',
    'onlinebackup': 'no',
    'deviceprotection': 'no',
    'techsupport': 'yes',
    'streamingtv': 'yes',
    'streamingmovies': 'no',
    'contract': 'one_year',
    'paperlessbilling': 'no',
    'paymentmethod': 'bank_transfer_(automatic)',
    'monthlycharges': 79.85,
    'totalcharges': 320.75,
}


import requests
url = 'http://localhost:9696/predict'
response = requests.post(url, json=customer)
result = response.json()

In [22]:
result

{'churn': False, 'churn_probability': 0.11721333734732876}