## 1. Introduction

In this notebook we examine the Telco Customer Churn dataset and then we build a model that can predict if a customer left the bank within the last month. We start with finding feature types, missing values and we continue with feature analysis and visualization of the data. Feature engineering is implemented to create new attributes, encoding and feature selection. At last we test several classifiers and we evaluate them with the help of the ROC and CAP curves.

#### Data Dictionary
- Customers who left within the last month: Churn
- Services that each customer has signed up for: PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies;  
- Customer account information: Tenure, Contract, PaymentMethod, PaperlessBilling, MonthlyCharges, TotalCharges
- Demographic info about customers: Gender, SeniorCitizen, Partner, Dependents   

#### Structure
1. Introduction
2. Data Profiling
3. Feature Analysis (Visualization)
4. Feature Engineering (Visualization)
5. Feature Engineering (Encoding)
6. Evaluation - Selection

#### Goal
The goal is to prefict the behavior to retain customers.

#### P.S. 
Feel free to comment if you have any question, something to note or suggest about this notebook. It will only make us better!  

## 2. Data Profiling

In [None]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

In [None]:
# Importing the dataset
df_Train = pd.read_csv('TelcoCustomerChurnDataset.csv')

# Dataset Information
df_Train.info()

There are 20 columns in the dataset with the below dtypes: 
float64(1): MonthlyCharges                     
int64(2): SeniorCitizen, tenure                  
object(18): gender, Partner, Dependents, PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, PaperlessBilling, PaymentMethod, TotalCharges, Churn

We do not include customerID feature.

TotalCharges should be dtype: float64 as MonthlyCharges but instead its dtype: object

In [None]:
df_Train['TotalCharges']

We convert TotalCharges from object to float

In [None]:
# Convert column TotalCharger from object to float
df_Train['TotalCharges'] = df_Train['TotalCharges'].apply(pd.to_numeric)

In [None]:
# Missing Values
df_Train.isnull().sum()

There are 11 missing values in TotalCharges column that we will handle later

In [None]:
# First DataFrame rows
df_Train.head(10)

- Features
- Categorical: Binary: 'gender','Partner','Dependents','PhoneService','PaperlessBilling','Churn'
             Nominal: 'MultipleLines','InternetService','OnlineSecurity','OnlineBackup','DeviceProtection',
                      'TechSupport','StreamingMovies','StreamingTV','Contract','PaymentMethod'           
- Numerical: Discrete: 'SeniorCitizen'
           Continuous: 'tenure','MonthlyCharges','TotalCharges'

In [None]:
# Describing The Data
df_Train.describe(include = 'all')

- tenure: Min = 0, Max = 72, Avg = 32.4
- MonthlyCharges: Min = 18.25, Max = 118.75, Avg = 64.76
- TotalCharges: Min = 18.8, Max = 8684.8, Avg = 2283.3

We notice that there are customers that they have tenure = 0. Probably that means that these customers made a contrtact with the company during the last month so their tenure is < 1 month.

## 3. Feature analysis (Visualization)

In [None]:
def autolabel(patches,ax,mode):
    if mode == 'percentage':
        """Display Percentage"""
        for j in range(len(patches)):
            rects = patches[j]
            height = rects.get_height()
            percentage = '{:.1f}%'.format(rects.get_height())       
            ax.annotate(percentage,
                        xy=(rects.get_x() + rects.get_width() / 2, height),
                        xytext=(0, 0.5),
                        textcoords="offset points",
                        ha='center', va='bottom')            
    elif mode == 'count':
        """Display Count"""
        for j in range(len(patches)):
            rects = patches[j]
            height = rects.get_height().astype('int')   
            height = height if height >= 0 else -1 # To avoid error
            ax.annotate(height,
                        xy=(rects.get_x() + rects.get_width() / 2, height),
                        xytext=(0, 0.5),
                        textcoords="offset points",
                        ha='center', va='bottom')         
               
def autoplot(X,hue,data,colors):
    fig, ax = plt.subplots(1,2,figsize=(15, 10))
    
    plt.subplot(1,2,1)
    ax[0] = sns.barplot(x=X.value_counts().index,
                        y=(X.value_counts()/len(X))*100,
                        data=data,palette='Blues_d')    
    ax[0].set_xlabel(X.name,fontsize=13)
    ax[0].set_ylabel("Percentage",fontsize=13)
    autolabel(ax[0].patches,ax[0],'percentage')
    
    plt.subplot(1,2,2)
    ax[1] = sns.countplot(x=X,hue=hue,data=df_Train,palette=colors,order = X.value_counts().index)   
    ax[1].set_ylabel("Number of Occurrences",fontsize=13)
    ax[1].set_xlabel(X.name,fontsize=13)
    autolabel(ax[1].patches,ax[1],'count')   
    
# Constants that we will use later
colors1 =['#C03028','#78C850']#Churn: No/Yes

We used the functions above to auto plot some features with annotations (percentages, counts).

### Categorial Features

In [None]:
# Churn
Churn = pd.crosstab(df_Train['Churn'],df_Train['Churn']).sum()
fig, ax = plt.subplots(figsize=(5, 5))
ax.pie(Churn, labels=Churn.index, autopct='%1.1f%%',colors=colors1)
plt.legend(title='Churn',fontsize=10,title_fontsize=10)

26.5% left the company within the last month and 73.5% stayed. This is a case with imbalanced data.

In [None]:
# Gender
autoplot(df_Train['gender'],df_Train['Churn'],df_Train,colors1)
pd.crosstab(df_Train['gender'], df_Train['Churn']).apply(lambda r: r/r.sum(),axis=1)

50.5% are men and 49.5% are women. Churning rates for men and women are similar 26.1% and 26.9% resectively. We can say that gender might not have big importance in our model.

In [None]:
# PhoneService-MultipleLines-InternetService
IVs = ['PhoneService','MultipleLines','InternetService']
for i in range(len(IVs)):    
    autoplot(df_Train[IVs[i]],df_Train['Churn'],df_Train,colors1)

In [None]:
for i in range(len(IVs)):    
    print(pd.crosstab(df_Train[IVs[i]], df_Train['Churn']).apply(lambda r: r/r.sum(), axis=1))
    print('\n')

- The Churning percentages for customers having a PhoneService (90.3%) or not (9.7%) are similar, 24.9% and 26.7% respectively. PhoneService feature seems not to have a big importance for our model.

- 42.2% of the customers do not have MultipleLines and 48.1% have. Churning percentages are similar 25% and 28.6%. Also the information of having a PhoneService or not is included in MultipleLines feature.

- 21.7% have not Internet Service and most of them stayed in the company, only 0.07% of the customers left. For those that had Fiber Optic as an Internet Service 41.8% Churned.

In [None]:
# OnlineSecurity-OnlineBackup-DeviceProtection-TechSupport-StreamingTV-StreamingMovies
IVs = ['OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies']
for i in range(len(IVs)):
    autoplot(df_Train[IVs[i]],df_Train['Churn'],df_Train,colors1)

In [None]:
for i in range(len(IVs)):
    print(pd.crosstab(df_Train[IVs[i]], df_Train['Churn']).apply(lambda r: r/r.sum(), axis=1))
    print('\n')

- As mentioned before, 21.7% have not Internet Service so no Online Services as well and most of them stayed in the company, only 0.07% left.

- Features: 'OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport'
- 49.7% has not 'OnlineSecurity' and 41.7% left the company.
- 43.8% has not 'OnlineBackup' and 39.9% left the company.
- 43.9% has not 'DeviceProtection' and 39.1% left the company.
- 49.3% has not 'TechSupport' and 41.6% left the company.

- Features: 'StreamingTV','StreamingMovies'
- The percentages of having or not these Online Services are similar.
- 1/3 of the customers left the company despite having or not 'StreamingTV' and 'StreamingMovies'

- We can say in general that, customers having an Internet Service but not Online Services tend to leave.

In [None]:
#SeniorCitizen-Partner-Dependents
IVs = ['SeniorCitizen','Partner','Dependents']
for i in range(len(IVs)):
    autoplot(df_Train[IVs[i]],df_Train['Churn'],df_Train,colors1)

In [None]:
for i in range(len(IVs)):
    print(pd.crosstab(df_Train[IVs[i]], df_Train['Churn']).apply(lambda r: r/r.sum(), axis=1))
    print('\n')

- Only 16.2% are Senior Citizens but 41.6% left the company in comparison with the non Senior Sitizens where 23.6% Churned
- In comparison, 23.6% of the non Senior Citizens customers left the company.

- Customers with and without Partner have similar percentages 48.3% and 51.7% respectively but 32.9% without a Partner Churned and 19.6% with a Partner Churned.

- 70% has no Dependents - 31% of these customers Churned while 15.4% of the customers with Dependents Chruned.

- Being a Senior Citizen, not having a Partner or Dependents increase the chance for a customer to leave the company.

In [None]:
# Contract-PaperlessBilling-PaymentMethod
IVs = ['Contract','PaperlessBilling','PaymentMethod']
df_Train['PaymentMethod'] = df_Train['PaymentMethod'].replace({'Bank transfer (automatic)':'Bank transfer Auto',
                                                               'Credit card (automatic)':'Credit card Auto'})
for i in range(len(IVs)):    
    autoplot(df_Train[IVs[i]],df_Train['Churn'],df_Train,colors1)

In [None]:
for i in range(len(IVs)):    
    print(pd.crosstab(df_Train[IVs[i]], df_Train['Churn']).apply(lambda r: r/r.sum(), axis=1))
    print('\n')

- 55% has Month to Month contract as well as the highest percentage of Churning 42.7%.

- 59.2% has PaperlessBilling and 33.5% of these customers Churned compared to 16.3% for those who do not have PaperlessBilling.

- 45.2% of the customers using Electronic Check as a Payment Methon Churned. 

- Having a Month to Month contract, PaperlessBilling and Electronic Check as a Payment Method increase the chance for a customer to leave the company.

### Numerical Features

In [None]:
# Tenure
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(15, 10),sharex=True)
sns.distplot(df_Train['tenure'], ax=ax1)
sns.boxplot(df_Train['tenure'], ax=ax2)
print('Mean Tenure = %0.2f\nMedian Tenure = %0.2f' % (df_Train['tenure'].mean(),df_Train['tenure'].median()))

There are two peaks in the Tenure feature, one for customers with low tenure and one for customers with high tenure.

In [None]:
# Tenure - Churn
ax = sns.FacetGrid(df_Train, hue='Churn',palette=colors1,aspect=2,height=5)
ax = ax.map(sns.kdeplot, "tenure",shade= True)
ax.fig.legend(title='Churn',fontsize=12,title_fontsize=12)    
    
fig, ax = plt.subplots()
ax = sns.boxplot(x='Churn', y='tenure', data=df_Train)

T_0 = df_Train['tenure'][df_Train['Churn'] == 'No'].mean()
T_1 = df_Train['tenure'][df_Train['Churn'] == 'Yes'].mean()
print('Mean Tenure No Churn: %0.1f \nMean Tenure Churn: %0.1f' % (T_0,T_1))

The graphs show that Churning customers have a low Tenure with mean Tenure = 18 months. As the tenure increases customers tend to stay in the company.

In [None]:
# MonthlyCharges
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(15, 10),sharex=True)
sns.distplot(df_Train['MonthlyCharges'], ax=ax1)
sns.boxplot(df_Train['MonthlyCharges'], ax=ax2)
print('Mean MonthlyCharges = %0.2f\nMedian MonthlyCharges = %0.2f' % (df_Train['MonthlyCharges'].mean(),df_Train['MonthlyCharges'].median()))

In [None]:
# MonthlyCharges - Churn
ax = sns.FacetGrid(df_Train, hue='Churn',palette=colors1,aspect=2,height=5)
ax = ax.map(sns.kdeplot, "MonthlyCharges",shade= True)
ax.fig.legend(title='Churn',fontsize=12,title_fontsize=12)    
    
fig, ax = plt.subplots()
ax = sns.boxplot(x='Churn', y='MonthlyCharges', data=df_Train)

M_0 = df_Train['MonthlyCharges'][df_Train['Churn'] == 'No'].mean()
M_1 = df_Train['MonthlyCharges'][df_Train['Churn'] == 'Yes'].mean()
print('Mean MonthlyCharges No Churn: %0.1f \nMean MonthlyCharges Churn: %0.1f' % (M_0,M_1))

From the two highest peaks in the graphs we can say that customers with higher MonthlyCharges tend to leave the company.

In [None]:
# TotalCharges
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(15, 10),sharex=True)
sns.distplot(df_Train['TotalCharges'], ax=ax1)
sns.boxplot(df_Train['TotalCharges'], ax=ax2)
print('Mean TotalCharges = %0.2f\nMedian TotalCharges = %0.2f' % (df_Train['TotalCharges'].mean(),df_Train['TotalCharges'].median()))

In [None]:
# TotalCharges - Churn
ax = sns.FacetGrid(df_Train, hue='Churn',palette=colors1,aspect=2,height=5)
ax = ax.map(sns.kdeplot, "TotalCharges",shade= True)
ax.fig.legend(title='Churn',fontsize=12,title_fontsize=12)    
    
fig, ax = plt.subplots()
ax = sns.boxplot(x='Churn', y='TotalCharges', data=df_Train)

TC_0 = df_Train['TotalCharges'][df_Train['Churn'] == 'No'].mean()
TC_1 = df_Train['TotalCharges'][df_Train['Churn'] == 'Yes'].mean()
print('Mean TotalCharges No Churn: %0.1f \nMean TotalCharges Churn: %0.1f' % (TC_0,TC_1))

Many customers with lower TotalCharges left the company. This seems odd but maybe these low TotalCharges are really high for customers with a low tenure and that lead to their decision to leave the company.

## 4. Feature Engineering ( Visualization )

In [None]:
# Online Services 
IVs = ['OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies']

OnlineServices = df_Train[IVs].replace({'No internet service':2,'No': 0, 'Yes': 1})
df_Train['OnlineServices'] = OnlineServices.sum(axis=1)
df_Train['OnlineServices'] = df_Train['OnlineServices'].replace({12:'No Int. Service'})

autoplot(df_Train['OnlineServices'],df_Train['Churn'],df_Train,colors1)
pd.crosstab(df_Train['OnlineServices'], df_Train['Churn']).apply(lambda r: r/r.sum(), axis=1)

There are 6 Online Services and those customers with no Internet Service. Looking at the Churning percentages we can say that as the number of Online Services, increases the number of Churned customers decreases. From 52.2% Churned customers with Internet Service and 0 Online Services to 0.05% Churned customers with Internet Service and 6 Online Services.

In [None]:
# MonthChTenure = MonthlyCharges * Tenure 
# MonthChTenure - TotalCharges
df_Train['MonthChTenure'] = df_Train['MonthlyCharges']*df_Train['tenure']

fig, (ax1,ax2) = plt.subplots(2,1,figsize=(15, 10),sharex=True)
sns.distplot(df_Train['TotalCharges'], ax=ax1)
sns.distplot(df_Train['MonthChTenure'], ax=ax2)

In [None]:
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(15, 10),sharex=True)
sns.boxplot(df_Train['TotalCharges'], ax=ax1)
sns.boxplot(df_Train['MonthChTenure'], ax=ax2)

We created the feature MonthChTenure by multiplying the Tenure feature with the MonthlyCharges feature. Then we compared it with the TotalCharges feature and we found that they are identical. So, we conclude that the TotalCharges feature includes the information of the Tenure and MonthlyCharges features.

In [None]:
# TotalCharges: Delete missing values
df_Train[df_Train['TotalCharges'].isnull()].loc[:,('MonthlyCharges','tenure')]

TotalCharges feature has 11 missing values as we found earlier in this notebook. These missing values correspond to the customers with a tenure of 0 months. That probably means that these customers subscribed with the company during the last month and there is no overall information of their TotalCharges since they are less than 1 month in the company. We believe that these customers do not provide solid information that can be used in our model so we delete theses rows from our dataset. 

In [None]:
df_Train = df_Train.drop(df_Train['MonthlyCharges'][df_Train['TotalCharges'].isnull()].index)

## 5. Feature Engineering ( Encoding )

In [None]:
# Dataset split to Categorical (Nominal,Binary) and Numeric Vars
df_Cat_Bin = df_Train[['gender','Partner','SeniorCitizen','Dependents','PhoneService','PaperlessBilling']].iloc[:]
df_Cat_Nom = df_Train[['MultipleLines','InternetService','OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingMovies','StreamingTV','Contract','PaymentMethod']].iloc[:]
df_Num = df_Train[['tenure','MonthlyCharges','TotalCharges']].iloc[:]

# Categorical Output
y = df_Train['Churn'].iloc[:]

In [None]:
# LABEL ENCODING - ONE HOT ENCODING

# Categorical Binary Features Encoding
from sklearn.preprocessing import LabelEncoder
df_Cat_Bin_Ld = df_Cat_Bin.apply(LabelEncoder().fit_transform)

# Categorical Nominal Features Encoding
df_Cat_Nom_OHEd = pd.get_dummies(df_Cat_Nom)

# All Categorical Features
df_Cat = pd.concat([df_Cat_Bin_Ld,df_Cat_Nom_OHEd],axis=1)

# Categorical Outpout Encoding
y_Ld = y.replace({'No': 0, 'Yes': 1})

# ALL the Selected IVs
X = pd.concat([df_Num,df_Cat],axis=1)
columns=X.columns
X.head()

In [None]:
# Correlation Matrix
plt.figure(figsize=(20, 20))
corr = X.corr()
sns.heatmap(corr, xticklabels=corr.columns,yticklabels=corr.columns,cmap = "coolwarm",annot=True,annot_kws = {'size': 6})
plt.title("Correlation")
plt.show()

In [None]:
X = X.drop(columns=['OnlineSecurity_No internet service','OnlineBackup_No internet service',
                    'DeviceProtection_No internet service','TechSupport_No internet service',
                    'StreamingMovies_No internet service','StreamingTV_No internet service'])

X = X.drop(columns=['PhoneService'])

X = X.drop(columns=['TotalCharges'])

- There is no need to keep the features with No Internet Service since they correspond to the same feature. So, we can only keep the InternetService_No and drop the others.

- We also drop PhoneService feature since its informtion is included to the MultipleLines feature.

- We drop TotalCharges feature since its info corresponds to the MonthlyCharges and Tenure features. We coud also drop the last two instead but deleting TotalCharges gave us better results.

In [None]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y_Ld, test_size = 0.2, random_state = 0)

In [None]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
# Choosing Classifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier,AdaBoostClassifier

In [None]:
# Used GridSearchCV for parameter tuning
CF = [None]*9
Names = ['Logistic Regression','SVM linear','SVM rbf','Naive Bayes','kNN','Decision Tree','Random Forest','Gradient Boosting','Ada Boost']
CF[0] = LogisticRegression(solver='newton-cg')
CF[1] = SVC(kernel = 'linear', random_state = 0,probability=True)
CF[2] = SVC(kernel = 'rbf', random_state = 0,probability=True)
CF[3] = GaussianNB()
CF[4] = KNeighborsClassifier(n_neighbors=20,metric='minkowski')
CF[5] = DecisionTreeClassifier(max_depth=5,min_samples_leaf=2,random_state = 0)
CF[6] = RandomForestClassifier(n_estimators=150,min_samples_split=4,max_depth=9,min_samples_leaf=2,random_state = 0)
CF[7] = GradientBoostingClassifier(loss='exponential',min_samples_leaf=2,learning_rate=0.05,random_state = 0)
CF[8] = AdaBoostClassifier(random_state = 0)

We first used GridSearchCV to tune some of hyperparameters.

In [None]:
# Classification Metrics
Classifiers = ['Logistic Regression','SVM linear','SVM rbf','Naive Bayes','k-NN','Decision Tree','Random Forest','Gradient Boosting','Ada Boost']
Cols = ['Accuracy','Recall','Precision','f1 score','AUC ROC score']
Scores = pd.DataFrame(index=Classifiers,columns=Cols).astype('float')
for i in range(len(CF)):
    classifier = CF[i]
    classifier.fit(X_train, y_train)
    c_probs = classifier.predict_proba(X_test)
    c_probs = c_probs[:, 1]
    
    y_pred = classifier.predict(X_test)
    
    from sklearn.metrics import accuracy_score,recall_score,precision_score,f1_score,roc_auc_score
    Scores.Accuracy[i] = accuracy_score(y_test,y_pred)
    Scores.Recall[i] = recall_score(y_test,y_pred)
    Scores.Precision[i] = precision_score(y_test,y_pred)
    Scores['f1 score'][i] = f1_score(y_test,y_pred)
    Scores['AUC ROC score'][i] = roc_auc_score(y_test,c_probs)
    
print(Scores)

In [None]:
# Feature Importance plots
columns=X.columns
fig = plt.figure(figsize=(15,15))
fig.subplots_adjust(hspace=0.3, wspace=0.3)
for i in range(4):
    plt.subplot(2, 2, i+1)
    classifier = CF[i+5]
    classifier.fit(X_train, y_train)     

    FImportances = pd.DataFrame(data=classifier.feature_importances_,index=columns,columns=['Importance']).sort_values(by=['Importance'])
    plt.barh(range(FImportances.shape[0]),FImportances['Importance'],color = '#78C850')
    plt.yticks(range(FImportances.shape[0]), FImportances.index)
    plt.title('Feature Importances: %s' % (Names[i+5]))

From the Feature Importances plots we can say that the type of contract, MonthlyCharges, tenure and InternetService play a key role in the customer's decision to leave or not the company.

## 6. Evaluation

#### Receiver Operating Characteristic (ROC) Curve
The Receiver Operating Characteristic Curve, better known as the ROC Curve, is an excellent method for measuring the performance of a Classification model. It tells how much model is capable of distinguishing between classes. The True Positive Rate (TPR) is plot against False Positive Rate (FPR) for the probabilities of the classifier predictions. Then, the area under the plot is calculated.

In [None]:
# ROC - Curves for models
fig = plt.figure(figsize=(15,15))
fig.subplots_adjust(hspace=0.3, wspace=0.3)    
for i in range(len(CF)):
    plt.subplot(3, 3, i+1)

    classifier = CF[i]
    classifier.fit(X_train, y_train)  
     
    # Predict probabilities
    r_probs = [0 for _ in range(len(y_test))]
    c_probs = classifier.predict_proba(X_test)

    # Keep probabilities for the positive outcome only
    c_probs = c_probs[:, 1]

    # Calculate AUROC
    from sklearn.metrics import roc_curve, roc_auc_score, auc
    r_auc = roc_auc_score(y_test, r_probs)
    c_auc = roc_auc_score(y_test, c_probs)

    # Calculate ROC curve
    r_fpr, r_tpr, _ = roc_curve(y_test, r_probs)
    c_fpr, c_tpr, _ = roc_curve(y_test, c_probs)
    plt.plot(r_fpr, r_tpr, linestyle='--',c='r', label='Random Prediction (AUROC = %0.3f)' % r_auc)
    plt.plot(c_fpr, c_tpr, marker='.',c='b', label='%s (AUROC = %0.3f)' % (Names[i],c_auc))

    plt.title('ROC Plot')
    plt.xlabel('False Positive Rate - 1 - Specificity')
    plt.ylabel('True Positive Rate - Sensitivity')
    plt.legend(fontsize='small')

The highest the AUC, the better the model is at distinguishing between customer Churn or not. Random Forest Classifier and Gradient Boosting have the highest AUC both with 0.843.

#### Cumulative Accuracy Profile (CAP) Curve
The CAP Curve tries to analyse how to effectively identify all data points of a given class using minimum number of tries.

In [None]:
# Cap Curve
fig = plt.figure(figsize=(15,15))
fig.subplots_adjust(hspace=0.3, wspace=0.3)    
for i in range(len(CF)):
    plt.subplot(3, 3, i+1)
    
    total = len(y_test)
    class_1_count = np.sum(y_test)
    class_0_count = total - class_1_count

    plt.plot([0, total], [0, class_1_count], c = 'r', linestyle = '--', label = 'Random Model')

    plt.plot([0, class_1_count, total], 
             [0, class_1_count, class_1_count], 
             c = 'grey', linewidth = 2, label = 'Perfect Model')

    classifier = CF[i]
    classifier.fit(X_train, y_train)  
    c_probs = classifier.predict_proba(X_test)

    # Keep probabilities for the positive outcome only
    c_probs = c_probs[:, 1]

    model_y = [y for _, y in sorted(zip(c_probs, y_test), reverse = True)]
    y_values = np.append([0], np.cumsum(model_y))
    x_values = np.arange(0, total + 1)

    from sklearn.metrics import auc
    # Area under Random Model
    a = auc([0, total], [0, class_1_count])

    # Area between Perfect and Random Model
    aP = auc([0, class_1_count, total], [0, class_1_count, class_1_count]) - a

    # Area between Trained and Random Model
    aR = auc(x_values, y_values) - a

    AR = aR / aP

    plt.plot(x_values, y_values, c = 'g', label = '%s (AR = %0.3f)' % (Names[i],AR), linewidth = 4)

    # Plot information
    plt.xlabel('Total observations')
    plt.ylabel('Class 1 observations')
    plt.title('Cumulative Accuracy Profile')
    plt.legend(fontsize='small')

Decision Tree has the highest AUC with 0.688 and then Random Forest and Gradient Boosting follow with 0.686 and 0.687 respectively.

In [None]:
# Average values
Scores_avg = np.average(Scores,axis=0)
print('The avg accuracy is = %.2f' % Scores_avg[0])
print('The avg recall is = %.2f' % Scores_avg[1])
print('The avg precision is = %.2f' % Scores_avg[2])
print('The avg f1-score is = %.2f' % Scores_avg[3])
print('The avg AUC ROC score is = %.2f' % Scores_avg[4]) 

## 7. Final thoughts

In this case we would like a model which has a high recall because recall tells us which % of people who actually churned was correctly identified. We need a model with less False Negatives (FN).
The model with significant higher recall than the avg is Naive Bayes model with recall = 0.79, f1-score = 0.60, accuracy  = 0.73 and the lowest precision = 0.49. We choose this model even if the accuracy is lower than avg because identifying the customers that churned is our goal of this project.