# Background

Customer churn is defined as the number of customers who have stopped doing business with a company during a given time period. Churn poses a problem for a business as it lowers revenues and profits. Moreover, attracting new customers [costs 5 to 25 times more expensive](https://hbr.org/2014/10/the-value-of-keeping-the-right-customers) than retaining existing ones. [According to Bain & Co.](https://media.bain.com/Images/BB_Prescription_cutting_costs.pdf), increasing customer retention by 5% will increase profits to more than 25%.

Accurately predicting churn and identifying the relevant factors can help a company develop effective customer retention strategies which, in turn, reduce churn.

# Data

A newer version of telco customer churn data is used in this notebook, obtained from a data module in [IBM Accelerator Catalog](https://community.ibm.com/accelerators/?context=analytics&type=Data&industry=Telecommunications). The original module contains five data tables, but only three will be considered for analysis: `Demographics`, `Services`, and `Status`. A detailed description of columns in each table is available on [IBM Business Analytics Community](https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113).

In [None]:
# Data manipulation, visualization
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
!pip install openpyxl

# Preprocessing, modeling, and evaluation
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import plot_confusion_matrix, plot_roc_curve
from imblearn.over_sampling import RandomOverSampler
from collections import Counter

In [None]:
path = '/kaggle/input/telco-customer-churn-1113/'  # Read in datasets
stat = pd.read_excel(path+'Telco_customer_churn_status.xlsx')
demo = pd.read_excel(path+'Telco_customer_churn_demographics.xlsx')
serv = pd.read_excel(path+'Telco_customer_churn_services.xlsx')

key = ['Customer ID']
df = stat.merge(  # Merge into a single dataframe
    demo, left_on=key, right_on=key).merge(
    serv, left_on=key, right_on=key)
df.head()

After the tables are loaded, they are merged into a single data frame. Each table contains one identical column, `Customer ID`, on which the merging is performed. A total of 48 columns are output, but some are dropped because they contain redundant information, are already directly related to churn, or are highly related with other columns (which can result in multicollinearity).

In [None]:
to_drop = [  # Drop columns not used in the analyses
    'Customer ID',
    'Count_x',
    'Quarter_x',
    'Customer Status',
    'Churn Value',
    'Churn Score',
    'Churn Category',
    'Churn Reason',
    'Count_y',
    'Age',
    'Number of Dependents',
    'Quarter_y',
    'Referred a Friend',
    'Number of Referrals',
    'Phone Service',
    'Internet Service',
    'Streaming Music',
    'Total Charges',
    'Total Refunds',
    'Total Extra Data Charges',
    'Total Long Distance Charges',
    'Total Revenue']
df.drop(to_drop, axis=1, inplace=True)

# Data Exploration

There are 7043 observations in the data frame, each representing a unique customer. A majority of the columns contain categorical data of the nominal type, e.g., `Yes` or `No`. Six columns are of the numerical type, namely: `Satisfaction Score`, `CLTV`, `Tenure in Months`, `Avg Monthly Long Distance Charges`, `Avg Monthly GB Download`, and `Monthly Charge`. The column `Count` is only kept for the purpose of aggregating data and creating visualizations; it will be removed later. Upon checking, the data frame contains neither missing values nor duplicated rows.

In [None]:
df.info()

In [None]:
df.isna().sum()  # Check for missing values

In [None]:
df.duplicated().sum()  # Check for duplicated data

In [None]:
df.describe()  # Obtain statistical summary: numeric data

In [None]:
df.describe(include='O')  # Obtain statistical summary: categorical data

In [None]:
cpal = {'No':'#76528BFF', 'Yes':'#DF6589FF'}

def visualize_labels(img_title, df, col):
    fig, ax = plt.subplots(figsize=(6,5))
    sns.countplot(data=df, x=col, palette=cpal)
    for bar in ax.patches:
        height = bar.get_height()
        width = bar.get_width()
        col_total = df[col].count()
        pct = 100*height/(col_total)
        ax.text(bar.get_x() + width/2,
                bar.get_y() + height/2,
                f'{int(height)}',
                ha='center', va='center',
                color='white', weight='bold')
        ax.text(bar.get_x() + width/2,
                bar.get_y() + height/2 - 300,
                f'({pct:.1f}%)',
                ha='center', va='center',
                color='white', weight='bold')
    for pos in ['right', 'top', 'left']:
        ax.spines[pos].set_visible(False)
    ax.get_yaxis().set_visible(False)
    ax.set_xlabel(None)
    ax.set_title(f'{col}', loc='left', weight='bold')
    fig.savefig(f'{img_title}.png', dpi=300, bbox_inches='tight')
    
def stack_bar(img_title, df, cols):
    fig, axes = plt.subplots(1, 3, figsize=(16,5), sharey=True)
    for i, col in enumerate(cols):
        # Calculate totals and percentages
        total = df.groupby(col)['Count'].sum().reset_index()
        churn = df[df['Churn Label']=='Yes'].groupby(col)['Count'].sum().reset_index()
        churn['Count'] = [100*i/j for i,j in zip(churn['Count'], total['Count'])]
        total['Count'] = [100*i/j for i,j in zip(total['Count'], total['Count'])]
        bar1 = sns.barplot(  # top bars (group of 'Churn = No')
            x=col, y='Count', data=total, color=cpal['No'], ax=axes[i])
        bar2 = sns.barplot(  # bottom bars (group of 'Churn = Yes')
            x=col, y='Count', data=churn, color=cpal['Yes'], ax=axes[i])
        top_bar = mpatches.Patch(color=cpal['No'], label='Not Churn')
        bot_bar = mpatches.Patch(color=cpal['Yes'], label='Churn')  
        axes[i].legend(loc='lower center', ncol=2,
                       bbox_to_anchor=(0.5, -0.21),
                       handles=[top_bar, bot_bar])
        axes[i].spines['top'].set_visible(False)
        axes[i].spines['right'].set_visible(False)
        axes[i].set_xlabel(None)
        axes[i].set_ylabel('Percentage')
        axes[i].set_title(f'{col}', loc='left', weight='bold')  
        if i != 0:
            axes[i].legend().set_visible(False)
            axes[i].set_ylabel(None)
        else:
            pass
        for bar in axes[i].patches:
            if bar.get_height() != 100:
                axes[i].text(bar.get_x() + bar.get_width()/2,
                             bar.get_y() + bar.get_height()/2,
                             f'{int(bar.get_height())}%',
                             ha='center', va='center',
                             color='white', weight='bold')
    fig.savefig(f'{img_title}.png', dpi=300, bbox_inches='tight')

def unstack_bar(img_title, df, col):
    fig, ax = plt.subplots(figsize=(6,5))
    ax = sns.countplot(data=df, x=col, hue='Churn Label', palette=cpal)
    ax.legend(loc='lower center', ncol=2,
              bbox_to_anchor=(0.5, -0.21),
              labels=['Churn = Yes', 'Churn = No'])
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.set_xlabel(None)
    ax.set_ylabel('Count')
    ax.set_title(f'{col}', loc='left', weight='bold')
    fig.savefig(f'{img_title}.png', dpi=300, bbox_inches='tight')

def make_boxplot(img_title, df, cols):
    fig, axes = plt.subplots(1, 4, figsize=(18,5))
    for i, col in enumerate(cols):
        ax = sns.boxplot(data=df, x='Churn Label', y=col, palette=cpal, ax=axes[i])
        ax.set_xlabel('Churn')
        ax.set_ylabel(None)
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.set_title(f'{col}', loc='left', weight='bold')
    fig.savefig(f'{img_title}.png', dpi=300, bbox_inches='tight')

def make_histogram(img_title, df, col):
    fig, axes = plt.subplots(figsize=(6,5))
    ax = sns.histplot(x=col, data=df, hue='Churn Label', palette=cpal,
                      bins=6, multiple='stack')
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.set_xlabel(None)
    ax.set_ylabel('Count')
    ax.set_title(f'{col}', loc='left', weight='bold')
    fig.savefig(f'{img_title}.png', dpi=300, bbox_inches='tight')

In [None]:
visualize_labels('img-01', df, 'Churn Label')

Our target in this prediction is whether or not a customer will churn, represented by the column `Churn Label`. Looking at the proportion of class labels in said column, however, we see an imbalance. There is an unequal distribution between `Churn Label = Yes` and `Churn Label = No`. A majority of the churn data are labeled `No` while this case places more importance on predicting the `Yes`.

According to Jason Brownlee on [Machine Learning Mastery](https://machinelearningmastery.com/what-is-imbalanced-classification/):
> Imbalanced classifications pose a challenge for predictive modeling as most of the machine learning algorithms used for classification were designed around the assumption of an equal number of examples for each class. This results in models that have poor predictive performance, specifically for the minority class. This is a problem because typically, the minority class is more important and therefore the problem is more sensitive to classification errors for the minority class than the majority class.

In this notebook, the approach used to address class imbalance is to resample the training dataset. [Random oversampling](https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/) method is chosen, in which examples in the minority class are randomly duplicated.

In [None]:
categorical_cols = [  # List columns of categorical data type
    'Gender',
    'Under 30',
    'Senior Citizen',
    'Married',
    'Dependents',
    'Offer',
    'Multiple Lines',
    'Internet Type',
    'Online Security',
    'Online Backup',
    'Device Protection Plan',
    'Premium Tech Support',
    'Streaming TV',
    'Streaming Movies',
    'Unlimited Data',
    'Contract',
    'Paperless Billing',
    'Payment Method']
print('PROPORTION OF CATEGORIES ACROSS VARIABLES')
for col in categorical_cols:
    freq = df[col].value_counts(normalize=True).reset_index()
    freq.columns = [f'{col}', 'Proportion']
    print('-'*40+'\n', freq)

In [None]:
stack_bar('img-02', df, ['Gender', 'Under 30', 'Senior Citizen'])

- There is an almost equal proportion of female (49.5%) and male (50.5%) customers. Both genders have similar churn rate at 26%, signifying that gender has little to no effect on leaving the telco service.
- A large portion of customers (80%) are represented by those under the age of 30. These young customers have churn rate that is 6% lower than customers over 30.
- Senior citizens represent only 16% of total customers but have a relatively high churn rate at 41%, which is almost twice as high as non-senior citizens.

In [None]:
stack_bar('img-03', df, ['Married', 'Dependents', 'Offer'])

- About 52% customers are not married. These customers have 13% higher churn rate than married ones.
- Having dependents (e.g., children) reduce the likelihood of churn. There is only 6% churn among customers that have dependents, about 5 times lower than customers that do not.
- More than 50% of customers who subscribed to *Offer E* have churned in the past quarter, almost twice as high as customers who do not subscribe to any offer.

In [None]:
stack_bar('img-04', df, ['Multiple Lines', 'Internet Type', 'Online Security'])

- Customers with multiple phone lines have a slightly higher churn rate than customers without ones; about 3% difference.
- Fiber optic is the most preferred of the three internet service types available, 43% customers have it installed. However, the highest churn (40% rate) is also observed on customers having fiber optic.
- Of customers who subscribe to the online security service (29% of all customers), 14% have churned. Their churn rate is twice as low as customers who do not subscribe.

In [None]:
stack_bar('img-05', df, ['Online Backup', 'Device Protection Plan', 'Premium Tech Support'])

- Customers with either online backup or device protection subscription have 6-7% lower churn rate than customers with none of these services.
- About 31% of customers who do not subscribe to premium tech support have churned, the rate is twice as high as customers with support.

In [None]:
stack_bar('img-06', df, ['Streaming TV', 'Streaming Movies', 'Unlimited Data'])

- Thirty-nine percent of customers have streaming services, either TV or Movies. They have a 5-6% higher churn rate when compared to customers who do not subscribe to any streaming services.
- Thirty-one percent of customers who have unlimited data included in their subscription plan have churned, twice the number of churn in customers with limited data plan.

In [None]:
stack_bar('img-07', df, ['Contract', 'Paperless Billing', 'Payment Method'])

- Half of the telco customers (51%) opt for the Month-to-Month subscription plan and they are the most likely to churn when compared to others who opt for a One-Year or Two-Year plan. Customers with longer subscription plans have more than 30% lower churn rate.
- The paperless billing option, preferred by 60% of customers, relates to twice as high churn rate as the other billing option.
- Customers who pay their services with mailed checks, which is the least popular option (only 5% of all customers), have higher churn rate than customers who chose other methods. The most preferred payment method is bank withdrawal (55% customers).

In [None]:
unstack_bar('img-08', df, 'Satisfaction Score')

The chart above shows churn distribution by customer's satisfaction, from a score of 1 (Very Unsatisfied) to 5 (Very Satisfied). We see that customers who give a score of 1 and 2 have all churned. *Unhappy customers stop having business with you*.

In [None]:
numerical_cols = [  # List columns of numerical data type
    'CLTV',
    'Tenure in Months',
    'Avg Monthly Long Distance Charges',
    'Avg Monthly GB Download',
    'Monthly Charge']

med_churn_y = []
med_churn_n = []
for col in numerical_cols:
    med_churn_y.append(df[df['Churn Label']=='Yes'][col].median())
    med_churn_n.append(df[df['Churn Label']=='No'][col].median())

medians = pd.DataFrame(
    index=numerical_cols,
    data={
        'Median_Churn_Yes': med_churn_y,
        'Median_Churn_No': med_churn_n
    })
medians

In [None]:
make_boxplot('img-09', df, ['CLTV', 'Avg Monthly Long Distance Charges', 'Avg Monthly GB Download', 'Monthly Charge'])

- Customers with low CLTV (customer lifetime value) are more likely to churn than ones with high CLTV.
- Looking at the distribution of data, long distance charges have negligible effect on churn.
- Customers who have churned have a slightly higher average of monthly download and are paying higher monthly charges.

In [None]:
make_histogram('img-10', df, 'Tenure in Months')

We see that new customers (i.e., tenure under 12 months) have the highest cases of churn. On the other hand, it is observed that longer tenures translate into lower proportions of churn, especially in the case of very loyal customers (i.e., tenure over 60 months)

# Preprocessing

In [None]:
df.head()

A majority of columns in the data frame contains categorical data. However, many machine learning models require numeric inputs. [Encoding and One-Hot Encoding](https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/) are implemented to convert these categorical data into numerical form (0 or 1).

The churn data contains numeric features with a mixture of scales (e.g., dollars for `Monthly Charge`, Gigabytes for `Avg Monthly GB Download`), so another important step in the preprocessing stage is rescaling. The normalization method is applied to rescale numeric features into the range of 0 and 1.

A train/test split is defined where 20% of the data will be reserved for testing. Lastly, oversampling is performed on the training data to address class imbalance.

In [None]:
binary_cols = [col for col in df.columns if df[col].nunique() == 2]
encode_dict = {  # Encoding dictionary
    'Female':0, 'Male':1, 'No':0, 'Yes':1}
for col in binary_cols:
    df[col] = df[col].map(encode_dict)

dummy_cols = [  # Columns to one-hot encode
    'Satisfaction Score',
    'Offer',
    'Internet Type',
    'Contract',
    'Payment Method']
df = pd.get_dummies(df, columns=dummy_cols)

# Remove unnecessary column
df.drop('Count', axis=1, inplace=True)

X = df.drop('Churn Label', axis=1)  # Select features
y = df['Churn Label']  # Target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1)  # Split data 80/20

scaler = MinMaxScaler()  # Normalize train & test features
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

over = RandomOverSampler(random_state=1)  # Oversample training data
X_train_res, y_train_res = over.fit_resample(X_train, y_train.ravel())

In [None]:
print('-'*45+'\nCLASS PROPORTION'+'\n'+'-'*45,
      f'\nBefore resampling: {Counter(y_train)}',
      f'\nAfter resampling : {Counter(y_train_res)}')

In [None]:
print(f'SAMPLE FEATURES:\n{X_train_res[0]}',
      f'\n\nSAMPLE TARGETS:\n{y_train_res[:30]}')

# Modeling

Predicting whether or not a customer will churn is a binary classification problem, since there are only two possible outcomes: `Yes (1)` or `No (0)`. Three classification algorithms are implemented in the prediction, namely [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html), [Decision Tree](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html), and [K-Nearest Neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html). The models are fit on the train set and subsequently used to make prediction on unseen data. Model performances are evaluated with [metrics](https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9) such as accuracy, precision, recall, F1-score, as well as using confusion matrix and [ROC curve](https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5).

In [None]:
# Define instances of classifier
logit = LogisticRegression(random_state=1)
dtree = DecisionTreeClassifier(random_state=1)
neigh = KNeighborsClassifier()

for clf in [logit, dtree, neigh]:  # Display current parameters
    print('-'*40+f'\n{clf.__class__.__name__} parameters\n'+'-'*40)
    display(clf.get_params())

In [None]:
# Learning: Fit models on training data
logit.fit(X_train_res, y_train_res)
dtree.fit(X_train_res, y_train_res)
neigh.fit(X_train_res, y_train_res)

# Make predictions on testing data
logit_pred = logit.predict(X_test)
dtree_pred = dtree.predict(X_test)
neigh_pred = neigh.predict(X_test)

# Model evaluation

In [None]:
def print_reports(classifiers, predictions, y_test):
    reports = []
    for clf, pred in zip(classifiers, predictions):
        print('-'*55, f'\n{clf.__class__.__name__}', '\n'+'-'*55)
        print(classification_report(y_test, pred, digits=4))
        reports.append(
            classification_report(
                y_test, pred, output_dict=True))
    return reports

In [None]:
reports = print_reports(  # Display classification metrics
    [logit, dtree, neigh],
    [logit_pred, dtree_pred, neigh_pred],
    y_test)

# Hyperparameter tuning

Earlier, the models are trained with default hyperparameters. Better model performances may be achieved by [optimizing](https://machinelearningmastery.com/hyperparameter-optimization-with-random-search-and-grid-search/) these hyperparameters. One approach that can be applied is using the [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html). Given a set of values for hyperparameters, GridSearchCV fits a model using every single combination of these hyperparameters and evaluates it using cross-validation (hence the 'CV'). The set of hyperparameters that resulted in the best score can be accessed from the search result.

In [None]:
# Specify grids of hyperparameters to try
logit_grid = {'C': [.1, 1, 10],
              'penalty': ['l1', 'l2'],
              'solver': ['liblinear']}
dtree_grid = {'criterion': ['gini', 'entropy'],
              'max_depth': [5, 7, 8, 9, 10]}
neigh_grid = {'n_neighbors': [3, 4, 5, 6, 7],
              'p': [1, 2]}

# Create instances of GridSearchCV object
gsc_logit = GridSearchCV(logit, logit_grid, scoring='recall', cv=5)
gsc_dtree = GridSearchCV(dtree, dtree_grid, scoring='recall', cv=5)
gsc_neigh = GridSearchCV(neigh, neigh_grid, scoring='recall', cv=5)

In [None]:
gsc_logit.fit(X_train_res, y_train_res)

In [None]:
gsc_dtree.fit(X_train_res, y_train_res)

In [None]:
gsc_neigh.fit(X_train_res, y_train_res)

In [None]:
for clf, gsc in zip([logit, dtree, neigh], [gsc_logit, gsc_dtree, gsc_neigh]):
    print('-'*70, f'\n{clf.__class__.__name__}', '\n'+'-'*70)
    print(f'Best parameters: {gsc.best_params_}')
    print(f'Best recall    : {gsc.best_score_*100:.2f}%\n')

In [None]:
# Define instances of classifier with tuned parameters
logit2 = LogisticRegression(random_state=1, C=10, penalty='l1', solver='liblinear')
dtree2 = DecisionTreeClassifier(random_state=1, criterion='entropy', max_depth=10)
neigh2 = KNeighborsClassifier(n_neighbors=4, p=1)

# Learning: Fit models on training data
logit2.fit(X_train_res, y_train_res)
dtree2.fit(X_train_res, y_train_res)
neigh2.fit(X_train_res, y_train_res)

# Make predictions on testing data
logit2_pred = logit2.predict(X_test)
dtree2_pred = dtree2.predict(X_test)
neigh2_pred = neigh2.predict(X_test)

# Evaluating tuned models

In [None]:
reports2 = print_reports(  # Classification report
    [logit2, dtree2, neigh2],  # After tuning
    [logit2_pred, dtree2_pred, neigh2_pred],
    y_test)

In [None]:
def compare_metrics(reports):
    metrics_logit = []
    metrics_dtree = []
    metrics_neigh = []
    metrics_data = [metrics_logit, metrics_dtree, metrics_neigh]
    for i, metric in enumerate(metrics_data):
        metric.append(round(reports[i]['accuracy']*100, 2))
        metric.append(round(reports[i]['1']['precision']*100, 2))
        metric.append(round(reports[i]['1']['recall']*100, 2))
        metric.append(round(reports[i]['1']['f1-score']*100, 2))
    metrics_cols = ['%Accuracy', '%Precision', '%Recall', '%F1-Score']
    metrics_idx = ['Logistic Regression', 'Decision Tree', 'KNN']
    metrics_df = pd.DataFrame(metrics_data, index=metrics_idx, columns=metrics_cols)
    return metrics_df

In [None]:
# Show metrics of Class 1 (Churn = Yes) prediction
metrics_df = pd.concat([compare_metrics(reports),
    compare_metrics(reports2).rename(index= lambda s: s+' Tuned')])
metrics_df.sort_values(by=['%Accuracy'], ascending=False, inplace=True)
metrics_df.style.background_gradient(cmap='Blues').format("{:.1f}")

- Although showing no improvement after tuning, the logistic regression model is still the most well-performing in terms of overall accuracy and recall for predicting class 1 (Churn = Yes).
- While tuning improves recall score for decision tree, it results in poorer precision. This is reflected in the confusion matrices below: false negative (FN) cases decrease, but false positives (FP) increase. The area under ROC curve is also consequently affected: it yields bigger AUC score.
  - In the case of K-nearest neighbors model, the reverse is true: better precision (less FP), lower recall (more FN) and smaller AUC.

In [None]:
def plot_confusion(img_title, classifiers, X_test, y_test):
    fig, axes = plt.subplots(2, 3, figsize=(16,12))
    for i, ax, clf in zip(range(7), axes.flatten(), classifiers):
        plot_confusion_matrix(
            clf, X_test, y_test, ax=ax,
            display_labels=['Stay', 'Churn'],
            cmap='Purples', colorbar=False)
        if i in [0, 1, 2]:
            ax.set_title(clf.__class__.__name__)
        else:
            ax.set_title(f'{clf.__class__.__name__}_Tuned')
        if i not in [0, 3]:
            ax.set_ylabel(None)
    fig.savefig(f'{img_title}.png', dpi=300, bbox_inches='tight')

def plot_rocs(img_title, classifiers, classifiers2, X_test, y_test):
    fig, axes = plt.subplots(1, 2, figsize=(16,5))
    for i, ax in zip(range(3), axes.flatten()):
        if i == 0:
            for clf in classifiers:
                plot_roc_curve(clf, X_test, y_test, ax=ax, lw=2)
            ax.set_title('ROC curve', loc='left', weight='bold')
        else:
            for clf in classifiers2:
                plot_roc_curve(clf, X_test, y_test, ax=ax, lw=2)
            ax.set_title('ROC curve - Tuned hyperparameters',
                         loc='left', weight='bold')
        ax.plot([0, 1], [0, 1], linestyle='--', lw=2)
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
    fig.savefig(f'{img_title}.png', dpi=300, bbox_inches='tight')

In [None]:
classifiers = [logit, dtree, neigh]  # Before tuning
classifiers2 = [logit2, dtree2, neigh2]  # After tuning

plot_confusion(  # Show confusion matrices
    'img-11', classifiers+classifiers2, X_test, y_test)

In [None]:
plot_rocs(  # Show ROC curves and AUC scores
    'img-12', classifiers, classifiers2, X_test, y_test)

# Feature importance

The models have performed fairly well in accurately predicting customer churn. However, it is equally important that one also understands which features are most relevant in that prediction. [Feature importance](https://machinelearningmastery.com/calculate-feature-importance-with-python/) is a technique in which input features are assigned a score based on how useful they are at predicting a target variable. It helps us better understand the dataset and can be used to improve the predictive model.

For logistic regression model, feature importance scores can be retrieved from the *coef_* attribute. Positive scores indicate a feature that predicts class 1 (Churn = Yes), whereas the negative scores indicate a feature that predicts class 0 (Churn = No).

In [None]:
logit2_coef = pd.DataFrame({
    'feature': list(X.columns),
    'coefficient': [i for i in logit2.coef_[0]]
}).sort_values('coefficient', ascending=False)
logit2_coef

A number of variables have driven customers to be more likely to churn in the past quarter: 
- expressing low satisfaction (esp. score 1 and 2),
- opting for monthly contract,
- paying higher monthly charge,
- purchasing offer A and E,
- being a senior citizen,
- not subscribing to online security service,
- not having dependents, and
- having recently joined (low tenure length).