<h3 style='color:blue' align='center'>Customer Churn Prediction Using Artificial Neural Network (ANN)</h3>

Customer churn prediction is to measure why customers are leaving a business. In this tutorial we will be looking at customer churn in telecom business. We will build a deep learning model to predict the churn and use precision,recall, f1-score to measure performance of our model

In [3]:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
%matplotlib inline

**Load the data**

In [4]:
df = pd.read_csv("customer_churn.csv")
df.sample(5)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
5361,2495-TTHBQ,Female,0,No,Yes,4,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,20.4,84.75,No
3483,4465-VDKIQ,Female,0,No,No,18,Yes,No,Fiber optic,No,...,Yes,No,No,No,Month-to-month,No,Electronic check,77.8,1358.6,No
4376,0853-NWIFK,Female,0,No,No,45,Yes,No,Fiber optic,Yes,...,No,No,Yes,Yes,One year,Yes,Electronic check,100.3,4483.95,No
5876,4844-JJWUY,Female,1,No,No,1,Yes,No,Fiber optic,No,...,No,No,No,Yes,Month-to-month,Yes,Electronic check,86.0,86.0,Yes
67,3410-YOQBQ,Female,0,No,No,31,Yes,No,DSL,No,...,Yes,Yes,Yes,Yes,Two year,No,Mailed check,79.2,2497.2,No


**First of all, drop customerID column as it is of no use**

In [5]:
df.drop('customerID',axis='columns',inplace=True)

In [6]:
df.dtypes

gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

**Quick glance at above makes me realize that TotalCharges should be float but it is an object. Let's check what's going on with  this column**

In [7]:
df.TotalCharges.values

array(['29.85', '1889.5', '108.15', ..., '346.45', '306.6', '6844.5'],
      dtype=object)

**Ahh... it is string. Lets convert it to numbers**

In [8]:
pd.to_numeric(df.TotalCharges)

ValueError: Unable to parse string " " at position 488

**Hmmm... some values seems to be not numbers but blank string. Let's find out such rows**

In [None]:
pd.to_numeric(df.TotalCharges,errors='coerce').isnull()

In [None]:
df[pd.to_numeric(df.TotalCharges,errors='coerce').isnull()]

In [None]:
df.shape

In [None]:
df.iloc[488].TotalCharges

In [None]:
df[df.TotalCharges!=' '].shape

**Remove rows with space in TotalCharges**

In [None]:
df1 = df[df.TotalCharges!=' ']
df1.shape

In [None]:
df1.dtypes

In [None]:
df1.TotalCharges = pd.to_numeric(df1.TotalCharges)

In [None]:
df1.TotalCharges.values

In [None]:
df1[df1.Churn=='No']

**Data Visualization**

In [None]:
tenure_churn_no = df1[df1.Churn=='No'].tenure
tenure_churn_yes = df1[df1.Churn=='Yes'].tenure

plt.xlabel("tenure")
plt.ylabel("Number Of Customers")
plt.title("Customer Churn Prediction Visualiztion")

blood_sugar_men = [113, 85, 90, 150, 149, 88, 93, 115, 135, 80, 77, 82, 129]
blood_sugar_women = [67, 98, 89, 120, 133, 150, 84, 69, 89, 79, 120, 112, 100]

plt.hist([tenure_churn_yes, tenure_churn_no], rwidth=0.95, color=['green','red'],label=['Churn=Yes','Churn=No'])
plt.legend()

In [None]:
mc_churn_no = df1[df1.Churn=='No'].MonthlyCharges      
mc_churn_yes = df1[df1.Churn=='Yes'].MonthlyCharges      

plt.xlabel("Monthly Charges")
plt.ylabel("Number Of Customers")
plt.title("Customer Churn Prediction Visualiztion")

blood_sugar_men = [113, 85, 90, 150, 149, 88, 93, 115, 135, 80, 77, 82, 129]
blood_sugar_women = [67, 98, 89, 120, 133, 150, 84, 69, 89, 79, 120, 112, 100]

plt.hist([mc_churn_yes, mc_churn_no], rwidth=0.95, color=['green','red'],label=['Churn=Yes','Churn=No'])
plt.legend()

**Many of the columns are yes, no etc. Let's print unique values in object columns to see data values**

In [None]:
def print_unique_col_values(df):
       for column in df:
            if df[column].dtypes=='object':
                print(f'{column}: {df[column].unique()}') 

In [None]:
print_unique_col_values(df1)

**Some of the columns have no internet service or no phone service, that can be replaced with a simple No**

In [None]:
df1.replace('No internet service','No',inplace=True)
df1.replace('No phone service','No',inplace=True)

In [None]:
print_unique_col_values(df1)

**Convert Yes and No to 1 or 0**

In [9]:
yes_no_columns = ['Partner','Dependents','PhoneService','MultipleLines','OnlineSecurity','OnlineBackup',
                  'DeviceProtection','TechSupport','StreamingTV','StreamingMovies','PaperlessBilling','Churn']
for col in yes_no_columns:
    df1[col].replace({'Yes': 1,'No': 0},inplace=True)

NameError: name 'df1' is not defined

In [None]:
for col in df1:
    print(f'{col}: {df1[col].unique()}') 

In [None]:
df1['gender'].replace({'Female':1,'Male':0},inplace=True)

In [None]:
df1.gender.unique()

**One hot encoding for categorical columns**

In [10]:
df2 = pd.get_dummies(data=df1, columns=['InternetService','Contract','PaymentMethod'])
df2.columns

NameError: name 'df1' is not defined

In [None]:
df2.sample(5)

In [None]:
df2.dtypes

In [None]:
cols_to_scale = ['tenure','MonthlyCharges','TotalCharges']

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df2[cols_to_scale] = scaler.fit_transform(df2[cols_to_scale])

In [11]:
for col in df2:
    print(f'{col}: {df2[col].unique()}')

NameError: name 'df2' is not defined

**Train test split**

In [None]:
X = df2.drop('Churn',axis='columns')
y = df2['Churn']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=5)

In [None]:
X_train.shape

In [None]:
X_test.shape

In [12]:
X_train[:10]

NameError: name 'X_train' is not defined

In [None]:
len(X_train.columns)

**Build a model (ANN) in tensorflow/keras**

In [None]:
import tensorflow as tf
from tensorflow import keras


model = keras.Sequential([
    keras.layers.Dense(26, input_shape=(26,), activation='relu'),
    keras.layers.Dense(15, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

# opt = keras.optimizers.Adam(learning_rate=0.01)

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=100)

In [None]:
model.evaluate(X_test, y_test)

In [13]:
yp = model.predict(X_test)
yp[:5]

NameError: name 'X_test' is not defined

In [None]:
y_pred = []
for element in yp:
    if element > 0.5:
        y_pred.append(1)
    else:
        y_pred.append(0)

In [None]:
y_pred[:10]

In [None]:
y_test[:10]

In [14]:
from sklearn.metrics import confusion_matrix , classification_report

print(classification_report(y_test,y_pred))

NameError: name 'y_test' is not defined

In [None]:
import seaborn as sn
cm = tf.math.confusion_matrix(labels=y_test,predictions=y_pred)

plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True, fmt='d')
plt.xlabel('Predicted')
plt.ylabel('Truth')

In [None]:
y_test.shape

**Accuracy**

In [None]:
round((862+229)/(862+229+137+179),2)

**Precision for 0 class. i.e. Precision for customers who did not churn**

In [None]:
round(862/(862+179),2)

**Precision for 1 class. i.e. Precision for customers who actually churned**

In [None]:
round(229/(229+137),2)

**Recall for 0 class**

In [None]:
round(862/(862+137),2)

In [None]:
round(229/(229+179),2)

**Exercise**

In [None]:
Take this dataset for bank customer churn prediction : https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling

1) Build a deep learning model to predict churn rate at bank. 

2) Once model is built, print classification report and analyze precision, recall and f1-score 