# Telecom Churn Case Study

In this notebook, we are going to work on a Telecom company case study where using the past information we need to build a model that can predict whether a particular customer will  switch to different service provider or not (Churn or not). So for us the variable of interest is `Churn` which will tell us whether or not a particular customer has churned. It is a binary variable, 1 means that the customer has churned and 0 means the customer has not churned.

Company also needs to know the factors (variables) which influences the Churn variable and how much they impact individually. This will help the company to improve those area to retain their customers. Company needs a descent model that can predict good percentage of Churn and Non-Churn customer correctly.

We will build a Logistic Regression model for this problem because it is easy to interpret and this will help company in decision making better.

We will perform following steps in this notebook:

- **Step 1**: Importing Libraries
- **Step 2**: Exploring Data Frame
- **Step 3**: Data Preparation
- **Step 4**: Splitting the Dataset
- **Step 5**: Feature Scaling
- **Step 6**: Model Building
- **Step 7**: Model Evaluation
- **Step 8**: Model Validation
- **Step 9**: Model Interpretation
- **Step 10**: Conclusion

# Step 1: Importing Libraries

In [None]:
#Importing all libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

#building model
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression

#model evaluation
from sklearn.feature_selection import RFE
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn import metrics
from sklearn.metrics import precision_recall_curve

#model validation
from sklearn.model_selection import train_test_split

#visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Suppressing Warnings
import warnings
warnings.filterwarnings('ignore')

# Data display coustomization
pd.set_option('display.max_columns', 100)

# Step 2: Exploring Data Frame

In [None]:
telecom = pd.read_csv("/kaggle/input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")

In [None]:
# Now we have one data frame consisting all data. Now we will see first five rows of the new data frame
telecom.head()

In [None]:
#prinitng shape of the dataset
r,c = telecom.shape
print(f"Shape of telecom dataset: {telecom.shape}")
print(f"Number of rows: {r}")
print(f"Number of columns: {c}")

This means we have **21 features** about a customer including target variable `Churn` and we have details for **7043 customers**.

Brief description about each feature (column) is given below:
1. `customerID`: The unique ID of each customer
2. `tenure`: Number of monthscustomer has been using the service
3. `PhoneService`: Whether a customer has a Phone services or not (Yes, No)
4. `Contract`: The contract term of the customer (Month-to-month, One year, Two year)
5. `PaperlessBilling`: Whether a customer has opted for paperless billing (Yes, No)
6. `PaymentMethod`: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
7. `MonthlyCharges`: Specifies the money paid by a customer each month
8. `TotalCharges`: The total money paid by the customer to the company
9. `Churn`: This is the target variable which specifies if a customer has churned or not (Yes, No)
10. `gender`: The gender of a person (Male, Female)
11. `SeniorCitizen`: Whether a customer can be classified as a senior citizen (1=Yes, 0=No)
12. `Partner`: Whether the customer has a partner or not (Yes, No)
13. `Dependents`: Whether the customer has dependents(children/ retired parents) or not (Yes, No)
14. `MultipleLines`: Whether the customer has multiple lines or not (Yes, No, No phone service)
15. `InternetService`: Customer’s internet service provider (DSL, Fiber optic, No)
16. `OnlineSecurity`: Whether the customer has online security or not (Yes, No, No internet service)
17. `OnlineBackup`: Whether the customer has online backup or not (Yes, No, No internet service)
18. `DeviceProtection`: Whether the customer has device protection or not (Yes, No, No internet service)
19. `TechSupport`: Whether the customer has tech support or not (Yes, No, No internet service)
20. `StreamingTV`: Whether the customer has streaming TV or not (Yes, No, No internet service)
21. `StreamingMovies`: Whether the customer has streaming movies or not (Yes, No, No internet service)

In [None]:
# let's look at the some statistics of the dataframe
telecom.describe()

In [None]:
# Let's look at the data type of each feature
telecom.info()

#### Variables are of different types, which are categorized below

- **Categorical**:
    - **Binary (7)**: `SeniorCitizen`, `gender`, `Partner`, `Dependents`, `PhoneService`, `PaperlessBilling`, and `Churn`
    
    - **Multimonial (11)**: `CustomerID`, `MultipleLines`,`InternetService`, `OnlineSecurity`, `OnlineBackup`, `DeviceProtection`, `TechSupport`, `StreamingTV`, `StreamingMovies`, `Contract`, `PaymentMethod`
    
    
- **Continuous(3)**: `TotalCharges `, `MonthlyCharges` and `Tenure` 

# Step 3: Data Preparation

#### Converting Binary Variables (Yes/ No) into (1/ 0)

In [None]:
# Defining method to convert them
def binary_map(x):
    return x.map({'Yes': 1, "No": 0})

bin_var =  ['PhoneService', 'PaperlessBilling', 'Churn', 'Partner', 'Dependents']

# Applying the method to the data frame
telecom[bin_var] = telecom[bin_var].apply(binary_map)

#### Converting Binary Variable gender (Male/ Female) into (1/ 0)

In [None]:
#creating dummies for gender and dropping first column because single column can capture the whole data
gender = pd.get_dummies(telecom['gender'], drop_first=True)

# Merging the above results with telecom data frame 
telecom = pd.concat([telecom, gender], axis=1)

In [None]:
#printing first 5 rows of the data frame after converting Binary variables
telecom.head()

`Male` column represents the gender column now, 1 = Male and 0 = Female

### Coverting Multinomial Variables by creating dummy variables

A dummy variable is a numeric variable that represents categorical data.

Technically, dummy variables are dichotomous, quantitative variables. Their range of values is small; they can take on only two quantitative values. As a practical matter, regression results are easiest to interpret when dummy variables are limited to two specific values, 1 or 0. Typically, 1 represents the presence of a qualitative attribute, and 0 represents the absence.


##### Avoid the Dummy Variable Trap
When defining dummy variables, a common mistake is to define too many variables. If a categorical variable can take on k values, it is tempting to define k dummy variables. Resist this urge. Remember, you only need k - 1 dummy variables.

A kth dummy variable is redundant; it carries no new information. And it creates a severe multicollinearity problem for the analysis.

In [None]:
# Creating a dummy variable for some of the categorical variables and dropping the first one.
dummy1 = pd.get_dummies(telecom[['Contract', 'PaymentMethod', 'InternetService']], drop_first=True)

# Adding the results to the telecom dataframe
telecom = pd.concat([telecom, dummy1], axis=1)

In [None]:
telecom.head()

In [None]:
# Creating dummy variables for the remaining categorical variables and dropping the level with big names.

# Creating dummy variables for the variable 'MultipleLines'
ml = pd.get_dummies(telecom['MultipleLines'], prefix='MultipleLines')
# Dropping MultipleLines_No phone service column
ml1 = ml.drop(['MultipleLines_No phone service'], 1)
# Adding the results to the telecom dataframe
telecom = pd.concat([telecom,ml1], axis=1)

# Creating dummy variables for the variable 'OnlineSecurity'.
os = pd.get_dummies(telecom['OnlineSecurity'], prefix='OnlineSecurity')
os1 = os.drop(['OnlineSecurity_No internet service'], 1)
# Adding the results to the telecom dataframe
telecom = pd.concat([telecom,os1], axis=1)

# Creating dummy variables for the variable 'OnlineBackup'.
ob = pd.get_dummies(telecom['OnlineBackup'], prefix='OnlineBackup')
ob1 = ob.drop(['OnlineBackup_No internet service'], 1)
# Adding the results to the telecom dataframe
telecom = pd.concat([telecom,ob1], axis=1)

# Creating dummy variables for the variable 'DeviceProtection'. 
dp = pd.get_dummies(telecom['DeviceProtection'], prefix='DeviceProtection')
dp1 = dp.drop(['DeviceProtection_No internet service'], 1)
# Adding the results to the telecom dataframe
telecom = pd.concat([telecom,dp1], axis=1)

# Creating dummy variables for the variable 'TechSupport'. 
ts = pd.get_dummies(telecom['TechSupport'], prefix='TechSupport')
ts1 = ts.drop(['TechSupport_No internet service'], 1)
# Adding the results to the telecom dataframe
telecom = pd.concat([telecom,ts1], axis=1)

# Creating dummy variables for the variable 'StreamingTV'.
st =pd.get_dummies(telecom['StreamingTV'], prefix='StreamingTV')
st1 = st.drop(['StreamingTV_No internet service'], 1)
# Adding the results to the telecom dataframe
telecom = pd.concat([telecom,st1], axis=1)

# Creating dummy variables for the variable 'StreamingMovies'. 
ssm = pd.get_dummies(telecom['StreamingMovies'], prefix='StreamingMovies')
ssm1 = ssm.drop(['StreamingMovies_No internet service'], 1)
# Adding the results to the telecom dataframe
telecom = pd.concat([telecom,ssm1], axis=1)

We dropped `No phone service` and `No internet service` because they are already included from columns `InternetService` and `PhoneService` as we can see below

In [None]:
telecom.InternetService.value_counts()

In [None]:
telecom.PhoneService.value_counts()

In [None]:
telecom.head()

#### Dropping the repeated Variables

In [None]:
# We have created dummies for the below variables, so we can drop them
telecom = telecom.drop(['Contract','PaymentMethod','gender','MultipleLines','InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
       'TechSupport', 'StreamingTV', 'StreamingMovies'], 1)

In [None]:
#The varaible TotalCharges is of String data type so converting it into float type
telecom['TotalCharges'] = pd.to_numeric(telecom["TotalCharges"].replace(" ",""),downcast="float")

In [None]:
#checking data types of variables 
telecom.info()

We have transformed all the variables, now next step is to check for outlier in the dataset.

### Checking for Outliers

In [None]:
#Plot Box Plot for all there continuous variables 

plt.figure(figsize=(15,3))
plt.subplot(1,3,1)
sns.boxplot(telecom[["tenure"]])
plt.title("Tenure",size=15)

plt.subplot(1,3,2)
sns.boxplot(telecom[["MonthlyCharges"]])
plt.title("MonthlyCharges",size=15)

plt.subplot(1,3,3)
sns.boxplot(telecom[["TotalCharges"]])
plt.title("TotalCharges",size=15)

From the above Box Plots we can see that there are no outliers.

### Checking for Missing Values

In [None]:
# Adding up the missing values (column-wise)
telecom.isnull().sum()

It means that 11/7043 = 0.001561834 i.e 0.1%, best is to remove these observations from the analysis

In [None]:
# Removing NaN TotalCharges rows
telecom = telecom[~np.isnan(telecom['TotalCharges'])]

In [None]:
# Checking again for missing values (column-wise)
telecom.isnull().sum()

Now we don't have any missing values

### Checking the Correlation among Variables.
We are using Pearson's correlation to compute correlation matrix

In [None]:
# Correlation Matrix
plt.figure(figsize = (20,10))
sns.heatmap(round(telecom.corr(),1),annot = True)
plt.show()

In [None]:
#dropping the highly correlated variables

telecom.drop(['MultipleLines_No','OnlineSecurity_No','OnlineBackup_No','DeviceProtection_No','TechSupport_No',
                       'StreamingTV_No','StreamingMovies_No'],axis=1,inplace=True)

# Step 4: Splitting the Dataset

We are done with data processing steps,  now the data is ready to fetch in the model.

We will first split the dataset into towo part:
- X = All independent variables
- y = Dependent variable `Churn`

Then we will split the dataset into Training set and Testing set:
- Training Set: Model is build using this dataset
- Testing Set: Model Validation is done using this set

We will be doing In-sample validation for this problem. Training set and Testing set will be in ratio 7:3 respectively. 

In [None]:
#customerID column is of no use for the model so we drop that column also
X = telecom.drop(['Churn','customerID'], axis=1)
X.head()

In [None]:
y = telecom['Churn']
y.head()

Now we split the dataset into Training set and Testing set with ratio 7:3

In [None]:
# Splitting the dataset into training set and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state=100)

# Step 5: Feature Scaling

Most of the times, your dataset will contain features highly varying in magnitudes, units and range. But since, most of the machine learning algorithms use Eucledian distance between two data points in their computations. To supress this effect, we need to bring all features to the same level of magnitudes.

There are several ways for performing Feature Scaling, here we will be using `Standard Scalar` or `Standardization`
\begin{equation*}
\mathbf{X_*}   = \frac{X - Mean}{Standard Deviation}
\end{equation*}

In [None]:
#we will standardize continous variables only and not categorical variables
sc = StandardScaler()
sc.fit(X_train[['tenure','MonthlyCharges','TotalCharges']])
X_train[['tenure','MonthlyCharges','TotalCharges']] = sc.transform(X_train[['tenure','MonthlyCharges','TotalCharges']])
X_train.head()

### Minimum samples required for building a Logistic Regression Model

For multivariable logistic regression, Peduzzi, Concato, Kemper, Holford, & Feinstein (1996) suggested a very simple guideline for a minimum number of cases for logistic regression study.

**p:** Smallest of the proportions of negative or positive cases in the population

**k:** Number of Independent variables
\begin{equation*}
\mathbf{N}   = \frac{10*k}{p}
\end{equation*}

In [None]:
### Checking the 
p = (sum(y_train)/len(y_train))
print(f"p: {p}")

k = X_train.shape[1]
print(f"k: {k}")

N = 10 * k / p
print(f"N: {int(N)}")

This means we need to have minimum **879** samples to build a Logistic Regression model and in the `X_train` we have **4922** samples i.e. we can build a Logistic model.

# Step 6: Model Building

Now we will build Logistic Regression model using `statsmodel` and `sklearn` libraries.

We will perform Coarse tuning and Fine tuning technique to do `Feature Selection` and select best features for the model

- **Coarse Tuning**: Recursive Feature Elimination (RFE) 
- **Fine Tuning**: Variable Inflation Factors (VIF) and p-value

## Building First Model

Building first model with all selected varaiables

In [None]:
# Logistic regression model
logm1 = sm.GLM(y_train,(sm.add_constant(X_train)), family = sm.families.Binomial())
logm1.fit().summary()

### Feature Selection Using RFE

Selecting top 15 features for the model out of 23 features using RFE

In [None]:
logreg = LogisticRegression()
rfe = RFE(logreg, 15)
rfe = rfe.fit(X_train, y_train)

In [None]:
#top 15 columns returned by RFE
col = X_train.columns[rfe.support_]
col

## Building Second Model

Building second model after selecting top 15 features from RFE

In [None]:
#building the model with top 15 features which we got from RFE
X_train_sm = sm.add_constant(X_train[col])
logm2 = sm.GLM(y_train,X_train_sm, family = sm.families.Binomial())
res = logm2.fit()
res.summary()

We can notice that p-value of all features is < 0.05

In [None]:
# Getting the predicted values on the train set and showing first 10 predictions in terms of probabilities
y_train_pred = res.predict(X_train_sm)
y_train_pred[:10]

In [None]:
#reshaping the predicted array
y_train_pred = y_train_pred.values.reshape(-1)

##### Comparing Actual Churn and Predicted Churn on Training set 

In [None]:
#creating data frame with actual churn and predicted probablilities
y_train_pred_final = pd.DataFrame({'Churn':y_train.values, 'Churn_Prob':y_train_pred})
y_train_pred_final['CustID'] = y_train.index
y_train_pred_final.head()

##### Creating new column 'Predicted' with 1 if Churn_Prob > 0.5 else 0

In [None]:
y_train_pred_final['predicted'] = y_train_pred_final.Churn_Prob.map(lambda x: 1 if x > 0.5 else 0)

# Let's see the head
y_train_pred_final.head()

### Checking Accuracy of the Model

In [None]:
#Checking Accuracy of the model
print("Accuracy (Training Set): ",round(metrics.accuracy_score(y_train_pred_final.Churn, y_train_pred_final.predicted),4))

Accuracy of the model looks good with **81%**. Let's check Variable Inflation Factors (VIF) to check multicollinearity because Pearson's correlation calculates one-to-one correlation only. However, VIF determines the strength of the correlation between the independent variables. It is predicted by taking a variable and regressing it against every other variable. 

### Checking VIFs

In [None]:
# Create a dataframe that will contain the names of all the feature variables and their respective VIFs
vif = pd.DataFrame()
vif['Features'] = X_train[col].columns
vif['VIF'] = [variance_inflation_factor(X_train[col].values, i) for i in range(X_train[col].shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

There are a few variables with high VIF. It's best to drop these variables as they aren't helping much with prediction and unnecessarily making the model complex. The variable 'PhoneService' has the highest VIF. So let's start by dropping that.

In [None]:
#We will drop variables one by one, droping MonthlyCharges column
col = col.drop('MonthlyCharges',1)
col

## Building Third Model

Building third model after dropping `MonthlyCharges`variable.

In [None]:
# Let's re-run the model using the selected variables
X_train_sm = sm.add_constant(X_train[col])
logm3 = sm.GLM(y_train,X_train_sm, family = sm.families.Binomial())
res = logm3.fit()
res.summary()

We can notice that p-value of `MultipleLines_Yes` is 0.07 (> 0.05).

This means that variable is insignificant for us and hence we can drop it.

##### Making predictions on the Training Set

In [None]:
y_train_pred = res.predict(X_train_sm).values.reshape(-1)
y_train_pred_final['Churn_Prob'] = y_train_pred
y_train_pred[:10]

##### Creating new column 'Predicted' with 1 if Churn_Prob > 0.5 else 0

In [None]:
# Creating new column 'predicted' with 1 if Churn_Prob > 0.5 else 0
y_train_pred_final['predicted'] = y_train_pred_final.Churn_Prob.map(lambda x: 1 if x > 0.5 else 0)
y_train_pred_final.head()

### Checking Accuracy of the Model

In [None]:
# Let's check the overall accuracy.
print("Accuracy (Training Set): ",round(metrics.accuracy_score(y_train_pred_final.Churn, y_train_pred_final.predicted),4))

So overall the accuracy hasn't dropped much.

### Checking VIFs

In [None]:
vif = pd.DataFrame()
vif['Features'] = X_train[col].columns
vif['VIF'] = [variance_inflation_factor(X_train[col].values, i) for i in range(X_train[col].shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

We have seen `MultipleLines_Yes` variable is insignificant due to p-value and `TotalCharges` has high VIF value. However, we will drop only one variable at a time and there we give priority to p-value more than VIF.

Therefore, we will drop `MultipleLines_Yes`

In [None]:
#We are droping MultipleLines_Yes variable as it is insignificant
#We give priority to p-value than VIF
col = col.drop('MultipleLines_Yes',1)
col

## Building Fourth Model

Building Fourth model after dropping `MultipleLines_Yes` variable

In [None]:
# Let's re-run the model using the selected variables
X_train_sm = sm.add_constant(X_train[col])
logm4 = sm.GLM(y_train,X_train_sm, family = sm.families.Binomial())
res = logm4.fit()
res.summary()

We can notice that p-value for all variables are < 0.05

##### Making predictions on the Training Set

In [None]:
y_train_pred = res.predict(X_train_sm).values.reshape(-1)
y_train_pred_final['Churn_Prob'] = y_train_pred
y_train_pred[:10]

##### Creating new column 'Predicted' with 1 if Churn_Prob > 0.5 else 0

In [None]:
# Creating new column 'predicted' with 1 if Churn_Prob > 0.5 else 0
y_train_pred_final['predicted'] = y_train_pred_final.Churn_Prob.map(lambda x: 1 if x > 0.5 else 0)
y_train_pred_final.head()

### Checking Accuracy of the Model

In [None]:
# Let's check the overall accuracy.
print("Accuracy (Training Set): ",round(metrics.accuracy_score(y_train_pred_final.Churn, y_train_pred_final.predicted),4))

The accuracy is still practically the same.

### Checking VIFs

In [None]:
vif = pd.DataFrame()
vif['Features'] = X_train[col].columns
vif['VIF'] = [variance_inflation_factor(X_train[col].values, i) for i in range(X_train[col].shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
# Let's drop TotalCharges since it has a high VIF
col = col.drop('TotalCharges')
col

## Building Fifth Model

Building fifth model after dropping `TotalCharges` variable

In [None]:
# Let's re-run the model using the selected variables
X_train_sm = sm.add_constant(X_train[col])
logm5 = sm.GLM(y_train,X_train_sm, family = sm.families.Binomial())
res = logm5.fit()
res.summary()

We can notice that p-value for all variables are < 0.05

##### Making predictions on the Training Set

In [None]:
y_train_pred = res.predict(X_train_sm).values.reshape(-1)
y_train_pred_final['Churn_Prob'] = y_train_pred
y_train_pred[:10]

##### Creating new column 'Predicted' with 1 if Churn_Prob > 0.5 else 0

In [None]:
# Creating new column 'predicted' with 1 if Churn_Prob > 0.5 else 0
y_train_pred_final['predicted'] = y_train_pred_final.Churn_Prob.map(lambda x: 1 if x > 0.5 else 0)
y_train_pred_final.head()

### Checking Accuracy of the Model

In [None]:
# Let's check the overall accuracy.
print("Accuracy (Training Set): ",round(metrics.accuracy_score(y_train_pred_final.Churn, y_train_pred_final.predicted),4))

The accuracy is still practically the same.

### Checking VIFs

In [None]:
vif = pd.DataFrame()
vif['Features'] = X_train[col].columns
vif['VIF'] = [variance_inflation_factor(X_train[col].values, i) for i in range(X_train[col].shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

#### All variables have a good value of VIF and none of the variable is insignificant. Therefore we don't need to drop any more variables and we can proceed with this model.

# Step 7: Model Evaluation

## Confusion Metrix

In [None]:
# Let's take a look at the confusion matrix 
confusion = metrics.confusion_matrix(y_train_pred_final.Churn, y_train_pred_final.predicted )
confusion

In [None]:
print("Predicted     |  Not Churn (0)  |  Churn (1)")
print("Actual        |                 | ")
print("--------------|-----------------|----------------")
print("Not Churn (0) |     3270        |     365")
print("--------------|-----------------|----------------")
print("Churn     (1) |      604        |     683")

In [None]:
TP = confusion[1,1] # true positive 
TN = confusion[0,0] # true negatives
FP = confusion[0,1] # false positives
FN = confusion[1,0] # false negatives

## Metrics for Evaluation

In [None]:
#Accuracy of the final model
accuracy = (TN + TP)/float(TN+FN+TP+FP)
print("Accuracy of the model: ",round(accuracy,3))

# Sensitivity of the final model
sensitivity = TP / float(TP+FN)
print("Sensitivity of the model: ",round(sensitivity,3))

# Specificity of the final model
specificity = TN / float(TN+FP)
print("Specificity of the model: ",round(specificity,3))

We can notice that we have got very good Accuracy and Specificity score, however, Sensitivity score is not that good.

It means that our model is not able to capture `churned` customer very well and this can be a problematic for the business. We wish to capture them properly, but How?

**Remember**: We declared the customer as churn (1) or not churn (0) from probabilities based on some arbitrary thresh-hold. We chose that thresh-hold to be 0.5 i.e. any customer with prob > 0.5 marked as churn (1) else not churn (0).

Therefore, now we need to find optimal value of the thresh-hold so that our model can capture churn customer well.

## Finding the Optimal Thresh-hold value

Optimal thresh-hold probability is that probability where we get balanced sensitivity and specificity

In [None]:
# Let's create columns with different probability cutoffs 
numbers = [float(x)/10 for x in range(10)]
for i in numbers:
    y_train_pred_final[i]= y_train_pred_final.Churn_Prob.map(lambda x: 1 if x > i else 0)
y_train_pred_final.head()

In [None]:
# Now let's calculate accuracy sensitivity and specificity for various probability cutoffs.
cutoff_df = pd.DataFrame( columns = ['Probability','Accuracy','Sensitivity','Specificity'])

num = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
for i in num:
    cm1 = metrics.confusion_matrix(y_train_pred_final.Churn, y_train_pred_final[i] )
    total1=sum(sum(cm1))
    accuracy = (cm1[0,0]+cm1[1,1])/total1
    
    speci = cm1[0,0]/(cm1[0,0]+cm1[0,1])
    sensi = cm1[1,1]/(cm1[1,0]+cm1[1,1])
    cutoff_df.loc[i] =[ i ,accuracy,sensi,speci]
print(cutoff_df)

In [None]:
# Let's plot accuracy sensitivity and specificity for various probabilities.
cutoff_df.plot.line(x='Probability', y=['Accuracy','Sensitivity','Specificity'])
plt.xlabel("Thresh-hold")
plt.ylabel("Scores")
plt.title("Sensitivity and Specificity Trade-off",size=15)
plt.show()

#### From the above curve ,we can notice that 0.3 is the optimum thresh-hold value

In [None]:
y_train_pred_final['final_predicted'] = y_train_pred_final.Churn_Prob.map( lambda x: 1 if x > 0.3 else 0)

y_train_pred_final.head()

### Let's look at the Confustion Matrix, Accuracy, Sensitivity and Specificity for the final optimal thresh-hold value

In [None]:
confusion2 = metrics.confusion_matrix(y_train_pred_final.Churn, y_train_pred_final.final_predicted )
confusion2

In [None]:
print("Predicted     |  Not Churn (0)  |  Churn (1)")
print("Actual        |                 | ")
print("--------------|-----------------|----------------")
print("Not Churn (0) |     2787        |     848")
print("--------------|-----------------|----------------")
print("Churn     (1) |      288        |     999")

In [None]:
TP = confusion2[1,1] # true positive 
TN = confusion2[0,0] # true negatives
FP = confusion2[0,1] # false positives
FN = confusion2[1,0] # false negatives

In [None]:
#Accuracy of the final model
accuracy = (TN + TP)/float(TN+FN+TP+FP)
print("Accuracy of the model: ",round(accuracy,3))

# Sensitivity of the final model
sensitivity = TP / float(TP+FN)
print("Sensitivity of the model: ",round(sensitivity,3))

# Specificity of the final model
specificity = TN / float(TN+FP)
print("Specificity of the model: ",round(specificity,3))

Now we can notice that, we have got `Sesitivity` score along with good `Accuracy` and `Specificity`.

Our model performance is similar on all three metrics, now let's look at the **Receiver Operating Characteristic (ROC) curve** of the model.

An ROC curve demonstrates several things:

- It shows the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity).
- The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.
- The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.

## ROC Curve

In [None]:
def draw_roc( actual, probs ):
    fpr, tpr, thresholds = metrics.roc_curve( actual, probs,
                                              drop_intermediate = False )
    auc_score = metrics.roc_auc_score( actual, probs )
    plt.figure(figsize=(5, 5))
    plt.plot( fpr, tpr, label='ROC curve (area = %0.2f)' % auc_score )
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate or [1 - True Negative Rate]')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver operating characteristic example')
    plt.legend(loc="lower right")
    #plt.savefig("E:/1. NITW/Project 4th Sem/ROC Curve.jpg")
    plt.show()
    
    return None

In [None]:
fpr, tpr, thresholds = metrics.roc_curve( y_train_pred_final.Churn, y_train_pred_final.Churn_Prob, drop_intermediate = False )

draw_roc(y_train_pred_final.Churn, y_train_pred_final.Churn_Prob)

We can notice that ROC curve look very good and **Area under the curve (AUC)** is `0.85` which is a very good score and tells the goodness of the model.

Therefore, all this shows that our the model we build using Training dataset fits goon on that and the optimal value of the thresh-hold gives us good scores.

Now let's look at some different metrics for evaluation which we come accross in the theoritical part. However for this dataset we will refer `Sensitivity` and `Specificity` only.

## Precision and Recall

##### Precision 

It tells us the percentage of 1’s predicted correctly out of total 1’s predicted.

Precision = TP / (TP + FP)

##### Recall

It tell us the 1’s predicted correctly out of total actual 1’s. It's basically sensitivity.

Recall = TP / (TP + FN)

In [None]:
# Precision of the final model
precision = TP / float(TP+FP)
print("Precision of the model: ",round(precision,3))

# Recall of the final model
recall = TP / float(TP+FN)
print("Recall of the model: ",round(recall,3))

### Precision and Recall Tradeoff

Let's look at the Precision and Recall trade-off now

In [None]:
p, r, thresholds = precision_recall_curve(y_train_pred_final.Churn, y_train_pred_final.Churn_Prob)

In [None]:
plt.plot(thresholds, p[:-1], "g-",label="Precision")
plt.plot(thresholds, r[:-1], "r-",label="Recall")
plt.xlabel("Thresh-hold")
plt.ylabel("Scores")
plt.title("Precision and Recall Trade-off",size=15)
plt.legend()
plt.show()

#### Observations

We can see that thresh-hold value of `0.5` would be preffered to choose if we use `Precision` and `Recall` for model evaluation.

We can in the graph that the curve for precision is quite jumpy towards the end. This is because the denominator of precision, i.e. (TP+FP) is not constant as these are the predicted values of 1s. And because the predicted values can swing wildly, we get a very jumpy curve.

# Step 8: Model Validation

We are done with the model building and evaluation steps, now let's check for the stability of the model.

Whether the model gives similar scores on the `Testing Set` also. We are using `In-sample` validation for this problem. 

##### Let's first scale down the continous varibales using Standardization.

We have already trained the scalar on the `Training Set`, now we just need to transform the `Testing Set` using the same scalar.

In [None]:
X_test[['tenure','MonthlyCharges','TotalCharges']] = sc.transform(X_test[['tenure','MonthlyCharges','TotalCharges']])

##### Now selecting only those columns which we used to build the final model

In [None]:
X_test = X_test[col]
X_test.head()

### Making Predictions using the trained model

In [None]:
X_test_sm = sm.add_constant(X_test)
y_test_pred = res.predict(X_test_sm)

In [None]:
y_test_pred[:10]

In [None]:
# Converting y_pred to a dataframe which is an array
y_pred_1 = pd.DataFrame(y_test_pred)

In [None]:
# Let's see the head
y_pred_1.head()

In [None]:
# Converting y_test to dataframe
y_test_df = pd.DataFrame(y_test)

In [None]:
# Putting CustID to index
y_test_df['CustID'] = y_test_df.index

In [None]:
# Removing index for both dataframes to append them side by side 
y_pred_1.reset_index(drop=True, inplace=True)
y_test_df.reset_index(drop=True, inplace=True)

In [None]:
# Appending y_test_df and y_pred_1
y_pred_final = pd.concat([y_test_df, y_pred_1],axis=1)

#### Final Probabilities corrosponding to the customerID

In [None]:
y_pred_final.head()

In [None]:
# Renaming the column 
y_pred_final= y_pred_final.rename(columns={ 0 : 'Churn_Prob'})

In [None]:
# Let's see the head of y_pred_final
y_pred_final.head()

##### Creating new column 'final_predicted' with 1 if Churn_Prob > 0.3 else 0

Choosing the optimal thresh-hold value

In [None]:
y_pred_final['final_predicted'] = y_pred_final.Churn_Prob.map(lambda x: 1 if x > 0.3 else 0)

In [None]:
y_pred_final.head()

### Let's look at the Confustion Matrix, Accuracy, Sensitivity and Specificity for the final optimal thresh-hold value

In [None]:
confusion2 = metrics.confusion_matrix(y_pred_final.Churn, y_pred_final.final_predicted )
confusion2

In [None]:
print("Predicted     |  Not Churn (0)  |  Churn (1)")
print("Actual        |                 | ")
print("--------------|-----------------|----------------")
print("Not Churn (0) |     1144        |     384")
print("--------------|-----------------|----------------")
print("Churn     (1) |      163        |     419")

In [None]:
TP = confusion2[1,1] # true positive 
TN = confusion2[0,0] # true negatives
FP = confusion2[0,1] # false positives
FN = confusion2[1,0] # false negatives

### Model Evaluation (Testing Set)

In [None]:
#Accuracy of the final model
accuracy = (TN + TP)/float(TN+FN+TP+FP)
print("Accuracy of the model: ",round(accuracy,3))

# Sensitivity of the final model
sensitivity = TP / float(TP+FN)
print("Sensitivity of the model: ",round(sensitivity,3))

# Specificity of the final model
specificity = TN / float(TN+FP)
print("Specificity of the model: ",round(specificity,3))

### Observations

We can notice that, for the Testing Set, all three metrics shows similar score as observed in the Training Set.

**Training Set**
- Accuracy of the model:  0.769
- Sensitivity of the model:  0.776
- Specificity of the model:  0.767

**Testing Set**
- Accuracy of the model:  0.741
- Sensitivity of the model:  0.72
- Specificity of the model:  0.749

We can notice that scores are similar for both Training and Testing Set, this shows that the model we build using Training Set also fits good and generalizes on the Testing Set.

Hence, we are ready to deploy the model and make predicitions and decisions using that.

However, from time to time we need to monitor its performance and if the accuracy drops on the new data then we need to rebuild the model using new data.

# Step 9: Model Interpretation

Now we reached to the final step in this notebook which is interpretation of model coefficients and making the final conclusion.

In [None]:
model  = pd.DataFrame({"Features": X_train_sm.columns,"Coefficient":res.params.values})
model["Odds_Ratio"] = model["Coefficient"].apply(lambda x: np.exp(x))
model[["Coefficient","Odds_Ratio"]] = model[["Coefficient","Odds_Ratio"]].apply(lambda x: round(x,2))
model["Perc_Impact"] = model["Odds_Ratio"].apply(lambda x: (x-1)*100)
model

### Observations

**Tenure:**

- Coefficient: -.90
- Odds Ratio: 0.41

*Tenure is continous variable which was standarized using Standard Scalar. Therefore, for 1 stardardized unit increase the odds of getting churned reduces by 59%. We know that 1 stardardized unit of tenure is equal to 24.5 months, therefore for increase in tenure by 24.5 months will lead to decrease in customer getting churned by 59%.*

**PaperlessBilling**

- Coefficient: 0.35
- Odds Ratio: 1.42

*The odds of a customer to get churned in case he/ she has opted for Paperless Billing are 1.42 higher than in case of Not opted for Paperless Billing, considering every other variable same.*
*In terms of percentage change, the odds of customer with Paperless Billing getting churned is 42% higher than the odds of customer with not Paperless Billing getting churned.*

**SeniorCitizen**

- Coefficient: 0.47
- Odds Ratio: 1.60

*The odds of a Senior Citizen customer to get churned are 1.60 higher than in case of non-Senior Citizen, considering every other variable same.*
*In terms of percentage change, the odds of a Senior Citizen customer getting churned is 60% higher than the odds of not Senior Citizen customer getting churned.*

**Contract_One year**

- Coefficient: -0.74
- Odds Ratio: 0.48

*The odds of a customer with One Year contract to get churned are 0.52 lower than in case of customer not having One Year contract, considering every other variable same.*
*In terms of percentage change, the odds of a customer with One Year contract getting churned is 52% lesser than the odds of a customer not having One Year contract getting churned.*

**Contract_Two year**

- Coefficient: -1.31
- Odds Ratio: 0.27

*The odds of a customer with Two Year contract to get churned are 0.73 lower than in case of customer not having Two Year contract, considering every other variable same.*
*In terms of percentage change, the odds of a customer with Two Year contract getting churned is 73% lesser than the odds of a customer not having Two Year contract getting churned.*

**PaymentMethod_Credit card (automatic)**

- Coefficient: -0.39
- Odds Ratio: 0.68

*The odds of a customer with Automatic Payment via Credit Card to get churned are 0.32 lower than in case of customer not having Automatic Payment via Credit Card, considering every other variable same.*
*In terms of percentage change, the odds of a customer with Automatic Payment via Credit Card getting churned is 32% lesser than the odds of a customer not having Automatic Payment via Credit Card getting churned.*

**PaymentMethod_Credit card (automatic)**

- Coefficient: -0.39
- Odds Ratio: 0.68

*The odds of a customer with Automatic Payment via Credit Card to get churned are 0.32 lower than in case of customer not having Automatic Payment via Credit Card, considering every other variable same.*
*In terms of percentage change, the odds of a customer with Automatic Payment via Credit Card getting churned is 32% lesser than the odds of a customer not having Automatic Payment via Credit Card getting churned.*

**PaymentMethod_Mailed check**


- Coefficient: -0.34
- Odds Ratio: 0.71

*The odds of a customer with have enabled Payment Method via Mail Check to get churned are 0.29 lower than in case of customer not enabled Payment Method via Mail Check, considering every other variable same.*
*In terms of percentage change, the odds of a customer enabled Payment Method via Mail Check getting churned is 29% lesser than the odds of a customer not enabled Payment Method via Mail Check getting churned.*

**InternetService_Fiber Optic**

- Coefficient: 0.86
- Odds Ratio: 2.37

*The odds of a customer with having Fiber Optic service to get churned are 2.37 higher than in case of customer not having Fiber Optic service, considering every other variable same.*
*In terms of percentage change, the odds of a customer having Fiber Optic service getting churned is 137% higher than the odds of a customer not having Fiber Optic service getting churned.*

**InternetService_No**

- Coefficient: -0.97
- Odds Ratio: 0.38

*The odds of a customer with Not having Interner Services to get churned are 0.62 lower than in case of customer having Interner Services, considering every other variable same.*
*In terms of percentage change, the odds of a customer Not having Interner Services getting churned is 62% lesser than the odds of a customer having Interner Services getting churned.*

**TechSupport_Yes**

- Coefficient: -0.41
- Odds Ratio: 0.67

*The odds of a customer with having Tech Support to get churned are 0.33 lower than in case of customer not having Tech Support , considering every other variable same.*
*In terms of percentage change, the odds of a customer having Tech Support  getting churned is 33% lesser than the odds of a customer not having Tech Support getting churned.*

**StreamingTV_Yes**

- Coefficient: 0.35
- Odds Ratio: 1.41

*The odds of a customer with Streaming TV services to get churned are 1.41 higher than in case of customer not having Streaming TV services, considering every other variable same.*
*In terms of percentage change, the odds of a customer Streaming TV services getting churned is 41% higher than the odds of a customer not having Streaming TV services getting churned.*

**StreamingMovies_Yes**

- Coefficient: 0.25
- Odds Ratio: 1.28

*The odds of a customer with Streaming Movies services to get churned are 1.28 higher than in case of customer not having Streaming Movies services, considering every other variable same.*
*In terms of percentage change, the odds of a customer Streaming Movies services getting churned is 28% higher than the odds of a customer not having Streaming Movies services getting churned.*

# Step 10: Conclusion

We have completed all steps for solving a classification problem. We have seen that the model we built gives good accuracy score of 77% on the Training dataset and 74% on the Testing dataset along with other metrics. For this problem we preffered to use Sensitivity and Specificity metrics for the evauation. We have also seen impact of each variable on the probability of churn. Below are few observations about the model:

- A customer with long term contracts like One year and Two Year are less likely to churn than the customer having Monthly contract.

- A customer who is associated with the company from longer time is less likely to churn than a customer who is associated from few months. Reason can be the customer is happy with the services and wishes to continue with them.

- Customer using Internet Services, Fiber Optics, Streaming TV and Movies servies are more likely to churn than customer who are not using these services. Reason can be company not providing good Internet services and need to work on that.

- Customer who have opted for Payment Method through Credit Card or Mailed check are less likely to churn then other customers.

Overall, company need to provide better internet services and other services associated with internet to retain their customers.