# Data Description 


Estimating churners prior to their abandonment of a product or service is crucial. In this machine learning assignment, I will design a churn prediction model for the telecom industry to forecast which customers are most likely to churn.

The 39 features of this dataset are as follows:

1. Acronyms: Descriptions
2. MOBILE_NUMBER: Customer phone number
3. CIRCLE_ID: Telecom circle area to which the customer belongs to
4. LOC	Local calls: within same telecom circle
5. STD	STD calls: outside the calling circle
6. IC: Incoming calls
7. OG: Outgoing calls
8. T2T: Operator T to T, i.e. within same operator (mobile to mobile)
9. T2M: Operator T to other operator mobile
10. T2O: Operator T to other operator fixed line
11. T2F: Operator T to fixed lines of T
12. T2C: Operator T to it’s own call center
13. ARPU: Average revenue per user
14. MOU: Minutes of usage - voice calls
15. AON: Age on network - number of days the customer is using the operator T network
16. ONNET: All kind of calls within the same operator network
17. OFFNET:    	All kind of calls outside the operator T network
18. ROAM:	Indicates that customer is in roaming zone during the call
19. SPL:   	Special calls
20. ISD:    	ISD calls
21. RECH:    	Recharge
22. NUM:    	Number
23. AMT:    	Amount in local currency
24. MAX:    	Maximum
25. DATA:    	Mobile internet
26. 3G:    	3G network
27. AV:    	Average
28. VOL:    	Mobile internet usage volume (in MB)
29. 2G:    	2G network
30. PCK:    	Prepaid service schemes called - PACKS
31. NIGHT:    	Scheme to use during specific night hours only
32. MONTHLY:    	Service schemes with validity equivalent to a month
33. SACHET:   	Service schemes with validity smaller than a month
34. *.6:    	KPI for the month of June
35. *.7:    	KPI for the month of July
36. *.8:    	KPI for the month of August
37. *.9:    	KPI for the month of September
38. FB_USER:	Service scheme to avail services of Facebook and similar social networking sites
39. VBC:    	Volume based cost - when no specific scheme is not purchased and paid as per usage

# Problem Statement

In the telecommunications business, clients may pick from a variety of service providers and actively switch from one to another. In this extremely competitive sector, the yearly turnover rate for the telecoms industry averages between 15 and 25 percent. Given that it costs five to ten times as much to gain a new client as it does to maintain an existing one, customer retention is now more crucial than customer acquisition.

Retaining high-profitable clients is the primary business objective for many incumbent operators of T. To decrease customer turnover, telecom businesses must anticipate which consumers are at high risk of churning.

# Aim

In this project, I am going to analyze the customer-level data of a large telecommunications company, develop predictive models to identify consumers at high risk of churn, and identify the primary signs of churn.


# Tech stack

* Language – Python
* Libraries - Numpy, Pandas, Matplotlib, Seaborn, Scikit-learn,

# Approach

1. Importing the required libraries and reading the dataset.
    * Understanding the dataset
2. Exploratory Data Analysis (EDA) –
3. Filtering High Value Customers
4. Creating target Variable
5. Deriving New Features
6. Handling Missing values
7. Data Visualization-Univariate Analysis
8. Data Visualization- Bivariate Analysis
9. Outlier Detection
10. Data Preparation
11. Data Modeling and Eavlaution
12. Non-Interpretable Models
13. Interpretable Models
14. Conclusion

# 1. Understanding the dataset

In [2]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
    border: 1px  black solid !important;
  color: black !important;
}
</style>

## 1.1 Importing Libraries

In [7]:
!pip install graphviz

Collecting graphviz
  Downloading graphviz-0.20.1-py3-none-any.whl (47 kB)
[K     |████████████████████████████████| 47 kB 1.9 MB/s eta 0:00:011
[?25hInstalling collected packages: graphviz
Successfully installed graphviz-0.20.1


In [9]:
#Importing Data Reading and Processing Libraries
import pandas as pd
import numpy as np

#Imporitng Data Visualization Libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

#Importing Data Preparation and Modeling Libraries
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold, GridSearchCV,StratifiedKFold

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier


#Importing Warning Libraries
import warnings
warnings.filterwarnings("ignore")

#Importing Miscellaneous Libraries
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)
pd.set_option('display.width', None)

# Import the StandardScaler()
from sklearn.preprocessing import StandardScaler

#Improting the PCA module
from sklearn.decomposition import PCA

# For Hopkins test
from sklearn.neighbors import NearestNeighbors
from random import sample
from numpy.random import uniform
import numpy as np
from math import isnan

# For clustering 
## using KMeans ##
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Importing classification report and confusion matrix from sklearn metrics
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.metrics import recall_score,roc_auc_score,roc_curve

## using Hierarchical ##
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import dendrogram
from scipy.cluster.hierarchy import cut_tree

# Importing required packages for visualization
from IPython.display import Image  
#from sklearn.externals.six import StringIO  
from sklearn.tree import export_graphviz
import pydot, graphviz


# Other sklearn packages
import sklearn.metrics as metrics
from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier

from datetime import date,datetime
import math
import multiprocessing

print("Successfully Importing Libraries...")

Successfully Importing Libraries...


# 1.2 Dataset Loading 

In [10]:
df = pd.read_csv('telecom_churn_data.csv')
print("Successully Datset Load..")
df.sample(5)

Successully Datset Load..


Unnamed: 0,mobile_number,circle_id,loc_og_t2o_mou,std_og_t2o_mou,loc_ic_t2o_mou,last_date_of_month_6,last_date_of_month_7,last_date_of_month_8,last_date_of_month_9,arpu_6,arpu_7,arpu_8,arpu_9,onnet_mou_6,onnet_mou_7,onnet_mou_8,onnet_mou_9,offnet_mou_6,offnet_mou_7,offnet_mou_8,offnet_mou_9,roam_ic_mou_6,roam_ic_mou_7,roam_ic_mou_8,roam_ic_mou_9,roam_og_mou_6,roam_og_mou_7,roam_og_mou_8,roam_og_mou_9,loc_og_t2t_mou_6,loc_og_t2t_mou_7,loc_og_t2t_mou_8,loc_og_t2t_mou_9,loc_og_t2m_mou_6,loc_og_t2m_mou_7,loc_og_t2m_mou_8,loc_og_t2m_mou_9,loc_og_t2f_mou_6,loc_og_t2f_mou_7,loc_og_t2f_mou_8,loc_og_t2f_mou_9,loc_og_t2c_mou_6,loc_og_t2c_mou_7,loc_og_t2c_mou_8,loc_og_t2c_mou_9,loc_og_mou_6,loc_og_mou_7,loc_og_mou_8,loc_og_mou_9,std_og_t2t_mou_6,std_og_t2t_mou_7,std_og_t2t_mou_8,std_og_t2t_mou_9,std_og_t2m_mou_6,std_og_t2m_mou_7,std_og_t2m_mou_8,std_og_t2m_mou_9,std_og_t2f_mou_6,std_og_t2f_mou_7,std_og_t2f_mou_8,std_og_t2f_mou_9,std_og_t2c_mou_6,std_og_t2c_mou_7,std_og_t2c_mou_8,std_og_t2c_mou_9,std_og_mou_6,std_og_mou_7,std_og_mou_8,std_og_mou_9,isd_og_mou_6,isd_og_mou_7,isd_og_mou_8,isd_og_mou_9,spl_og_mou_6,spl_og_mou_7,spl_og_mou_8,spl_og_mou_9,og_others_6,og_others_7,og_others_8,og_others_9,total_og_mou_6,total_og_mou_7,total_og_mou_8,total_og_mou_9,loc_ic_t2t_mou_6,loc_ic_t2t_mou_7,loc_ic_t2t_mou_8,loc_ic_t2t_mou_9,loc_ic_t2m_mou_6,loc_ic_t2m_mou_7,loc_ic_t2m_mou_8,loc_ic_t2m_mou_9,loc_ic_t2f_mou_6,loc_ic_t2f_mou_7,loc_ic_t2f_mou_8,loc_ic_t2f_mou_9,loc_ic_mou_6,loc_ic_mou_7,loc_ic_mou_8,loc_ic_mou_9,std_ic_t2t_mou_6,std_ic_t2t_mou_7,std_ic_t2t_mou_8,std_ic_t2t_mou_9,std_ic_t2m_mou_6,std_ic_t2m_mou_7,std_ic_t2m_mou_8,std_ic_t2m_mou_9,std_ic_t2f_mou_6,std_ic_t2f_mou_7,std_ic_t2f_mou_8,std_ic_t2f_mou_9,std_ic_t2o_mou_6,std_ic_t2o_mou_7,std_ic_t2o_mou_8,std_ic_t2o_mou_9,std_ic_mou_6,std_ic_mou_7,std_ic_mou_8,std_ic_mou_9,total_ic_mou_6,total_ic_mou_7,total_ic_mou_8,total_ic_mou_9,spl_ic_mou_6,spl_ic_mou_7,spl_ic_mou_8,spl_ic_mou_9,isd_ic_mou_6,isd_ic_mou_7,isd_ic_mou_8,isd_ic_mou_9,ic_others_6,ic_others_7,ic_others_8,ic_others_9,total_rech_num_6,total_rech_num_7,total_rech_num_8,total_rech_num_9,total_rech_amt_6,total_rech_amt_7,total_rech_amt_8,total_rech_amt_9,max_rech_amt_6,max_rech_amt_7,max_rech_amt_8,max_rech_amt_9,date_of_last_rech_6,date_of_last_rech_7,date_of_last_rech_8,date_of_last_rech_9,last_day_rch_amt_6,last_day_rch_amt_7,last_day_rch_amt_8,last_day_rch_amt_9,date_of_last_rech_data_6,date_of_last_rech_data_7,date_of_last_rech_data_8,date_of_last_rech_data_9,total_rech_data_6,total_rech_data_7,total_rech_data_8,total_rech_data_9,max_rech_data_6,max_rech_data_7,max_rech_data_8,max_rech_data_9,count_rech_2g_6,count_rech_2g_7,count_rech_2g_8,count_rech_2g_9,count_rech_3g_6,count_rech_3g_7,count_rech_3g_8,count_rech_3g_9,av_rech_amt_data_6,av_rech_amt_data_7,av_rech_amt_data_8,av_rech_amt_data_9,vol_2g_mb_6,vol_2g_mb_7,vol_2g_mb_8,vol_2g_mb_9,vol_3g_mb_6,vol_3g_mb_7,vol_3g_mb_8,vol_3g_mb_9,arpu_3g_6,arpu_3g_7,arpu_3g_8,arpu_3g_9,arpu_2g_6,arpu_2g_7,arpu_2g_8,arpu_2g_9,night_pck_user_6,night_pck_user_7,night_pck_user_8,night_pck_user_9,monthly_2g_6,monthly_2g_7,monthly_2g_8,monthly_2g_9,sachet_2g_6,sachet_2g_7,sachet_2g_8,sachet_2g_9,monthly_3g_6,monthly_3g_7,monthly_3g_8,monthly_3g_9,sachet_3g_6,sachet_3g_7,sachet_3g_8,sachet_3g_9,fb_user_6,fb_user_7,fb_user_8,fb_user_9,aon,aug_vbc_3g,jul_vbc_3g,jun_vbc_3g,sep_vbc_3g
21693,7000529804,109,0.0,0.0,0.0,6/30/2014,7/31/2014,8/31/2014,9/30/2014,120.372,90.387,125.122,99.314,2.58,11.06,14.66,4.01,29.56,16.04,40.63,16.63,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.58,11.06,14.66,3.75,24.64,16.04,40.63,16.48,0.0,0.0,0.0,0.0,4.91,0.0,0.0,0.0,27.23,27.11,55.29,20.23,0.0,0.0,0.0,0.26,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.41,0.0,0.0,0.0,0.0,5.73,0.88,1.29,0.7,1.3,0.0,0.0,0.0,34.26,27.99,56.59,21.34,3.73,3.36,0.66,1.78,9.16,13.69,15.89,26.49,0.0,0.0,1.41,0.0,12.89,17.06,17.98,28.28,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.51,0.0,0.0,1.65,0.0,0.0,0.0,0.0,0.51,0.0,0.0,1.65,13.41,17.18,17.98,29.98,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.11,0.0,0.0,11,8,13,8,124,106,148,111,25,30,30,20,6/27/2014,7/15/2014,8/25/2014,9/28/2014,25,0,20,0,6/27/2014,7/9/2014,8/25/2014,9/27/2014,4.0,3.0,4.0,3.0,25.0,25.0,17.0,17.0,3.0,3.0,4.0,3.0,1.0,0.0,0.0,0.0,100.0,67.0,68.0,51.0,270.52,159.39,364.24,188.15,0.0,0.0,0.0,0.0,21.91,0.0,0.0,0.0,44.53,2.5,2.6,0.02,0.0,0.0,0.0,0.0,0,0,0,0,3,3,4,3,0,0,0,0,1,0,0,0,1.0,1.0,1.0,1.0,195,0.0,0.0,0.0,0.0
40972,7001943514,109,0.0,0.0,0.0,6/30/2014,7/31/2014,8/31/2014,9/30/2014,571.601,354.175,332.762,357.574,375.09,324.39,304.06,511.39,452.08,293.93,377.28,316.58,17.59,0.0,0.0,0.0,259.68,0.0,0.0,0.0,19.71,36.31,17.43,31.43,101.54,94.98,127.93,91.56,0.0,0.96,0.0,0.0,0.0,0.0,0.0,0.0,121.26,132.26,145.36,122.99,279.81,288.08,286.63,479.96,166.41,197.98,247.44,225.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,446.23,486.06,534.08,704.98,0.0,0.0,0.0,0.0,0.0,0.0,1.9,0.0,0.18,0.0,0.0,0.0,567.68,618.33,681.34,827.98,11.26,31.41,24.93,29.81,72.58,87.96,116.59,85.38,1.79,0.35,0.0,6.46,85.64,119.73,141.53,121.66,6.19,0.0,9.44,15.69,21.61,10.08,17.13,65.88,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,27.81,10.08,26.58,81.58,120.06,137.21,168.11,219.88,0.0,0.0,0.0,0.0,6.6,7.4,0.0,16.63,0.0,0.0,0.0,0.0,6,9,3,8,750,483,260,520,128,128,130,130,6/27/2014,7/31/2014,8/28/2014,9/29/2014,128,128,0,0,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,2434,0.0,0.0,0.0,0.0
91450,7001224459,109,0.0,0.0,0.0,6/30/2014,7/31/2014,8/31/2014,9/30/2014,148.51,399.3,273.225,231.588,3.99,2.73,29.39,4.73,239.96,563.44,421.53,230.48,5.18,0.0,0.0,0.0,7.68,0.0,0.0,0.0,3.99,2.73,29.39,4.73,230.84,560.03,417.91,230.33,0.78,3.41,3.61,0.15,0.0,0.0,0.0,0.0,235.63,566.18,450.93,235.21,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.65,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.65,0.0,0.0,0.0,0.0,0.0,0.0,4.63,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,236.28,566.18,450.93,239.84,2.55,4.21,7.63,6.64,190.48,347.56,227.24,264.44,56.24,100.81,64.56,43.38,249.28,452.59,299.44,314.48,0.0,0.0,2.7,0.0,0.0,0.0,0.0,0.0,102.13,0.81,0.89,0.0,0.0,0.0,0.0,0.0,102.13,0.81,3.59,0.0,432.09,550.54,410.04,347.19,0.21,0.0,0.0,0.0,80.46,97.13,106.99,32.41,0.0,0.0,0.0,0.3,1,3,2,3,0,500,283,283,0,250,250,250,6/23/2014,7/29/2014,8/28/2014,9/26/2014,0,0,33,33,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,847,0.0,0.0,0.0,0.0
81856,7002305071,109,0.0,0.0,0.0,6/30/2014,7/31/2014,8/31/2014,9/30/2014,275.805,84.021,12.68,16.895,1.61,0.31,0.0,0.0,199.59,39.19,17.36,0.23,0.0,27.18,169.01,43.04,0.0,39.51,17.36,0.23,0.05,0.0,0.0,0.0,23.58,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.63,0.0,0.0,0.0,1.56,0.0,0.0,0.0,176.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,177.58,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,201.21,0.0,0.0,0.0,1.11,0.0,0.0,0.0,82.93,0.0,0.0,0.0,0.0,0.0,0.0,0.0,84.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,84.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20,4,0,3,360,56,0,20,50,30,0,20,6/29/2014,7/30/2014,,9/17/2014,50,0,0,20,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,888,0.0,0.0,0.0,0.0
78136,7000879810,109,0.0,0.0,0.0,6/30/2014,7/31/2014,8/31/2014,9/30/2014,17.8,298.597,-30.349,119.5,3.06,0.0,0.0,2.98,16.74,39.66,7.94,76.74,0.0,9.31,0.0,0.0,0.0,6.68,0.0,0.0,3.06,0.0,0.0,2.98,15.66,31.06,4.23,74.76,0.0,0.0,0.0,0.0,0.0,1.91,3.71,0.0,18.73,31.06,4.23,77.74,0.0,0.0,0.0,0.0,1.08,0.0,0.0,0.16,0.0,0.0,0.0,1.81,0.0,0.0,0.0,0.0,1.08,0.0,0.0,1.98,0.0,0.0,0.0,0.0,0.0,1.91,4.63,0.0,0.0,0.0,0.0,0.0,19.81,32.98,8.86,79.73,4.04,8.54,9.63,16.53,28.53,76.79,42.49,94.01,1.86,11.21,3.74,7.89,34.44,96.56,55.88,118.44,0.0,0.0,0.0,0.0,1.56,1.45,0.41,0.0,1.56,0.0,3.71,2.55,0.0,0.0,0.0,0.0,3.13,1.45,4.13,2.55,37.58,98.01,60.01,120.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,4,2,0,0,352,130,0,0,252,130,0,,7/23/2014,8/22/2014,,0,50,130,0,,7/12/2014,,,,1.0,,,,252.0,,,,0.0,,,,1.0,,,,252.0,,,0.0,11.74,0.0,0.0,0.0,140.89,0.0,0.0,,212.17,,,,212.17,,,,0.0,,,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,,1.0,,,482,319.97,60.04,0.0,0.0
