# Telecom Customer Churn Prediction Model

__What is Churn?__

_Customer churn is said to be happened when a customer stops using any business services or stops doing business with anyone.
This model will help you to predict customer churn on the basis of some variables and it will help you to focus on factors which stops this churn._

# Importing required Packages

In [1]:
import pandas as pd
import numpy as np

In [2]:
data=pd.read_csv('Telco-Customer-Churn.csv')

In [3]:
data

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0.0,Yes,No,1.0,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0.0,No,No,34.0,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0.0,No,No,2.0,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0.0,No,No,45.0,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0.0,No,No,2.0,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0.0,Yes,Yes,24.0,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,,84.8,,No
7039,2234-XADUH,Female,0.0,Yes,Yes,72.0,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,,103.2,7362.9,No
7040,4801-JZAZL,Female,0.0,Yes,Yes,11.0,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,,29.6,346.45,No
7041,8361-LTMKD,Male,1.0,Yes,No,4.0,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.4,306.6,Yes


# Knowing dimensions of your data

In [4]:
print('Number of Records',data.shape[0])
print('Number of Features',data.shape[1])


Number of Records 7043
Number of Features 21


In [5]:
print('what Features we have in the data \n',data.columns)

what Features we have in the data 
 Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')


In [6]:
#checking data type of columns
data.dtypes

customerID           object
gender               object
SeniorCitizen       float64
Partner              object
Dependents           object
tenure              float64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges       object
TotalCharges         object
Churn                object
dtype: object

# Working with missing values

In [7]:
#lets check if missing values are there in the data

data.isnull().sum()

customerID           0
gender               0
SeniorCitizen       11
Partner              0
Dependents           0
tenure              91
PhoneService         0
MultipleLines       35
InternetService     28
OnlineSecurity      16
OnlineBackup         7
DeviceProtection    37
TechSupport         33
StreamingTV         10
StreamingMovies     13
Contract            28
PaperlessBilling     0
PaymentMethod        8
MonthlyCharges      76
TotalCharges        71
Churn                0
dtype: int64

From above output we can see that, we have missing values in our data

In [8]:
data.isnull().values.sum()

464

__There are 464 null values__ in the data which is __equivalent to 6.5% of total instances__. So, we can't just remove the entire row. we need to work on imputing these missing Values

___Imputing 'SeniorCitizen' Column___

In [9]:
data['SeniorCitizen']

0       0.0
1       0.0
2       0.0
3       0.0
4       0.0
       ... 
7038    0.0
7039    0.0
7040    0.0
7041    1.0
7042    0.0
Name: SeniorCitizen, Length: 7043, dtype: float64

__Column 'SeniorCitizen' has two values:__
1 -> SeniorCitizen
2 -> Not a SeniorCitizen

__Let's Map 1 to 'Yes' and 0 to 'No'__

In [10]:
data['SeniorCitizen'].replace(1.0,'Yes',inplace=True)

In [11]:
data['SeniorCitizen'].replace(0.0,'No',inplace=True)

In [12]:
data['SeniorCitizen']

0        No
1        No
2        No
3        No
4        No
       ... 
7038     No
7039     No
7040     No
7041    Yes
7042     No
Name: SeniorCitizen, Length: 7043, dtype: object

In [13]:
data['SeniorCitizen'].value_counts()

No     5891
Yes    1141
Name: SeniorCitizen, dtype: int64

__Since it is a Categorical variable. we will replace null values with the entry which has maximum occurence.__

In [14]:
data['SeniorCitizen'].fillna('No', inplace=True)

In [15]:
data['SeniorCitizen'].value_counts()

No     5902
Yes    1141
Name: SeniorCitizen, dtype: int64

___Imputing 'Tenure' Column___

In [16]:
data['tenure'].isnull().sum()

91

Since it is a numeric type. we can replace missing values by Median or mean. Let's calculate mean and median of 'tenure'

In [17]:
mean_tenure=data['tenure'].mean()
print('mean of tenure',mean_tenure)
median_tenure=data['tenure'].median()
print('median of tenure',median_tenure)
print('min tenure',min(data['tenure']))
print('Max Tenure',max(data['tenure']))

mean of tenure 32.32853855005754
median of tenure 29.0
min tenure 0.0
Max Tenure 72.0


we can see from the above findings there are some outliers in our data. So, its better to replace missing values with median

In [18]:
data['tenure'].fillna(median_tenure,inplace=True)

In [19]:
data['tenure'].isnull().sum()

0

___Imputing "MultiLines"___

In [20]:
data['MultipleLines'].unique()

array(['No phone service', 'No', 'Yes', nan], dtype=object)

In [21]:
data['MultipleLines'].value_counts()

No                  3374
Yes                 2953
No phone service     681
Name: MultipleLines, dtype: int64

If a customer doesn't have phone service then he/she shouldn't have multipleLines connection. so first we can replace 'No phone service' with 'No'

In [22]:
data['MultipleLines']=data['MultipleLines'].replace('No phone service','No')

In [23]:
data['MultipleLines'].value_counts()

No     4055
Yes    2953
Name: MultipleLines, dtype: int64

In [24]:
data['MultipleLines'].isnull().sum()

35

Replacing null values with 'No' 

In [25]:
data['MultipleLines'].fillna('No',inplace=True)

In [26]:
data['MultipleLines'].isnull().sum()

0

___We need to simply the data. If a customer doesn't have internet connection then those features which are depend on internet, they won't work. so we need to convert entry 'No Internet service' to 'No' in all those variables,___ 

In [27]:
Test=['OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies']
for i in Test:
    data[i].replace('No internet service','No',inplace=True)

In [28]:
data['OnlineBackup'].value_counts()

No     4612
Yes    2424
Name: OnlineBackup, dtype: int64

___Imputing 'InternetService' Feature___

In [29]:
data['InternetService'].value_counts()

Fiber optic    3082
DSL            2411
No             1522
Name: InternetService, dtype: int64

In [30]:
data['InternetService'].isnull().sum()

28

In [31]:
data['InternetService'].fillna('Fiber optic',inplace=True)

In [32]:
data['InternetService'].isnull().sum()

0

__"OnlineSecurity"__

In [33]:
data["OnlineSecurity"].isnull().sum()

16

In [34]:
data['OnlineSecurity'].value_counts()

No     5012
Yes    2015
Name: OnlineSecurity, dtype: int64

In [35]:
data['OnlineSecurity'].fillna('No',inplace=True)

In [36]:
data["OnlineSecurity"].isnull().sum()

0

__"OnlineBackup"__

In [37]:
data["OnlineBackup"].isnull().sum()

7

In [38]:
data['OnlineBackup'].value_counts()

No     4612
Yes    2424
Name: OnlineBackup, dtype: int64

In [39]:
data['OnlineBackup'].fillna('No',inplace=True)

In [40]:
data['OnlineBackup'].isnull().sum()

0

__"DeviceProtection"__

In [41]:
data['DeviceProtection'].isnull().sum()

37

In [42]:
data['DeviceProtection'].value_counts()

No     4596
Yes    2410
Name: DeviceProtection, dtype: int64

In [43]:
data['DeviceProtection'].fillna('No',inplace=True)

In [44]:
data['DeviceProtection'].isnull().sum()

0

__"TechSupport"__

In [45]:
data["TechSupport"].isnull().sum()

33

In [46]:
data['TechSupport'].value_counts()

No     4975
Yes    2035
Name: TechSupport, dtype: int64

In [47]:
data['TechSupport'].fillna('No',inplace=True)

__"StreamingTV "__

In [48]:
data['StreamingTV'].isnull().sum()

10

In [49]:
data['StreamingTV'].value_counts()

No     4329
Yes    2704
Name: StreamingTV, dtype: int64

In [50]:
data['StreamingTV'].fillna('No',inplace=True)

__'Contract'__

In [51]:
data['Contract'].isnull().sum()

28

In [52]:
data['Contract'].value_counts()

Month-to-month    3865
Two year          1688
One year          1462
Name: Contract, dtype: int64

In [53]:
data['Contract'].fillna('Month-to-month',inplace=True)

__"PaymentMethod"__

In [54]:
data['PaymentMethod'].isnull().sum()

8

In [55]:
data['PaymentMethod'].value_counts()

Electronic check             2363
Mailed check                 1611
Bank transfer (automatic)    1542
Credit card (automatic)      1519
Name: PaymentMethod, dtype: int64

In [56]:
data['PaymentMethod'].fillna('Electronic check',inplace=True)

In [57]:
data['PaymentMethod'].isnull().sum()

0

__"MonthlyCharges"__

In [58]:
data['MonthlyCharges'].isnull().sum()

76

In [59]:
#converting MonthlyCharges to float

data.MonthlyCharges=pd.to_numeric(data.MonthlyCharges,errors="coerce")
data.MonthlyCharges.astype(float)

0        29.85
1        56.95
2        53.85
3        42.30
4        70.70
         ...  
7038     84.80
7039    103.20
7040     29.60
7041     74.40
7042    105.65
Name: MonthlyCharges, Length: 7043, dtype: float64

In [60]:
MonthlyCharges_mean=data['MonthlyCharges'].mean()
print("Mean of Monthly Charges",MonthlyCharges_mean)
MonthlyCharges_median=data['MonthlyCharges'].median()
print("Median of Monthly Charges",MonthlyCharges_median)
print('Max of Monthly charges',max(data['MonthlyCharges']))
print('Min of Monthly charges',min(data['MonthlyCharges']))

Mean of Monthly Charges 64.75242606948036
Median of Monthly Charges 70.35
Max of Monthly charges 118.75
Min of Monthly charges 18.25


In [61]:
#replacing null values with median
data['MonthlyCharges'].fillna(MonthlyCharges_median,inplace=True)

In [62]:
data['MonthlyCharges'].isnull().sum()

0

__"TotalCharges"__

In [63]:

data.TotalCharges=pd.to_numeric(data.TotalCharges,errors="coerce")
data.TotalCharges.astype(float)

0         29.85
1       1889.50
2        108.15
3       1840.75
4        151.65
         ...   
7038        NaN
7039    7362.90
7040     346.45
7041     306.60
7042    6844.50
Name: TotalCharges, Length: 7043, dtype: float64

In [64]:
data['TotalCharges'].isnull().sum()

82

In [65]:
TotalCharges_mean=data.TotalCharges.mean()
print('Mean of Total charges',TotalCharges_mean)
TotalCharges_median=data.TotalCharges.median()
print("median of Total charges",TotalCharges_median)
print('max of Total charges',max(data.TotalCharges))
print('Min of Total charges',min(data.TotalCharges))

Mean of Total charges 2282.29807498923
median of Total charges 1396.0
max of Total charges 8684.8
Min of Total charges 18.8


In [66]:
#replacing null values with median
data['TotalCharges'].fillna(TotalCharges_median,inplace=True)

In [67]:
data['TotalCharges'].isnull().sum()

0

In [69]:
data['TotalCharges'].median()

1396.0

In [72]:
data.isnull().sum()

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies     13
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges         0
Churn                0
dtype: int64