__Customer Churn using Telco Dataset__

### **Importing the packages**

In [None]:
##Importing the packages
#Data processing packages
import numpy as np 
import pandas as pd 

#Visualization packages
import matplotlib.pyplot as plt 
import seaborn as sns 

import warnings
warnings.filterwarnings('ignore')

### **Importing the data**

In [None]:
# The dataset contains the information of 7042 Customers and their churn value.
data = pd.read_csv('../input/telcom-customer-churn/customer_churn_data.csv')

### **Basic Analysis**

In [None]:
#Find the size of the data Rows x Columns
data.shape

**COMMENTS:** The data consists of 7043 rows and 21 columns

In [None]:
#Display first 5 rows of Customer Data
data.head()

In [None]:
#Find Basic Statistics like count, mean, standard deviation, min, max etc.
data.describe(include='all')

**COMMENTS:** 
1. Count of 7043 for all the fields indicates that there are no missing values in any of the field
2. Minimum(min) and Maximum(max) defines the range of values for that field.
3. Mean(mean) indicates average of all the values in the field.  There is large variation of mean values of the fields so we need to scale the data.
4. 25%, 50%, 75% percentiles indicates the distribution of data

In [None]:
#Find the the information about the fields, field datatypes and Null values
data.info()

**COMMENTS:**  Info fuction is used to list all the field names, their datatypes, count of elements in the field and if the field contacts Null values.

### **Visualizing the impact of Categorical Features on the Target**

In [None]:
#These fields does not add value, hence removed
data = data.drop(['customerID'], axis = 1)

In [None]:
#Confirm that customerID column is dropped
data.head()

In [None]:
#Find attrition size (Values)
data['Churn'].value_counts()

![](http://) **COMMENTS:**  1869 customers left the Telcom out of total 7043 employees

### **Convert Categorical values to Numeric Values**

In [None]:
#A lambda function is a small anonymous function.
#A lambda function can take any number of arguments, but can only have one expression.
data['Churn']=data['Churn'].apply(lambda x : 1 if x=='Yes' else 0)

In [None]:
#Finding the Count of Customer Churn. The output shows that 1869 customers churned(left) last month
data.Churn.value_counts()

In [None]:
#Compare gender with Churn using crosstab.
#pd.crosstab(data._______, data._______)
pd.crosstab(data.gender, data.Churn)

### **Compare the fields**

In [None]:
#Compare gender with Churn using crosstab. Add Total(margins)
#pd.crosstab(data._______, data._______, margins=True)
pd.crosstab(data.gender, data.Churn, margins=True)

In [None]:
#Compare gender with Churn using crosstab. Add Total(margins). Make it colorful.
#pd.crosstab(data._______, data._______, margins=True)
pd.crosstab(data.gender, data.Churn, margins=True).style.background_gradient(cmap='autumn_r')

In [None]:
#Compare gender with Churn using crosstab. Add Total(margins). Make it colorful. Normalize the data
#pd.crosstab(data._______, data._______, margins=True, normalize='index').style.background_gradient(cmap='autumn_r')
pd.crosstab(data.gender, data.Churn, margins=True, normalize='index').style.background_gradient(cmap='autumn_r')

In [None]:
#Compare gender with Churn using crosstab. Add Total(margins). Make it colorful. Normalize the data. Round it to two digits after decimal
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(__).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.gender, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  27% of the Customers left the Telco.  Males and females are almost equal in number
![](http://) **RECOMMENDED ACTION:**  No Gender specific action is required to reduce the Churn rate

In [None]:
#Compare Partner with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.Partner, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  The Telco that had Partners have less customers leaving
![](http://) **RECOMMENDED ACTION:**  Target the customers who have Partners as they have higher retention rate**

In [None]:
#Compare Dependents with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.Dependents, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  The Telco that had Dependents have less customers leaving
![](http://) **RECOMMENDED ACTION:**  Target the customers who have dependents.  Family plans may help

In [None]:
#Compare PhoneService with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.PhoneService, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  PhoneService has less impact on
customers leaving
![](http://) **RECOMMENDED ACTION:**  No action is required

In [None]:
pd.crosstab(data.MultipleLines, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  No. of lines have less impact on
customers leaving
![](http://) **RECOMMENDED ACTION:**  No action required

In [None]:
pd.crosstab(data.InternetService, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers with FiberOptic internet connection have higher probability of leaving
![](http://) **RECOMMENDED ACTION:**  Recommended DSL connection to the customers and also investigate if there are problems in Fiber Optic connection

In [None]:
#Compare OnlineSecurity with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.OnlineSecurity, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are not provided Online Security have higher chance of leaving
**![](http://) **RECOMMENDED ACTION:**  Provide OnlineSecurity to the customers as the default offering or value added service with minimum fee**

In [None]:
#Compare OnlineBackup with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.OnlineBackup, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are provided Online Backup Solution are higher probability of leaving
![](http://) **RECOMMENDED ACTION:**  Investigate if there are problems in Online Backup solution and why the customers who are using this service are leaving

In [None]:
#Compare DeviceProtection with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.DeviceProtection, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are NOT provided device protection have higher chances of leaving
![](http://) **RECOMMENDED ACTION:**  Provide DeviceProtection to the customers as the default offering or value added service with minimum fee

In [None]:
#Compare TechSupport with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.TechSupport, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  The customers who are not provided Tech Support have higher chances of leaving
![](http://) **RECOMMENDED ACTION:**  Provide Tech Support to the customers

In [None]:
#Compare StreamingTV with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.StreamingTV, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are NOT provided StreamingTV have higher chances of leaving
![](http://) **RECOMMENDED ACTION:**  Motivate the customers to use StreamingTV service, by reducing the charges or giving some offers

In [None]:
#Compare StreamingMovies with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.StreamingMovies, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are NOT provided StreamingMovies solution have higher chances of leaving
![](http://) **RECOMMENDED ACTION:**  Motivate the customers to use StreamingMovies service, by reducing the charges or giving some offers

In [None]:
#Compare Contract with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.Contract, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are contracted for lesser period have higher chances of leaving
![](http://) **RECOMMENDED ACTION:**  Motivate the customers to go for longer period of Contract, by reducing the charges or giving some exciting offers

In [None]:
#Compare PaperlessBilling with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.PaperlessBilling, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are provided PaperlessBilling have higher chances of leaving
![](http://) **RECOMMENDED ACTION:**  It looks like customers still prefer Paper bills, continue Paper billing service as an option in addtion to Paperless billing.

In [None]:
#Compare PaymentMethod with Churn using crosstab
#pd.crosstab(data._______, data._______, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')
pd.crosstab(data.PaymentMethod, data.Churn, margins=True, normalize='index').round(2).style.background_gradient(cmap='autumn_r')

![](http://) **OBSERVATION:**  Customers who are using Electronic Check facility have higher chances of leaving
![](http://) **RECOMMENDED ACTION:**  Look for any issues with Electronic check facility which is causes Customer Churn or rectify or discontinue this service.

In [None]:
#data['TotalCharges'] = data.TotalCharges.astype('int64')

### **Visualizing the impact of Numerical Features on the Target**

In [None]:
#Plot the pairplot of all the Numerical parameters(data) against Churn
#sns.pairplot(data=_______,hue='______')
sns.pairplot(data=data,hue='Churn')

In [None]:
g = sns.pairplot(data=data[['MonthlyCharges','Churn']],hue='Churn')
g.fig.set_size_inches(10,10)

In [None]:
data.hist(layout = (9, 3), figsize=(24, 48), color='blue', grid=False, bins=50)

![](http://) **OBSERVATIONS:**  
1. Senior Citizens have better retention rate
2. The greater the tenure, the chances of retention are better
3. Lower the Monthly charges the retention is better
![](http://) **RECOMMENDED ACTIONS:**  
1. Look for opportunitites or implement methods to increase Senior Citizen customer base
2. Trying to retain customers, put more focus on retaining the existing customers
3. Look for the possibilities of reducing the Monthly charges

In [None]:
#Comparing the numeric fields SeniorCitizen, tenure and MonthlyCharges against Customer Churn using boxplots
plt.figure(figsize=(24,12))
#plt.subplot(131)  ; sns.boxplot(x='______',y='______',data=data)
#plt.subplot(132)  ; sns.boxplot(x='______',y='______',data=data)
#plt.subplot(133)  ; sns.boxplot(x='______',y='______',data=data)
plt.subplot(131)  ; sns.boxplot(x='Churn',y='SeniorCitizen',data=data)
plt.subplot(132)  ; sns.boxplot(x='Churn',y='tenure',data=data)
plt.subplot(133)  ; sns.boxplot(x='Churn',y='MonthlyCharges',data=data)
