 #                                         Telecom Churn Analysis

### Orange S.A., formerly France Télécom S.A., is a French multinational telecommunications corporation. The Orange Telecom's Churn Dataset, consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription.
### Explore and analyze the data to discover key factors responsible for customer churn and come up with ways/recommendations to ensure customer retention.

## This Notebook is prepared by 
### ***santosh shinde***

# **Business Understanding Of A Telecom Industry Customer Churn:**

In recent years, customer churn prediction has become a hot research topic. Churners are people who leave a company's service for various reasons. 

Companies must be able to accurately predict customer behaviour in order to reduce customer churn. Customer churn has emerged as a major issue in all industries. According to studies, acquiring a new customer is more expensive than retaining an existing one.

In order to retain existing customers, service providers need to know the reasons of churn, which can be realised through the knowledge extracted from the data.

we will perform EDA(Exploratory Data Analysis) n order to get actionable insights and convert them into meaningful stories and present it so that the company will take necessary actions to prevent further churning.


# **GitHub Link -**

GitHub Link - https://github.com/santy1586/Data_Analysis

# **Problem Statement**


Orange S.A, Fornaly France Telecom is a French mulinational telecommunications corporation. The Orange Telecom's Churn Dataset, consists of cleaned customer data, along with churn label specifying whether a customer cancelled the subscription. 




#### **Business Objective**

As per the Organization requirements, We need to Explore and Analyze the data by performing Exploratory Data Analysis. And to find out what are the key factors responsible for customer churn. And how we can prevent from further retention. 
* Overall Maximizing the company's profit by retaining customer
* Overall minimizing the customers by identifying the key cause of the problem

## ***Now we will import libraries as well as the data which was provided by the Organization.***

### Import Modules and Loading Data

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import missingno as msno

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#insert the data file 
path='/content/drive/MyDrive/'
df= pd.read_csv(path + 'Telecom Churn.csv')

### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

In [None]:
df.drop('Area code',axis = 1, inplace = True)

As we are having State variable, we don't need area code.

In [None]:
df

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df[df.duplicated()].sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
missing = pd.DataFrame((df.isnull().sum())*100/df.shape[0]).reset_index()
plt.figure(figsize=(16,5))
ax = sns.pointplot('index',0,data=missing)
plt.xticks(rotation =90,fontsize =7)
plt.title("Percentage of Missing values")
plt.ylabel("PERCENTAGE")
plt.show()

### What did you know about your dataset?

***There are 3333 rows and 20 columns in above dataset.Only one column is in boolean format.***
 
***8 Variables is of Float Data type.***

***8 Variables is of Interger data Type.***

***3 Variables are in Object format i.e there are categorical values.***

***So far the dataset which was provided by the Organization have CLEAN features without having any MISSING or NULL values.***

## ***2. Understanding Your Variables***

In [None]:
df.columns



**STATE:**
51 Unique States name

**Account** **Length:**
Length of The Account

**International Plan:**
Yes Indicate International Plan is Present and No Indicates no subscription for Internatinal Plan

**Voice Mail Plan:**
Yes Indicates Voice Mail Plan is Present and No Indicates no subscription for Voice Mail Plan

**Number vmail messages:**
Number of Voice Mail Messages ranging from 0 to 50

**Total day minutes:**
 Total Number of Minutes Spent  in Morning

**Total day calls:**
 Total Number of Calls made  in Morning.

**Total day charge:**
 Total Charge to the Customers in Morning.

**Total eve minutes:**
Total Number of Minutes Spent  in Evening

**Total eve calls:**
 Total Number of Calls made r in Evening.

**Total eve charge:**
 Total Charge to the Customers in Morning.

**Total night minutes:**
 Total Number of Minutes Spent  in the Night.

**Total night calls:**
 Total Number of Calls made  in Night.

**Total night charge:**
 Total Charge to the Customers in Night.

 **Customer service calls**
 Number of customer service calls made by customer

 **Churn**
 Customer Churn, True means churned customer, False means retained customer
 

In [None]:
# Dataset Describe
df.describe().T.astype(int)

### Variables Description 

All the variables in the Dataset does not have any missing or null values so the count will remain same for all the variables.

Numerical Variables are more compared to Categorical Variables, And its in binary format. It will make easier to perform numerical analysis.








### Check Unique Values for each variable.

In [None]:
def unique(df):
  return df.unique()

In [None]:
df.apply(unique)

In [None]:
df['Churn'].value_counts()

## 3. ***Data Wrangling***

As we know the Organization has given all the dataset, along with the specified variable with it.
And this Feature will be our Target Variable.

### Univariate Analysis

In [None]:
print('Customers who did not left the services and are still using it, is {} % of total'.format(round(df['Churn'].value_counts()[0]/len(df['Churn'])*100)))
print('Customers who left the services is {} % of total'.format(round(df['Churn'].value_counts()[1]/len(df['Churn'])*100)))
x=df.Churn.value_counts()
sns.barplot(x.index,x)
plt.gca().set_ylabel('Churn')

> First of all understanding the dataset which was done by collecting information

> Some of the variables were not required so i have drop that variable which is 'Area Code' which was not necessary as the dataset was having State names.

> Then moving forward i have thoroughly checked whether we have missing values or some nan values which was later found there are no missing values. The overall Data is clean.

> And also i have find out with the help of bar plot that how many percentage of customers were churned. So 'Churn' variable is our Target variable.

### Analyzing State Column

In [None]:
df['State'].value_counts()

In [None]:
sns.set(style='darkgrid')
plt.figure(figsize=(14,8))
state= sns.countplot(x='State',hue='Churn',data=df)
plt.show()

By Observing it,
West Virginia has the HIGHEST non-churner customers as compared to the other states.

California has the LOWEST non-churner customers as compared to the other states

Similarly,
New jersey has the highest churner customers as compared to the other states

Iowa,Hawaii,Alaska. These states are the lowest of the lowest churner customers as compared to other states.

In [None]:
plt.rcParams['figure.figsize'] = (12, 7)
color = plt.cm.copper(np.linspace(0, 0.5, 20))
((df.groupby(['State'])['Churn'].mean())*100).sort_values(ascending = False).head(5).plot.bar(color = ['violet','indigo','blue','grey','yellow','orange','red'])
plt.title(" State with Top 5 churn percentage", fontsize = 20)
plt.xlabel('state', fontsize = 15)
plt.ylabel('percentage', fontsize = 15)
plt.show()

In [None]:
df.groupby(['State'])['Churn'].mean().sort_values(ascending=False).head(5)

From the above Analysis New Jersey, California, Texas, Maryland and South Carolina are one of the top 5 states where customers are churning.

Probaly the reason for this scenerio could be range/low coverage of the cellular network.

So, The Organization might have to look into it. In order to prevent it.

### Analyzing  Account Length column

In [None]:
df['Account length'].value_counts()

In [None]:
sns.distplot(df['Account length'])

By observing the figure, We don't see any Intuitions. lets colab with churn data.

In [None]:
churners= df[df['Churn'] == bool(True)]
non_churners =df[df['Churn']==bool(False)]

In [None]:
sns.distplot(df['Account length'],label='ALL')
sns.distplot(churners['Account length'],color = 'red',hist=False,label = 'Churners')
sns.distplot(non_churners['Account length'],color = 'yellow',hist=True,label = 'non-Churners')
plt.legend()

We don't find any intuitions in this variable and neither related to churn data. Moving forwad to another vaariable.

### Analyzing International plan column

In [None]:
df['International plan'].value_counts()

In [None]:
#Calculating churn percentage using international plan.
data = pd.crosstab(df["International plan"],df["Churn"])
data['Percentage Churn'] = data.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
print(data)

In [None]:
sns.countplot(x='International plan',hue='Churn',data=df)

There are 3010 customers who don't have International plans.

Therwhe are 323 customers who have International plans.

Those who have International plans have churned about 42.4% percentage wise.

11.4% of the customers who did not churned have none International plans.

so, As we can see the customers who bought International plans are churning.

Probably plans has high call charges or can also have connectivity issue.

### Analyzing Voice mail plan column

In [None]:
#Calculating churn percentage using voice mail plan.
data = pd.crosstab(df["Voice mail plan"],df["Churn"])
data['Percentage Churn'] = data.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
data

In [None]:
sns.countplot(x='Voice mail plan',hue ='Churn',data=df)

Same as International plan variable, People who don't have any voice mail plans not churning.

### Analyzing Number vmail messages columns

In [None]:
df['Number vmail messages'].value_counts()

In [None]:
df['Number vmail messages'].unique()

In [None]:
df.boxplot(column='Number vmail messages',by='Churn')


From the above figure, we are seeing churners are leaving the plans, If there is more than 15 voice mail messages.

Company may have to improve their voice mail quality.

### Analyzing Customer secive calls columns

In [None]:
df['Customer service calls'].value_counts()

In [None]:
data = pd.crosstab(df['Customer service calls'],df['Churn'])
data['Percentage_Churn'] = data.apply(lambda x : x[1]*100/(x[0]+x[1]),axis = 1)
data

In [None]:
sns.countplot(x='Customer service calls',hue= 'Churn',data=df)

Customer service is also the important part of the Organization. From the above data it is observed that customers called service center more than 4 times have churned the services.

More than 60% of the customers are churning. So, company should work to improve the effiecency of the service calls. 

### Analyzing all the calls columns 

All the columns are now in numerical format, so we will be executing numerical methods

In [None]:
# Evaluating mean
df.groupby(['Churn'])['Total day calls'].mean()


In [None]:
df.groupby(['Churn'])['Total day minutes'].mean()

In [None]:
df.groupby(['Churn'])['Total day charge'].mean()

In [None]:
sns.scatterplot(x='Total day minutes',y='Total day charge',hue='Churn', data=df,palette='hls')

In [None]:
#Evaluating average customers from total evening calls
df.groupby(['Churn'])['Total eve calls'].mean()

In [None]:
#Evaluating average customers from total evening minutes
df.groupby(['Churn'])['Total eve minutes'].mean()

In [None]:
#Evaluating average customers from total evening charges
df.groupby(['Churn'])['Total eve charge'].mean()

In [None]:
sns.scatterplot(x='Total eve minutes',y='Total eve charge',hue='Churn', data=df,palette='hls')

In [None]:
#Evaluating average customers from total night calls
df.groupby(['Churn'])['Total night calls'].mean()

In [None]:
#Evaluating average customers from total night minutes
df.groupby(['Churn'])['Total night minutes'].mean()

In [None]:
#Evaluating average customers from total night charges
df.groupby(['Churn'])['Total night charge'].mean()

In [None]:
sns.scatterplot(x='Total night minutes',y = 'Total night charge',hue  = 'Churn',data=df)

In [None]:
#Evaluating average customers from total international minutes
df.groupby(['Churn'])['Total intl minutes'].mean()

In [None]:
#Evaluating average customers from total international calls
df.groupby(['Churn'])['Total intl calls'].mean()

In [None]:
#Evaluating average customers from total international charges
df.groupby(['Churn'])['Total intl charge'].mean()

In [None]:
sns.scatterplot(x ='Total intl minutes',y='Total intl charge',hue='Churn',data=df)

In [None]:
day =df['Total day charge'].mean()/df['Total day minutes'].mean()
eve =df['Total eve charge'].mean()/df['Total eve minutes'].mean()
night =df['Total night charge'].mean()/df['Total night minutes'].mean()
intl =df['Total intl charge'].mean()/df['Total intl minutes'].mean()

In [None]:
day,eve,night,intl

In [None]:
sns.barplot(x=['Day','evening','night','internatonal'],y=[day,eve,night,intl])

After observing the above Figure, we found that total day/night/evening minutes/calls/charges are not a cause of churn rate. However, international call charges are expensive when compared to others, which might be a reason for international plan customers to end up leaving.

### Bivariate Analysis

In Bivariate Analysis we analyze data by taking more than one column from the dataset. 

In [None]:

categorical_columns = ['International plan','Voice mail plan']
Numerical_columns = ['Account length','Number vmail messages','Total day minutes','Total day calls','Total day charge','Total eve minutes','Total eve calls','Total eve charge','Total night minutes','Total night calls','Total night charge','Total intl minutes','Total intl calls','Total intl charge','Customer service calls']

In [None]:
fig,axes = plt.subplots(2,1,figsize=(10,12))
for index,cat_col in enumerate(categorical_columns):
    row,col = index//1,index%1
    sns.countplot(x=cat_col,data=df,hue='Churn',ax=axes[row])

> Here i have used subplots which will help us to give more than one visual.

> And also i have use countplot which will help us to give clear understanding of columns along with Target variable.


> Customers who do not have any International plan didn't churn this company services and The Customers who have or had    	International plans churned this company services.

> Customer who do not have any Voive mail plan didn't churn this company services, Whereas the Customers who has the have or had Voice mail plans churned this company serivces

> As per the above findings, It seems that the cost of International plans and voice mail plans are higher than the other telecom industries, and thats the reason customers are churning. 

> As per the findings, I am seeing a negative growth in these plans. The Organization might have to change their costing in plans in order to prevent customer churning. 

In [None]:
df

In [None]:
sns.FacetGrid(df,hue ='Churn',size=10).map(plt.scatter,'Total day charge','Total intl charge').add_legend();
plt.show()

From the above figure,We can see mostly customers are churning when there is above 30$ of day charges.

In [None]:
sns.FacetGrid(df,hue ='Churn',size=10).map(plt.scatter,'Total eve charge','Total intl charge').add_legend();
plt.show()

Mostly customers are churning the company from the range of 10-30$ of evening charges.

In [None]:
sns.FacetGrid(df,hue ='Churn',size=10).map(plt.scatter,'Total night charge','Total intl charge').add_legend();
plt.show()

### Multivariate Analysis

In Multivariate Analysis we analyze the data by taking two or more columns from the dataset

In [None]:
plt.figure(figsize=(19,8))
df.corr()['Churn'].sort_values(ascending = False).plot(kind='bar',color=['red','yellow','blue','green','brown','indigo','orange'])

In [None]:
plt.figure(figsize=(17,8))
correlation=df.corr()
sns.heatmap(abs(correlation), annot=True, cmap='coolwarm')

***After performing exploratory data analysis on the data set, this is what we have incurred from data:***

* ***Some states have a higher churn rate than others, which might be due to poor network coverage.***
****Because the area code and account length have no influence on the churn rate, they are redundant data columns.***
****Customers who have the International plan churn more frequently, and the international calling charges are also high, leaving customers dissatisfied with network issues and high call charges.***
****When there are more than 20 voice-mail messages in the voice mail section, there is churn, which basically means that the voice mail quality is poor.***.
****Total day call minutes, total day calls, Total day charge, Total eve minutes, Total eve calls, Total eve charge, Total night minutes, Total night calls, Total night charge, none of these columns had any bearing on the churn rate.***
****Data on international calls shows that the churn rate of those customers who take the international plan is high, implying that international call charges are high as well as a call drop or network issue.***
****Data from customer service calls shows that when an unsatisfied customer calls the service centre, the churn rate is high, indicating that the service centre did not resolve the customer's issue.*** 

# **Conclusion**

* ***Increase network coverage in the churned state***
*  ***Customers can benefit from a discount plan in international plans.***
*  ***Improve the voice mail quality or solicit customer feedback***
*  ***Improve call centre service by soliciting frequent feedback from customers about their problems and attempting to resolve them as soon as possible.***