<a href="https://colab.research.google.com/github/pratyushkumarrath/Telecom-Churn-Analysis/blob/main/Telecom_Churn_Analysis_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. INTRODUCTION**
Churn is the percentage of customers that stop using your service during a given time frame. Churn rate is one of the most important metrics that a company with recurring payment customers can calculate, and is most often expressed as a percentage of subscribers that have canceled their recurring payment plans or closure of an account or cancel their subscription or use another service provider. 
Churn could occur due to many different reasons and customer churn analysis helps to identify the cause and timing of the churn leading to implement effective churn retention strategies.

Here we have The Orange Telecom's Churn Dataset, consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription We are going to explore and analyze the data to discover key factors responsible for customer churn and recommend some ways to ensure customer retention.


In [15]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [16]:
# importing the necessary libraries and data file

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
churn_df = pd.read_csv('/content/drive/MyDrive/EDA Data Analysis/Telecom Churn.csv')

# 2. DATA INFORMATION

In [17]:
# Finding shape of the data

churn_df.shape

(3333, 20)

In [18]:
# Data Dictionaries present in the data
churn_df.columns

Index(['State', 'Account length', 'Area code', 'International plan',
       'Voice mail plan', 'Number vmail messages', 'Total day minutes',
       'Total day calls', 'Total day charge', 'Total eve minutes',
       'Total eve calls', 'Total eve charge', 'Total night minutes',
       'Total night calls', 'Total night charge', 'Total intl minutes',
       'Total intl calls', 'Total intl charge', 'Customer service calls',
       'Churn'],
      dtype='object')

**DATA DICTIONARY** 

* **State:** the state in which the customer resides, indicated by a two-letter abbreviation
* **Account Length:** the number of days that this account has been active
* **Area Code:** the three-digit area code of the corresponding customer
* **International Plan:** whether the customer has an international calling plan: (yes/no)
* **Voice Mail Plan:** whether the customer has a voice mail feature: (yes/no)
* **Number VMail Message:** the average number of voice mail messages
* **Total Day Mins:** the total number of calling minutes used during the day
* **Total Day Calls:** the total number of calls placed during the day
* **Total Day Charge:** the billed cost of daytime calls
* **Total Eve Mins:** the total number of calling minutes used during the evening
* **Total Eve Calls:** the total number of calls placed during the evening
* **Total Eve Charge:** the billed cost of evening time calls
* **Total Night Mins:** the total number of calling minutes used during the night
* **Total Night Calls:** the total number of calls placed during the night
* **Total Night Charge:** the billed cost of nighttime calls
* **Total Intl Mins:** the total number of international minutes
* **Total Intl Calls:** the total number of international calls
* **Total Intl Charge:** the billed cost for international calls
* **Customer Service Calls:** the number of calls placed to Customer Service
* **Churn:** whether the customer left the service: true/false

In [22]:
# Type of each attribute with null count

churn_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   3333 non-null   object 
 1   Account length          3333 non-null   int64  
 2   Area code               3333 non-null   int64  
 3   International plan      3333 non-null   object 
 4   Voice mail plan         3333 non-null   object 
 5   Number vmail messages   3333 non-null   int64  
 6   Total day minutes       3333 non-null   float64
 7   Total day calls         3333 non-null   int64  
 8   Total day charge        3333 non-null   float64
 9   Total eve minutes       3333 non-null   float64
 10  Total eve calls         3333 non-null   int64  
 11  Total eve charge        3333 non-null   float64
 12  Total night minutes     3333 non-null   float64
 13  Total night calls       3333 non-null   int64  
 14  Total night charge      3333 non-null   

In [23]:
# Viewing top 5 data

churn_df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,No,No,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,Yes,No,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,Yes,No,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [24]:
# Describe the data column wise

churn_df.describe()

Unnamed: 0,Account length,Area code,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls
count,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0
mean,101.064806,437.182418,8.09901,179.775098,100.435644,30.562307,200.980348,100.114311,17.08354,200.872037,100.107711,9.039325,10.237294,4.479448,2.764581,1.562856
std,39.822106,42.37129,13.688365,54.467389,20.069084,9.259435,50.713844,19.922625,4.310668,50.573847,19.568609,2.275873,2.79184,2.461214,0.753773,1.315491
min,1.0,408.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.2,33.0,1.04,0.0,0.0,0.0,0.0
25%,74.0,408.0,0.0,143.7,87.0,24.43,166.6,87.0,14.16,167.0,87.0,7.52,8.5,3.0,2.3,1.0
50%,101.0,415.0,0.0,179.4,101.0,30.5,201.4,100.0,17.12,201.2,100.0,9.05,10.3,4.0,2.78,1.0
75%,127.0,510.0,20.0,216.4,114.0,36.79,235.3,114.0,20.0,235.3,113.0,10.59,12.1,6.0,3.27,2.0
max,243.0,510.0,51.0,350.8,165.0,59.64,363.7,170.0,30.91,395.0,175.0,17.77,20.0,20.0,5.4,9.0


In [12]:
# Number of unique data in every attribute
churn_df.nunique()

State                       51
Account length             212
Area code                    3
International plan           2
Voice mail plan              2
Number vmail messages       46
Total day minutes         1667
Total day calls            119
Total day charge          1667
Total eve minutes         1611
Total eve calls            123
Total eve charge          1440
Total night minutes       1591
Total night calls          120
Total night charge         933
Total intl minutes         162
Total intl calls            21
Total intl charge          162
Customer service calls      10
Churn                        2
dtype: int64

In [25]:
# Finding any missing value in the data

churn_df.isnull().sum()

State                     0
Account length            0
Area code                 0
International plan        0
Voice mail plan           0
Number vmail messages     0
Total day minutes         0
Total day calls           0
Total day charge          0
Total eve minutes         0
Total eve calls           0
Total eve charge          0
Total night minutes       0
Total night calls         0
Total night charge        0
Total intl minutes        0
Total intl calls          0
Total intl charge         0
Customer service calls    0
Churn                     0
dtype: int64

In [32]:
churn_df.duplicated().sum()

0

**SUMMERY:** 

It’s a dataset consists 3,333 records, where each record uses the first 19 attributes to describe the profile of a certain customer and the last attribute to label this customer. In which all the listed customers are from 51 different states having 3 different area codes.
There is no duplicate value or any type of missing value such as 'null' or 'nan'. Hence the data is already in cleaned form.