# SCENARIO



In the telecommunications sector, company Zela faces the challenge
of maintaining its customer base in a highly competitive market. The
company has noticed an unsettling trend of increased customer
churn, which is the rate at which customers discontinue their service.
If churn increases by only one point, then it directly affects the
business in a negative perspective. High Churn rates compound very
fast that can have a massive loss to the company. This trend
threatens the company's market share, impacts its revenue and long term sustainability.

# PROBLEM STATEMENT


To address the issue of high churn rates, the company has embarked
on a data-driven approach to understand the underlying factors
contributing to customer churn. By analyzing customer data, the
company aims to identify patterns and predictors of churn, which
could include usage patterns, service charges, and customer service
interactions. The goal is to leverage these insights to develop targeted
strategies that improve customer retention. This could involve
adjusting pricing models, enhancing service features, or improving
customer service. The success of this initiative is critical for the
company to stabilize its customer base, optimize its service offerings,
and secure a competitive edge in the market.

# THE DATASET

The training dataset contains 3467 samples. Each sample contains 19
features and 1 boolean variable "churn" which indicates the class of the
sample.
Dataset Link: https://drive.google.com/drive/u/1/folders/1x4--wO9Tw87c2-R28I2v1NWNiUZbwhE1

# IMPORT THE REQUIRED LIBRARIES

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# READ THE GIVEN CSV FILE (DATASET)

In [7]:
train=pd.read_csv('train.csv')
train

Unnamed: 0,State_Code,Account_Length_Months,Area_Code,Intl_Plan,VM_Plan,VM_Messages,Avg_Day_Minutes,Avg_Day_Calls,Avg_Day_Charge,Avg_Eve_Minutes,Avg_Eve_Calls,Avg_Eve_Charge,Avg_Night_Minutes,Avg_Night_Calls,Avg_Night_Charge,Avg_Intl_Minutes,Avg_Intl_Calls,Avg_Intl_Charge,Customer_Service_Calls,Churn
0,MI,36,510,0,0,0,193.08,88,35.56,228.34,109,26.68,200.51,126,10.03,12.54,6,3.27,1,0
1,TN,16,510,0,0,0,165.62,69,30.50,246.13,95,28.76,150.08,99,7.51,12.30,9,3.21,1,0
2,DC,99,415,0,0,0,216.22,70,39.83,115.62,110,13.51,236.66,87,11.84,13.11,2,3.42,3,0
3,WY,159,510,0,0,0,182.16,85,33.56,218.48,126,25.54,201.62,133,10.08,9.32,2,2.43,1,0
4,NJ,77,510,0,0,0,134.16,98,24.72,230.27,139,26.92,244.35,140,12.22,15.76,3,4.11,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3462,VA,117,415,0,1,38,280.32,100,51.63,263.17,75,30.77,272.71,132,13.64,10.58,1,2.76,3,0
3463,MI,110,415,1,0,0,317.08,84,58.40,201.98,95,23.61,270.28,96,13.52,16.68,4,4.36,0,1
3464,IN,136,415,0,1,21,193.95,93,35.73,194.05,103,22.68,324.76,127,16.24,10.92,4,2.86,2,0
3465,KY,93,415,0,1,37,249.08,101,45.88,166.62,94,19.48,181.27,95,9.07,11.96,10,3.12,3,0


# STEPS FOLLOWED TO APPROACH THE SOLUTION

• **Analyzation of the distribution of customer demographics** to see if there are any trends in churn rates.

• **Correlation Analysis** to identify any strong predictors of churn among the numerical features.

• **Exploratory Data Analysis (EDA)** to identify patterns and factors that are indicative of customer churn.

• **Machine Learning Model** to predict churn based on the 
features and identify the most important features that contribute t 
chu..

# CORRELATION ANALYSIS

Correlation analysis is a statistical technique for determining the strength of a link between two variables. It is used to detect patterns and trends in data and to forecast future occurrences. The correlation coefficient ranges between -1 and 1.

### TYPES OF CORRELATIONS

The sign of the correlation coefficient indicates the direction of the relationship between variables. <br>
• Positive Correlation: Positive correlation indicates that two variables have a direct relationship. As one variable increases, the other variable also increases. For example, there is a positive correlation between height and weight. As people get taller, they also tend to weigh more. <br>
• Negative Correlation: Negative correlation indicates that two variables have an inverse relationship. As one variable increases, the other variable decreases. For example, there is a negative correlation between price and demand. As the price of a product increases, the demand for that product decreases. <br>
• Zero Correlation: Zero correlation indicates that there is no relationship between two variables. The changes in one variable do not affect the other variable. For example, there is zero correlation between shoe size and intelligence. <br>

### CORRELATION COEFFICIENTS

Perfect: 0.80 to 1.00 <br>
Strong: 0.50 to 0.79 <br>
Moderate: 0.30 to 0.49 <br>
Weak: 0.00 to 0.29 <br>

### CORRELATION ANALYSIS OF THE CHURN

In [13]:
numeric_columns = train.select_dtypes(include=['float64', 'int64']).columns
train[numeric_columns].corr()['Churn']

Account_Length_Months     0.016765
Area_Code                 0.017986
Intl_Plan                 0.262824
VM_Plan                  -0.109743
VM_Messages              -0.094647
Avg_Day_Minutes           0.229461
Avg_Day_Calls             0.005167
Avg_Day_Charge            0.229450
Avg_Eve_Minutes           0.072349
Avg_Eve_Calls             0.001935
Avg_Eve_Charge            0.072346
Avg_Night_Minutes         0.045635
Avg_Night_Calls          -0.029180
Avg_Night_Charge          0.045620
Avg_Intl_Minutes          0.070925
Avg_Intl_Calls           -0.028726
Avg_Intl_Charge           0.070988
Customer_Service_Calls    0.214136
Churn                     1.000000
Name: Churn, dtype: float64

### OBSERVATIONS

From the above Correlation Analysis, we observe that: <br>
- **THE FOLLOWING FEATURES HAVE SIGNIFICANT POSITIVE CORRELATION WITH CHURN:**
    - If customer has International Plan (Intl_Plan)
    - Average Minutes of Day Calls (Avg_Day_Minutes)
    - Average Charge of Day Calls (Avg_Day_Charge)
    - Number of Calls to Customer Service (Customer_Service_Calls)
- **THE FOLLOWING FEATURES HAVE SIGNIFICANT NEGATIVE CORRELATION WITH CHURN:**
    - If customer has Voice Mail Plan (VM_Plan)

### INFERENCES

- International Calling Services at Zela are not upto-the-mark.
- Calling Services during the Daytime are not satisfactory enough for the customers.
- Customer Service Department is not efficient enough to cater the needs of the customers.
- Voice Mail Service is performing well.

### SOLUTIONS

- **International Calling Services**
    - Improve international calling network infrastructure.
    - Prices for International Packs can be pulled down to retain existing customers.
    - Discuss with international partners  
- **Daytime Calling Services**
    - Daytime Calling Services must be improved.
    - Increase network capacity during peak daytime hours.
    - Introduce dynamic pricing such that the calling rate increases after a certain point of time.
    - Introduce dynamic pricing such that calls during non-peak hours are cheaper to reduce network congestion.
- **Customer Service**  
    - Organise training sessions for customer service employees.
    - Focus on customer service feedback.
    - Increase customer service force.
- **Voice-Mail**
    - Maintain the current operational Voice-Mail system.
    - Promote the features of the Voice Mail Service to customers, highlighting its benefits.
    - Integrate Voice Mail Service with other communication services 

# ANALYZATION OF DISTRIBUTION OF CUSTOMER DEMOGRAPHICS

In [15]:
train.shape

(3467, 20)

In [16]:
train.dtypes

State_Code                 object
Account_Length_Months       int64
Area_Code                   int64
Intl_Plan                   int64
VM_Plan                     int64
VM_Messages                 int64
Avg_Day_Minutes           float64
Avg_Day_Calls               int64
Avg_Day_Charge            float64
Avg_Eve_Minutes           float64
Avg_Eve_Calls               int64
Avg_Eve_Charge            float64
Avg_Night_Minutes         float64
Avg_Night_Calls             int64
Avg_Night_Charge          float64
Avg_Intl_Minutes          float64
Avg_Intl_Calls              int64
Avg_Intl_Charge           float64
Customer_Service_Calls      int64
Churn                       int64
dtype: object

In [17]:
train.isnull()

Unnamed: 0,State_Code,Account_Length_Months,Area_Code,Intl_Plan,VM_Plan,VM_Messages,Avg_Day_Minutes,Avg_Day_Calls,Avg_Day_Charge,Avg_Eve_Minutes,Avg_Eve_Calls,Avg_Eve_Charge,Avg_Night_Minutes,Avg_Night_Calls,Avg_Night_Charge,Avg_Intl_Minutes,Avg_Intl_Calls,Avg_Intl_Charge,Customer_Service_Calls,Churn
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3462,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3463,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3464,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3465,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [20]:
train.columns

Index(['State_Code', 'Account_Length_Months', 'Area_Code', 'Intl_Plan',
       'VM_Plan', 'VM_Messages', 'Avg_Day_Minutes', 'Avg_Day_Calls',
       'Avg_Day_Charge', 'Avg_Eve_Minutes', 'Avg_Eve_Calls', 'Avg_Eve_Charge',
       'Avg_Night_Minutes', 'Avg_Night_Calls', 'Avg_Night_Charge',
       'Avg_Intl_Minutes', 'Avg_Intl_Calls', 'Avg_Intl_Charge',
       'Customer_Service_Calls', 'Churn'],
      dtype='object')

In [21]:
train.describe()

Unnamed: 0,Account_Length_Months,Area_Code,Intl_Plan,VM_Plan,VM_Messages,Avg_Day_Minutes,Avg_Day_Calls,Avg_Day_Charge,Avg_Eve_Minutes,Avg_Eve_Calls,Avg_Eve_Charge,Avg_Night_Minutes,Avg_Night_Calls,Avg_Night_Charge,Avg_Intl_Minutes,Avg_Intl_Calls,Avg_Intl_Charge,Customer_Service_Calls,Churn
count,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0,3467.0
mean,100.606288,436.62071,0.096048,0.262475,7.695991,195.64244,105.970003,36.037067,214.303825,105.993078,25.049262,203.18103,105.335449,10.162132,11.79073,4.827517,3.076752,1.571676,0.14364
std,39.841338,42.021174,0.2947,0.440043,13.509946,58.679758,21.163917,10.808548,53.649726,21.014681,6.270818,50.799019,21.255113,2.540708,3.186702,2.894911,0.831318,1.318664,0.350775
min,1.0,408.0,0.0,0.0,0.0,0.0,0.0,0.0,44.68,13.0,5.22,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,73.0,408.0,0.0,0.0,0.0,154.81,92.0,28.51,178.62,92.0,20.88,169.57,91.0,8.48,9.78,3.0,2.56,1.0,0.0
50%,100.0,415.0,0.0,0.0,0.0,195.78,106.0,36.07,214.31,106.0,25.05,203.75,106.0,10.19,11.96,4.0,3.12,1.0,0.0
75%,127.0,415.0,0.0,1.0,17.0,234.54,120.0,43.2,250.31,121.0,29.26,237.57,119.0,11.88,13.8,7.0,3.6,2.0,0.0
max,243.0,510.0,1.0,1.0,52.0,380.0,175.0,70.0,376.75,180.0,44.04,400.0,185.0,20.0,23.0,22.0,6.0,9.0,1.0
