<a href="https://colab.research.google.com/github/rohandhunde/Telicom_Customer_Churn_Analysis/blob/main/Customer_Churn_Telocom_EDA_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Telicom Churn Analysis



##### **Project Type**    - EDA
##### - Individual

# **Project Summary -**

The Churn Data Analysis project aimed to analyze customer churn data for a telecommunications company using Python. The project involved importing the data, cleaning and preprocessing it, conducting exploratory data analysis, performing statistical analysis, and developing predictive models using machine learning algorithms.

Through the analysis, several important findings were discovered. It was found that customers with the International Plan were more likely to churn than those without it, and that customers with four or more customer service calls were more likely to churn than those with fewer calls. Additionally, high day and evening minutes were associated with higher churn rates.

Based on these findings, several recommendations were made to help reduce customer churn, such as modifying the International Plan to be more appealing, being proactive with communication, periodically offering promotions to retain customers, and addressing poor network connectivity issues.

Overall, the Churn Data Analysis project provided valuable insights into customer behavior and suggested potential strategies to help the telecommunications company reduce customer churn and improve business performance.

**GitHub Link -**https://github.com/rohandhunde/Telicom_Customer_Churn_Analysis

https://github.com/rohandhunde/EDA-telocom-churn-analysis

# **Problem Statement**


**Write Problem Statement Here.**

Customer churn prediction is extremely important for any business as it recognizes the clients who are likely to stop using their services.

In the telecom industry, customers are able to choose from multiple service providers and actively switch from one operator to another. In this highly competitive market, the telecommunications industry experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has now become even more important than customer acquisition.

For many incumbent operators, retaining high profitable customers is the number one business goal. To reduce customer churn, telecom companies need to predict which customers are at high risk of churn. In this project, you will analyse customer-level data of a leading telecom firm, do exploratory data analysis to identify the main indicators why customers are leaving the company.

#### **Define Your Business Objective?**

***Reducing Customer Churn Rate***

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
from numpy import math
from numpy import loadtxt
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Replace the file link with the link to your own file
url = 'https://drive.google.com/file/d/1F1liwDyQVlimL2rEo1ndmuX4LzNNeBW8/view?usp=sharing'

# Extract the file ID from the link
file_id = url.split('/')[-2]

# Generate a download link for the file
download_link = 'https://drive.google.com/uc?id=' + file_id

# Read the CSV file into a Pandas DataFrame
df = pd.read_csv(download_link)


### Dataset First View

In [None]:
# Dataset First Look
df.head()

In [None]:
# last 5 rows look
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated()
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isna())
df.isna().sum()

### What did you know about your dataset?

the dataset is comming from the telicom indestry and contain the some important fetures and important thats important for analysis .
the dataset we have thet contain the 3333 rows and 20 columns . it has no missing values and does not contain any missing values in between dataset  The goal of your analysis is to study the churn of customers and gain insights into why they may abandon a product or service. Churn prediction involves analyzing the likelihood of a customer leaving and taking steps to prevent it.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Data set Describe
df.describe(include="all")

### Variables Description

* **State                :**categorica for the 50 states

* **Account Length       :**how long account has been active

* **Area Code            :**Code Number of Area having some States included in each area code

* **lntl Plan            :**Internat ional plan activated ( yes, no )

* **VMail Plan           :**  ice Mail plan activated ( yes ,no )

* **VMail Message        :**No.of voice mail messages

* **Day Mins             :**Total day minutes used

* **Day calls**         :Total day calls made

* **Day Charge**         :Total day charge

* **Eve Mins**          :Total evening minutes

* **Eve Calls**          :Total evening calls

* **Eve Charge**         :Total evening charge

* **Night Mins**         :Total night minutes

* **Night Calls**        :Total night calls

* **Night Charge**      :Total night charge

* **Intl Mins**         :Total International minutes used

* **Intl Calls**         :Total International calls made

* **Intl Charge**        :Total International charge

* **CustServ calls**    :Number of customer service caUs made

* **Churn**             :Customer churn (Target Variable True=1, False=0)

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in df.columns:
    print(f"{column}: {df[column].nunique()} unique values")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# extracting the churn and non_churn data from whole dataset
not_churn_data = df[df['Churn'] == False]
not_churn_data.shape
churn_data = df[df['Churn'] == True]
churn_data.shape

not_churn_data.shape,churn_data.shape



In [None]:
df=df.copy()

In [None]:
# Write your code to make your dataset analysis ready.
# Create a copy of the current dataset and assigning to df

# Counting the number of churned customers
num_churned_customers = df['Churn'].sum()
print("No. of customers Churning : -", num_churned_customers)

# Assigning churn customers data to variable df_churn
df_churn = df.loc[df['Churn'] == True]

In [None]:
# Churn data groupby Area Code Wise
pd.DataFrame(df.groupby('Area code')['Churn'].value_counts().reset_index(name="Count"))

In [None]:
# what is the shape of the unique values in dataset
df["State"].value_counts().shape

In [None]:
# unique aracod
print(df["Area code"].unique())

In [None]:
def get_mean_median(df, area_code):
    '''
    This function returns the mean and median of the whole dataset for a particular area code.
    '''
    area_code_df = df[df['Area code'] == area_code]
    churned_customers_df = area_code_df[area_code_df['Churn'] == True]

    if churned_customers_df.empty:
        print("Invalid Area Code")
        return None

    mean = churned_customers_df['Total day charge'].mean()
    median = churned_customers_df['Total day charge'].median()

    return pd.DataFrame({'mean': [mean], 'median': [median]})

In [None]:
# Getting Mean Median for area code 408
area_code = 408
get_mean_median(df, area_code)

### What all manipulations have you done and insights you found?

Based on your approach, it seems that you have taken a data-driven approach to identify the reasons behind customer churn. By analyzing the churned customer data, you have attempted to identify patterns and behaviors that may have led to customers leaving your service.

One of the key insights you have gained from your analysis is that customers who have taken the voice mail plan but are not using it and talking for longer durations may be facing network issues. This suggests that network quality and stability may be a significant factor in customer churn. It is worth exploring ways to improve the quality of your network to address this issue and reduce customer churn.

In addition to network issues, you may have also identified other reasons for customer churn. For example, customers who are making a high number of international calls may be experiencing high costs, which could prompt them to switch to a different provider. Similarly, customers who are making a high number of calls during the day or night may be experiencing a lack of flexibility in their plan, which could also prompt them to switch providers.

Another possible reason for customer churn is dissatisfaction with customer service. Customers who are unhappy with the level of support they receive may be more likely to switch providers. It may be worth exploring ways to improve customer service, such as providing faster response times, more personalized support, or additional resources to help customers troubleshoot issues.

Overall, your approach to identifying the reasons behind customer churn is a data-driven one that focuses on analyzing customer behavior to identify patterns and insights. By creating new columns and experimenting with different logics, you have been able to gain valuable insights into the factors that may be driving customer churn. By addressing these issues, you can work to reduce churn and retain more customers over the long term.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
print(df.Churn.value_counts())
df['Churn'].value_counts().plot(kind='pie',autopct="%1.1f%%",colors=["salmon","skyblue"])

In [None]:
print(df.Churn.value_counts())
df['Churn'].value_counts().plot(kind='bar', color=["salmon","skyblue"])
plt.title('Churn Data')
plt.xlabel('Churn')
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.show()

##### 1. Why did you pick the specific chart?

a pie chart express the relation ship betwee the false and true values in one circle the why we can get the idea about the whole dataset . thats we want

##### 2. What is/are the insight(s) found from the chart?

in the dataset we have 3333 total vlues in which we have 2850 False this fugure is almost 85.5% and we have 438  True and that is exactely 14.5 % according to our visuallization


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 customer churn is a significant business metric in the telecom industry, and it directly affects the competitiveness of service providers. Customer churn can be caused by a variety of factors, including poor network quality, dissatisfaction with customer service, and the introduction of new competitors. It is essential for service providers to focus on customer retention as well as customer acquisition, as retaining existing customers is typically less costly than acquiring new ones. By investing in network infrastructure, offering flexible plans, and improving customer service, telecom service providers can reduce customer churn and improve their competitiveness in the market. Additionally, positive word of mouth can also be a powerful tool in acquiring new customers and reducing customer churn rates.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Calculate the churn rate for each state and sort in descending order
churn_rate_by_state = df.groupby('State')['Churn'].mean().sort_values(ascending=False)

# Show the top 10 churned states
top_10_churned_states = churn_rate_by_state.head(10)
print(top_10_churned_states)

# Visualize the top 10 churned states
colors = ['#E71D36', '#FF9F1C', '#FECB52', '#2EC4B6', '#48BB78', '#FFDAB9', '#FED8B1', '#D6A2E8', '#9B5DE5', '#1D3557']
plt.bar(top_10_churned_states.index, top_10_churned_states.values, color=colors)
plt.title("States with the highest churn rate", fontsize=20)
plt.xlabel('State', fontsize=15)
plt.ylabel('Churn Rate (%)', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

bar chart the show the frequency of the each column with each state

##### 2. What is/are the insight(s) found from the chart?

Out of the 51 states included in the dataset, 10 states have higher churn rates of over 21.74%, which is more than 50% of the average churn rate. These states are CA, NJ, TX, MD, SC, MI, MS, NV, WA, and ME.

Based on data wrangling, it has been observed that some states have poor network regions, while others require better maintenance and new installations. Interestingly, the states of NV and NJ are both in the top 10 churned states, indicating that these issues may be contributing to the high churn rates in these states.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the churn rates can certainly have a positive impact on businesses. By identifying the top churned states, businesses can focus their efforts on improving their services and retaining more customers, which can lead to better growth prospects. For instance, if poor network coverage is causing high churn rates in a state, businesses can work on improving connectivity and network coverage to retain more customers.

However, if businesses fail to take appropriate action based on the insights gained, it may lead to negative growth. Continuously high churn rates can result in a loss of customers and revenue, which can have a detrimental effect on business growth.

Therefore, it is crucial for businesses to consider these insights seriously and implement measures to improve their service quality and customer retention in the identified states with high churn rates, to avoid negative impact on growth.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Box Plot for Account Length attribute
sns.set(style="whitegrid")
plt.figure(figsize=(8,6))
sns.boxplot(x='Churn', y='Account length', data=df, palette='coolwarm')
plt.xlabel('Churn', fontsize=12)
plt.ylabel('Account Length', fontsize=12)
plt.title('Boxplot of Account Length Grouped by Churn', fontsize=14)
plt.show()

##### 1. Why did you pick the specific chart?

that shows the distribution of the "Account length" variable based on the "Churn" variable. A box plot provides information about the symmetry, skew, variance, and outliers of the data. The graph shows the minimum and maximum values, median, and interquartile range. The outliers are well segregated, and the mean and median are well defined in the box plot.

##### 2. What is/are the insight(s) found from the chart?

From the boxplot, we can see the distribution of the "Account length" feature for both the "Churn" and "Non-Churn" groups. The boxplot helps us to identify the median, interquartile range (IQR), outliers, and the overall shape of the distribution.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the box plot can help create a positive business impact by providing information about the distribution of a particular variable, in this case, the account length. By comparing the account length distribution between the churned and non-churned customers, businesses can identify patterns and make informed decisions about customer retention strategies.

#### Chart - 4

In [None]:
df["Area code"].unique()

In [None]:
# Chart - 4 visualization code
# Group the data by 'Area code' and calculate the mean churn rate
area_churn = df.groupby('Area code')['Churn'].mean().reset_index()

# Create a bar plot using seaborn
sns.barplot(x='Area code', y='Churn', data=area_churn, palette=['skyblue', 'lightgreen', 'pink'])
plt.xlabel('Area code', fontsize = 15)
plt.ylabel('Churn rate (%)', fontsize = 15)
plt.title('Average Churn Rate by Area Code', fontsize=20)
plt.show()

##### 1. Why did you pick the specific chart?

this plot can compere the data side by side and we can get the data overview easyly

##### 2. What is/are the insight(s) found from the chart?

The average churn rate varies across the three area codes (408, 415, and 510), with customers in the 415 area code having the highest churn rate, followed by customers in the 408 area code, and customers in the 510 area code having the lowest churn rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the churn rate data can have a positive impact on business outcomes by identifying areas for improvement in customer satisfaction and retention. This can lead to a decrease in churn rates and an increase in customer loyalty, which in turn can result in increased revenue, customer lifetime value, and market share. However, ignoring the insights gained or focusing solely on reducing churn rates without addressing the underlying reasons for high churn rates can lead to negative growth in the long run, such as loss of revenue, market share, and reputation. Therefore, a careful analysis of the insights gained from the data and a holistic approach to improving customer satisfaction and retention is essential for positive business impact.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(12, 5))
plt.subplot(1,2,1)
sns.barplot(x='International plan', y='Total day minutes', data=df, ci=None)
plt.title('Average Minutes Talked')
plt.xlabel('International plan')
plt.ylabel('Minutes')

plt.tight_layout()
plt.show()

In [None]:
plt.subplot(1,2,2)
sns.barplot(x='International plan', y='Total day charge', data=df, ci=None)
plt.title('Average Calling Charge')
plt.xlabel('International plan')
plt.ylabel('Charge')

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The code creates a facet grid of two bar charts, with one chart showing the average minutes talked and the other showing the average calling charge for customers with and without an international plan, across two categories. This is a good choice for visualizing and comparing two sets of data within the same chart, making it easier to see the differences between the two categories. The use of seaborn's barplot also allows for easy customization of the visualization.

##### 2. What is/are the insight(s) found from the chart?

Based on the information you provided, the first plot shows that the average talk time for customers without an international plan is around 175+ minutes, while for those with an international plan it is around 185+ minutes. This suggests that customers with an international plan tend to talk for slightly longer periods of time.

The second plot indicates that the average calling charges for customers without an international plan are around $30+, while for those with an international plan, it is around $35+. This suggests that customers with an international plan tend to have higher calling charges compared to those without an international plan.

Overall, these two plots suggest that there may be a correlation between having an international plan and higher talk time and calling charges. However, more detailed analysis and data would be required to confirm this hypothesis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

3010 dont have a international plan

323 have a international plan

Among those who have a international plan 42.4 % people churn., if the company focuses solely on offering international plans without considering the underlying reasons for differences in average minutes talked and calling charges, such as differences in customer needs or preferences, it may lead to negative growth.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# Visualizing percentage of customers with voice mail plan
plt.figure(figsize=(8,6))
sns.countplot(x='Voice mail plan', data=df, palette=['skyblue', 'red'])
plt.title('Customers with Voice Mail Plan')
plt.xlabel('Voice mail plan')
plt.ylabel('Count')

##### 1. Why did you pick the specific chart?

as we can see we can ge the side by sede comparision on visuals.

##### 2. What is/are the insight(s) found from the chart?

The bar graph is used to compare the items between different groups over time. Bar graphs are used to measure the changes over a period of time.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Providing a voice mail plan to customers can reduce the churn rate as customers with the voice mail plan tend to churn less frequently. This could be due to the fact that customers with a voice mail plan feel more engaged and connected with the service, as they have a feature that allows them to easily leave and receive voice messages. Additionally, customers may perceive the voice mail plan as an added value and are more likely to continue using the service to make use of this feature. Therefore, offering a voice mail plan to customers can be a potential strategy to reduce churn and improve customer retention.

#### Chart - 7

In [None]:
# Chart - 7 visualization code


sns.barplot(x="Churn", y="Total day calls", data=df, ci=None)
plt.title("Mean Total Day Calls")
plt.xlabel("Churn")
plt.ylabel("Total Day Calls")

In [None]:
sns.barplot(x="Churn", y="Total day minutes", data=df, ci=None)
plt.title("Mean Total Day Minutes")
plt.xlabel("Churn")
plt.ylabel("Total Day Minutes")

In [None]:
sns.barplot(x="Churn", y="Total day charge", data=df, ci=None)
plt.title("Mean Total Day Charge")
plt.xlabel("Churn")
plt.ylabel("Total Day Charge")

##### 1. Why did you pick the specific chart?

abobe the plot is better to get vissuals from data and cpmpare it side by side


##### 2. What is/are the insight(s) found from the chart?

in mean total day calls we have churn in the form of true and false and false and true both are same in first plot

in meand total day minuts plot
we also have a churn in form true and false and true we have arround 200+ and false we have near about 175

in total charges plot we have true value around 35 and false we have around  30

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis can help create a positive business impact.

For instance, customers who have an international plan tend to have longer call durations and generate higher charges. Hence, the company can introduce more attractive international plans to retain customers and increase revenue.

Furthermore, customers with a voicemail plan tend to churn less frequently. Thus, the company can focus on marketing and promoting voicemail plans to retain customers and increase loyalty.

However, there is one insight that may lead to negative growth. Customers who make fewer calls during the day and have lower charges are more likely to churn. This implies that the company may lose revenue if they focus only on retaining high-value customers and ignore customers who generate less revenue. Hence, the company needs to balance their retention strategies for both high-value and low-value customers to ensure sustained growth.



#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Visualizing churn rate per customer service calls
plt.rcParams['figure.figsize'] = (12, 8)


s1=list(df['Customer service calls'].unique())
s2=list(df.groupby(['Customer service calls'])['Churn'].mean()*100)
plt.bar(s1,s2, color = ['violet','indigo','b','g','y','orange','r'])


plt.title(" Churn rate per service call", fontsize = 20)
plt.xlabel('No of cust service call', fontsize = 15)
plt.ylabel(' percentage', fontsize = 15)
plt.show()

##### 1. Why did you pick the specific chart?

The bar graph is used to compare the items between different groups over time.

##### 2. What is/are the insight(s) found from the chart?

The data shows that customers make varying numbers of service calls, with a range of 0 to 9. Customers who make more service calls are more likely to leave. Specifically, customers who make more than 5 service calls have a probability of leaving that exceeds 50%. Therefore, it is important to prioritize solving the issues of customers who make more than 5 service calls and providing them with better service to prevent them from leaving. Additionally, customers who make 4 or more service calls are more likely to churn than those who make fewer service calls, at a rate of over four times more often.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

the gained insights can help in creating a positive business impact. By identifying that customers with more than 5 service calls are likely to leave, the company can take measures to address their concerns and provide better service to retain them. This can help in reducing customer churn and increasing customer loyalty, which can have a positive impact on the business.

However, the insight that customers with four or more customer service calls churn more than four times as often as other customers can lead to negative growth if the company fails to address their concerns and improve their experience. If these customers continue to face issues and do not receive satisfactory resolution, they may leave the company, leading to increased customer churn and negative impact on the business. Therefore, it is important for the company to focus on resolving the issues of such customers and improve their overall experience to prevent negative growth.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
s1=df['State'].unique()
s2=df.groupby(['State'])['Churn'].mean()

plt.rcParams['figure.figsize'] = (18, 7)

plt.plot(s1,s2,color='r', marker='o', linewidth=2, markersize=12)

plt.title(" States churn rate", fontsize = 20)
plt.xlabel('state', fontsize = 15)
plt.ylabel('churn rate', fontsize = 15)
plt.show()

##### 1. Why did you pick the specific chart?

we can get the point wise overview of the data

##### 2. What is/are the insight(s) found from the chart?

above the visualization is on the basis of state churn data analysis rate
it is the relation between the state and churn rate

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



Positive impact:

If the analysis reveals that customers from certain states have a lower churn rate, the company can focus their marketing efforts in those states to attract new customers.
If the analysis shows that customers from certain states are more likely to stay with the company for a long time, the company can use this information to create targeted retention strategies for those customers.
If the analysis uncovers specific reasons why customers from certain states are more likely to churn, the company can take steps to address those reasons and reduce overall churn rate.
Negative impact:

If the analysis reveals that customers from certain states have a significantly higher churn rate, the company may need to invest more resources into those states to reduce churn and retain customers, which could lead to increased costs.
If the analysis shows that there are no discernible patterns or insights based on state, the company may need to invest more resources into data analysis to identify other factors that are contributing to churn, which could also lead to increased costs.
Ultimately, the specific insights gained from the state churn rate data analysis will determine whether the impact is positive or negative for the business.

#### Chart - 13

In [None]:
df.head()

In [None]:
# Chart - 13 visualization code
# Calculate the churn rates by plan type
plan_churn_rates = df.pivot_table(index='International plan', values='Churn', aggfunc='mean')

# Set the figure size
fig, ax = plt.subplots(figsize=(8, 8))

# Set the colors for the chart
colors = ['#4e79a7', '#f28e2b']

# Create the donut chart
wedges, texts, autotexts = ax.pie(plan_churn_rates['Churn'], labels=plan_churn_rates.index, colors=colors, autopct='%1.1f%%', startangle=90, pctdistance=0.75, wedgeprops={'width': 0.4, 'edgecolor': 'w'})

# Add a circle at the center to create the donut shape
center_circle = plt.Circle((0, 0), 0.5, color='white', edgecolor='white', linewidth=0.1)
fig.gca().add_artist(center_circle)

# Set the title and font size
ax.set_title('Churn Rates by Plan Type', fontsize=14)

# Set the font size for the labels
plt.setp(texts, fontsize=14)
plt.setp(autotexts, fontsize=12, color='white')

# Show the chart
plt.show()

##### 1. Why did you pick the specific chart?

A donut chart can be useful in highlighting the proportion of each category within the whole dataset. It can help to identify which categories are the largest or smallest, and can make it easier to compare the relative sizes of different categories.

##### 2. What is/are the insight(s) found from the chart?

A donut chart is a type of chart that displays data in a circular shape with a hole in the center. It is used to show the proportion of each category in a dataset. The chart is divided into segments, with each segment representing a category. The size of each segment is proportional to the value of the category it represents. The chart is useful for showing how much of the total is made up by each category.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from a donut chart can also potentially help create a positive business impact if the data reveals important patterns or correlations that can inform decision-making. For example, if the chart shows that a large proportion of customers are in a particular category, it could suggest that the business should focus on improving its offerings in that category to retain customers.

However, there could also be insights that lead to negative growth. For instance, if the chart shows that a large proportion of customers are in a category that is not profitable for the business, it could suggest that the business needs to reevaluate its product offerings and focus on more profitable categories.

Overall, the insights gained from a donut chart will depend on the specific patterns and correlations present in the data. It’s important to analyze the data thoroughly and interpret the results carefully to make informed business decisions that can help drive growth and profitability.




Regenerate response

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# calculate the correlation matrix
corr = df.corr()

# plot the correlation heatmap
sns.heatmap(corr, cmap='coolwarm', annot=True)

# set the title and display the plot
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

From the correlation heatmap, we can see that some variables are highly correlated with each other, such as:

Total day minutes, total day charge, and total day calls are highly positively correlated with each other
Total eve minutes, total eve charge, and total eve calls are highly positively correlated with each other
Total night minutes, total night charge, and total night calls are highly positively correlated with each other
We can also observe some interesting negative correlations:

There is a negative correlation between churn and customer service calls, which indicates that customers who make more customer service calls are less likely to churn
There is a negative correlation between international plan and churn, indicating that customers who have international plan are less likely to churn
Overall, this information can help businesses to identify the key drivers of customer churn and take appropriate measures to retain customers, such as improving customer service, offering international plans, and optimizing pricing strategies for different plans.

##### 2. What is/are the insight(s) found from the chart?

From the correlation heatmap, we can observe that the three pairs of variables, total day charge & total day minute, total evening charge & total evening minute, and total night charge & total night minute, have a perfect positive correlation with each other, i.e., they are highly correlated with a correlation coefficient of 1.

The number of customer service calls made by the customers is positively correlated only with the area code and negatively correlated with all other variables.

We can also see that the total day minutes, total evening minutes, and total night minutes are highly correlated with each other, indicating that customers who spend more time on calls during the day also spend more time on calls during the evening and night.

Furthermore, there is a strong positive correlation between the account length and the number of customer service calls made by the customers, indicating that customers who have been with the company for a longer duration are more likely to make service calls.

Overall, the correlation heatmap provides us with valuable insights into the relationships between different variables, which can be used to gain a better understanding of the customer behavior and improve the company's performance.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.set(style="ticks", font_scale=1.2, rc={"figure.figsize":(10,8)})
sns.set_palette("husl")

g = sns.pairplot(df, hue="Churn")
plt.title("Pairplot of Churn Data")
plt.show()

##### 1. Why did you pick the specific chart?

Pair plot is a graphical representation used to identify patterns and relationships between features in a dataset. It helps to form simple classification models and identify the best set of features to explain the relationship between two variables or form separated clusters. It is a useful tool for exploratory data analysis and visualizing data distributions.

##### 2. What is/are the insight(s) found from the chart?

The pair plot shows that there is less linear relationship between variables and the data points are not linearly separable. The churned customer data is clustered and overlapping, while the non-churn data is more symmetrical. The area code appears to be an important feature and the number of churns with respect to different features provides insightful information. Overall, the pair plot provides a visual summary of the relationships and patterns in the data.





## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

1.   Modify International Plan: Since the charge for the International Plan is the same as the normal plan, it might be a good idea to modify the plan to make it more attractive to customers who frequently make international calls.

2.   Be proactive with communication: Communication is key in retaining customers. The company should be proactive in communicating with customers and addressing any concerns they may have.


3.   Ask for feedback often: Regularly seeking feedback from customers can help the company identify areas for improvement and address any issues before they become major problems.

4.  Periodically throw offers to retain customers: Offering discounts or promotions periodically can incentivize customers to stay with the company and reduce churn rates.

5.  Look at the customers facing problem in the most churning states: Identifying the customers facing issues in the most churning states can help the company focus their efforts and resources to retain those customers.

6.   Lean into best customers: Focusing on retaining the best customers, who generate the most revenue, can help the company maximize profits.

7.   Regular server maintenance: Ensuring the servers are running smoothly can help prevent issues such as poor network connectivity and improve the customer experience.

8.  Solving poor network connectivity issue: Addressing poor network connectivity can improve the customer experience and reduce the likelihood of churn.

9.   Define a roadmap for new customers: Providing new customers with a clear roadmap of what to expect can help them feel more comfortable and committed to staying with the company.

10.  Analyze churn when it happens: Analyzing churn as it happens can help the company identify patterns and factors that contribute to churn and make necessary changes to prevent it in the future.

11. tay competitive: The company should continue to monitor the market and stay competitive by offering competitive pricing and features to attract and retain customers.


# **Conclusion**



*   The minute fields are more strongly associated with the charge fields than the area code or state fields, which suggests that these fields may not be as important for predicting churn.

*   The International Plan field is a strong predictor of churn, suggesting that customers with this plan may have unique needs or concerns that the company should address.

*   Customers with high numbers of customer service calls are at a greater risk of churning, indicating that the company should prioritize addressing their concerns and providing quality customer service.

*  Customers with high day and evening minutes are also at a higher risk of churning, which suggests that the company should pay special attention to these customers and address any issues they may be experiencing with their service or plan.

*   Finally, there is no obvious association between churn and several other variables, including day calls, evening calls, night calls, international calls, night minutes, international minutes, account length, and voicemail messages. However, this does not necessarily mean that these variables are not important for predicting churn, and further analysis may be needed to fully understand their relationship to customer churn.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***