# Churn Analysis of a Telecom Company using Pandas

# About Dataset

**Content:**<br>
The **Telco Customer Churn** dataset includes a variety of variables or features that provide information about customers and their interactions with a telecommunications company, each row represents a customer, each column contains customer’s attributes.
1. **Customer ID**: A unique identifier for each customer.
2. **Gender**: The gender of the customer (e.g., Male, Female).
3. **Senior Citizen**: Whether the customer is a senior citizen or not, 1: yes / 0: no.
4. **Partner**: Whether the customer has a partner (Yes/No).
5. **Dependents**: Whether the customer has dependents (children, relatives...), (Yes/No).
6. **Tenure**: The number of months the customer has stayed with the company.
7. **Phone Service**: Whether the customer has phone service provided by the company (Yes/No).
8. **Multiple Lines**: Whether the customer has multiple lines (e.g., Yes, No, No phone service).
9. **Internet Service**: Type of internet service subscribed (e.g., DSL, Fiber optic, No).
10. **Online Security**: Whether the customer has online security service (e.g., Yes, No, No internet service).
11. **Online Backup**: Whether the customer has online backup service (e.g., Yes, No, No internet service).
12. **Device Protection**: Whether the customer has device protection service (e.g., Yes, No, No internet service).
13. **Tech Support**: Whether the customer has tech support service (e.g., Yes, No, No internet service).
14. **Streaming TV**: Whether the customer has streaming TV service (e.g., Yes, No, No internet service).
15. **Streaming Movies**: Whether the customer has streaming movie service (e.g., Yes, No, No internet service).
16. **Contract**: The type of contract the customer has (e.g., Month-to-month, One year, Two year).
17. **Paperless Billing**: Whether the customer has opted for paperless billing (Yes/No).
18. **Payment Method**: The method of payment used by the customer (e.g., Electronic check, Credit card, Bank transfer, Mailed check).
19. **Monthly Charges**: The amount charged to the customer on a monthly basis.
20. **Total Charges**: The total amount charged to the customer over the entire tenure.
21. **Churn**: Whether the customer churned (cancelled the service) or not (Yes/No).

**Objective**
- Our target variable is **Churn**, the objective is to analyze what factors affect the churn rate in the company.

**Questions Answered**
1. Is customer churn gender neutral?
2. Do senior citizens churn more often than others?
3. Does having a partner affect a customer's behavior, what about dependents?
4. How does customers tenure affect their churn?
5. Do 'PhoneService' and'MultipleLines' play a significant role in customer churn?
6. Does subscribing to an Internet service affect churn? What about other related services?
7. Are contract types related to customer churn?
8. Do customers with paperless billing churn more often?
9. Do certain payment methods increase churn? Which payment method makes customers churn more often?
10. How does the amount of monthly charges affect customer churn?

# 1. Importing Data from Python and Reading Dataset

In [557]:
import pandas as pd # for data processing
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')

# 2. Inspecting the data

In [558]:
# Inspecting the first few rows 
df.head(2)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No


In [559]:
# Inspecting the last few rows
df.tail(2)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.4,306.6,Yes
7042,3186-AJIEK,Male,0,No,No,66,Yes,No,Fiber optic,Yes,...,Yes,Yes,Yes,Yes,Two year,Yes,Bank transfer (automatic),105.65,6844.5,No


In [560]:
df.shape
#(rows,columns)

(7043, 21)

In [561]:
# checking if customer ID's are unique to avoid duplicates
len(list(df['customerID'].unique()))

7043

In [562]:
# Looking at info of each column such as column name, number of rows, number of non_null values and data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


- We don't have missing values in any column.
- The variable TotalCharges is a numeric variable, but it is of type 'object' in our data set, we need to change it to float.

In [563]:
# Description of some numerical variables
df.describe().round(3)

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162,32.371,64.762
std,0.369,24.559,30.09
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75


- 16% of customers are senior citizens.
- Customers have stayed 32 months on average, that's 2.5+ years.
- Nothing fishy about our data...

In [564]:
# Description of categorical variables
df.describe(include = 'object').T

Unnamed: 0,count,unique,top,freq
customerID,7043,7043,7590-VHVEG,1
gender,7043,2,Male,3555
Partner,7043,2,No,3641
Dependents,7043,2,No,4933
PhoneService,7043,2,Yes,6361
MultipleLines,7043,3,No,3390
InternetService,7043,3,Fiber optic,3096
OnlineSecurity,7043,3,No,3498
OnlineBackup,7043,3,No,3088
DeviceProtection,7043,3,No,3095


top represents the mode, freq is the frequency of observation of the mode.

# 3. Data Wrangling

### Changing data type

In [565]:
# Converting TotalCharges variable to numeric
# df['TotalCharges'] = pd.to_numeric(df['TotalCharges'])
# We get an error: we have empty strings in this column.

In [566]:
# We have 11 instances with an empty string in the TotalCharges column
df.loc[df['TotalCharges'] == ' ', :]

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
488,4472-LVYGI,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,,No
753,3115-CZMZD,Male,0,No,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.25,,No
936,5709-LVOEQ,Female,0,Yes,Yes,0,Yes,No,DSL,Yes,...,Yes,No,Yes,Yes,Two year,No,Mailed check,80.85,,No
1082,4367-NUYAO,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.75,,No
1340,1371-DWPAZ,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,,No
3331,7644-OMVMY,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,19.85,,No
3826,3213-VVOLG,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.35,,No
4380,2520-SGTTA,Female,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.0,,No
5218,2923-ARZLG,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,19.7,,No
6670,4075-WKNIU,Female,0,Yes,Yes,0,Yes,Yes,DSL,No,...,Yes,Yes,Yes,No,Two year,No,Mailed check,73.35,,No


In [567]:
# Finding a relation between monthly and total charges with tenure
df[['tenure', 'MonthlyCharges', 'TotalCharges']].head()

Unnamed: 0,tenure,MonthlyCharges,TotalCharges
0,1,29.85,29.85
1,34,56.95,1889.5
2,2,53.85,108.15
3,45,42.3,1840.75
4,2,70.7,151.65


It seems like ***Totalcharges ≈ tenure * MonthlyCharges*** but this is not accurate, there could be discounts or extra costs for delays in payment.

All of the customers who have not stayed with the company for any period of time have a **Tenure** value of 0. We can replace the **TotalCharges** values with 0, based on our previous discovery. These two variables are positively correlated.

In [568]:
# Replacing string values with 0
df['TotalCharges'].replace(' ',0,inplace=True)

In [569]:
# We can cast it into a float now
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'])
df['TotalCharges'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 7043 entries, 0 to 7042
Series name: TotalCharges
Non-Null Count  Dtype  
--------------  -----  
7043 non-null   float64
dtypes: float64(1)
memory usage: 55.1 KB


In [570]:
# Confirming the correlation
df[['tenure', 'MonthlyCharges','TotalCharges']].corr()

Unnamed: 0,tenure,MonthlyCharges,TotalCharges
tenure,1.0,0.2479,0.826178
MonthlyCharges,0.2479,1.0,0.651174
TotalCharges,0.826178,0.651174,1.0


### Binning

We can bin 3 numerical variables in our data:
1. tenure
2. MonthlyCharges
3. TotalCharges

In [571]:
# Summary of our variables:
df[['tenure','MonthlyCharges','TotalCharges']].agg(['mean','min','max','count']).round(2)


Unnamed: 0,tenure,MonthlyCharges,TotalCharges
mean,32.37,64.76,2279.73
min,0.0,18.25,0.0
max,72.0,118.75,8684.8
count,7043.0,7043.0,7043.0


In [572]:
# Binning will make our analysis easier
bins_tenure = [0,32,64,96]
bins_monthly = [0,20,64,128]
bins_total = [0,2000,4000,9000]
df['tenure_bin'] = pd.cut(df['tenure'], bins=bins_tenure, labels=['00-32', '32-64', '64-119'])
df['MonthlyCharges_bin'] = pd.cut(df['MonthlyCharges'], bins=bins_monthly, labels=['Low', 'Medium', 'High'])
df['TotalCharges_bin'] = pd.cut(df['TotalCharges'], bins=bins_total, labels=['Low', 'Medium', 'High'])
df[['tenure_bin','MonthlyCharges_bin', 'TotalCharges_bin']].head()

Unnamed: 0,tenure_bin,MonthlyCharges_bin,TotalCharges_bin
0,00-32,Medium,Low
1,32-64,Medium,Low
2,00-32,Medium,Low
3,32-64,Medium,Low
4,00-32,High,Low


# 4. Data Analysis

## What are the factors (variables) affecting customer churn?

In [573]:
# Checking all variable names
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn',
       'tenure_bin', 'MonthlyCharges_bin', 'TotalCharges_bin'],
      dtype='object')

### 1 Is customer churn gender neutral?

In [574]:
df[['gender', 'Churn']].groupby('gender').value_counts().reset_index()

Unnamed: 0,gender,Churn,count
0,Female,No,2549
1,Female,Yes,939
2,Male,No,2625
3,Male,Yes,930


Gender does not affect churn, almost same proportion of customers left the company in both gender groups.

### 2 Do senior citizens churn more often than others?

In [599]:
# Percentage of senior customers in this dataset
df['SeniorCitizen'].mean().round(2)*100

16.0

In [575]:
pd.DataFrame(df[['Churn','SeniorCitizen']].groupby('SeniorCitizen').value_counts(normalize = True).mul(100)).rename(columns = {'proportion':'percentage'}).round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,percentage
SeniorCitizen,Churn,Unnamed: 2_level_1
0,No,76.39
0,Yes,23.61
1,No,58.32
1,Yes,41.68


A higher percentage of senior citizens churned (41.6%) compared to non-senior citizens (23.6%). This suggests that being a senior citizen may be associated with churning, but further analysis must be done to understand the reasons behind these patterns.

### 3 Does having a partner affect a customer's behavior, what about dependents?

In [576]:
# Partner
pd.DataFrame(df[['Churn','Partner']].groupby('Partner').value_counts(normalize = True).mul(100).round(2)).rename(columns = {'proportion':'percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,percentage
Partner,Churn,Unnamed: 2_level_1
No,No,67.04
No,Yes,32.96
Yes,No,80.34
Yes,Yes,19.66


- Among customers who left the company, 32.96% do not have partners while 19.66 do.
- This could mean that customers with partners are less likely to churn compared to those without partners.

In [577]:
# Dependents
pd.DataFrame(df[['Churn','Dependents']].groupby('Dependents').value_counts(normalize = True).mul(100).round(2)).rename(columns = {'proportion':'percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,percentage
Dependents,Churn,Unnamed: 2_level_1
No,No,68.72
No,Yes,31.28
Yes,No,84.55
Yes,Yes,15.45


- Customers who do not have dependents are twice as likely (31.28%) to churn compared with customers who have dependents (15.45%) (Children, spouse, elderly relative...)

### 4 How does customers tenure affect their churn?

In [578]:
pd.DataFrame(df[['tenure_bin','Churn']].groupby('tenure_bin').value_counts(normalize = True)).mul(100).round(2).rename(columns = {'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
tenure_bin,Churn,Unnamed: 2_level_1
00-32,No,61.21
00-32,Yes,38.79
32-64,No,84.39
32-64,Yes,15.61
64-119,No,93.51
64-119,Yes,6.49


- Customers who have stayed for less than 32 months with the company are more than twice likely (38.79%) to churn compared to those who stayed for more than 64 months (15.61%).
- The longer the customer stays in the company the less likely they are to leave; 'tenure' and 'Churn' are negatively correlated.


#### Total charges is strongly correlated to tenure, so it only makes sense that smaller amounts of total charges lead to higher churn rate

In [579]:
pd.DataFrame(df[['TotalCharges_bin','Churn']].groupby('TotalCharges_bin').value_counts(normalize = True)).mul(100).round(2).rename(columns = {'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
TotalCharges_bin,Churn,Unnamed: 2_level_1
Low,No,67.94
Low,Yes,32.06
Medium,No,76.24
Medium,Yes,23.76
High,No,85.25
High,Yes,14.75


### 5 Do 'PhoneService' and'MultipleLines' play a significant role in customer churn?

In [580]:
pd.DataFrame(df[['PhoneService', 'MultipleLines', 'Churn']].groupby('PhoneService').value_counts(normalize=True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Percentage
PhoneService,MultipleLines,Churn,Unnamed: 3_level_1
No,No phone service,No,75.073314
No,No phone service,Yes,24.926686
Yes,No,No,39.946549
Yes,Yes,No,33.343814
Yes,Yes,Yes,13.362679
Yes,No,Yes,13.346958


Having a phone service and multiple lines do not seem to significantly affect the churn rate.

### 6 Does subscribing to an Internet service affect churn? What about other related services?

In [581]:
pd.DataFrame(df[['InternetService', 'Churn']].groupby('InternetService').value_counts(normalize=True)).mul(100).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
InternetService,Churn,Unnamed: 2_level_1
DSL,No,81.040892
DSL,Yes,18.959108
Fiber optic,No,58.107235
Fiber optic,Yes,41.892765
No,No,92.59502
No,Yes,7.40498


- Most of customers who have have not subscribed to Internet Service didn't churn.
- Of those who do have Internet service, almost 42% of those with a Fiber optic provider churned. That's approximately twice the percentage of customers with DSL that churned (19%).


In [582]:
# Why would customers with a Fiber optic provider churn?
df[['InternetService' ,'MonthlyCharges']].groupby('InternetService').mean().round(2).reset_index()

Unnamed: 0,InternetService,MonthlyCharges
0,DSL,58.1
1,Fiber optic,91.5
2,No,21.08


It appears that the high monthly cost of Fiber optic Internet service could be why some customers who subscribed are churning.

#### Online Security

In [583]:
pd.DataFrame(df[['OnlineSecurity','Churn']].groupby('OnlineSecurity').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
OnlineSecurity,Churn,Unnamed: 2_level_1
No,No,58.233276
No,Yes,41.766724
No internet service,No,92.59502
No internet service,Yes,7.40498
Yes,No,85.388806
Yes,Yes,14.611194


- 41.77% of customers with no online security left the company while only 14.61% left among customers who subscribed to the online security service.


#### Online Backup

In [584]:
pd.DataFrame(df[['OnlineBackup','Churn']].groupby('OnlineBackup').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
OnlineBackup,Churn,Unnamed: 2_level_1
No,No,60.071244
No,Yes,39.928756
No internet service,No,92.59502
No internet service,Yes,7.40498
Yes,No,78.468506
Yes,Yes,21.531494


#### Device Protection

In [585]:
pd.DataFrame(df[['DeviceProtection','Churn']].groupby('DeviceProtection').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
DeviceProtection,Churn,Unnamed: 2_level_1
No,No,60.872375
No,Yes,39.127625
No internet service,No,92.59502
No internet service,Yes,7.40498
Yes,No,77.497936
Yes,Yes,22.502064


#### Tech Support

In [586]:
pd.DataFrame(df[['TechSupport','Churn']].groupby('TechSupport').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
TechSupport,Churn,Unnamed: 2_level_1
No,No,58.364526
No,Yes,41.635474
No internet service,No,92.59502
No internet service,Yes,7.40498
Yes,No,84.833659
Yes,Yes,15.166341


#### Streaming TV

In [587]:
pd.DataFrame(df[['StreamingTV','Churn']].groupby('StreamingTV').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
StreamingTV,Churn,Unnamed: 2_level_1
No,No,66.476868
No,Yes,33.523132
No internet service,No,92.59502
No internet service,Yes,7.40498
Yes,No,69.929812
Yes,Yes,30.070188


#### Streaming Movies

In [588]:
pd.DataFrame(df[['StreamingMovies','Churn']].groupby('StreamingMovies').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
StreamingMovies,Churn,Unnamed: 2_level_1
No,No,66.319569
No,Yes,33.680431
No internet service,No,92.59502
No internet service,Yes,7.40498
Yes,No,70.058565
Yes,Yes,29.941435


In [589]:
pd.DataFrame(df[['StreamingMovies','Churn']].groupby('StreamingMovies').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
StreamingMovies,Churn,Unnamed: 2_level_1
No,No,66.319569
No,Yes,33.680431
No internet service,No,92.59502
No internet service,Yes,7.40498
Yes,No,70.058565
Yes,Yes,29.941435


- Customers who use Online Security, Online Backup, Device Protection, and Tech Support services are less likely to churn, but these variables do not greatly influence churn rate.
- Streaming TV and Streaming Movies services do not have much of an influence on customer churn.

### 7 Are contract types related to customer churn? Which type influences churn rate the most?

In [590]:
pd.DataFrame(df[['Contract','Churn']].groupby('Contract').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
Contract,Churn,Unnamed: 2_level_1
Month-to-month,No,57.290323
Month-to-month,Yes,42.709677
One year,No,88.730482
One year,Yes,11.269518
Two year,No,97.168142
Two year,Yes,2.831858


- Most customers who signed one-year and two-year contracts didn't leave the company.
- 42.7% of customers who signed a month-to-month contract churned, while only 15% of customers signing other contract types left.
- Let's look closer at the month-to-month contracts.

In [591]:
pd.DataFrame(df[['Contract','Churn']].groupby('Churn').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
Churn,Contract,Unnamed: 2_level_1
No,Month-to-month,42.906842
No,Two year,31.832238
No,One year,25.26092
Yes,Month-to-month,88.550027
Yes,One year,8.881755
Yes,Two year,2.568218


- It's significant that 88.5% of people who churned have signed month-to-month contracts.
The contract does indeed play a significant role in customer churn.

### 8 Do customers with paperless billing churn more often?

In [592]:
pd.DataFrame(df[['PaperlessBilling', 'Churn']].groupby('PaperlessBilling').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
PaperlessBilling,Churn,Unnamed: 2_level_1
No,No,83.669916
No,Yes,16.330084
Yes,No,66.434908
Yes,Yes,33.565092


- The majority of customers with and without paperless billing did not churn, but of those who did leave the company, they're twice as likely to leave when delivered paperless bills.

### 9 Do certain payment methods increase churn? Which payment method makes customers churn more often?

In [593]:
pd.DataFrame(df[['PaymentMethod', 'Churn']].groupby('PaymentMethod').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
PaymentMethod,Churn,Unnamed: 2_level_1
Bank transfer (automatic),No,83.290155
Bank transfer (automatic),Yes,16.709845
Credit card (automatic),No,84.756899
Credit card (automatic),Yes,15.243101
Electronic check,No,54.714588
Electronic check,Yes,45.285412
Mailed check,No,80.8933
Mailed check,Yes,19.1067


Customers who pay using electronic checks are more likely to leave the company compared to those who pay with other methods.

### 10 How does the amount of monthly charges affect customer churn?

In [594]:
pd.DataFrame(df[['MonthlyCharges_bin', 'Churn']].groupby('MonthlyCharges_bin').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
MonthlyCharges_bin,Churn,Unnamed: 2_level_1
Low,No,91.158537
Low,Yes,8.841463
Medium,No,81.294069
Medium,Yes,18.705931
High,No,65.794769
High,Yes,34.205231


The higher the monthly charges were, the greater the percentage of customers who left the company.

In [595]:
pd.DataFrame(df[['MonthlyCharges_bin', 'Churn']].groupby('Churn').value_counts(normalize = True).mul(100)).rename(columns={'proportion':'Percentage'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
Churn,MonthlyCharges_bin,Unnamed: 2_level_1
No,High,50.560495
No,Medium,37.881716
No,Low,11.557789
Yes,High,72.766185
Yes,Medium,24.130551
Yes,Low,3.103264


72.77% of customers who churned pay high amounts of monthly charges. Monthly charges significantly influences customer churn rate.

# 5 Conclusion

- Customer tenure, contract types, and monthly charges are the most influencing variables on customer churn rate.
- Having a phone service and multiple lines as well as subscribing to a streaming TV or movie sevices do not affect the churn rate much.
- Having internet service, and otherrelated services like online security, online backup, device protection and tech support may influence churn rate but not greatly, they are moderately correlated with customer churn. So is the case with paperless billing, payment methods.