# Telecom Customer Churn Case Study

You have been provided with a dataset related to telecom customer churn. Each row in the dataset represents a unique customer, and the columns contain various attributes and information about these customers.

The data set includes information about:
- Churn Column: Indicates customer churn within the last month.
- Services Info: Subscribed services like phone, internet, etc.
- Account Details: Tenure, contract, billing, charges.
- Demographics: Gender, age, and family status.


## Load the dataset in a dataframe

In [1]:
#import necessary libraries
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd


In [4]:
#1. import the provided dataset to dataframe (telecom_customer_churn.csv)
df = pd.read_csv('D:\\CodeBasics\\Maths__Stats_Exercises\\MathsnStats_Exercise1\\MathsnStats_Exercise1\\telecom_customer_churn.csv')

#2. change the settings to display all the columns
#3. check the number of rows and columns
df.shape
#4. check the top 5 rows
df.head(5)
print(f'this is the shape: ', df.shape)




this is the shape:  (7043, 21)


In [11]:
#display all the column names
df.columns

Index(['customer_id', 'gender', 'senior_citizen', 'partner', 'dependents',
       'tenure', 'phone_service', 'multiple_lines', 'internet_service',
       'online_security', 'online_backup', 'device_protection', 'tech_support',
       'streaming_tv', 'streaming_movies', 'contract', 'paperless_billing',
       'payment_method', 'monthly_charges', 'total_charges', 'churn'],
      dtype='object')

In [16]:
# Check if the dataset contains nulls
df.isnull().sum()

customer_id          0
gender               0
senior_citizen       0
partner              0
dependents           0
tenure               0
phone_service        0
multiple_lines       0
internet_service     0
online_security      0
online_backup        0
device_protection    0
tech_support         0
streaming_tv         0
streaming_movies     0
contract             0
paperless_billing    0
payment_method       0
monthly_charges      0
total_charges        0
churn                0
dtype: int64

In [19]:
#check the datatype of all columns
df.dtypes

customer_id           object
gender                object
senior_citizen         int64
partner               object
dependents            object
tenure                 int64
phone_service         object
multiple_lines        object
internet_service      object
online_security       object
online_backup         object
device_protection     object
tech_support          object
streaming_tv          object
streaming_movies      object
contract              object
paperless_billing     object
payment_method        object
monthly_charges      float64
total_charges         object
churn                 object
dtype: object

In [64]:
# Fix the datatype
#convert the datatype of 'monthly_charges', 'total_charges', 'tenure' to numeric datatype (pd.to_numeric)
pd.to_numeric(df['monthly_charges'], errors='coerce')
pd.to_numeric(df['total_charges'], errors='coerce')
pd.to_numeric(df['tenure'], errors='coerce')

0        1
1       34
2        2
3       45
4        2
        ..
7038    24
7039    72
7040    11
7041     4
7042    66
Name: tenure, Length: 7043, dtype: int64

Q1 - Calculate the mean, median, and mode of the monthly_charges column

In [8]:
print('mean: ',df['monthly_charges'].mean())
print('median: ',df['monthly_charges'].median())
print('mode: ',df['monthly_charges'].mode())
print('std dev: ', df['monthly_charges'].std())

mean:  64.76169246059918
median:  70.35
mode:  0    20.05
Name: monthly_charges, dtype: float64
std dev:  30.090047097678493


Q2 - Calculate the 25th, 50th, and 75th percentiles of the total_charges column

Q3 - Calculate the range of monthly_charges column?

Hint - Range is the difference between max and min of monthly_charges.

In [74]:
monthly_charges_min = df['monthly_charges'].min()
monthly_charges_max = df['monthly_charges'].max()
monthly_charges_range = monthly_charges_max - monthly_charges_min
print(f'this is the range: {monthly_charges_range}')

this is the range: 100.5


Q4 - What is the first quartile of the monthly_charges column for customers who have not churned?

In [86]:
non_churn_df = df[df['churn'] == 'No']
#df.apply(lambda x: x['churn'] == 'Yes', axis=1)
non_churn_df.quantile(0.25)

  non_churn_df.quantile(0.25)


senior_citizen      0.0
tenure             15.0
monthly_charges    25.1
Name: 0.25, dtype: float64

Q5 - What is the third quartile of the total_charges column for customers who have churned?

In [87]:
churn_df = df[df['churn'] == 'Yes']
#df.apply(lambda x: x['churn'] == 'Yes', axis=1)
churn_df.quantile(0.3)

  churn_df.quantile(0.3)


senior_citizen      0.00
tenure              3.00
monthly_charges    69.55
Name: 0.3, dtype: float64

Q6-  What is the mode of the payment method column for customers who have churned?

In [91]:
method_of_churned = churn_df[['payment_method']]
method_of_churned

Unnamed: 0,payment_method
2,Mailed check
4,Electronic check
5,Electronic check
8,Electronic check
13,Bank transfer (automatic)
...,...
7021,Electronic check
7026,Bank transfer (automatic)
7032,Electronic check
7034,Credit card (automatic)


Q7 - What is the mean of the total charges column for customers who have churned and have a month-to-month contract?

In [129]:
# Filter the rows based on the churn status and contract type
mon_to_mon_cont = churn_df[df['contract']=='Month-to-month']
# Calculate the mean of the total charges column
number = mon_to_mon_cont['total_charges'].head(1).iloc[0]
sum_of_mon_to_mon_cont = mon_to_mon_cont['total_charges'].astype(float).sum()
rows = mon_to_mon_cont.shape[0]
mean = sum_of_mon_to_mon_cont/rows
print(sum_of_mon_to_mon_cont)
print(f'this is mean: {mean}')

# Print the result


1927182.25
this is mean: 1164.4605740181269


  mon_to_mon_cont = churn_df[df['contract']=='Month-to-month']


Q8 - What is the median of the tenure column for customers who have not churned and have a two-year contract?

In [136]:
# Filter the rows based on the churn status and contract type
two_year_cont = non_churn_df[non_churn_df['contract']=='Two year']
two_year_cont
# Calculate the median of the tenure column
median = two_year_cont['tenure'].median()
# Print the result
print(f'this is median {median}')

this is median 64.0
