# Telco Customer Churn

## Content
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

## The data set includes information about:

Customers who left within the last month – the column is called Churn

Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies

Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges

Demographic info about customers – gender, age range, and if they have partners and dependents.

Reference : https://www.kaggle.com/datasets/blastchar/telco-customer-churn

In [1]:
# Dependencies
import pandas as pd
from pathlib import Path


In [2]:
# Save file path to variable
file = Path("WA_Fn-UseC_-Telco-Customer-Churn.csv")

# Read with Pandas
Churn_df = pd.read_csv(file)
Churn_df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [3]:
# Get the column names. 
Churn_df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [4]:
# Drop "SeniorCitizen, MultipleLines, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport,StreamingTV, StreamingMovies" Columns
                   
churn_df_new = Churn_df.drop(columns = ['SeniorCitizen','MultipleLines',
                   'OnlineSecurity','OnlineBackup',
                   'DeviceProtection','TechSupport',
                   'StreamingTV','StreamingMovies'])

churn_df_new.head()

Unnamed: 0,customerID,gender,Partner,Dependents,tenure,PhoneService,InternetService,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,Yes,No,1,No,DSL,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,No,No,34,Yes,DSL,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,No,No,2,Yes,DSL,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,No,No,45,No,DSL,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,No,No,2,Yes,Fiber optic,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [5]:
# Get the column names after droping the columns
Churn_df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [6]:
# Create a DataFrame 
customer_churn_df = pd.DataFrame(churn_df_new)
customer_churn_df.head()

Unnamed: 0,customerID,gender,Partner,Dependents,tenure,PhoneService,InternetService,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,Yes,No,1,No,DSL,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,No,No,34,Yes,DSL,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,No,No,2,Yes,DSL,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,No,No,45,No,DSL,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,No,No,2,Yes,Fiber optic,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [7]:
# Rename the columns names:"customerID": "CustomerID", "gender: Gender", "tenure : Tenure"
customer_churn_df = customer_churn_df.rename(
    columns={"customerID": "CustomerID",
             "Partner" : "LifePartner",
            "gender": "Gender",
            "tenure": "Tenure"})

# Print the Dataframe
customer_churn_df.head(10)

Unnamed: 0,CustomerID,Gender,LifePartner,Dependents,Tenure,PhoneService,InternetService,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,Yes,No,1,No,DSL,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,No,No,34,Yes,DSL,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,No,No,2,Yes,DSL,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,No,No,45,No,DSL,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,No,No,2,Yes,Fiber optic,Month-to-month,Yes,Electronic check,70.7,151.65,Yes
5,9305-CDSKC,Female,No,No,8,Yes,Fiber optic,Month-to-month,Yes,Electronic check,99.65,820.5,Yes
6,1452-KIOVK,Male,No,Yes,22,Yes,Fiber optic,Month-to-month,Yes,Credit card (automatic),89.1,1949.4,No
7,6713-OKOMC,Female,No,No,10,No,DSL,Month-to-month,No,Mailed check,29.75,301.9,No
8,7892-POOKP,Female,Yes,No,28,Yes,Fiber optic,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes
9,6388-TABGU,Male,No,Yes,62,Yes,DSL,One year,No,Bank transfer (automatic),56.15,3487.95,No


In [8]:
# Print the number of rows and columns in the Dataframe
customer_churn_df.shape

(7043, 13)

In [9]:
# Check the data types.
customer_churn_df.dtypes

CustomerID           object
Gender               object
LifePartner          object
Dependents           object
Tenure                int64
PhoneService         object
InternetService      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

In [10]:
# Prints information of all columns:
customer_churn_df.info(verbose=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   CustomerID        7043 non-null   object 
 1   Gender            7043 non-null   object 
 2   LifePartner       7043 non-null   object 
 3   Dependents        7043 non-null   object 
 4   Tenure            7043 non-null   int64  
 5   PhoneService      7043 non-null   object 
 6   InternetService   7043 non-null   object 
 7   Contract          7043 non-null   object 
 8   PaperlessBilling  7043 non-null   object 
 9   PaymentMethod     7043 non-null   object 
 10  MonthlyCharges    7043 non-null   float64
 11  TotalCharges      7043 non-null   object 
 12  Churn             7043 non-null   object 
dtypes: float64(1), int64(1), object(11)
memory usage: 715.4+ KB


In [11]:
# Count NaN/null values on entire DataFrame 
null_counts = customer_churn_df.isnull().sum()
print(null_counts)

CustomerID          0
Gender              0
LifePartner         0
Dependents          0
Tenure              0
PhoneService        0
InternetService     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64


In [12]:
# Drop null values.
# Though we have checked for the null value count in previous code but just for confirmation checking
# again with dropping null values so that we can get same number of raw counts for each columns.
customer_churn_df = customer_churn_df.dropna()
customer_churn_df.count()

CustomerID          7043
Gender              7043
LifePartner         7043
Dependents          7043
Tenure              7043
PhoneService        7043
InternetService     7043
Contract            7043
PaperlessBilling    7043
PaymentMethod       7043
MonthlyCharges      7043
TotalCharges        7043
Churn               7043
dtype: int64

In [13]:
# find out if DataFame has any duplicate rows
dup_rows = customer_churn_df[customer_churn_df.duplicated() == True].count()
print('Duplicated rows: ', dup_rows.sum())

Duplicated rows:  0


In [14]:
# Find out if DataFrame has any duplicate CustomerID. CustomerID is unique for each customers
dups_cust_id = customer_churn_df['CustomerID'][customer_churn_df['CustomerID'].duplicated() == True].count()
print('Duplicated customer ids: ', dups_cust_id)

Duplicated customer ids:  0


In [15]:
# Export file as a CSV, without the Pandas index, but with the header
customer_churn_df.to_csv("Clean_Telco-Customer-Churn.csv", index=False, header=True)