<a href="https://colab.research.google.com/github/ss22aba/Customer-Churn-Prediction/blob/main/Customer_Churn_Prediction_and_Analysis_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **INTRODUCTION**

**Customer Churn**

Businesses work hard to attract new customers from their competitors. Keeping customers is crucial because it directly impacts a company's income. Identifying customers who might leave early helps businesses take action to retain them. This study aims to find the best machine learning methods to predict customer churn early. The data used in this study includes information about customers from about nine months before they might leave.

**Dataset**.     https://www.kaggle.com/datasets/blastchar/telco-customer-churn


In [1]:
# Upgrading the version of scipy
!pip install --upgrade scipy

Collecting scipy
  Downloading scipy-1.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.6/38.6 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.11.4
    Uninstalling scipy-1.11.4:
      Successfully uninstalled scipy-1.11.4
Successfully installed scipy-1.13.0


In [2]:
#-------- Importing Basic Modules --------#

# For doing mathematical computation
import math
# For providing functionalities for file system, environment variables, and system functions
import os
# For managing memory and optimizing performance
import gc
# For managing memory and optimizing performance
import gc
# For generating random numbers, sequences, and performing random selections
import random
# For printing data structures
import pprint
# For performing inear algebra and basic statistical operations
import numpy as np
# For performing data analysis, manipulation and visualisation tasks
import pandas as pd
# For creating visualizations in Python
import matplotlib.pyplot as plt
# For creating informative and visually appealing statistical graphics
import seaborn as sns



In [3]:
# Uploading the dataset into the colab
from google.colab import files
uploaded = files.upload()

Saving Telco-Customer-Churn.csv to Telco-Customer-Churn.csv


In [4]:
# For reading the CSV file "Telco-Customer-Churn"
df_churn = pd.read_csv("Telco-Customer-Churn.csv")
# For displaying first few rows of dataframe
df_churn.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [5]:
# Display the dimensions of the dataset
print(f"Dataset Dimension: {df_churn.shape[0]} rows, {df_churn.shape[1]} columns")

Dataset Dimension: 7043 rows, 21 columns


In [6]:
# To get the summary of the dataframe
df_churn.info()

print("\n SeniorCitizen is already converted to integer")
print("\n TotalCharges should be converted to float")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [7]:
# Drop customerID column
del df_churn["customerID"]

In [8]:
# Transposed summary of the categorical columns in your df_churn DataFrame
df_churn.describe(include=['object']).T

Unnamed: 0,count,unique,top,freq
gender,7043,2,Male,3555
Partner,7043,2,No,3641
Dependents,7043,2,No,4933
PhoneService,7043,2,Yes,6361
MultipleLines,7043,3,No,3390
InternetService,7043,3,Fiber optic,3096
OnlineSecurity,7043,3,No,3498
OnlineBackup,7043,3,No,3088
DeviceProtection,7043,3,No,3095
TechSupport,7043,3,No,3473


In [9]:
# Displays the total number of rows and the number of unique rows in a DataFrame.
print('Known observations: {}\nUnique observations: {}'.format(len(df_churn.index),len(df_churn.drop_duplicates().index)))

print("No duplicates Found!")

Known observations: 7043
Unique observations: 7021
No duplicates Found!


In [16]:
# Padding value
left_padding = 21
# Calculates and displays the number of unique values for each feature in the DataFrame
print("Unique Values By Features")
for feature in df_churn.columns:
    uniq = np.unique(df_churn[feature])
    print(feature.ljust(left_padding),len(uniq))

Unique Values By Features
gender                2
SeniorCitizen         2
Partner               2
Dependents            2
tenure                73
PhoneService          2
MultipleLines         3
InternetService       3
OnlineSecurity        3
OnlineBackup          3
DeviceProtection      3
TechSupport           3
StreamingTV           3
StreamingMovies       3
Contract              3
PaperlessBilling      2
PaymentMethod         4
MonthlyCharges        1585
TotalCharges          6531
Churn                 2


In [18]:
# Check for missing values
df_churn.isna().sum()

gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64