<a href="https://colab.research.google.com/github/naphtron/Phase-3-Project/blob/master/customer_churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Customer Churn Prediction Project

## Introduction

In this notebook, we embark on a project to build a predictive model aimed at forecasting whether Telco customers will churn or not.

Churn, in this context, refers to customers terminating their association with the telecommunications service provider.

The primary objective of this project is to develop an effective model that can anticipate customer churn, allowing the company to proactively take measures to retain customers and consequently reduce the overall churn rate.

# Data Description

# Telco Customer Churn DataFrame

The provided data represents information about Telco customers and includes 7043 entries with 21 columns. The DataFrame has a `RangeIndex` with entries ranging from 0 to 7042.

## Data Columns

| Column            | Description                                        |
|-------------------|----------------------------------------------------|
| customerID        | Unique identifier for each customer.               |
| gender            | Gender of the customer (e.g., Male or Female).     |
| SeniorCitizen     | Binary indicator for senior citizen status (1 or 0).|
| Partner           | Whether the customer has a partner (Yes or No).    |
| Dependents        | Whether the customer has dependents (Yes or No).  |
| tenure            | Number of months the customer has been with the company.|
| PhoneService      | Whether the customer has phone service (Yes or No).|
| MultipleLines     | Whether the customer has multiple lines (Yes, No, or No phone service).|
| InternetService   | Type of internet service (DSL, Fiber optic, or No).|
| OnlineSecurity    | Availability of online security (Yes, No, or No internet service).|
| OnlineBackup      | Availability of online backup (Yes, No, or No internet service).|
| DeviceProtection  | Availability of device protection (Yes, No, or No internet service).|
| TechSupport       | Availability of tech support (Yes, No, or No internet service).|
| StreamingTV       | Availability of streaming TV (Yes, No, or No internet service).|
| StreamingMovies   | Availability of streaming movies (Yes, No, or No internet service).|
| Contract          | Type of customer contract (Month-to-month, One year, Two years).|
| PaperlessBilling  | Whether the customer uses paperless billing (Yes or No).|
| PaymentMethod     | The customer's payment method.                     |
| MonthlyCharges    | Monthly amount charged to the customer (in dollars).|
| TotalCharges      | Total amount charged to the customer.               |
| Churn             | Customer churn status (Yes or No).                 |


## Notebook Structure

1. **Dataset Description:** Formal documentation describing the structure and content of the Telco Customer Churn dataset.

2. **Import Libraries:** Importing necessary libraries, such as Pandas for data manipulation and analysis.

3. **Load Dataset:** Reading the dataset into a Pandas DataFrame for further analysis.

4. **Data Preparation:** Cleaning and preprocessing the data for model input.

5. **Exploratory Data Analysis (EDA):** Exploring the cleaned dataset to gain insights into its characteristics.

6. **Modeling:** Developing and training the predictive model.

7. **Model Evaluation:** Assessing the performance of the trained model.

8. **Conclusion:** Summarizing the findings and outlining potential next steps.

In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np

In [18]:

# Read the CSV file 'Telco-Customer-Churn.csv' into a DataFrame named df
df = pd.read_csv('Telco-Customer-Churn.csv')

# Display the first few rows of the DataFrame to get an overview
display(df.head())
print("\n Info")
df.info()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes



 Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   o

In [27]:
print("Number of categorical columns ",len(df.select_dtypes(['object']).columns))
print("Number of numeric columns: ", len(df.select_dtypes(['float','int']).columns))

Number of categorical columns  18
Number of numeric columns:  3


In [11]:
df.isna().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

No null values present.

Next, check for duplicate records

In [12]:
df.duplicated().sum()

0

The dataset contains no duplicates

In [30]:
df.nunique()

customerID          7043
gender                 2
SeniorCitizen          2
Partner                2
Dependents             2
tenure                73
PhoneService           2
MultipleLines          3
InternetService        3
OnlineSecurity         3
OnlineBackup           3
DeviceProtection       3
TechSupport            3
StreamingTV            3
StreamingMovies        3
Contract               3
PaperlessBilling       2
PaymentMethod          4
MonthlyCharges      1585
TotalCharges        6531
Churn                  2
dtype: int64