# **VODAFONE CORPORATION CUSTOMER CHURN PREDICTION**

#   **Business Understanding**

## Problem Statement

Telecommunication companies, such as Vodafone Corporation, face a significant challenge with customer churn, where customers cease using their services. To address this issue effectively, it's crucial to anticipate which customers are at risk of churning and implement proactive retention strategies. Leveraging machine learning models can provide a solution by predicting potential churners based on various factors, including usage patterns, payment history, and demographic data.

## Understanding Customer Attrition

Customer attrition, also known as churn or turnover, refers to the percentage of customers who stop using a company's product or service within a certain period.

For instance, if a company begins the year with 500 customers but ends with only 480, the churn rate is 4%. Predicting why and when customers leave can significantly help organizations strategize retention efforts.

## Project Goal

This project aims to:
- Determine the likelihood of customer churn based on certain characteristics such as gender
- Identify key indicators of churn
- Propose effective retention strategies to mitigate customer attrition.
- Train an Machine Learning algorithm to help predict the likelihood of a customer to churn

## Hypothesis

| Hypothesis Description                                                    |                                                                      |
|---------------------------------------------------------------------------|----------------------------------------------------------------------|
| Null Hypothesis (H0)                                                      | Customers with Month-to-Month contracts are equally likely to churn as those with one-year and two-year contracts.                  |
| Alternative Hypothesis (H1)                                               | Customers with Month-to-Month contracts are more likely to churn compared to those with one-year and two-year contracts.           |


## Analytical Questions

1. **From which contract do most churners originate?**
2. **Which gender exhibits the highest churn rate?**
3. **How does the number of lines a customer has influence churn?**
4. **Which internet service experiences the highest churn rate?**
5. **How does churn compare between customers with and without tech support?**
6. **Who churns more: customers with phone service or those without?**
7. **Who churns more: customers with paperless billing or those without?**
8. **During which tenures does churn occur most frequently?**


# **Data Understanding**
Columns and their Descriptions


| Column            | Description                                                         |
|-------------------|---------------------------------------------------------------------|
| Gender            | Whether the customer is a male or a female                          |
| SeniorCitizen     | Whether a customer is a senior citizen or not                       |
| Partner           | Whether the customer has a partner or not (Yes, No)                 |
| Dependents        | Whether the customer has dependents or not (Yes, No)                |
| Tenure            | Number of months the customer has stayed with the company          |
| Phone Service     | Whether the customer has a phone service or not (Yes, No)           |
| MultipleLines     | Whether the customer has multiple lines or not                      |
| InternetService   | Customer's internet service provider (DSL, Fiber Optic, No)        |
| OnlineSecurity    | Whether the customer has online security or not (Yes, No, No Internet) |
| OnlineBackup      | Whether the customer has online backup or not (Yes, No, No Internet) |
| DeviceProtection  | Whether the customer has device protection or not (Yes, No, No internet se
| TechSupport       | Whether the customer has tech support or not (Yes, No, No internet) |
| StreamingTV       | Whether the customer has streaming TV or not (Yes, No, No internet service
| StreamingMovies   | Whether the customer has streaming movies or not (Yes, No, No Internet ser
| Contract          | The contract term of the customer (Month-to-Month, One year, Two year) |
| PaperlessBilling  | Whether the customer has paperless billing or not (Yes, No)        |
| Payment Method    | The customer's payment method (Electronic check, mailed check, Bank transf
(automatic), Credit card(automatic)) |
| MonthlyCharges    | The amount charged to the customer monthly                          |
| TotalCharges      | The total amount charged to the customer                            |
| Churn             | Whether the customer churned or not (Yes or No)                     |



In [41]:
# Import libraries
from dotenv import dotenv_values
import warnings
warnings.filterwarnings('ignore')
import pyodbc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [42]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")

In [43]:
# Create a connection with the remote databse
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"
connection = pyodbc.connect(connection_string)

In [48]:
# load first dataset
# Load database churn data from the database
churn_data1="SELECT * FROM dbo.LP2_Telco_churn_first_3000"
churn_data1=pd.read_sql(churn_data1,connection)

# Save database churn data to a csv file
'''file_name='churn_data1.csv'
churn_data1.to_csv(file_name,index=False)'''

"file_name='churn_data1.csv'\nchurn_data1.to_csv(file_name,index=False)"

In [45]:
# load second dataset from a GitHub Repository
# The data is saved in a csv file
churn_data2=pd.read_csv('Vodafone_Churn_data\LP2_Telco-churn-second-2000.csv')

In [46]:
# load the third dataset saved in OneDrive.
#The data is excel format
# This dataset will be used as the testing dataset
churn_data3=pd.read_excel('Vodafone_Churn_data\Telco-churn-last-2000.xlsx')

In [67]:
pd.set_option('display.max_columns',21) #Display maximum number of columns
churn_data1.head()


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,False,True,False,1,False,,DSL,False,True,False,False,False,False,Month-to-month,True,Electronic check,29.85,29.85,False
1,5575-GNVDE,Male,False,False,False,34,True,False,DSL,True,False,True,False,False,False,One year,False,Mailed check,56.950001,1889.5,False
2,3668-QPYBK,Male,False,False,False,2,True,False,DSL,True,True,False,False,False,False,Month-to-month,True,Mailed check,53.849998,108.150002,True
3,7795-CFOCW,Male,False,False,False,45,False,,DSL,True,False,True,True,False,False,One year,False,Bank transfer (automatic),42.299999,1840.75,False
4,9237-HQITU,Female,False,False,False,2,True,False,Fiber optic,False,False,False,False,False,False,Month-to-month,True,Electronic check,70.699997,151.649994,True


In [68]:
churn_data2.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,5600-PDUJF,Male,0,No,No,6,Yes,No,DSL,No,No,No,Yes,No,No,Month-to-month,Yes,Credit card (automatic),49.5,312.7,No
1,8292-TYSPY,Male,0,No,No,19,Yes,No,DSL,No,No,Yes,Yes,No,No,Month-to-month,Yes,Credit card (automatic),55.0,1046.5,Yes
2,0567-XRHCU,Female,0,Yes,Yes,69,No,No phone service,DSL,Yes,No,Yes,No,No,Yes,Two year,Yes,Credit card (automatic),43.95,2960.1,No
3,1867-BDVFH,Male,0,Yes,Yes,11,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,74.35,834.2,Yes
4,2067-QYTCF,Female,0,Yes,No,64,Yes,Yes,Fiber optic,No,Yes,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,111.15,6953.4,No


In [69]:
churn_data3.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges
0,7613-LLQFO,Male,0,No,No,12,Yes,Yes,Fiber optic,No,No,No,No,Yes,No,Month-to-month,Yes,Electronic check,84.45,1059.55
1,4568-TTZRT,Male,0,No,No,9,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,No,Mailed check,20.4,181.8
2,9513-DXHDA,Male,0,No,No,27,Yes,No,DSL,Yes,No,Yes,Yes,Yes,Yes,One year,No,Electronic check,81.7,2212.55
3,2640-PMGFL,Male,0,No,Yes,27,Yes,Yes,Fiber optic,No,No,No,Yes,No,No,Month-to-month,Yes,Electronic check,79.5,2180.55
4,3801-HMYNL,Male,0,Yes,Yes,1,Yes,No,Fiber optic,No,No,No,No,Yes,Yes,Month-to-month,No,Mailed check,89.15,89.15


In [59]:
# Check for duplicated rows
print(f"dataset 1 duplicates: {churn_data1.duplicated().sum()}, "
      f"dataset 2 duplicates: {churn_data2.duplicated().sum()}, "
      f"dataset 3 duplicates: {churn_data3.duplicated().sum()}")


dataset 1 duplicates: 0, dataset 2 duplicates: 0, dataset 3 duplicates: 0


In [51]:
# Get shape of the datasets
churn_data1.shape,churn_data2.shape,churn_data3.shape

((3000, 21), (2043, 21), (2000, 20))