#   Business Understanding

## Introduction

**Understanding Customer Attrition**

Customer attrition, also known as churn or turnover, refers to the percentage of customers who stop using a company's product or service within a certain period.

For instance, if a company begins the year with 500 customers but ends with only 480, the churn rate is 4%. Predicting why and when customers leave can significantly help organizations strategize retention efforts.

**Project Goal**

This project aims to:
- Determine the likelihood of customer churn
- Identify key indicators of churn
- Propose effective retention strategies to mitigate customer attrition.


## Problem Statement

Telecommunication companies (telcos) like Vodafone Corporation encounter a common issue known as customer churn, where customers discontinue their services. Addressing this challenge requires telcos to anticipate which customers are likely to churn and implement proactive strategies to retain them. Machine learning models offer a solution by predicting potential churners based on factors such as usage patterns, payment history, and demographic data.


## Columns and their Description

| Column            | Description                                                         |
|-------------------|---------------------------------------------------------------------|
| Gender            | Whether the customer is a male or a female                          |
| SeniorCitizen     | Whether a customer is a senior citizen or not                       |
| Partner           | Whether the customer has a partner or not (Yes, No)                 |
| Dependents        | Whether the customer has dependents or not (Yes, No)                |
| Tenure            | Number of months the customer has stayed with the company          |
| Phone Service     | Whether the customer has a phone service or not (Yes, No)           |
| MultipleLines     | Whether the customer has multiple lines or not                      |
| InternetService   | Customer's internet service provider (DSL, Fiber Optic, No)        |
| OnlineSecurity    | Whether the customer has online security or not (Yes, No, No Internet) |
| OnlineBackup      | Whether the customer has online backup or not (Yes, No, No Internet) |
| DeviceProtection  | Whether the customer has device protection or not (Yes, No, No internet service) |
| TechSupport       | Whether the customer has tech support or not (Yes, No, No internet) |
| StreamingTV       | Whether the customer has streaming TV or not (Yes, No, No internet service) |
| StreamingMovies   | Whether the customer has streaming movies or not (Yes, No, No Internet service) |
| Contract          | The contract term of the customer (Month-to-Month, One year, Two year) |
| PaperlessBilling  | Whether the customer has paperless billing or not (Yes, No)        |
| Payment Method    | The customer's payment method (Electronic check, mailed check, Bank transfer(automatic), Credit card(automatic)) |
| MonthlyCharges    | The amount charged to the customer monthly                          |
| TotalCharges      | The total amount charged to the customer                            |
| Churn             | Whether the customer churned or not (Yes or No)                     |


In [25]:
# Import libraries
from dotenv import dotenv_values
import warnings
warnings.filterwarnings('ignore')
import pyodbc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [26]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")

In [27]:
# Create a connection with the remote databse
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"
connection = pyodbc.connect(connection_string)

In [30]:
# load first dataset
# Load database churn data from the database
churn_data1="SELECT * FROM dbo.LP2_Telco_churn_first_3000"
churn_data1=pd.read_sql(churn_data1,connection)

# Save database churn data to a csv file
file_name='churn_data1.csv'
churn_data1.to_csv(file_name,index=False)

In [31]:
# load second dataset from a GitHub Repository
# The data is saved in a csv file
churn_data2=pd.read_csv('Vodafone_Churn_data\LP2_Telco-churn-second-2000.csv')

In [35]:
# load the third dataset saved in OneDrive.
# This dataset will be used as the testing dataset
churn_data3=pd.read_excel('Vodafone_Churn_data\Telco-churn-last-2000.xlsx')