# Predicting Customer Churn for SyriaTel

# Business Understanding

## Overview
SyriaTel is a telecommunications company providing mobile services to customers across Syria. In the highly competitive telecom industry, customer retention is crucial for maintaining revenue and growth. Churn, refers to customers leaving the service thus poses a significant threat to the company's profitability and market share. Predicting customer churn allows SyriaTel to proactively address factors leading to customer dissatisfaction and implement strategies to retain valuable customers.

## Challenges
1. High Competition: The telecom industry is saturated with multiple service providers, offering similar services at competitive prices making it easy for customers to switch.

2. Customer Behavior Analysis: Understanding the diverse needs and behaviors of customers to identify those at risk of churning.
3. Data Quality: Ensuring the data used for analysis is accurate, complete, and up-to-date to build reliable predictive models.
4. Resource Allocation: Allocating resources effectively to retain customers at risk of churning while minimizing costs.

## Problem Statement
SyriaTel wants to predict customer churn based on historical data to identify customers at risk of leaving the service. By accurately predicting churn, SyriaTel can implement targeted retention strategies to reduce customer attrition and improve overall customer satisfaction.

## Objectives
1. To gather and pre-process customer data to ensure quality and completeness.

2. To identify and create relevant features that can contribute to customer churn.
3. To build and train various machine learning model to predict churn.
4. To asses the performance of the models using appropriate metrics and select the best-performing model.

## Proposed Solution
The proposed solution involves building a predictive model to identify customers at risk of churning based on historical data. By accurately predicting churn, SyriaTel can:
 - Target at-risk customers with personalized retention strategies.
 
 - Optimize marketing and promotional efforts.
 - Improve overall customer satisfaction and loyalty.

## Metrics of the Model
To evaluate the performance of the churn prediction model, the following metrics will be used:
1. **Accuracy**: The proportion of correctly predicted instances(both churn and non-churn) out of the total instances.

2. **Precision**: The proportion of correctly predicted churn instances out of all instances predicted as churn.
3. **Recall**: The proportion of correctly predicted churn instances out of all actual churn instances.
4. **F1 Score**: The harmonic mean of precision and recall, providing a balance between the two metrics.

## Conclusion
By leveraging machine learning to predict customer churn, SyriaTel can gain valuable insights into customer behavior and proactively address churn risks. This approach not only helps in retaining customers but also enhances the overall customer experience and strengthens the company's competitive position in the telecom industry.

## Source of Data
The dataset used for this analysis is obtained from the [Kaggle website](https://www.kaggle.com/becksddf/churn-in-telecoms-dataset). The dataset contains information about customer demographics, usage patterns, and churn status for a telecom company. The dataset consists of 3333 observations and 21 features, including customer attributes such as account length, international plan, voicemail plan, total day minutes, total day calls, total day charge, etc.

# Data Understanding

In [1]:
# Import the necessary classes from data_processing.py
from data_processing import DataProcessor, DataAnalysis

# Load the data
processor = DataProcessor('data/telecom_dataset.xls')
data = processor.load_data()


In [3]:
# Exploration of the data
# Initial exploration of the data
print("First 5 rows of the data:")
# Print the first 5 rows of the data in pandas dataframe
data.head()


First 5 rows of the data:


Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


The dataset contains the following columns:
- State: The state in which the customer resides.

- Account Length: The number of days the customer has been with the company.
- Area Code: The area code of the customer's phone number.
- Phone: The customer's phone number.
- International Plan: Whether the customer has an international plan (yes/no).
- Voice Mail Plan: Whether the customer has a voicemail plan (yes/no).
- Number Vmail Messages: The number of voicemail messages.
- Total Day Minutes: Total number of minutes the customer used during the day.
- Total Day Calls: Total number of calls the customer made during the day.
- Total Day Charge: Total charges for calls made during the day.
- Total Eve Minutes: Total number of minutes the customer used during the evening.
- Total Eve Calls: Total number of calls the customer made during the evening.
- Total Eve Charge: Total charges for calls made during the evening.
- Total Night Minutes: Total number of minutes the customer used during the night.
- Total Night Calls: Total number of calls the customer made during the night.
- Total Night Charge: Total charges for calls made during the night.
- Total Intl Minutes: Total number of international minutes used.
- Total Intl Calls: Total number of international calls made.
- Total Intl Charge: Total charges for international calls.
- Customer Service Calls: Number of customer service calls made.
- Churn: Whether the customer churned or not (yes/no).



In [4]:
# Summary of the data
print("Summary of the data:")
data.info()

Summary of the data:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   3333 non-null   object 
 1   account length          3333 non-null   int64  
 2   area code               3333 non-null   int64  
 3   phone number            3333 non-null   object 
 4   international plan      3333 non-null   object 
 5   voice mail plan         3333 non-null   object 
 6   number vmail messages   3333 non-null   int64  
 7   total day minutes       3333 non-null   float64
 8   total day calls         3333 non-null   int64  
 9   total day charge        3333 non-null   float64
 10  total eve minutes       3333 non-null   float64
 11  total eve calls         3333 non-null   int64  
 12  total eve charge        3333 non-null   float64
 13  total night minutes     3333 non-null   float64
 14  total night calls  

The dataset contains 3333 observations and 21 columns. 
- Customer Demographics(state)

- Account details(Account Length, Area Code, Phone)
- Service Plans(International Plan, Voice Mail Plan)
- Voice Mail Usage(Number Vmail Messages)
- Call Minutes and Charges(Total Day Minutes, Total Day Calls, Total Day Charge, Total Eve Minutes, Total Eve Calls, Total Eve Charge, Total Night Minutes, Total Night Calls, Total Night Charge, Total Intl Minutes, Total Intl Calls, Total Intl Charge)
- Customer Service interactions(Customer Service Calls)
- Customer Churn (boolean value indicating whether a customer churned)

In [11]:
# Check for missing values
print("Missing values:")
data.isnull().sum()

Missing values:


state                     0
account length            0
area code                 0
phone number              0
international plan        0
voice mail plan           0
number vmail messages     0
total day minutes         0
total day calls           0
total day charge          0
total eve minutes         0
total eve calls           0
total eve charge          0
total night minutes       0
total night calls         0
total night charge        0
total intl minutes        0
total intl calls          0
total intl charge         0
customer service calls    0
churn                     0
dtype: int64

There are no missing values in the dataset. 

In [12]:
# Check for duplicates
print("Duplicates:")
data.duplicated().sum()

Duplicates:


0

There are no duplicate rows in the dataset.