## 1. Business Understanding

### Business Problem
SyriaTel, a leading telecommunications provider, is facing significant revenue loss due to customer churn — the phenomenon where customers discontinue their services. In the highly competitive telecom sector, acquiring new customers is costly, making customer retention a top strategic priority.  

Reducing churn requires a clear understanding of the factors that influence customer behavior and the ability to anticipate churn before it happens.

### Project Objective
The goal of this project is to build a predictive machine learning model that classifies whether a customer is likely to churn. By leveraging customer demographic and usage data, the model will empower SyriaTel to:
- Identify at-risk customers
- Implement targeted retention strategies
- Optimize marketing efforts
- Reduce overall churn rates

### Stakeholders
- **Customer Retention & Loyalty Team** — Responsible for designing and executing churn reduction programs.
- **Marketing Strategy Team** — Focused on customer engagement and acquisition strategies.
- **Senior Management** — Interested in high-level insights for strategic planning and revenue assurance.

### Key Business Questions
- What customer attributes and behaviors are strong indicators of churn?
- How accurately can churn be predicted before it occurs?
- What actionable insights can SyriaTel apply to minimize customer attrition?

## 2. Data Understanding

## Dataset Overview  
The dataset contains historical customer information from SyriaTel, including demographics, account information, usage patterns, and churn status.  
Our goal in this phase is to understand the structure of the data, inspect column details, assess data quality, and identify potential issues for cleaning and preparation.

## Data Source  
- **Dataset Name:** SyriaTel Customer Churn  
- **Target Variable:** `Churn` (Binary — Yes/No)  
- **Data Type:** Tabular (CSV)  

---

# Importing Libraries and Loading Data
```python
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Settings for better display
pd.set_option('display.max_columns', None)
sns.set(style='whitegrid')

# Load dataset
data = pd.read_csv('data/SyriaTel_Customer_Churn.csv')

# Display first few records
data.head()


# 3️⃣ Data Preparation

### Actions Taken:
- Dropped missing values
- Checked duplicates
- Reset index after cleaning

# Drop missing values
df_clean = df.dropna()

# Check for duplicates
duplicates = df_clean.duplicated().sum()

# Drop duplicates if any
df_clean = df_clean.drop_duplicates().reset_index(drop=True)

print(f"Missing values dropped. Duplicates found and dropped: {duplicates}")


