# 📊 Customer Churn Prediction
### Technologies: Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
---
This notebook demonstrates how to build a machine learning model to predict customer churn.
The goal is to identify which customers are likely to stop using a service based on their behavior and demographics.

In [None]:
# Importing required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# For this demo, we create a synthetic dataset
np.random.seed(42)
data = pd.DataFrame({
    'CustomerID': range(1, 501),
    'Gender': np.random.choice(['Male', 'Female'], 500),
    'Age': np.random.randint(18, 70, 500),
    'Tenure': np.random.randint(1, 72, 500),
    'Balance': np.random.randint(0, 200000, 500),
    'NumOfProducts': np.random.randint(1, 4, 500),
    'HasCrCard': np.random.choice([0, 1], 500),
    'IsActiveMember': np.random.choice([0, 1], 500),
    'EstimatedSalary': np.random.randint(10000, 150000, 500),
    'Exited': np.random.choice([0, 1], 500)  # Target variable
})

data.head()

## 🔍 Data Preprocessing

In [None]:
# Encode categorical features
le = LabelEncoder()
data['Gender'] = le.fit_transform(data['Gender'])

# Define features and target
X = data.drop(['CustomerID', 'Exited'], axis=1)
y = data['Exited']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## 🤖 Model Training and Evaluation

In [None]:
# Logistic Regression
log_model = LogisticRegression()
log_model.fit(X_train, y_train)
y_pred_log = log_model.predict(X_test)

# Random Forest
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)

# Evaluation
print("Logistic Regression Accuracy:", accuracy_score(y_test, y_pred_log))
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))

print("\nClassification Report (Random Forest):")
print(classification_report(y_test, y_pred_rf))

## 📈 Confusion Matrix Visualization

In [None]:
cm = confusion_matrix(y_test, y_pred_rf)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Random Forest Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

## 🧩 Conclusion
- Built and trained churn prediction models using synthetic data.
- Random Forest performed slightly better than Logistic Regression.
- This approach can help identify customers likely to churn, enabling proactive retention strategies.