# 📉 Customer Churn Prediction & Analysis

This project aims to predict customer churn using machine learning techniques and exploratory data analysis. The goal is to identify customers likely to leave and understand key factors influencing their decisions.

*Note: This notebook uses open-source data and mirrors typical analytical workflows used in a business context.*

## 📦 1. Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

## 📥 2. Loading the Dataset

In [None]:
file_path = r'C:/Users/Lenovo/OneDrive/Desktop/Customer churn/Data_file.xlsx'
data = pd.read_excel(file_path)
data.head()

## 🧾 3. Initial Data Overview

In [None]:
data.info()
data.describe()

## 🧹 4. Data Cleaning

Handle missing values, convert datatypes, drop unnecessary columns.

In [None]:
# Drop ID or redundant columns if any
# data.drop(columns=['ID'], inplace=True)

# Example: Convert TotalCharges to numeric if needed
# data['TotalCharges'] = pd.to_numeric(data['TotalCharges'], errors='coerce')

# Handle nulls
data.isnull().sum()

## 📊 5. Exploratory Data Analysis (EDA)

In [None]:
sns.countplot(x='Customer_Status', data=data)
plt.title('Churn Distribution')
plt.show()

## 🔄 6. Encoding Categorical Variables

In [None]:
label_encoders = {}
for column in data.select_dtypes(include='object').columns:
    if column != 'Customer_Status':
        le = LabelEncoder()
        data[column] = le.fit_transform(data[column])
        label_encoders[column] = le

## 🎯 7. Feature and Target Variables

In [None]:
X = data.drop(columns=['Customer_Status'])
y = data['Customer_Status']

## ✂️ 8. Train-Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 🤖 9. Model Building

In [None]:
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

## 📈 10. Model Evaluation

In [None]:
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print('ROC-AUC Score:', roc_auc_score(y_test, model.predict_proba(X_test)[:,1]))

## 💡 11. Key Insights

- Customers with short tenure and high charges are more likely to churn.
- Long-term contracts and support services reduce churn likelihood.
- The model performs well in identifying high-risk customers.

Use these insights to support marketing, retention, or account management strategies.

## ✅ 12. Conclusion

This analysis shows how churn prediction models can help businesses retain valuable customers. The workflow used here can be adapted for other domains such as finance, telecom, or SaaS.