# Customer Churn Prediction Dashboard & Presentation

This notebook provides an interactive dashboard and visual presentation for the Telco Customer Churn project. It is designed to help recruiters and stakeholders quickly understand the business problem, data insights, model results, and actionable recommendations.

**Outline:**
1. Import Required Libraries
2. Load and Prepare Data
3. Interactive Visualizations with Plotly
4. Build Dashboard with Dash
5. Display Key Metrics and Charts in Notebook

In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report
from imblearn.over_sampling import SMOTE
# Dash for dashboard (optional, for interactive dashboard)
from dash import Dash, dcc, html, Input, Output
import dash_bootstrap_components as dbc

In [None]:
# Load and preprocess the Telco Customer Churn dataset
df = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw/main/TelecomCustomerChurn.csv')
df = df.drop('customerID', axis=1)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df['TotalCharges'] = df['TotalCharges'].fillna(df['TotalCharges'].median())
df['Churn'] = df['Churn'].map({'No': 0, 'Yes': 1})
display(df.head())

In [None]:
# Visualize churn rate and class imbalance
churn_counts = df['Churn'].value_counts()
churn_rate = churn_counts[1] / churn_counts.sum()

fig = px.pie(names=['No Churn', 'Churn'], values=churn_counts.values, title='Churn Rate')
fig.show()

print(f'Churn Rate: {churn_rate:.2%}')

sns.countplot(x='Churn', data=df)
plt.title('Churn Distribution')
plt.show()

In [None]:
# Visualize feature correlations and distributions
numeric_features = df.select_dtypes(include=['float64', 'int64'])
correlation = numeric_features.corr()
plt.figure(figsize=(10,8))
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap (Numeric Features)')
plt.show()

# Distribution of tenure and monthly charges by churn
fig1 = px.histogram(df, x='tenure', color='Churn', barmode='overlay', title='Tenure Distribution by Churn')
fig1.show()
fig2 = px.histogram(df, x='MonthlyCharges', color='Churn', barmode='overlay', title='Monthly Charges Distribution by Churn')
fig2.show()

In [None]:
# Encode categorical features, split data, balance with SMOTE, and train models
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

X = df.drop('Churn', axis=1)
y = df['Churn']
categorical_cols = X.select_dtypes(include=['object']).columns.tolist()
preprocessor = ColumnTransformer([
    ('cat', OneHotEncoder(drop='first'), categorical_cols),
    ('num', SimpleImputer(strategy='median'), X.select_dtypes(include=['float64', 'int64']).columns.tolist())
])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
smote = SMOTE(random_state=42)
X_train_enc = preprocessor.fit_transform(X_train)
X_test_enc = preprocessor.transform(X_test)
X_train_bal, y_train_bal = smote.fit_resample(X_train_enc, y_train)
lr = LogisticRegression(max_iter=1000)
lr.fit(X_train_bal, y_train_bal)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train_bal, y_train_bal)

In [None]:
# Evaluate models: confusion matrix, classification report, and feature importances
from sklearn.metrics import ConfusionMatrixDisplay

y_pred_lr = lr.predict(X_test_enc)
y_pred_rf = rf.predict(X_test_enc)

print('Logistic Regression Results:')
ConfusionMatrixDisplay(confusion_matrix(y_test, y_pred_lr)).plot()
plt.title('Confusion Matrix - Logistic Regression')
plt.show()
print(classification_report(y_test, y_pred_lr))

print('Random Forest Results:')
ConfusionMatrixDisplay(confusion_matrix(y_test, y_pred_rf)).plot()
plt.title('Confusion Matrix - Random Forest')
plt.show()
print(classification_report(y_test, y_pred_rf))

importances = rf.feature_importances_
feature_names = preprocessor.get_feature_names_out()
feat_imp = pd.Series(importances, index=feature_names).sort_values(ascending=False)
fig = px.bar(feat_imp.head(10), title='Top 10 Feature Importances (Random Forest)')
fig.show()

## Business Recommendations & Conclusion

**Key Insights:**
- Customers with month-to-month contracts and manual/electronic check payments are at highest risk of churn.
- Tenure and monthly charges are strong predictors of churn.

**Recommendations:**
1. Target month-to-month contract customers with loyalty programs or incentives for longer-term contracts.
2. Encourage customers to switch from manual/electronic check payments to automatic payments or credit cards.
3. Engage new customers early with onboarding and retention campaigns.

**Conclusion:**
The machine learning models built here help identify high-risk customers, enabling the company to reduce churn and save on acquisition costs. This dashboard provides a clear, interactive view of the data and model results for recruiters and stakeholders.

## Streamlit Dashboard

An interactive dashboard is available for this project, built with Streamlit. It allows recruiters and stakeholders to explore the data, model results, and business recommendations interactively.

**To run the dashboard:**
```powershell
streamlit run churn_streamlit_dashboard.py
```
The dashboard will open in your browser at http://localhost:8501

## Installation

Before running the dashboard or notebook, install all required dependencies:

```powershell
pip install -r requirements.txt
```