# Telco Customer Churn Prediction

This notebook demonstrates an end-to-end machine learning pipeline to predict customer churn using the Telco dataset.

## 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
import joblib

## 2. Load and Explore Dataset

In [None]:
df = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")
df.head()

## 3. Data Preprocessing

In [None]:
# Convert TotalCharges to numeric
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(inplace=True)

# Drop irrelevant column
df.drop(['customerID'], axis=1, inplace=True)

# Encode target variable
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Churn'] = le.fit_transform(df['Churn'])

## 4. Feature Engineering

In [None]:
# One-hot encode categorical features
X = pd.get_dummies(df.drop('Churn', axis=1))
y = df['Churn']

## 5. Split Dataset

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

## 6. Train Model

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## 7. Evaluate Model

In [None]:
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("ROC-AUC Score:", roc_auc_score(y_test, y_prob))

## 8. Save Model for Deployment

In [None]:
joblib.dump(model, "rf_churn_model.pkl")
joblib.dump(X.columns.tolist(), "model_features.pkl")

## 9. Summary

- A Random Forest classifier was trained to predict churn.
- The model achieved high accuracy and AUC.
- It's now ready for use in a Streamlit app.