# Customer Churn Modeling

## Objective
Build and evaluate classification models to predict customer churn using the cleaned dataset.

This notebook covers all modeling tasks:

- ✅ Train/test split and feature scaling (as needed)
- ✅ Model training: Logistic Regression, Random Forest, Gradient Boosting
- ✅ Cross-validation and hyperparameter tuning
- ✅ Compare models using AUC-ROC and accuracy
- ✅ Interpret results and select the best model for deployment

**Goal**: Build a predictive model that accurately forecasts customer churn and provides actionable insights for retention.

# Setup

## Imports 

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

# Data Preparation

## Data Loading

In [3]:
df = pd.read_csv('datasets/churn_cleaned.csv')
df.head()

Unnamed: 0,monthlycharges,totalcharges,seniorcitizen,churn,tenure_months,type_One year,type_Two year,paperlessbilling_Yes,paymentmethod_Credit card (automatic),paymentmethod_Electronic check,...,streamingtv_No internet,streamingtv_Yes,streamingmovies_No internet,streamingmovies_Yes,multiplelines_No phone service,multiplelines_Yes,contract_type,payment_method,paperless,internet_type
0,29.85,29.85,0,1,0.0,False,False,True,False,True,...,False,False,False,False,True,False,Month-to-month,Electronic Check,Yes,DSL
1,56.95,1889.5,0,1,34.0,True,False,False,False,False,...,False,False,False,False,False,False,One year,Mailed Check,No,DSL
2,53.85,108.15,0,0,2.0,False,False,True,False,False,...,False,False,False,False,False,False,Month-to-month,Mailed Check,Yes,DSL
3,42.3,1840.75,0,1,45.0,True,False,False,False,False,...,False,False,False,False,True,False,One year,Other,No,DSL
4,70.7,151.65,0,0,2.0,False,False,True,False,True,...,False,False,False,False,False,False,Month-to-month,Electronic Check,Yes,Fiber Optic


## Feature and Target Separation

In [4]:
X = df.drop('churn', axis=1)
y = df['churn']

## Train/Test Split

- Stratification ensures that the distribution of the target variable (churn) is approximately the same in both the training and test sets.

In [6]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y)

## Scale Numeric Features

In [7]:
numeric_features = ['monthlycharges', 'totalcharges', 'tenure_months']
scaler = StandardScaler()
X_train[numeric_features] = scaler.fit_transform(X_train[numeric_features])
X_test[numeric_features] = scaler.transform(X_test[numeric_features])