# Bank Customer Churn Prediction - Clean Model Development

## 🎯 Project Overview

This is a **clean version** of the model development notebook that addresses:
1. **Data leakage issues** - Only using features available BEFORE churn
2. **Simplified metrics** - Removed problematic precision@10% calculation  
3. **Realistic performance** - Expect AUC scores in 0.70-0.85 range

### 📊 Business Context
- **Goal**: Predict customer churn with realistic performance (AUC > 0.75)
- **Focus**: Clean, interpretable models without data leakage
- **Metrics**: AUC-ROC, Precision, Recall, F1-Score

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from datetime import datetime

# Machine Learning
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import (
    classification_report, confusion_matrix, roc_auc_score, 
    roc_curve, precision_score, recall_score, f1_score, accuracy_score
)

# Models
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

# MLflow for experiment tracking
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature

# Configuration
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8')
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

print("📚 Libraries imported successfully!")
print(f"🎲 Random state: {RANDOM_STATE}")
print(f"📅 Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")


: 