In [24]:
import pandas as pd
import joblib
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    classification_report,
    confusion_matrix,
    roc_auc_score,
    roc_curve
)

**Explanation:**  
Imported required libraries for model evaluation:
- `joblib` to load saved model  
- `sklearn.metrics` for classification metrics  
- `matplotlib` and `seaborn` for visualizations


In [25]:
df = pd.read_csv("../data/processed_data.csv")

if "customerID" in df.columns:
     df.drop("customerID", axis=1, inplace=True)

X = df.drop("Churn", axis=1)
y = df["Churn"]

**Explanation:**  
Loaded the processed dataset and separated:
- **X** → feature matrix  
- **y** → target variable (Churn)


In [26]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

**Explanation:**  
Performed the same train-test split used during model training to ensure consistency in evaluation.


In [30]:
# Load trained model
model = joblib.load("../data/customer_churn_model_tuned.pkl")
print("Model loaded successfully")


Model loaded successfully


**Explanation:**  
Loaded the previously saved **Random Forest model** for evaluation on unseen test data.


In [31]:
# Make predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]


**Explanation:**  
- `y_pred` → predicted class labels (0/1)  
- `y_prob` → predicted probabilities for ROC-AUC calculation


In [32]:
# Evaluation metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))

print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))


Accuracy: 0.7789623312011372
Precision: 0.5683297180043384
Recall: 0.7005347593582888
F1 Score: 0.6275449101796408

Classification Report:

              precision    recall  f1-score   support

           0       0.88      0.81      0.84      1033
           1       0.57      0.70      0.63       374

    accuracy                           0.78      1407
   macro avg       0.72      0.75      0.74      1407
weighted avg       0.80      0.78      0.79      1407



**Explanation:**  
Evaluated model performance using:
- **Accuracy** → overall correctness  
- **Precision** → correctness of churn predictions  
- **Recall** → ability to detect actual churners (very important)  
- **F1-score** → balance between precision and recall
