# Model Comparison and Feature Importance

In this notebook, I will:
1. Compare multiple machine learning algorithms beyond Logistic Regression, such as Random Forest and XGBoost, to see if I can improve predictive performance.
2. Implement basic hyperparameter tuning using GridSearchCV or RandomizedSearchCV to optimize model settings.
3. Evaluate models using metrics that are crucial for credit risk analysis, such as AUC-ROC and Precision-Recall curves, since identifying "bad" credit risks is more important than overall accuracy.
4. Explore feature importance and model interpretability. Understanding which features drive the model’s decisions is vital in a credit risk context, where explainability is often a regulatory and business requirement.

In [2]:
import joblib
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score, RocCurveDisplay, PrecisionRecallDisplay
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from xgboost import XGBClassifier

# Load the data
X_train = joblib.load("../data/X_train.pkl")
X_test = joblib.load("../data/X_test.pkl")
y_train = joblib.load("../data/y_train.pkl")
y_test = joblib.load("../data/y_test.pkl")

### Random Forest Baseline

I'll train a basic Random Forest with default parameters and compare its performance to the Logistic Regression baseline. Random Forests are often good at handling complex interactions between features and may provide better recall for the "bad" class.

In [3]:
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)

acc_rf = accuracy_score(y_test, y_pred_rf)
auc_rf = roc_auc_score(y_test, rf.predict_proba(X_test)[:,1]) # Calculate AUC
print("Random Forest Accuracy:", acc_rf)
print("Random Forest AUC:", auc_rf)

print("\nClassificaction Report:\n", classification_report(y_test, y_pred_rf))

Random Forest Accuracy: 0.745
Random Forest AUC: 0.7909604519774011

Classificaction Report:
               precision    recall  f1-score   support

           0       0.77      0.91      0.83       141
           1       0.62      0.34      0.44        59

    accuracy                           0.74       200
   macro avg       0.70      0.63      0.64       200
weighted avg       0.73      0.74      0.72       200



The Random Forest's accuracy and AUC provide a quick snapshot. If the AUC is higher than Logistic Regression’s AUC, it suggests the Random Forest may be better at ranking which customers are more likely to be bad credit risks. I will pay special attention to the recall for the bad class and the AUC-ROC, as these metrics align better with credit risk priorities.

#### Model Interpretation

**Random Forest (Baseline):**  
- **Accuracy:** 0.745  
- **AUC:** 0.79  

**Interpretation:**

The baseline Random Forest model gives the following results:

- **Accuracy (74.5%)**: Slightly lower than the Logistic Regression models, suggesting it is less effective at overall classification.
- **Class 0 (Good Credit)**:
  - **Recall (0.91)**: Very high, indicating the model identifies most good credit customers correctly.
  - **Precision (0.77)**: Slightly lower than Logistic Regression.
- **Class 1 (Bad Credit)**:
  - **Recall (0.34)**: Much lower than Logistic Regression, meaning the model misses most bad credit customers.
  - **Precision (0.62)**: Lower than Logistic Regression, indicating it is less reliable when predicting bad credit.

**Key Takeaways:**
While the Random Forest model excels at identifying good credit customers, its recall for the bad credit class is significantly worse than Logistic Regression, which makes it less suitable for a credit risk context without further tuning.

### Hyperparameter Tuning for Random Forest

I will use GridSearchCV to search for optimal hyperparameters for the Random Forest. This process demonstrates how I can improve the model further. I'll tune parameters like `n_estimators`, `max_depth`, and `min_samples_leaf`.

In [4]:
param_grid = {
    'n_estimators': [100,300],
    'max_depth': [None, 10, 20],
    'min_samples_leaf': [1, 2, 5]
}

grid_search = GridSearchCV(
    estimator = RandomForestClassifier(random_state=42),
    param_grid = param_grid,
    scoring = 'roc_auc', # Using AUC as the scoring metric
    cv = 3,
    n_jobs = -1
)

grid_search.fit(X_train, y_train)

print("Best Params:", grid_search.best_params_)
print("Best Score (AUC):", grid_search.best_score_)

Best Params: {'max_depth': 10, 'min_samples_leaf': 5, 'n_estimators': 100}
Best Score (AUC): 0.7623129915833197


**Interpreting the Grid Search Results:**

- `n_estimators`: Number of trees in the forest. More trees can improve performance but increase training time.
- `max_depth`: Maximum depth of the trees. Deeper trees can model complex relationships but may overfit.
- `min_samples_leaf`: Minimum samples per leaf. Increasing this can reduce overfitting.

The best parameters are those that give the highest AUC on the validation folds. With these parameters, I'll retrain the model and evaluate on the test set.

In [8]:
best_rf = grid_search.best_estimator_ 
y_pred_best_rf = best_rf.predict(X_test)

acc_best_rf = accuracy_score(y_test, y_pred_best_rf)
auc_best_rf = roc_auc_score(y_test, best_rf.predict_proba(X_test)[:,1])

print("Optimized Random Forest Accuracy:", acc_best_rf)
print("Optimized Random Forest AUC:", auc_best_rf)
print("\nClassification Report:\n", classification_report(y_test, y_pred_best_rf))

Optimized Random Forest Accuracy: 0.75
Optimized Random Forest AUC: 0.8015386464719317

Classification Report:
               precision    recall  f1-score   support

           0       0.75      0.96      0.84       141
           1       0.71      0.25      0.38        59

    accuracy                           0.75       200
   macro avg       0.73      0.61      0.61       200
weighted avg       0.74      0.75      0.71       200



#### Model Interpretation

**Random Forest (Optimized):**  
- **Accuracy:** 0.75  
- **AUC:** 0.80154  

**Interpretation:**

After tuning the Random Forest with GridSearchCV, the model's performance is as follows:

- **Accuracy (75%)**: Slight improvement over the baseline Random Forest.
- **Class 0 (Good Credit)**:
  - **Recall (0.96)**: Increased significantly, making it very effective at identifying good credit customers.
  - **Precision (0.75)**: Similar to the baseline model.
- **Class 1 (Bad Credit)**:
  - **Recall (0.25)**: Decreased from the baseline Random Forest, which is concerning.
  - **Precision (0.71)**: Improved slightly.

**Key Takeaways:**
While the optimized Random Forest improves overall metrics like accuracy and AUC, its recall for the bad credit class drops even further, making it less effective for practical use in credit risk management.

### Trying XGBoost

XGBoost often performs well in tabular data tasks. I'll train a basic XGBoost classifier and see if it outperforms Random Forest. If it does well, I could also consider tuning its parameters.

In [10]:
xgb = XGBClassifier(random_state=42, use_label_encoder=False, eval_metric='logloss')
xgb.fit(X_train, y_train)
y_pred_xgb = xgb.predict(X_test)

acc_xgb = accuracy_score(y_test, y_pred_xgb)
auc_xgb = roc_auc_score(y_test, xgb.predict_proba(X_test)[:,1])

print("XGBoost Accuracy:", acc_xgb)
print("XGBoost AUC:", auc_xgb)
print("\nClassification Report:\n", classification_report(y_test, y_pred_xgb))

XGBoost Accuracy: 0.785
XGBoost AUC: 0.8139199423007573

Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.89      0.85       141
           1       0.67      0.53      0.59        59

    accuracy                           0.79       200
   macro avg       0.75      0.71      0.72       200
weighted avg       0.78      0.79      0.78       200



Parameters: { "use_label_encoder" } are not used.



#### Model Interpretation

**XGBoost:**  
- **Accuracy:** 0.785  
- **AUC:** 0.81392  

**Interpretation:**

XGBoost delivers the following results:

- **Accuracy (78.5%)**: Higher than both Logistic Regression and Random Forest models.
- **Class 0 (Good Credit)**:
  - **Recall (0.89)**: Slightly lower than the optimized Random Forest but still strong.
  - **Precision (0.82)**: Better than Random Forest.
- **Class 1 (Bad Credit)**:
  - **Recall (0.53)**: A significant improvement over both Random Forest models, on par with the improved Logistic Regression.
  - **Precision (0.67)**: Matches the improved Logistic Regression.

**Key Takeaways:**
XGBoost offers a good balance between overall performance and recall for the bad credit class. Its ability to identify more bad credit customers compared to Random Forest makes it a strong candidate for credit risk modeling.