# Table of Model Evaluation Metric Comparisions

In [28]:
import pandas as pd
from IPython.display import display

data = {
    "Model": [
        "ANN (Thresh 0.40)",
        "XGB (Thresh 0.30)",
        "XGB (Thresh 0.50)",
        "XGB (Thresh 0.70)",
        "RF (Baseline)",
        "RF (Class Weighted)",
        "RF (Thresh 0.2)",
        "RF Weighted + Thresh 0.2"
    ],
    "Accuracy": [0.637, 0.6607, 0.6607, 0.6607, 0.803, 0.802, 0.802, 0.802],
    "Precision (Class 1)": [0.348, 0.2626, 0.3463, 0.4886, 0.58, 0.59, 0.34, 0.35],
    "Recall (Class 1)": [0.816, 0.9256, 0.6815, 0.2862, 0.17, 0.15, 0.81, 0.79],
    "F1 Score (Class 1)": [0.488, 0.4091, 0.4592, 0.3610, 0.26, 0.24, 0.48, 0.49],
    "ROC AUC": [0.775, 0.7317, 0.7317, 0.7317, 0.776, 0.779, 0.779, 0.779]
}

el_comparison_df = pd.DataFrame(data)
model_comparison_df = model_comparison_df.round(4)
display(model_comparison_df)

Unnamed: 0,Model,Accuracy,Precision (Class 1),Recall (Class 1),F1 Score (Class 1),ROC AUC
0,ANN (Thresh 0.40),0.637,0.348,0.816,0.488,0.775
1,XGB (Thresh 0.30),0.6607,0.2626,0.9256,0.4091,0.7317
2,XGB (Thresh 0.50),0.6607,0.3463,0.6815,0.4592,0.7317
3,XGB (Thresh 0.70),0.6607,0.4886,0.2862,0.361,0.7317
4,RF (Baseline),0.803,0.58,0.17,0.26,0.776
5,RF (Class Weighted),0.802,0.59,0.15,0.24,0.779
6,RF (Thresh 0.2),0.802,0.34,0.81,0.48,0.779
7,RF Weighted + Thresh 0.2,0.802,0.35,0.79,0.49,0.779


# Detailed Comparative Analysis of ANN, XGBoost, and Random Forest

### ANN

**Strengths:**
- Strong recall, which is vital in high-risk settings (e.g., catching defaulters).
- Balanced F1 score, meaning good trade-off between precision and recall.
- High AUC, indicating good overall discriminatory power.

**Limitations:**
- Lower precision than threshold-tuned Random Forest or XGBoost at 0.7, meaning more false positives.
- Lowest accuracy among tested models.


### XGBoost

**Insights:**
- At threshold 0.30, XGBoost delivers the highest recall (92.56%), but sacrifices precision heavily — useful when missing a defaulter is more costly than a false alarm.
- At threshold 0.50, it achieves the best balance between recall and precision, yielding a competitive F1 score (0.4592).
- At threshold 0.70, XGBoost becomes precision-focused, identifying fewer positives but with high accuracy — suited to false-positive-sensitive applications.

**Overall:**
- Flexible model when paired with threshold tuning.
- Slightly lower AUC than ANN and Random Forest, suggesting marginally weaker ranking performance.


### Random Forest (RF)

**Key Takeaways:**
- Baseline RF is biased toward the majority class (class 0), with poor recall (0.17).
- Class weighting alone does not significantly improve recall, indicating that rebalancing class importance isn't sufficient on its own.
- Threshold tuning drastically improves recall to 0.81, and F1 to 0.48.
- The best RF configuration is class weighting + threshold tuning, achieving the highest F1 score (0.49) among all models and competitive recall and precision.

**AUC**: Highest among all models (tied with ANN), confirming strong overall discrimination ability.

### Three Model Analysis

- **ANN** is a strong contender for balanced, recall-oriented classification, offering consistent performance and a high AUC.
- **XGBoost** is highly flexible, allowing precise control over the trade-off between recall and precision depending on threshold selection.
- **Random Forest**, when combined with class weighting and threshold tuning, emerges as the most balanced and reliable model, particularly when the F1 score is the primary concern.

# Conclusion

In the context of this coursework — developing a model to minimise default risk in a loan dataset — the evaluation reveals that no single model is universally optimal. However, the priority in credit risk modelling is to identify defaulters as accurately as possible, even at the cost of increased false positives, because the financial cost of undetected defaults far outweighs that of overly cautious lending.


- **XGBoost** with a lowered decision threshold (0.30) achieves the highest recall (0.9256), making it the most effective model for flagging high-risk borrowers and minimising missed defaults. This makes it highly suitable in settings where risk aversion is paramount, such as subprime or unsecured lending.

- **Random Forest**, with class weighting and threshold tuning (0.2), offers the best overall F1 score (0.49), combining strong recall (0.79) with better precision than XGBoost. This makes it an excellent choice for balanced decision-making, where lenders aim to reduce default while maintaining acceptable approval rates.

- **ANN**, while slightly weaker in AUC and precision, still performs competitively with recall of 0.816 and a strong F1 score of 0.488, offering a more interpretable and stable alternative.

# Recommendation

To meet the coursework objective of minimising default**, XGBoost (threshold 0.30) is the most suitable primary model. However, to ensure operational balance and reduce unnecessary loan rejections, it may be advisable to pair it with a Random Forest model in a dual-stage system, or use business-defined thresholds to modulate risk appetite over time.

This approach not only optimises for technical accuracy, but also aligns with the practical and regulatory demands of credit risk management.

# Model Recommendation Summary

In [57]:
from IPython.display import display, Markdown

model_recommendation_data = {
    "Goal": [
        "Maximizing Recall",
        "Maximizing Precision",
        "Balanced F1 (Fair Trade-off)",
        "Overall AUC Performance"
    ],
    "Best Model": [
        "XGBoost @ Threshold 0.30",
        "XGBoost @ Threshold 0.70",
        "RF Weighted + Threshold",
        "ANN or RF (any tuned)"
    ],
    "Reasoning": [
        "Catching nearly all positives (recall = 0.93)",
        "Highest precision (0.49) with controlled recall",
        "Best F1 (0.49), strong recall and decent precision",
        "AUC (0.775–0.779), best class discrimination"
    ]
}

model_recommendation_df = pd.DataFrame(model_recommendation_data)
model_recommendation_df.style.set_properties(**{
    'white-space': 'pre-wrap',
    'word-wrap': 'break-word',
    'max-width': '300px'
})

Unnamed: 0,Goal,Best Model,Reasoning
0,Maximizing Recall,XGBoost @ Threshold 0.30,Catching nearly all positives (recall = 0.93)
1,Maximizing Precision,XGBoost @ Threshold 0.70,Highest precision (0.49) with controlled recall
2,Balanced F1 (Fair Trade-off),RF Weighted + Threshold,"Best F1 (0.49), strong recall and decent precision"
3,Overall AUC Performance,ANN or RF (any tuned),"AUC (0.775–0.779), best class discrimination"
