# Final Model Comparison — Credit Scoring

## Objective
To compare multiple machine learning models for credit default prediction
and determine the most suitable approach from both
predictive performance and business perspectives.

## Models Evaluated
- Logistic Regression (Baseline & Explainable)
- Random Forest (Non-linear, Balanced)
- LightGBM (High-performance Risk Ranking)

## Key Evaluation Focus
- ROC AUC
- KS Statistic
- Recall for Default (Class 1)
- Approval Rate
- Business Trade-off


## Experimental Setup

- Dataset: Home Credit Application Data
- Target Variable: Default (1) vs Non-default (0)
- Train-test split: 80:20 (stratified)
- Feature set: Engineered numerical and encoded categorical features
- Evaluation performed on hold-out test set

Accuracy is not considered a primary metric
due to strong class imbalance in the dataset.

## Model Performance Summary

| Model | ROC AUC | KS Statistic | Recall (Default) | Approval Rate | Key Characteristics |
|------|--------|--------------|------------------|---------------|--------------------|
| Logistic Regression | ~0.69 | ~0.36 | ~0.97 | ~30–40% | Conservative, highly interpretable |
| Random Forest | 0.731 | 0.344 | 0.64 | 67.1% | Balanced risk and growth |
| LightGBM (Default Threshold) | 0.759 | 0.386 | 0.02 | 99.7% | Ranking-focused, overly permissive |
| **LightGBM (Tuned Threshold)** | **0.759** | **0.386** | **0.14** | **~91%** | Growth-oriented strategy |

## Visual Model Comparison

This section presents visual comparisons to support
the quantitative evaluation metrics.

### ROC Curve Comparison

The ROC curves below compare the discriminative ability
of each model in separating default and non-default customers.


![ROC Curve Comparison](../assets/roc_comparison.png)

## Feature Importance Analysis

Across all models, top contributing features are consistent:
- EXT_SOURCE_1, EXT_SOURCE_2, EXT_SOURCE_3
- Employment duration (EMPLOYED_YEARS)
- Age (AGE_YEARS)
- Credit & annuity related ratios

This consistency validates:
- Data quality
- Model robustness
- Alignment with credit risk domain knowledge


## Business Interpretation

1. Logistic Regression:
   - High recall for default
   - Low approval rate
   - Suitable for conservative lending strategy and regulatory explainability

2. Random Forest:
   - Balanced trade-off between risk and growth
   - Reasonable approval rate with good default detection
   - Suitable as challenger model

3. LightGBM:
   - Best ranking performance (AUC & KS)
   - Requires threshold tuning for decision making
   - Suitable as primary risk scoring engine


## Final Recommendation

Based on model performance and business trade-offs:

- Use **LightGBM** as the primary **risk scoring model**
  to rank applicants by default risk.
- Use **threshold tuning** to align approval decisions
  with business risk appetite.
- Maintain **Logistic Regression** as:
  - Benchmark model
  - Explainability layer for regulatory and stakeholder communication.
- Consider **Random Forest** as a challenger model
  for periodic validation and performance monitoring.


## Future Improvements
- Cost-sensitive optimization using expected loss
- Segment-based thresholding (e.g. income, product type)
- Model monitoring for data drift & performance decay
- Integration with rule-based credit policy


## Conclusion

This study demonstrates an end-to-end credit scoring workflow:
from EDA and feature engineering to multi-model evaluation
and business-aligned decision making.

The results highlight that model performance alone is insufficient
without proper threshold selection and business context.
