# Credit Risk Modelling – Business Validation & Risk Segmentation

## Objective
This notebook translates the Probability of Default (PD) model into actionable
business insights by:
- Defining PD cut-offs
- Creating risk bands
- Estimating portfolio risk distribution
- Providing credit policy recommendations

This step aligns the model with real-world lending decision frameworks.

In [1]:
import pandas as pd
import numpy as np

In [10]:
# Loading the processed data and PD scores
df = pd.read_csv("../data/processed/credit_model_ready.csv")

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X = df.drop(columns=["Risk"])
y = df["Risk"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# PD scores for test set
y_test_pd = model.predict_proba(X_test)[:, 1]

## PD Cut-off Definition

In practice, banks do not use a fixed 0.5 threshold.
Instead, borrowers are segmented into **risk bands** based on PD ranges,
aligned with the institution’s risk appetite.

In [5]:
def assign_risk_band(pd):
    if pd < 0.20:
        return "Low Risk"
    elif pd < 0.40:
        return "Medium Risk"
    elif pd < 0.60:
        return "High Risk"
    else:
        return "Very High Risk"

risk_df = pd.DataFrame({
    "PD": y_test_pd,
    "Actual_Default": y_test.values
})

risk_df["Risk_Band"] = risk_df["PD"].apply(assign_risk_band)
risk_df.head()

Unnamed: 0,PD,Actual_Default,Risk_Band
0,0.103494,0,Low Risk
1,0.2712,0,Medium Risk
2,0.376441,0,Medium Risk
3,0.195655,0,Low Risk
4,0.363581,0,Medium Risk


In [6]:
risk_df.groupby("Risk_Band")["Actual_Default"].mean()

Risk_Band
High Risk         0.410959
Low Risk          0.107692
Medium Risk       0.338235
Very High Risk    0.793103
Name: Actual_Default, dtype: float64

In [7]:
risk_df.groupby("Risk_Band")["Actual_Default"].agg(["count", "mean"])

Unnamed: 0_level_0,count,mean
Risk_Band,Unnamed: 1_level_1,Unnamed: 2_level_1
High Risk,73,0.410959
Low Risk,130,0.107692
Medium Risk,68,0.338235
Very High Risk,29,0.793103


## Portfolio-Level Validation of Risk Bands

The default rates increase monotonically across PD-based risk bands:

- **Low Risk:** ~11% default rate
- **Medium Risk:** ~34% default rate
- **High Risk:** ~41% default rate
- **Very High Risk:** ~79% default rate

This clear separation confirms that the PD model provides meaningful risk ranking.
Higher risk bands concentrate a disproportionate share of defaults, which is
consistent with real-world retail credit portfolios.

Such monotonic behavior is a key requirement for using PD models in:
- Credit approval strategies
- Risk-based pricing
- Expected Loss estimation

## Portfolio Risk Distribution

This section shows how borrowers are distributed across risk bands.

In [8]:
risk_df["Risk_Band"].value_counts(normalize=True).round(3)

Risk_Band
Low Risk          0.433
High Risk         0.243
Medium Risk       0.227
Very High Risk    0.097
Name: proportion, dtype: float64

- A large share of borrowers fall into **Low and Medium Risk** bands.
- High and Very High Risk bands represent a smaller but critical segment.
- This distribution is typical for a retail credit portfolio.

## Default Rate by Risk Band

A well-performing PD model should show **monotonically increasing default rates**
across higher risk bands.

In [9]:
default_rate_by_band = (
    risk_df
    .groupby("Risk_Band")["Actual_Default"]
    .mean()
    .sort_index()
)

default_rate_by_band

Risk_Band
High Risk         0.410959
Low Risk          0.107692
Medium Risk       0.338235
Very High Risk    0.793103
Name: Actual_Default, dtype: float64

Observed default rates increase consistently from Low Risk to Very High Risk bands.
This confirms that the PD model provides meaningful risk ranking and is suitable
for credit decision-making.

## Suggested Credit Policy Actions by Risk Band

| Risk Band        | Suggested Action |
|------------------|------------------|
| Low Risk         | Approve loan at standard interest rate |
| Medium Risk      | Approve with moderate risk premium |
| High Risk        | Approve selectively with higher pricing or collateral |
| Very High Risk   | Reject or require strong guarantees |

These actions align with common retail banking credit policies.

## Business Recommendations

- The PD model demonstrates strong discriminatory power (AUC ≈ 0.76) and stable performance.
- Risk segmentation enables differentiated pricing and approval strategies.
- Behavioral variables such as checking and saving account balances are key risk drivers.
- The model can support:
  - Loan approval decisions
  - Risk-based pricing
  - Portfolio monitoring and early warning systems

## Final Project Conclusion

An end-to-end Credit Risk PD model was successfully developed using Logistic Regression.
The project covered:
- Data understanding and EDA
- Feature engineering aligned with credit risk principles
- PD model development and evaluation
- Business-oriented risk segmentation and recommendations

The resulting framework is interpretable, realistic, and aligned with industry
best practices, making it suitable for real-world retail credit risk assessment.