# Credit Risk Modelling – Expected Loss (EL)

## Objective
This notebook extends the Probability of Default (PD) model by estimating
**Expected Loss (EL)** for each borrower.

Expected Loss is a core credit risk metric used by banks for:
- Loan pricing
- Capital allocation
- Portfolio risk monitoring
- Regulatory reporting (Basel framework)

This notebook integrates PD with Loss Given Default (LGD) and
Exposure at Default (EAD) to build a complete credit risk framework.

## Expected Loss Framework

Expected Loss is defined as:

$$
\text{Expected Loss (EL)} = \text{PD} \times \text{LGD} \times \text{EAD}
$$

Where:
- **PD (Probability of Default):** Likelihood that a borrower will default
- **LGD (Loss Given Default):** Percentage of exposure lost if default occurs
- **EAD (Exposure at Default):** Outstanding exposure at the time of default

In [1]:
import pandas as pd
import numpy as np

## Probability of Default (PD)

PD values are obtained from the previously developed Logistic Regression model.
These probabilities represent borrower-level default risk estimates and serve
as the primary input for Expected Loss calculation.

In [2]:
# Loading processed dataset
df = pd.read_csv("../data/processed/credit_model_ready.csv")

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X = df.drop(columns=["Risk"])
y = df["Risk"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

pd_scores = model.predict_proba(X_test)[:, 1]

## Exposure at Default (EAD)

For retail credit products, Exposure at Default is approximated using the
**loan (credit) amount**, which represents the outstanding exposure if default occurs.

Raw (non-standardized) credit amount values are used to preserve the
monetary interpretation of exposure.

In [3]:
# Loading raw data for EAD
raw_df = pd.read_csv("../data/raw/german_credit_data.csv")
raw_df.drop(columns=["Unnamed: 0"], inplace=True)

# Align raw credit amount with test set
raw_test = raw_df.loc[X_test.index].copy()

ead = raw_test["Credit amount"]

## Loss Given Default (LGD)

In the absence of borrower-level recovery data, LGD is assigned using
conservative retail credit assumptions.

- Typical unsecured retail loan LGD ranges between **45% and 60%**
- For this project, a **fixed LGD of 45%** is assumed

This approach is widely used in academic studies and simulation-based
credit risk modeling.

In [4]:
lgd = 0.45

## Expected Loss Calculation

Expected Loss is calculated at the borrower level by combining PD, LGD,
and EAD values.

In [5]:
el_df = pd.DataFrame({
    "PD": pd_scores,
    "LGD": lgd,
    "EAD": ead.values
})

el_df["Expected_Loss"] = el_df["PD"] * el_df["LGD"] * el_df["EAD"]

el_df.head()

Unnamed: 0,PD,LGD,EAD,Expected_Loss
0,0.103494,0.45,3578,166.635799
1,0.2712,0.45,882,107.639368
2,0.376441,0.45,4473,757.71847
3,0.195655,0.45,2831,249.254814
4,0.363581,0.45,1289,210.89539


## Portfolio-Level Expected Loss

Portfolio-level Expected Loss provides an estimate of total and average
credit loss exposure across the loan portfolio.

In [6]:
total_el = el_df["Expected_Loss"].sum()
avg_el = el_df["Expected_Loss"].mean()

total_el, avg_el

(161591.59352285252, 538.6386450761751)

- **Total Expected Loss** represents the estimated credit loss for the portfolio.
- **Average Expected Loss per borrower** supports pricing and risk segmentation.

Borrowers with higher Expected Loss values contribute disproportionately
to overall portfolio risk.

## Expected Loss by Risk Band

Borrowers are segmented into PD-based risk bands to analyze how Expected Loss
is distributed across different risk levels.

In [7]:
def assign_risk_band(pd):
    if pd < 0.20:
        return "Low Risk"
    elif pd < 0.40:
        return "Medium Risk"
    elif pd < 0.60:
        return "High Risk"
    else:
        return "Very High Risk"

el_df["Risk_Band"] = el_df["PD"].apply(assign_risk_band)

In [8]:
el_by_band = el_df.groupby("Risk_Band")["Expected_Loss"].agg(
    ["count", "mean", "sum"]
)

el_by_band

Unnamed: 0_level_0,count,mean,sum
Risk_Band,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
High Risk,73,830.745449,60644.417765
Low Risk,130,151.420468,19684.660896
Medium Risk,68,437.876231,29775.583726
Very High Risk,29,1775.411418,51486.931136


## Interpretation of Expected Loss Results

- Expected Loss increases significantly across higher risk bands.
- A relatively small group of high-risk borrowers contributes a
  disproportionate share of total portfolio loss.
- This confirms the effectiveness of PD-based risk segmentation.

Expected Loss outputs can directly support:
- Risk-based pricing
- Credit approval thresholds
- Portfolio monitoring and risk mitigation

## Business Applications of Expected Loss

- **Loan Pricing:** Interest rates can be adjusted to cover expected credit losses.
- **Capital Allocation:** Portfolios with higher EL require more economic capital.
- **Early Warning Systems:** Rising EL can flag borrowers for closer monitoring.
- **Portfolio Optimization:** Risk mitigation efforts can focus on high-EL segments.

## Final Credit Risk Framework Summary

This project successfully implemented a complete credit risk framework, including:
- Probability of Default (PD) modeling using Logistic Regression
- Risk segmentation using PD-based cut-offs
- Expected Loss (EL) estimation using PD × LGD × EAD

The resulting framework is interpretable, scalable, and aligned with
real-world retail banking credit risk practices.