
# PD, LGD, EAD - CECL models

This notebook illustrates three key concepts: Probability of Default (PD), Loss Given Default (LGD), and Exposure at Default (EAD). These metrics are essential for calculating Expected Credit Loss (ECL) and are integral to the Current Expected Credit Loss (CECL) framework.

**Probability of Default (PD)** quantifies the likelihood that a borrower will default on a loan within a specified time frame. It is typically estimated using historical data and predictive modeling techniques, such as logistic regression. In this analysis, we generate synthetic loan data and apply logistic regression to predict PD based on borrower characteristics.

**Loss Given Default (LGD)** measures the percentage of the total exposure that is lost when a borrower defaults. It is calculated as one minus the recovery rate, which represents the proportion of the loan amount that can be recovered after default. Understanding LGD is vital for assessing potential losses and managing risk effectively.

**Exposure at Default (EAD)** represents the total value that a lender is exposed to at the time of default, including the outstanding loan amount and any additional amounts that may be drawn before default. EAD is crucial for calculating expected losses and determining capital requirements.

This notebook explains the calculation of these metrics using synthetic data, demonstrating the methodology and providing insights into their significance in credit risk assessment.



## PD, LGD, and EAD Analysis Using Python

The code below illustrates the concepts of Probability of Default (PD), Loss Given Default (LGD), and Exposure at Default (EAD) within the framework of Current Expected Credit Loss (CECL) using a logistic regression model. Additionally, I will provide a simple formula and corresponding code to demonstrate how these concepts can be applied in practice.

- **Probability of Default (PD)**:
   - PD represents the likelihood that a loan will default. In the context of a logistic regression model, PD is the predicted probability that the dependent variable (default) equals 1 (i.e., the loan defaults).
   - In this analysis, we will use logistic regression to build a CECL model that produces binary outputs: $ y = 1 $ for default and $ y_{\text{proba}} = \text{PD} $ for the predicted probability of default.

- **Loss Given Default (LGD)**:
   - LGD measures the proportion of the exposure that is lost when a default occurs. It is typically calculated as $ 1 - \text{Recovery Rate} $, where the recovery rate is the percentage of the loan amount that can be recovered after default.

- **Exposure at Default (EAD)**:
   - EAD represents the total amount of money that is at risk if the loan defaults. It is defined as the outstanding principal balance of the loan at the time of default.

### Expected Loss (EL) Formula

The expected loss (EL) for a loan can be calculated using the following formula:

$$\text{EL} = \text{PD} \times \text{LGD} \times \text{EAD}$$

### Implementation

In this analysis, I have created a dataset containing features that can help predict whether a loan will default. I will utilize a logistic regression model to estimate the Probability of Default (PD) based on these features.


In [4]:
# PD/LGD/EAD Analysis
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

# Generate synthetic data
num_loans = 1000
np.random.seed(42)

# Create a DataFrame with synthetic loan data
data = {
    'Loan_ID': range(1, num_loans + 1),
    'Default': np.random.binomial(1, 0.05, num_loans),  # 5% default rate
    'Recovery_Rate': np.random.uniform(0.2, 0.8, num_loans),  # Recovery rate between 20% and 80%
    'Exposure_Amount': np.random.uniform(50000, 500000, num_loans),  # Exposure amount between $50,000 and $500,000
    'Feature1': np.random.normal(0, 1, num_loans),  # Example feature 1
    'Feature2': np.random.normal(0, 1, num_loans)   # Example feature 2
}

df = pd.DataFrame(data)

# Features and target variable
X = df[['Feature1', 'Feature2']]
y = df['Default']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict probabilities
y_prob = model.predict_proba(X_test)[:, 1]

# Calculate PD, LGD, and EAD
df_test = df.iloc[X_test.index].copy()  # Use .copy() to avoid SettingWithCopyWarning
df_test['PD'] = y_prob  # Probability of Default
df_test['LGD'] = 1 - df_test['Recovery_Rate']  # Loss Given Default
df_test['EAD'] = df_test['Exposure_Amount']  # Exposure at Default

# Calculate Expected Loss
df_test['Expected_Loss'] = df_test['PD'] * df_test['LGD'] * df_test['EAD']

# Display the results
print("\nExpected Loss Results:")
print(df_test[['Loan_ID', 'PD', 'LGD', 'EAD', 'Expected_Loss']].head())

# Intuition Behind PD, LGD, and EAD Metrics
print("\nIntuition Behind PD, LGD, and EAD Metrics:")
print("1. Probability of Default (PD):")
print("   - PD represents the likelihood that a borrower will default on a loan within a specified time frame.")
print("   - It is calculated using historical data and predictive modeling techniques, such as logistic regression.")
print("   - A higher PD indicates a greater risk of default, which can influence lending decisions and pricing.")

print("\n2. Loss Given Default (LGD):")
print("   - LGD measures the percentage of the total exposure that is lost when a borrower defaults.")
print("   - It is calculated as 1 minus the recovery rate, which is the proportion of the loan amount that can be recovered after default.")
print("   - Understanding LGD helps lenders assess potential losses and manage risk more effectively.")

print("\n3. Exposure at Default (EAD):")
print("   - EAD represents the total value that a lender is exposed to at the time of default.")
print("   - It includes the outstanding loan amount and any additional amounts that may be drawn before default.")
print("   - EAD is crucial for calculating expected losses and capital requirements.")

# Optional: Display summary statistics for better insights
print("\nSummary Statistics of the Test Set:")
print(df_test[['PD', 'LGD', 'EAD', 'Expected_Loss']].describe())



Expected Loss Results:
     Loan_ID        PD       LGD            EAD  Expected_Loss
521      522  0.043567  0.239338  110495.808234    1152.161012
737      738  0.040899  0.683226  295017.049760    8243.694003
740      741  0.058106  0.303477  214556.336718    3783.473318
660      661  0.046203  0.439643  262526.811873    5332.642898
411      412  0.049351  0.732848  113248.835410    4095.836856

Intuition Behind PD, LGD, and EAD Metrics:
1. Probability of Default (PD):
   - PD represents the likelihood that a borrower will default on a loan within a specified time frame.
   - It is calculated using historical data and predictive modeling techniques, such as logistic regression.
   - A higher PD indicates a greater risk of default, which can influence lending decisions and pricing.

2. Loss Given Default (LGD):
   - LGD measures the percentage of the total exposure that is lost when a borrower defaults.
   - It is calculated as 1 minus the recovery rate, which is the proportion of t

### Conclusion

The calculation of Probability of Default (PD), Loss Given Default (LGD), and Exposure at Default (EAD) is fundamental to effective credit risk management. Through the use of synthetic data and logistic regression, we have demonstrated how these metrics can be calculated and analyzed. 

In practice, financial institutions employ sophisticated models and extensive historical data to estimate PD, LGD, and EAD. These calculations are not only essential for regulatory compliance but also for strategic decision-making regarding lending practices, risk pricing, and capital allocation. By accurately assessing these metrics, lenders can better understand their risk exposure, make informed lending decisions, and ultimately enhance their financial stability.

The insights gained from this analysis underscore the importance of robust credit risk modeling and the need for continuous monitoring and adjustment of these metrics in response to changing economic conditions and borrower behavior. As the financial landscape evolves, the ability to accurately predict and manage credit risk will remain a critical component of successful lending operations.