# Stress Testing, Backtesting - CECL model validation 

In this notebook, I'll explore the essential validation steps involved in the model validation process. Our primary focus will be on illustrating how to evaluate the accuracy of the model's predictions through backtesting and conducting stress testing to assess its robustness under adverse conditions. The validation process is specifically illustrated for Current Expected Credit Loss (CECL) models.

### CECL Model Overview:

The model is designed to estimate the expected credit losses over the life of a financial asset. The key components for this estimation include:

- **Probability of Default (PD)**: The likelihood that a loan will default.
- **Loss Given Default (LGD)**: The proportion of the loan amount that is lost if a default occurs.
- **Exposure at Default (EAD)**: The total amount of the loan that is at risk in the event of a default.
- **Expected Loss Calculation(EL)** for a loan can be calculated using the formula: $\text{EL} = \text{PD} \times \text{LGD} \times \text{EAD}$

In the context of a CECL model:

- **Output ($y$)**:
  - $y = 1$: Indicates that the loan has defaulted.
  - $y = 0$: Indicates that the loan has not defaulted.

- **Probability of Default ($y_{\text{proba}}$)**:
  - $y_{\text{proba}}$ represents the Probability of Default (PD), which is the predicted probability that a loan will default, as generated by the classifiers.

### Python implementation:

Let's assume we have a CECL model that incorporates these components:

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

import warnings
warnings.filterwarnings('ignore')


# Generate synthetic data
num_loans = 1000
np.random.seed(42)

data = {
    'Loan_ID': range(1, num_loans + 1),
    'Default': np.random.binomial(1, 0.05, num_loans),  # 5% default rate
    'Recovery_Rate': np.random.uniform(0.2, 0.8, num_loans),
    'Exposure_Amount': np.random.uniform(50000, 500000, num_loans),
    'Feature1': np.random.normal(0, 1, num_loans),  # Example feature
    'Feature2': np.random.normal(0, 1, num_loans)   # Example feature
}

df = pd.DataFrame(data)

# Features and target
X = df[['Feature1', 'Feature2']]
y = df['Default']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict probabilities
y_prob = model.predict_proba(X_test)[:, 1]

# Calculate PD, LGD, and EAD
df_test = df.iloc[X_test.index]
df_test['PD'] = y_prob
df_test['LGD'] = 1 - df_test['Recovery_Rate']
df_test['EAD'] = df_test['Exposure_Amount']

# Calculate Expected Loss
df_test['Expected_Loss'] = df_test['PD'] * df_test['LGD'] * df_test['EAD']



## Validating a CECL Model through Backtesting and Stress Testing

Validating a CECL model through backtesting and stress testing is crucial to ensure its robustness and reliability. Below is an approach to these validation techniques:

### Python Code for Backtesting and Stress Testing of the CECL Model

**Backtesting**: This technique involves applying the model to historical data to evaluate its performance. It helps in understanding the model's predictive accuracy and stability over time.

- Apply the model to the testing data to predict probabilities of default.
- Compare the predicted probabilities with the actual default status to assess the model's performance.

**Stress Testing**: This technique involves subjecting the model to extreme but plausible scenarios to evaluate its performance under adverse conditions. It helps in understanding the model's resilience and the potential impact of severe economic downturns or other stress events.

- Simulate extreme scenarios by adjusting the input features to reflect adverse conditions.
- Re-evaluate the model's performance under these scenarios.


In [9]:
from sklearn.metrics import roc_auc_score, confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')


# Predict probabilities and classes
y_prob = model.predict_proba(X_test)[:, 1]
y_pred = model.predict(X_test)

# Backtesting
print("Backtesting Results:")
print("ROC AUC Score:", roc_auc_score(y_test, y_prob))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

# Calculate PD, LGD, and EAD
df_test = df.iloc[X_test.index]
df_test['PD'] = y_prob
df_test['LGD'] = 1 - df_test['Recovery_Rate']
df_test['EAD'] = df_test['Exposure_Amount']

# Calculate Expected Loss
df_test['Expected_Loss'] = df_test['PD'] * df_test['LGD'] * df_test['EAD']

# Display the results
print(df_test[['Loan_ID', 'PD', 'LGD', 'EAD', 'Expected_Loss']].head())

# Stress Testing
# Simulate a stress scenario by adjusting the features
stress_features = X_test.copy()
stress_features['Feature1'] = stress_features['Feature1'] - 2  # Example stress adjustment
stress_features['Feature2'] = stress_features['Feature2'] - 2  # Example stress adjustment

# Predict probabilities and classes under stress scenario
stress_y_prob = model.predict_proba(stress_features)[:, 1]
stress_y_pred = model.predict(stress_features)

# Stress Testing Results
print("\nStress Testing Results:")
print("ROC AUC Score under Stress:", roc_auc_score(y_test, stress_y_prob))
print("Confusion Matrix under Stress:\n", confusion_matrix(y_test, stress_y_pred))
print("Classification Report under Stress:\n", classification_report(y_test, stress_y_pred))

Backtesting Results:
ROC AUC Score: 0.6759744037230948
Confusion Matrix:
 [[191   0]
 [  9   0]]
Classification Report:
               precision    recall  f1-score   support

           0       0.95      1.00      0.98       191
           1       0.00      0.00      0.00         9

    accuracy                           0.95       200
   macro avg       0.48      0.50      0.49       200
weighted avg       0.91      0.95      0.93       200

     Loan_ID        PD       LGD            EAD  Expected_Loss
521      522  0.043567  0.239338  110495.808234    1152.161012
737      738  0.040899  0.683226  295017.049760    8243.694003
740      741  0.058106  0.303477  214556.336718    3783.473318
660      661  0.046203  0.439643  262526.811873    5332.642898
411      412  0.049351  0.732848  113248.835410    4095.836856

Stress Testing Results:
ROC AUC Score under Stress: 0.6759744037230948
Confusion Matrix under Stress:
 [[191   0]
 [  9   0]]
Classification Report under Stress:
           


## Regulatory Compliance

The CECL model, introduced by the Financial Accounting Standards Board (FASB), marks a significant shift in how financial institutions account for credit losses. This standard adopts a forward-looking approach to estimating credit losses, enhancing credit risk assessment and improving the transparency of financial statements.

While the transition to CECL presents challenges, it offers numerous benefits, including improved credit risk assessment and greater transparency in financial reporting.




## Next Steps: Further Model Enhancement & Monitoring

**Model Sponsors:**

- **Model Refinement & Enhancement**:
  - **Data Enrichment**: Explore additional data sources, such as macroeconomic indicators or sector-specific trends, to strengthen the model's assumptions and estimates.
  - **Expanded Scenario Analysis**: Conduct a broader range of scenario analyses to evaluate the model's performance under various economic conditions, including extreme stress scenarios.

**Model Performance Feedback**: 
- Review and adjust model parameters based on insights gained from backtesting and stress testing to enhance predictive accuracy.

**Monitoring Performance**: 
- To ensure the CECL model remains robust and reliable, implement ongoing monitoring based on model risk. Establish a schedule for routine model updates and validations to reflect changes in market conditions and regulatory requirements.

**Model Validators:**

- **Collaborative Review**: Engage in peer reviews of the validation findings to gain diverse perspectives and identify potential areas for improvement.

**Model Sponsors and Validators:**

- **Enhanced Documentation**: Improve documentation of the validation process, methodologies, and results to ensure transparency and facilitate compliance with regulatory standards.
