In [0]:
The purpose of this code is to perform stress testing on a mortgage portfolio to evaluate how it would perform under adverse economic conditions. Specifically:

Modeling Default Risk: It simulates how changes in external factors (like unemployment, interest rates, and property values) impact the default probability (PD) of loans in the portfolio.

Assessing Losses: It calculates the Loss Given Default (LGD) based on changes in property values (through the loan-to-value ratio) and uses this to estimate the Expected Loss (EL) for each loan.

Portfolio Risk Evaluation: Finally, it aggregates the individual loan losses to determine the total portfolio loss under the simulated stress scenarios, helping to assess the overall risk exposure of the portfolio under extreme but plausible conditions.

'''
Step 1: Load Mortgage Portfolio Data
A Spark session is created to process the data. Sample mortgage data is defined and converted into a PySpark DataFrame for analysis.

Step 2: Define Stress Scenarios
Stress parameters are defined to simulate shocks, including a 10% increase in unemployment, a 2% rise in interest rates, and a 20% drop in property values.

Step 3: Model Default Probability (PD)
A base PD is calculated based on credit score using a logistic function, which is then adjusted for stress scenarios (e.g., increased unemployment).

Step 4: Calculate LGD and EAD
The stressed property value is computed by applying the 20% decline, and the loan-to-value (LTV) ratio is recalculated. The Loss Given Default (LGD) is then determined based on the new LTV.

Step 5: Compute Expected Loss (EL)
Expected Loss (EL) is calculated for each loan by multiplying the stress-adjusted PD, LGD, and loan balance (Exposure at Default or EAD).

Step 6: Aggregate Portfolio Loss
The total portfolio loss is computed by summing the expected losses across all loans, providing a measure of portfolio risk under the stressed scenario.'''

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, lit 
from pyspark.sql.types import DoubleType

# Step 1: Load Mortgage Portfolio Data
spark = SparkSession.builder.appName("StressTesting").getOrCreate()

# Sample mortgage data
data = [
    (1, 500000, 0.04, 0.75, 700, 300000),
    (2, 750000, 0.035, 0.85, 680, 600000),
    (3, 300000, 0.05, 0.65, 720, 200000)
]
columns = ["loan_id", "loan_balance", "interest_rate", "ltv", "credit_score", "property_value"]

# Create DataFrame
mortgage_df = spark.createDataFrame(data, columns)
mortgage_df.show()

# Step 2: Define Stress Scenarios
stress_params = {
    "unemployment_increase": 0.10,  # 10% rise
    "interest_rate_shock": 0.02,    # +2%
    "property_value_decline": 0.20  # -20%
}

# Step 3: Model Default Probability (PD)

# Base PD based on credit score using a logistic function
def calculate_base_pd(credit_score):
    return 1 / (1 + (2.718 ** (-0.02 * (credit_score - 650))))  # Logistic function

# Stress-adjusted PD function
def stress_pd(base_pd, unemployment_shock):
    return base_pd * (1 + unemployment_shock)

# Register UDFs
calculate_base_pd_udf = udf(calculate_base_pd, DoubleType())
stress_pd_udf = udf(stress_pd, DoubleType())

# Apply to DataFrame
mortgage_df = mortgage_df.withColumn("base_pd", calculate_base_pd_udf(col("credit_score")))
mortgage_df = mortgage_df.withColumn("stress_pd", stress_pd_udf(col("base_pd"), lit(stress_params["unemployment_increase"])))

# Step 4: Calculate LGD and EAD
# Stressed property values and recalculated LTV ratio
mortgage_df = mortgage_df.withColumn("stressed_property_value", col("property_value") * (1 - stress_params["property_value_decline"]))
mortgage_df = mortgage_df.withColumn("stressed_ltv", col("loan_balance") / col("stressed_property_value"))

# LGD model: 40% if LTV > 80%, else 25%
mortgage_df = mortgage_df.withColumn("lgd", when(col("stressed_ltv") > 0.8, 0.4).otherwise(0.25))

# EAD = loan balance (same as original balance)
mortgage_df = mortgage_df.withColumn("ead", col("loan_balance"))

# Step 5: Compute Expected Loss (EL)
mortgage_df = mortgage_df.withColumn("expected_loss", col("stress_pd") * col("lgd") * col("ead"))
mortgage_df.select("loan_id", "expected_loss").show()

# Step 6: Aggregate Portfolio Loss
total_loss = mortgage_df.agg({"expected_loss": "sum"}).collect()[0][0]
print(f"Total portfolio loss under stress scenario: ${total_loss:,.2f}")


+-------+------------+-------------+----+------------+--------------+
|loan_id|loan_balance|interest_rate| ltv|credit_score|property_value|
+-------+------------+-------------+----+------------+--------------+
|      1|      500000|         0.04|0.75|         700|        300000|
|      2|      750000|        0.035|0.85|         680|        600000|
|      3|      300000|         0.05|0.65|         720|        200000|
+-------+------------+-------------+----+------------+--------------+

+-------+------------------+
|loan_id|     expected_loss|
+-------+------------------+
|      1|160828.40236686394|
|      2|213061.88418916863|
|      3| 105885.2326183207|
+-------+------------------+

Total portfolio loss under stress scenario: $479,775.52
