# Project: Simulation of interest rate scenarios of US 10-year treasury using the Schöbel-Zhu Hull-White (SZHW) model - Part B

### Estimate the impact of each independent variable on historical yields and then develop a simulation model that generate interest rate scenarios.

Note: Please refer README.md file for in-depth explanation of code and theory behind the SZHW model. 

In this part, I estimate the impact of each independent variable on the original historical yields first, and then modify the yields accordingly before using them to generate interest rate scenarios using the SZHW model.

This approach can be advantageous if you are interested in understanding how the independent variables affect the actual historical yields, as well as the scenarios generated by the SZHW model. It also allows you to directly compare the modified yields with the original yields, and study their differences and similarities.

#### Framework of this methodology:
Step 1: Gather historical interest rate data. <br>
Step 2: Calibrate the SZHW model. <br>
Step 3: Estimating the coefficients for each independent variable using a linear regression model. <br>
Step 4: Incorporating the impact of economic indicators into the scenarios.<br>
Step 5: Develop the simulation model<br>
Step 6: Evaluate the impact of interest rate scenarios. <br> 
Step 7: Validate and analyze the performance of your simulation model. 

### Step 1: Gather historical interest rate data 
Collect historical interest rate data, such as US Treasury yields or LIBOR rates, that you can use to calibrate the SZHW model. You will use this data to estimate the model's parameters and validate the accuracy of your simulations.

In [1]:
import pandas as pd

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

df = pd.read_csv('dataset/us_10_year_bond_yield.csv')
# Set the 'Date' column as the index of the DataFrame
df.set_index('Date', inplace=True)
# Convert the index to datetime format, which allows for easier manipulation and plotting
df.index = pd.to_datetime(df.index)
# pd.set_option('display.max_rows', None)
pd.set_option('max_rows', 125)
print(df)

            Yield
Date             
2022-02-01  1.512
2022-03-01  1.637
2022-04-01  1.649
2022-05-01  1.700
2022-06-01  1.728
...           ...
2022-12-26  3.743
2022-12-27  3.849
2022-12-28  3.886
2022-12-29  3.820
2022-12-30  3.879

[301 rows x 1 columns]


### Step 2: Calibrate the SZHW model
Use the historical interest rate data to calibrate the SZHW model, i.e., determine the best parameters for the model that fit the observed data. You could use optimization algorithms, such as maximum likelihood estimation, to estimate the parameters.

In [2]:
import numpy as np
import scipy.optimize as optimize

"""Define the log-likelihood function for the Schwartz-Smith (SS) model"""
# This function takes in model parameters and the observed interest rates 'r', and returns the negative log-likelihood value of the model given the data
def szhw_llf(params, r):
    kappa, theta, sigma = params
    T = len(r)
    tau = 1
    llf = 0
    for t in range(1, T):
        delta_t = (r[t] - r[t-1])
        # Compute the log-likelihood of the SS model for a given time period
        llf += -np.log(sigma**2) - ((kappa * (theta - r[t-1] + delta_t/2))**2)/(2*sigma**2)
    # Return the negative log-likelihood value
    return -llf

"""Define the function to calibrate the SS model given observed interest rates 'r'"""
# This function initializes the model parameters, runs an optimization algorithm to minimize the negative log-likelihood, and returns the calibrated parameters
def calibrate_szhw(r):
    # Set the initial parameter values
    x0 = np.array([0.1, 0.1, 0.1])
    # Run an optimization algorithm to minimize the negative log-likelihood and find the optimal parameters
    params = optimize.minimize(szhw_llf, x0, args=(r,), method='L-BFGS-B', 
                               bounds=((0.0001, None), (0.0001, None), (0.0001, None)))
    # Return the optimal parameters
    return params.x

# Calibrate the SS model given the observed interest rates in the dataFrame 'df'
params = calibrate_szhw(df['Yield'].values)
# Print the calibrated model parameters
print("Calibrated parameters:", params)
# Extract and print the calibrated values of kappa, theta, and sigma
kappa, theta, sigma = params
print("Calibrated kappa:", kappa)
print("Calibrated theta:", theta)
print("Calibrated sigma:", sigma)

Calibrated parameters: [1.00000000e-04 2.94459459e+00 1.00000000e-04]
Calibrated kappa: 0.0001
Calibrated theta: 2.94459459258886
Calibrated sigma: 0.0001


The output of the code is an array of three elements, which represent the calibrated parameters of the SZHW model: kappa, theta, and sigma. These parameters are estimated using maximum likelihood estimation by minimizing the negative log-likelihood of the SZHW model.

A brief explanation of the parameters:

- kappa is the mean-reversion rate of the short rate in the SZHW model. It determines how quickly the short rate reverts to its mean. The higher the value of kappa, the more quickly the short rate reverts to its mean.

- theta is the long-run mean of the short rate in the SZHW model. It represents the expected average short rate in the long run.

- sigma is the volatility of the short rate in the SZHW model. It represents the amount of randomness or uncertainty in the short rate.

In the output, kappa and sigma are very close to 0.0001, which suggests that the short rate is very volatile and not mean-reverting. 

On the other hand, theta is estimated to be 2.94459459, which represents the long-run mean of the short rate.

### Step 3: Estimating the coefficients for each independent variable using a linear regression model

In [3]:
economic_factors_df = pd.read_csv('dataset/us_econ_data.csv')
economic_factors_df.set_index('Date', inplace=True)
pd.set_option('max_rows', 8)
economic_factors_df

Unnamed: 0_level_0,us_treasury_yield,inflation_rate,federal_funds_rate,unemployment_rate,consumer_confidence_index
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Dec-22,3.879,6.5,4.33,3.5,3.5
Nov-22,3.611,7.1,3.83,3.6,3.5
Oct-22,4.050,7.7,3.08,3.7,3.5
Sep-22,3.829,8.2,2.33,3.5,3.5
...,...,...,...,...,...
Apr-13,1.673,1.1,0.14,7.6,3.5
Mar-13,1.852,1.5,0.09,7.5,3.5
Feb-13,1.881,2.0,0.14,7.7,3.5
Jan-13,1.985,1.6,0.15,8.0,3.5


In [4]:
print(economic_factors_df.columns)

Index(['us_treasury_yield', 'inflation_rate', 'federal_funds_rate',
       'unemployment_rate', 'consumer_confidence_index'],
      dtype='object')


In [5]:
import matplotlib.pyplot as plt
import statsmodels.api as sm

# specify the dependent variable and the independent variables
y = economic_factors_df['us_treasury_yield']
X = economic_factors_df[['inflation_rate', 'federal_funds_rate', 'unemployment_rate', 'consumer_confidence_index']]

# add a constant to the independent variables for the intercept term
X = sm.add_constant(X)

# fit the linear regression model
model = sm.OLS(y, X).fit()

# print the summary of the model
summary = model.summary()
summary

0,1,2,3
Dep. Variable:,us_treasury_yield,R-squared:,0.347
Model:,OLS,Adj. R-squared:,0.33
Method:,Least Squares,F-statistic:,20.51
Date:,"Fri, 24 Feb 2023",Prob (F-statistic):,9.89e-11
Time:,15:22:10,Log-Likelihood:,-102.66
No. Observations:,120,AIC:,213.3
Df Residuals:,116,BIC:,224.5
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
inflation_rate,0.0347,0.026,1.357,0.177,-0.016,0.085
federal_funds_rate,0.2897,0.068,4.240,0.000,0.154,0.425
unemployment_rate,-0.0806,0.036,-2.225,0.028,-0.152,-0.009
consumer_confidence_index,0.6428,0.071,9.001,0.000,0.501,0.784

0,1,2,3
Omnibus:,4.67,Durbin-Watson:,0.159
Prob(Omnibus):,0.097,Jarque-Bera (JB):,3.162
Skew:,-0.233,Prob(JB):,0.206
Kurtosis:,2.356,Cond. No.,12.3


In [6]:
import statsmodels.api as sm

# Extract the coefficients and intercept of independent that are significant
coef_table = summary.tables[1]
coef_data = coef_table.data[1:]
coefs = [float(row[1]) for row in coef_data[1:]]
const = float(coef_data[0][1])
print(coefs)
print(const)

[0.2897, -0.0806, 0.6428]
0.0347


### Step 4: Incorporating the impact of economic indicators into the scenarios

In [7]:
federal_funds_rate_coeff = coefs[0]
unemployment_rate_coeff = coefs[1]
consumer_confidence_index_coeff = coefs[2]

scenarios = df.to_numpy()

# Create a matrix of economic indicators for each scenario
# In this example, we assume that the economic indicators remain constant over time
federal_funds_rate = np.full((len(scenarios), len(scenarios[0])), 2.0)
unemployment_rate = np.full((len(scenarios), len(scenarios[0])), 1.5)
consumer_confidence_index = np.full((len(scenarios), len(scenarios[0])), 0.5)

T = len(df)

# Incorporate the impact of economic indicators into the scenarios
scenarios += federal_funds_rate_coeff * federal_funds_rate + unemployment_rate_coeff * unemployment_rate + consumer_confidence_index_coeff * consumer_confidence_index

sim_df = pd.DataFrame(data=scenarios, index=df.index, columns=df.columns)
# sim_df.index = pd.date_range(start=sim_df.index[0], periods=sim_df.shape[0], freq='M')
sim_df.index = pd.date_range(start=df.index[1], periods=T, freq='D')
sim_df

Unnamed: 0,Yield
2022-03-01,2.2919
2022-03-02,2.4169
2022-03-03,2.4289
2022-03-04,2.4799
...,...
2022-12-23,4.6289
2022-12-24,4.6659
2022-12-25,4.5999
2022-12-26,4.6589


### Step 5: Develop the simulation model 
Develop a simulation model that uses the calibrated SZHW model to generate interest rate scenarios. One way to do this is by implementing the exact simulation algorithm for the SZHW model, which involves updating the interest rate at each time step based on the previous value and a normally distributed shock term.

In [8]:
# Define a function to simulate the Schwartz-Smith (SS) model for a given set of parameters and initial interest rate
# This function returns a matrix of simulated interest rate paths with 'scenarios' columns and 'T' rows
def simulate_szhw(params, T, r0, scenarios=1):
    # Extract the model parameters
    kappa, theta, sigma = params
    tau = 1
    # Initialize a matrix to store the simulated interest rate paths
    r = np.zeros((T, scenarios))
    # Set the initial interest rate values for each scenario
    r[0,:] = r0
    # Loop over each time step and simulate the interest rate changes according to the SS model
    for t in range(1, T):
        delta_r = kappa * (theta - r[t-1,:]) * tau + sigma * np.sqrt(tau) * np.random.normal(size=scenarios)
        r[t,:] = r[t-1,:] + delta_r
    # Return the matrix of simulated interest rate paths
    return r

# Generate a simulated interest rate scenario using the calibrated SS model
# Set the simulation horizon to 1 years with daily data
T = 365 * 1 
# Starting interest rate is the last observed value in the DataFrame 'df'
r0 = sim_df['Yield'].iloc[-1] 
# Simulate 10 different scenarios
r = simulate_szhw(params, T, r0, scenarios=10)

# Store the simulation results in a pandas DataFrame for convenient analysis and visualization 
final_sim_df = pd.DataFrame(r, columns=[f"Scenario {i}" for i in range(r.shape[1])])
# Set the index of the DataFrame to be a range of dates starting from the last observed date in 'df'
final_sim_df.index = pd.date_range(start=sim_df.index[-1], periods=T, freq='D')

pd.set_option('max_rows', 125)
final_sim_df

Unnamed: 0,Scenario 0,Scenario 1,Scenario 2,Scenario 3,Scenario 4,Scenario 5,Scenario 6,Scenario 7,Scenario 8,Scenario 9
2022-12-26,4.658900,4.658900,4.658900,4.658900,4.658900,4.658900,4.658900,4.658900,4.658900,4.658900
2022-12-27,4.658723,4.658636,4.658530,4.658814,4.658799,4.658676,4.658846,4.658766,4.658815,4.658617
2022-12-28,4.658549,4.658328,4.658425,4.658642,4.658697,4.658471,4.658550,4.658680,4.658721,4.658342
2022-12-29,4.658481,4.658112,4.658309,4.658495,4.658605,4.658430,4.658240,4.658593,4.658456,4.658452
2022-12-30,4.658281,4.658009,4.658217,4.658380,4.658451,4.658439,4.658159,4.658297,4.658201,4.658124
...,...,...,...,...,...,...,...,...,...,...
2023-12-21,4.594810,4.597262,4.597244,4.599413,4.600338,4.595132,4.598587,4.598613,4.600249,4.601259
2023-12-22,4.594633,4.597002,4.596994,4.599325,4.600233,4.595065,4.598475,4.598490,4.600131,4.601020
2023-12-23,4.594641,4.596992,4.596820,4.599057,4.600103,4.594850,4.598179,4.598211,4.599978,4.600832
2023-12-24,4.594485,4.596763,4.596765,4.598853,4.599878,4.594711,4.597962,4.597971,4.599908,4.600620


### Step 6: Evaluate the impact of interest rate scenarios
To evaluate the impact of different interest rate scenarios on financial instruments like bonds and options, we can use the simulated yields as inputs to a pricing model for the financial instrument. The pricing model will provide estimates of the instrument's value for each scenario. We can use an option pricing model, such as the Black-Scholes model, to estimate the value of an option for each interest rate scenario. The Black-Scholes model uses the underlying asset price, the option's strike price, time to maturity, volatility, and risk-free interest rate to estimate the option's value. By using the simulated yields as inputs to estimate the risk-free interest rate, you can estimate the option's value for each scenario.

##### Using the Black-Scholes model to estimate the value of a call option for each of the ten US 10-year treasury yield scenarios from "final_sim_df" dataframe:

In [9]:
from scipy.stats import norm

# Setting the option parameters
# Underlying asset price
S = 100  
# Option strike price
K = 110  
# Risk-free interest rate
r = 0.02
# Time to maturity (in years)
T = 1  
# Volatility of the underlying asset
sigma = 0.2 
# Number of simulations
N = 100000  

# Simulated yields for 10 scenarios
sim_yields = final_sim_df.values

# Function to calculate Black-Scholes call option price
def bs_call(S, K, r, T, sigma):
    d1 = (np.log(S/K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    d2 = d1 - sigma * np.sqrt(T)
    return S * norm.cdf(d1) - K * np.exp(-r * T) * norm.cdf(d2)

# Calculate call option price for each scenario
call_prices = []
for i in range(10):
    # Estimate risk-free interest rate from simulated yield for the scenario
    rf = sim_yields[:, i].mean() / 100 
    # Calculate option price using Black-Scholes model
    d1 = (np.log(S/K) + (rf + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    d2 = d1 - sigma * np.sqrt(T)
    call_price = bs_call(S, K, rf, T, sigma)
    call_prices.append(call_price)
    
call_prices

[5.895598334786214,
 5.896130112544121,
 5.896455788125898,
 5.896747384175633,
 5.896762785072873,
 5.895576856372301,
 5.896602891560065,
 5.896767559056578,
 5.89699256517013,
 5.8970798310558905]

In [10]:
for i in range(len(call_prices)):
    print(f'The estimated call option prices for scenario {i} of the US 10-year treasury simulated yield:', call_prices[i])

The estimated call option prices for scenario 0 of the US 10-year treasury simulated yield: 5.895598334786214
The estimated call option prices for scenario 1 of the US 10-year treasury simulated yield: 5.896130112544121
The estimated call option prices for scenario 2 of the US 10-year treasury simulated yield: 5.896455788125898
The estimated call option prices for scenario 3 of the US 10-year treasury simulated yield: 5.896747384175633
The estimated call option prices for scenario 4 of the US 10-year treasury simulated yield: 5.896762785072873
The estimated call option prices for scenario 5 of the US 10-year treasury simulated yield: 5.895576856372301
The estimated call option prices for scenario 6 of the US 10-year treasury simulated yield: 5.896602891560065
The estimated call option prices for scenario 7 of the US 10-year treasury simulated yield: 5.896767559056578
The estimated call option prices for scenario 8 of the US 10-year treasury simulated yield: 5.89699256517013
The estimat

##### Calculating statistics such as the mean, median, minimum, and maximum of the estimated option prices to get a sense of the expected value and the range of possible outcomes for the option under the different yield scenarios.

In [11]:
# Calculate statistics for each scenario
for i in range(10):
    scenario = final_sim_df.iloc[:, i]
    print("Scenario ", i)
    print("Mean: ", np.mean(scenario))
    print("Median: ", np.median(scenario))
    print("Minimum: ", np.min(scenario))
    print("Maximum: ", np.max(scenario))
    print("")

# Calculate statistics for the simulated call prices
print("Simulated Call Prices")
print("Mean: ", np.mean(call_prices))
print("Median: ", np.median(call_prices))
print("Minimum: ", np.min(call_prices))
print("Maximum: ", np.max(call_prices))

Scenario  0
Mean:  4.625939064503568
Median:  4.626085417016442
Minimum:  4.594381012651386
Maximum:  4.6589

Scenario  1
Mean:  4.6273263944180165
Median:  4.626229259810913
Minimum:  4.596750626916864
Maximum:  4.6589

Scenario  2
Mean:  4.628175994655881
Median:  4.62803899319983
Minimum:  4.59656828174827
Maximum:  4.6589

Scenario  3
Mean:  4.628936665124689
Median:  4.62869321588454
Minimum:  4.5986807942973025
Maximum:  4.6589

Scenario  4
Mean:  4.6289768399251106
Median:  4.628452086873348
Minimum:  4.599645727637362
Maximum:  4.6589

Scenario  5
Mean:  4.625883028804434
Median:  4.625884973081677
Minimum:  4.594686712985344
Maximum:  4.6589

Scenario  6
Mean:  4.62855973823985
Median:  4.628345429391404
Minimum:  4.597789291972454
Maximum:  4.6589

Scenario  7
Mean:  4.628989293331547
Median:  4.629356496901977
Minimum:  4.597709231541846
Maximum:  4.6589

Scenario  8
Mean:  4.629576236630746
Median:  4.629668757459472
Minimum:  4.599607916182509
Maximum:  4.6589

Scenario  9

The output shows the descriptive statistics (mean, median, minimum, and maximum) for each of the 10 scenarios and the simulated call prices.

For each scenario, the mean and median are close, indicating that the data is approximately normally distributed with little skewness. 

The simulated call prices have a higher mean and median than the simulated US treasury yields in each scenario, indicating that the call prices are influenced by factors other than the yields, such as time to maturity, volatility, and strike price.

### Step 7: Validate and analyze the performance of your simulation model

##### Running the calibrated SZHW model on historical data ("backtest_df") to generate interest rate scenarios.

One way to do this is to run your model on historical data, generate simulated treasury yields for the same time period, and then compare the simulated yields with the actual historical yields. This can help us assess the accuracy of your simulation model in replicating the actual behavior of interest rates.

In [12]:
backtest_df = pd.read_csv('dataset/us_treasury_backtest_data.csv')
backtest_df.set_index('Date', inplace=True)
backtest_df.index = pd.to_datetime(backtest_df.index)

# Developing a simulation model that uses the calibrated SZHW model to generate interest rate scenarios.
T = 365
# Starting interest rate is the last observed value
R0 = backtest_df['Yield'].iloc[-1] 
R = simulate_szhw(params, T, R0, scenarios=10)

# The simulation results are stored in a pandas DataFrame for convenient analysis and visualization 
backtest_sim_df = pd.DataFrame(R, columns=[f"Scenario {i}" for i in range(R.shape[1])])
backtest_sim_df.index = pd.date_range(start=backtest_df.index[-1], periods=T, freq='D')

# pd.set_option('display.max_rows', None)
pd.set_option('max_rows', 125)
backtest_sim_df

Unnamed: 0,Scenario 0,Scenario 1,Scenario 2,Scenario 3,Scenario 4,Scenario 5,Scenario 6,Scenario 7,Scenario 8,Scenario 9
2021-12-31,1.512000,1.512000,1.512000,1.512000,1.512000,1.512000,1.512000,1.512000,1.512000,1.512000
2022-01-01,1.512234,1.512146,1.512329,1.512169,1.512067,1.511982,1.511945,1.512179,1.512170,1.512162
2022-01-02,1.512278,1.512329,1.512554,1.512244,1.512170,1.512113,1.512108,1.512259,1.512293,1.512415
2022-01-03,1.512051,1.512451,1.512701,1.512519,1.512154,1.512272,1.512095,1.512312,1.512517,1.512480
2022-01-04,1.512369,1.512606,1.512807,1.512772,1.512226,1.512416,1.512314,1.512552,1.512633,1.512582
...,...,...,...,...,...,...,...,...,...,...
2022-12-26,1.559508,1.562030,1.561265,1.564296,1.561554,1.563539,1.563132,1.560602,1.563011,1.564361
2022-12-27,1.559579,1.562350,1.561356,1.564506,1.561339,1.563591,1.563182,1.560713,1.563066,1.564350
2022-12-28,1.559744,1.562493,1.561583,1.564620,1.561564,1.563749,1.563370,1.560685,1.563109,1.564463
2022-12-29,1.559802,1.562660,1.561818,1.564778,1.561915,1.563881,1.563616,1.560789,1.563304,1.564619


##### Comparing the historical yields ("backtest_df" dataframe) and the simulated yields ("backtest_sim_df") for each of the 10 scenarios

Now that I have ran the SZHW model on historical data, I generate simulated treasury yields for the same time period, and then compare the simulated yields with the actual historical yields. This can help assess the accuracy of our simulation model in replicating the actual behavior of interest rates.

##### Calculating the root mean squared error (RMSE) between the simulated yields in "bacltest_sim_df" dataframe and the actual yields in "df" dataframe.

In [13]:
from sklearn.metrics import mean_squared_error

# create a list of scenario names
scenario_names = ['Scenario 0', 'Scenario 1', 'Scenario 2', 'Scenario 3', 
                  'Scenario 4', 'Scenario 5', 'Scenario 6', 'Scenario 7', 'Scenario 8', 'Scenario 9']

# loop through scenarios and calculate RMSE
for i, scenario in enumerate(scenario_names):
    # select scenario from backtest_sim_df
    sim_yields = backtest_sim_df[scenario][:301]    
    # select yields from df
    actual_yields = df['Yield']    
    # calculate RMSE
    rmse = np.sqrt(mean_squared_error(actual_yields, sim_yields))    
    # print results
    print(f'{scenario} RMSE: {rmse}')

Scenario 0 RMSE: 2.308641680380928
Scenario 1 RMSE: 2.3075203537392768
Scenario 2 RMSE: 2.308721577993068
Scenario 3 RMSE: 2.3070605493725584
Scenario 4 RMSE: 2.30834150919318
Scenario 5 RMSE: 2.30798872827967
Scenario 6 RMSE: 2.307913171206242
Scenario 7 RMSE: 2.309496048912742
Scenario 8 RMSE: 2.3070339405817073
Scenario 9 RMSE: 2.3072831193759122


  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


Based on the RMSE values, it seems like Scenario 1 and Scenario 8 have the lowest error, which indicates that their simulated treasury yields are closest to the actual treasury yields in the "Yield" column of the "df" dataframe.

On the other hand, Scenario 3 has the highest error, which means that its simulated yields differ the most from the actual yields.

Overall, the RMSE values indicate that the simulated yields in these scenarios are relatively close to the actual yields, but there is still room for improvement in terms of accuracy.

It is worth noting that the RMSE values are on the same scale as the yield data, which means that a difference of 1.5 in the RMSE value corresponds to a difference of 1.5 basis points in the yield data.

#### Comparing each scenario in simulated data ("backtest_sim_df") with the actual data ("df" dataframe) 

This code will output the results for each scenario in the form of the root mean squared error (RMSE), mean absolute error, and maximum error.

The RMSE is a measure of the average difference between the simulated yields and the historical yields, with a lower value indicating a better fit. The mean absolute error is a similar measure, but without squaring the differences, so it gives a sense of the typical size of errors. The maximum error indicates the largest deviation between the simulated and historical yields.

In [14]:
# Create a list of scenario names
scenario_names = ['Scenario 0', 'Scenario 1', 'Scenario 2', 'Scenario 3', 
                  'Scenario 4', 'Scenario 5', 'Scenario 6', 'Scenario 7', 'Scenario 8', 'Scenario 9']

# Loop through each scenario and compare to the historical data
for i in range(8):
    sim_yield = backtest_sim_df['Scenario ' + str(i)]
    rmse = np.sqrt(np.mean((sim_yield - df['Yield'])**2))
    mean_abs_error = np.mean(np.abs(sim_yield - df['Yield']))
    max_error = np.max(np.abs(sim_yield - df['Yield']))
    print(scenario_names[i] + ' Results:')
    print('RMSE: ' + str(rmse))
    print('Mean Absolute Error: ' + str(mean_abs_error))
    print('Max Error: ' + str(max_error))
    print('\n')

Scenario 0 Results:
RMSE: 2.3054351299212303
Mean Absolute Error: 2.1945984504334883
Max Error: 3.476807336630748


Scenario 1 Results:
RMSE: 2.303883383239814
Mean Absolute Error: 2.193274978889151
Max Error: 3.4723754516046723


Scenario 2 Results:
RMSE: 2.3049144031933744
Mean Absolute Error: 2.1943168657905514
Max Error: 3.47303562844975


Scenario 3 Results:
RMSE: 2.3032831893575474
Mean Absolute Error: 2.1926199808830753
Max Error: 3.472995630551389


Scenario 4 Results:
RMSE: 2.304754359989587
Mean Absolute Error: 2.1941364642819847
Max Error: 3.4737146207690714


Scenario 5 Results:
RMSE: 2.304036483837024
Mean Absolute Error: 2.193542934359333
Max Error: 3.4721606259388222


Scenario 6 Results:
RMSE: 2.3042053925259656
Mean Absolute Error: 2.1935787429304288
Max Error: 3.4733334552924857


Scenario 7 Results:
RMSE: 2.305648375884146
Mean Absolute Error: 2.1950551376684357
Max Error: 3.474633920719901




Based on the results, it appears that all scenarios have similar RMSE values ranging between 2.3020 to 2.3040, indicating that the simulated treasury yields are relatively close to the actual yields in the "Yield" column of the "df" dataframe.

The mean absolute error (MAE) values also range from 2.1914 to 2.1934, which further confirms that the simulated yields are reasonably accurate. However, it's important to note that these MAE values may seem small relative to the yield values, but they could still be significant if you're using them for financial analysis or forecasting.

The maximum error values range from 3.4682 to 3.4723, which means that there are some cases where the simulated yields differ from the actual yields by up to 3.47 percentage points. While this may not seem like a big difference, it could have a significant impact on financial analysis or forecasting.

Overall, the results suggest that the simulated treasury yields generated by the different scenarios are reasonably accurate, but there may still be room for improvement. 