# Exercise for PhD students and interested Master students

## Task (A): Getting Euro Area Yield Data

A.1 Load-in ECB yield from 2004 to 2019 and from 2019 to 2020; i.e. EuroArea_YC_upto2019.csv, EuroArea_YC_2020.csv

A.2 From the helper function Helper_ECBDataCleaning import ECB_Yields, convert read-in ECB yields to spot rates for maturities 3/12, 6/12, 1, 2, 3, 5, 7, 10, 20, 30 years.

A.3 Create a pandas dataframe called y_ecb. This dataframe contains spot rates from 2004 to 2020 for all maturities from A.2. Free up storage by removing variables that you do not need anymore.


## Task (B): PCA on Euro Area Spot Rates

B.1 Display the cumulative variance explained table for Euro area spot rates from A.3.

B.2 How much variance in Euro area spot rates is explained by the first principal component? How much is explained by the first two principal components?

B.3 Plot the time series of the two most influential principal components.


## Task (C): Forecasting PC1

C.1 Rely on the BIC criterium to learn the optimal lag structure in an AR(p) model for the most influential principal component (PC1). Hint: import statsmodels.tsa.ar_model 

C.2 Fit a AR(p) model with optimal lag structure from C.1 to PC1. Hint: you can use a package such as the fit function in ARMA package in the statsmodels.

C.3 Use C.2 to compute $E_T[PC1_{T+k}]$ for $k = [1,2,...,2520]$. 


## Task (D): Rotate PC1 Forecasts into Forecasts of all 10 Spot Rates

D.1 Use C.3 and compute k-period ahead forecasts for all 10 spot rates and for $k = [1,2,...,2520]$. 



In [1]:
import pandas as pd
import numpy as np
import time as time

import matplotlib
import matplotlib.pyplot       as plt
matplotlib.style.use('ggplot')
%matplotlib tk

## Solution A.1

In [2]:
y_ecb_raw_2019 = pd.read_csv('EuroArea_YC_upto2019.csv') #data up to 2019
y_ecb_raw_2020 = pd.read_csv('EuroArea_YC_2020.csv')     #2020 data

## Solution A.2

In [3]:
from ipynb.fs.defs.Helper_ECBDataCleaning import ECB_Yields

#initialize class
ECB_Yields_2019_ = ECB_Yields(y_ecb_raw_2019)
ECB_Yields_2020_ = ECB_Yields(y_ecb_raw_2020)

# maturities of interest
maturities = [3/12, 6/12, 1,2,3,5,7,10,20,30]

#extract spot rates
y_ecb_2019       = ECB_Yields_2019_.ExtractSpotRates(maturities)
y_ecb_2020       = ECB_Yields_2020_.ExtractSpotRates(maturities)

## Solution A.3

In [4]:
y_ecb = y_ecb_2019.append(y_ecb_2020)

#free up unneeded storage
del y_ecb_raw_2019
del y_ecb_raw_2020
del y_ecb_2019
del y_ecb_2020
del ECB_Yields_2019_
del ECB_Yields_2020_

In [5]:
y_ecb.head(1)

Unnamed: 0_level_0,0.25 Y,0.5 Y,1 Y,2 Y,3 Y,5 Y,7 Y,10 Y,20 Y,30 Y
TIME_PERIOD,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2004-09-06,2.001665,2.102528,2.297177,2.655494,2.971161,3.483732,3.86532,4.262767,4.853754,5.056951


## Solution B.1

In [6]:
from ipynb.fs.defs.Helper_PCA import PCA

#initialize class
PCA_ = PCA(y_ecb)

#run PCA
PCA_.PerformEigenValueDecomposition()

In [7]:
#Variance Explained Table
PCA_.display_cumVarTable()

[0.91872932 0.99452827 0.99803265 0.99950891 0.99983503 0.99998474
 0.99999672 0.99999986 0.99999999 1.        ]


## Solution B.2

1. One PC alone explains 92% of variation in all yields (inkl Greece, Italian debt)

2. 2 PCs explain entire euro-area YC

In [8]:
#plot PC1,PC2
PCA_.plotPC(2)

## Solution C.1

In [9]:
#packages
import statsmodels.tsa.ar_model as ar_model

#initialize AR class with PC1(y_ecb)
AR_model_ = ar_model.AR(PCA_.PC[:,0])

#BIC test for up to 10 lags
BIC_test = AR_model_.fit(method='mle', ic='bic', maxlags=10)

#optimal BIC lag length
BIC_test.k_ar

statsmodels.tsa.AR has been deprecated in favor of statsmodels.tsa.AutoReg and
statsmodels.tsa.SARIMAX.

AutoReg adds the ability to specify exogenous variables, include time trends,
and add seasonal dummies. The AutoReg API differs from AR since the model is
treated as immutable, and so the entire specification including the lag
length must be specified when creating the model. This change is too
substantial to incorporate into the existing AR api. The function
ar_select_order performs lag length selection for AutoReg models.

AutoReg only estimates parameters using conditional MLE (OLS). Use SARIMAX to
estimate ARX and related models using full MLE via the Kalman Filter.





2

## Solution C.2

In [10]:
#FIT AR(2) to PC1(y_ecb)
#packages
import statsmodels.api as sm

ar2_model = sm.tsa.ARMA(PCA_.PC[:,0], order=(2,0)) #AR(2) for PC1(y_ecb)
ar2_results = ar2_model.fit(method='mle')           #fit AR(2) to PC1(y_ecb) using MLE
print(ar2_results.summary())            # print regression table

                              ARMA Model Results                              
Dep. Variable:                      y   No. Observations:                 4067
Model:                     ARMA(2, 0)   Log Likelihood                3829.335
Method:                           mle   S.D. of innovations              0.094
Date:                Thu, 08 Feb 2024   AIC                          -7650.670
Time:                        16:32:16   BIC                          -7625.428
Sample:                             0   HQIC                         -7641.730
                                                                              
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.7748      4.174     -0.186      0.853      -8.955       7.405
ar.L1.y        1.1399      0.006    205.224      0.000       1.129       1.151
ar.L2.y       -0.1401      0.006    -25.226      0.0

## Solution C.3

In [11]:
#E_T[PC1(T+K)], K \in {1,2,...,2520}
h = 2520 #forecastHorizon
E_PC_h = ar2_results.forecast(steps=h)[0] 
E_PC_h = E_PC_h.reshape((E_PC_h.shape[0],1))

## Solution D.1 

In [12]:
#E_T[y_ecb(T+K)], K \in {1,2,...,h}

# mean: 1x10
mean_y_ecb = y_ecb.mean().to_numpy()
mean_y_ecb = mean_y_ecb.reshape((1,y_ecb.shape[1]))

# 
E_y_h = np.zeros((h,y_ecb.shape[1]))
for i in range(0,h):
    E_y_h[i,:] = mean_y_ecb + E_PC_h[i,0] * PCA_.E[:,0].T