<div style="font-family: Arial; text-align: center;">

## Yield Curve PCA Decomposition

#### Created by Kannan Singaravelu, CQF
#### Edited and PCA Projection amended by Dr. Richard Diamond

# Dimensionality Reduction

One of the main difficulties in today’s environment is being able to visualize data easily. There is too much information, too much news, and too much data. Dimensionality is the number of dimensions, features or input variables associated in a dataset and dimensionality reduction means reducing the number of features in a dataset.

**Dimensionality reduction algorithms** project high-dimensional data to a low-dimensional space while retaining as much of the variation as possible. There are two main approaches to dimensionality reduction.

- The first one is known as linear projection which involves linearly projecting data from a high-dimensional space to a low-dimensional space. This includes techniques such as principal component analysis (PCA).
- The second approach is known as manifold learning which is also referred to as nonlinear dimensionality reduction. This includes techniques such as Uniform manifold approximation and projection (UMAP).

Dimensionality reduction techniques help to address the curse of dimensionality.

# Principal Component

PCA is a linear dimensionality reduction techniqu where the algorithm finds a low-dimensional representation of the data while retaining as much of the variation as possible and help reduce the complexity.

The main concept behind the PCA is to consider the correlation among features. If the correlation is very high among a subset of the features, **PCA will attempt to combine the highly correlated features and represent this data with a smaller number of linearly uncorrelated features**. The algorithm keeps performing this correlation reduction, finding the directions of maximum variance in the original high-dimensional data and projecting them onto a smaller dimensional space. These newly derived components are known as principal components.

Investors often refer to movements in the yield curve in terms of three driving factors:

- Level
- Slope
- Curvature

PCA formalizes this viewpoint and allows us to evaluate when a segment of the yield curve has cheapened or richened beyond that prescribed by recent yield movements. The essence of PCA in the context of rates market is that most yield curve movements can be represented as a set of two to three independent driving factors – the principal components (PCs) – along with their relative weightings. And, with these components, it is possible to reconstruct the original features.

We'll apply PCA to the set of yield curves fitted using the HJM model as discussed during the lecture. The PCs are ordered so that the first PC is the most important in capturing variability in the yield curves, the second PC is next most important, and so on.

The most intuitive way of obtaining PCs is via eigenvalue decomposition of a covariance matrix. The covariance measures the central tendency and talks about deviation from the mean. Intuitively, PCs represent ways in which the forward rates making up a yield curve can deviate from their mean levels.



In [3]:
# Import libraries
import numpy as np
import pandas as pd

# Cufflinks library allows direct plotting of Plotly interactive charts from Dataframes
# Plot settings
import cufflinks as cf
cf.set_config_file(offline=True)

# Heatmap of covariance matrix
import plotly.graph_objs as go
from plotly.subplots import make_subplots

# scikit
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

pd.set_option('display.max_rows', 5000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)

## Datasets

In [4]:
data = pd.read_csv('./20240624_Py_hjm_pca_2002-07.csv', index_col=0, sep ='\t')

In [5]:
data.head()

Unnamed: 0,0.08,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0,10.5,11.0,11.5,12.0,12.5,13.0,13.5,14.0,14.5,15.0,15.5,16.0,16.5,17.0,17.5,18.0,18.5,19.0,19.5,20.0,20.5,21.0,21.5,22.0,22.5,23.0,23.5,24.0,24.5,25.0
1,5.77,6.44,6.71,6.65,6.5,6.33,6.15,5.99,5.84,5.71,5.57,5.44,5.3,5.16,5.01,4.86,4.71,4.55,4.39,4.24,4.09,3.94,3.81,3.68,3.57,3.46,3.37,3.29,3.23,3.18,3.15,3.13,3.12,3.12,3.13,3.16,3.19,3.22,3.27,3.31,3.36,3.42,3.48,3.54,3.6,3.66,3.73,3.79,3.86,3.92,3.99
2,5.77,6.45,6.75,6.68,6.54,6.39,6.23,6.08,5.95,5.82,5.69,5.56,5.43,5.28,5.13,4.97,4.8,4.63,4.46,4.29,4.13,3.97,3.82,3.68,3.55,3.44,3.33,3.25,3.18,3.12,3.08,3.06,3.05,3.05,3.06,3.09,3.12,3.16,3.21,3.26,3.32,3.38,3.44,3.51,3.58,3.65,3.72,3.8,3.87,3.95,4.02
3,5.78,6.44,6.74,6.68,6.56,6.41,6.26,6.12,5.98,5.84,5.71,5.57,5.43,5.28,5.12,4.96,4.79,4.62,4.45,4.28,4.11,3.95,3.8,3.66,3.53,3.41,3.31,3.22,3.14,3.08,3.04,3.01,2.99,2.99,3.0,3.02,3.04,3.08,3.12,3.16,3.22,3.27,3.33,3.39,3.45,3.52,3.59,3.65,3.72,3.79,3.86
4,5.74,6.41,6.69,6.62,6.49,6.35,6.2,6.06,5.93,5.79,5.66,5.52,5.38,5.23,5.07,4.91,4.74,4.57,4.4,4.23,4.06,3.91,3.75,3.61,3.48,3.36,3.25,3.15,3.07,3.01,2.96,2.92,2.9,2.89,2.89,2.9,2.92,2.95,2.99,3.03,3.08,3.13,3.19,3.25,3.31,3.37,3.44,3.51,3.58,3.65,3.72
5,5.74,6.4,6.64,6.55,6.42,6.27,6.13,5.98,5.85,5.72,5.58,5.44,5.3,5.15,5.0,4.83,4.67,4.5,4.33,4.17,4.0,3.85,3.7,3.56,3.42,3.3,3.19,3.09,3.01,2.94,2.89,2.85,2.82,2.8,2.8,2.8,2.82,2.84,2.88,2.92,2.96,3.01,3.07,3.12,3.19,3.25,3.32,3.38,3.45,3.53,3.6


In [6]:
data.shape

(1264, 51)

Representation of a yield curve as 50 forward rates. As the yield curve evolves over time, each forward rate can change. It is understood that adjacent points on the yield curve do not move independently. PCA is a method for identifying the dominant ways in which various points on the yield curve move together.

PCA allows us to take a set of yield curves, process them using standard mathematical methods, and then define a reduced form model for the yield curve. This reduced form model retains only a small number of principal components (PCs) but can reproduce the vast majority of yield curves that the full structural model could. This reduced model has fewer sources of uncertainty (i.e. dimensions) than if the 50 points of the yield curve were modelled independently.

## Plot curves

In [7]:
# Plot curve
data.iloc[0].iplot(title = 'Representation of a Yield Curve')

In [8]:
# Plot all curves
data.iloc[:,].T.iplot(title='Daily Yield Curves')

We'll now produce the volatility chart by taking the first difference (scaling) and calculating historical variance by each individual maturity.

In [9]:
diff_ = data.diff(-1)
diff_.dropna(inplace=True)
diff_.tail()

Unnamed: 0,0.08,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0,10.5,11.0,11.5,12.0,12.5,13.0,13.5,14.0,14.5,15.0,15.5,16.0,16.5,17.0,17.5,18.0,18.5,19.0,19.5,20.0,20.5,21.0,21.5,22.0,22.5,23.0,23.5,24.0,24.5,25.0
1259,0.0,0.03,0.04,0.03,0.02,0.02,0.01,0.01,0.0,0.0,0.0,0.0,-0.01,0.0,-0.01,0.0,0.0,0.0,0.0,0.0,0.0,-0.01,0.0,0.0,-0.01,0.0,0.0,0.0,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,-0.02,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,0.0,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01
1260,0.02,0.01,0.0,0.0,0.0,-0.01,-0.01,-0.01,0.0,-0.01,-0.01,-0.01,0.0,0.0,0.0,-0.01,-0.01,-0.01,-0.01,0.0,-0.01,0.0,0.0,0.0,0.0,-0.01,-0.01,-0.01,0.0,0.0,-0.01,0.0,-0.01,0.0,-0.01,0.0,0.0,-0.01,-0.01,-0.01,-0.01,0.0,0.0,-0.01,-0.01,0.0,0.0,0.0,0.0,0.0,0.0
1261,-0.01,-0.03,-0.08,-0.12,-0.13,-0.13,-0.13,-0.13,-0.14,-0.13,-0.14,-0.14,-0.14,-0.14,-0.14,-0.14,-0.13,-0.14,-0.13,-0.14,-0.13,-0.13,-0.13,-0.13,-0.12,-0.11,-0.11,-0.11,-0.11,-0.11,-0.1,-0.1,-0.1,-0.1,-0.1,-0.1,-0.1,-0.1,-0.09,-0.09,-0.09,-0.1,-0.1,-0.1,-0.09,-0.1,-0.1,-0.1,-0.1,-0.1,-0.1
1262,0.0,0.0,0.01,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.02,0.01,0.02,0.02,0.01,0.01,0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.0
1263,0.02,0.0,0.03,0.03,0.04,0.04,0.05,0.06,0.06,0.06,0.07,0.07,0.06,0.05,0.05,0.05,0.04,0.04,0.03,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.03,0.02,0.02,0.03,0.03,0.03,0.04,0.04,0.04,0.04,0.04,0.04,0.05,0.05,0.04,0.05,0.04,0.05,0.05,0.05,0.05,0.05,0.05,0.06


In [10]:
diff_.shape

(1263, 51)

## Derive Volatility
The drift of forward rate is fully determined by volatility of forward rate dynamics.

In [11]:
vol = np.std(diff_, axis=0) * 10000

In [12]:
vol[:].iplot(title='Volatility of daily UK government yields', xTitle='Tenor', yTitle='Volatility (bps)',
         color='cornflowerblue')

The above volatility plot is of the averaged values, but we can see that different parts of the yield curve move differently. As you can see volatility is very significant, especially at the shorter end of the curve. This means that 1-year and 2-year rates seems to move up and down a lot as compared to other tenors.

It is never all up or all down and PCA help us figure out exactly what is going. Covariance of daily changes shows dependency of different rates. Principal components can be calculated by finding the eigenvalues and eigenvectors of this covariance matrix of below.

PCA decomposes the volatility.

## Calculate Covariance

In [13]:
cov_= pd.DataFrame(np.cov(diff_, rowvar=False)*252/10000, columns=diff_.columns, index=diff_.columns)
cov_.style.format("{:.4%}")

Unnamed: 0,0.08,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0,10.5,11.0,11.5,12.0,12.5,13.0,13.5,14.0,14.5,15.0,15.5,16.0,16.5,17.0,17.5,18.0,18.5,19.0,19.5,20.0,20.5,21.0,21.5,22.0,22.5,23.0,23.5,24.0,24.5,25.0
0.08,0.0040%,0.0009%,0.0002%,-0.0001%,-0.0001%,-0.0000%,0.0001%,0.0001%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0002%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0000%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%,0.0001%
0.5,0.0009%,0.0063%,0.0055%,0.0041%,0.0035%,0.0033%,0.0031%,0.0029%,0.0028%,0.0027%,0.0026%,0.0025%,0.0024%,0.0022%,0.0021%,0.0020%,0.0019%,0.0018%,0.0017%,0.0016%,0.0015%,0.0014%,0.0013%,0.0012%,0.0011%,0.0011%,0.0010%,0.0009%,0.0009%,0.0008%,0.0008%,0.0008%,0.0007%,0.0008%,0.0008%,0.0008%,0.0008%,0.0008%,0.0008%,0.0009%,0.0009%,0.0009%,0.0010%,0.0010%,0.0010%,0.0011%,0.0012%,0.0012%,0.0012%,0.0013%,0.0013%
1.0,0.0002%,0.0055%,0.0082%,0.0077%,0.0068%,0.0061%,0.0056%,0.0052%,0.0048%,0.0045%,0.0042%,0.0040%,0.0038%,0.0036%,0.0035%,0.0033%,0.0032%,0.0031%,0.0029%,0.0028%,0.0027%,0.0026%,0.0025%,0.0023%,0.0022%,0.0021%,0.0020%,0.0020%,0.0019%,0.0018%,0.0018%,0.0017%,0.0017%,0.0017%,0.0017%,0.0017%,0.0017%,0.0017%,0.0017%,0.0018%,0.0018%,0.0018%,0.0019%,0.0019%,0.0020%,0.0021%,0.0021%,0.0022%,0.0022%,0.0023%,0.0024%
1.5,-0.0001%,0.0041%,0.0077%,0.0082%,0.0075%,0.0069%,0.0063%,0.0058%,0.0055%,0.0051%,0.0049%,0.0046%,0.0044%,0.0042%,0.0041%,0.0039%,0.0038%,0.0036%,0.0035%,0.0034%,0.0032%,0.0031%,0.0029%,0.0028%,0.0027%,0.0026%,0.0025%,0.0024%,0.0023%,0.0022%,0.0022%,0.0021%,0.0021%,0.0021%,0.0021%,0.0021%,0.0021%,0.0021%,0.0021%,0.0022%,0.0022%,0.0022%,0.0023%,0.0023%,0.0025%,0.0025%,0.0026%,0.0026%,0.0027%,0.0027%,0.0028%
2.0,-0.0001%,0.0035%,0.0068%,0.0075%,0.0072%,0.0067%,0.0063%,0.0059%,0.0056%,0.0054%,0.0051%,0.0049%,0.0047%,0.0046%,0.0044%,0.0043%,0.0041%,0.0039%,0.0038%,0.0036%,0.0035%,0.0033%,0.0031%,0.0030%,0.0029%,0.0027%,0.0026%,0.0025%,0.0024%,0.0023%,0.0023%,0.0022%,0.0022%,0.0022%,0.0022%,0.0022%,0.0022%,0.0022%,0.0022%,0.0023%,0.0023%,0.0024%,0.0025%,0.0025%,0.0026%,0.0026%,0.0027%,0.0028%,0.0028%,0.0029%,0.0030%
2.5,-0.0000%,0.0033%,0.0061%,0.0069%,0.0067%,0.0065%,0.0062%,0.0060%,0.0058%,0.0055%,0.0054%,0.0052%,0.0051%,0.0049%,0.0048%,0.0046%,0.0044%,0.0042%,0.0041%,0.0039%,0.0037%,0.0035%,0.0034%,0.0032%,0.0030%,0.0029%,0.0028%,0.0027%,0.0026%,0.0025%,0.0024%,0.0023%,0.0023%,0.0023%,0.0023%,0.0023%,0.0023%,0.0023%,0.0024%,0.0024%,0.0024%,0.0025%,0.0026%,0.0026%,0.0027%,0.0028%,0.0028%,0.0029%,0.0030%,0.0031%,0.0032%
3.0,0.0001%,0.0031%,0.0056%,0.0063%,0.0063%,0.0062%,0.0061%,0.0060%,0.0058%,0.0057%,0.0056%,0.0054%,0.0053%,0.0052%,0.0051%,0.0049%,0.0047%,0.0045%,0.0043%,0.0042%,0.0040%,0.0037%,0.0036%,0.0033%,0.0032%,0.0030%,0.0029%,0.0028%,0.0026%,0.0025%,0.0025%,0.0024%,0.0024%,0.0024%,0.0023%,0.0024%,0.0024%,0.0024%,0.0024%,0.0025%,0.0025%,0.0026%,0.0027%,0.0027%,0.0028%,0.0029%,0.0029%,0.0030%,0.0031%,0.0032%,0.0033%
3.5,0.0001%,0.0029%,0.0052%,0.0058%,0.0059%,0.0060%,0.0060%,0.0060%,0.0059%,0.0058%,0.0058%,0.0057%,0.0056%,0.0055%,0.0054%,0.0052%,0.0050%,0.0048%,0.0046%,0.0044%,0.0042%,0.0040%,0.0038%,0.0035%,0.0033%,0.0032%,0.0030%,0.0029%,0.0028%,0.0026%,0.0026%,0.0025%,0.0024%,0.0025%,0.0024%,0.0025%,0.0024%,0.0025%,0.0025%,0.0026%,0.0026%,0.0027%,0.0028%,0.0028%,0.0029%,0.0030%,0.0031%,0.0032%,0.0032%,0.0033%,0.0034%
4.0,0.0002%,0.0028%,0.0048%,0.0055%,0.0056%,0.0058%,0.0058%,0.0059%,0.0059%,0.0059%,0.0059%,0.0058%,0.0058%,0.0057%,0.0056%,0.0055%,0.0053%,0.0051%,0.0049%,0.0047%,0.0044%,0.0042%,0.0040%,0.0037%,0.0035%,0.0033%,0.0031%,0.0030%,0.0029%,0.0027%,0.0027%,0.0026%,0.0025%,0.0025%,0.0025%,0.0025%,0.0025%,0.0026%,0.0026%,0.0026%,0.0027%,0.0028%,0.0029%,0.0029%,0.0030%,0.0031%,0.0032%,0.0033%,0.0033%,0.0034%,0.0035%
4.5,0.0002%,0.0027%,0.0045%,0.0051%,0.0054%,0.0055%,0.0057%,0.0058%,0.0059%,0.0060%,0.0060%,0.0060%,0.0060%,0.0059%,0.0058%,0.0057%,0.0055%,0.0053%,0.0051%,0.0049%,0.0046%,0.0044%,0.0042%,0.0039%,0.0037%,0.0035%,0.0033%,0.0031%,0.0030%,0.0029%,0.0028%,0.0027%,0.0026%,0.0026%,0.0026%,0.0026%,0.0026%,0.0027%,0.0027%,0.0027%,0.0028%,0.0028%,0.0030%,0.0030%,0.0031%,0.0032%,0.0033%,0.0034%,0.0035%,0.0036%,0.0037%


In [21]:
# Heatmap appropirate for Covariance Matrix
fig_matrix = go.Figure(data=go.Heatmap(z=cov_, colorscale='Viridis'))
fig_matrix.update_layout(title='Covariance Matrix Heatmap')
fig_matrix.show()

In [23]:
# 3D Surface Plot with larger dimensions
x, y = np.meshgrid(cov_.columns, cov_.index)
fig_surface = make_subplots(rows=1, cols=1, specs=[[{'type': 'surface'}]])
fig_surface.add_trace(go.Surface(z=cov_.values, x=x, y=y, colorscale='Viridis'))

# Update layout for larger dimensions
fig_surface.update_layout(title='Covariance 3D Surface Plot (rotate)',
                          scene=dict(
                              xaxis=dict(title='X Axis'),
                              yaxis=dict(title='Y Axis'),
                              zaxis=dict(title='Z Axis'),
                          ),
                          width=800,  # Adjust width as needed
                          height=600  # Adjust height as needed
                          )

# Show the plot
fig_surface.show()

# Observation: if we remove the 0.08 tenor (where covariance peaks),
# we are likely to have better behavour of Covariance Matrix

## Eigen Decomposition for Singular Value Decomposition

In [24]:
# Perform eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_)

# Sort values (good practice)
idx = eigenvalues.argsort()[::-1]   
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:,idx]

# Format into a DataFrame 
df_eigval = pd.DataFrame({"Eigenvalues": eigenvalues})

df_eigval.head()

Unnamed: 0,Eigenvalues
0,0.002029
1,0.000463
2,0.000163
3,8.5e-05
4,5.1e-05


In [26]:
eigenvalues

array([2.02898049e-03, 4.63398406e-04, 1.63446845e-04, 8.51547101e-05,
       5.10538526e-05, 3.32765289e-05, 1.58231855e-05, 4.49832087e-06,
       1.94407432e-06, 8.99455051e-07, 6.04790270e-07, 5.90792253e-07,
       5.89198637e-07, 5.57023543e-07, 5.55577838e-07, 5.37017622e-07,
       5.25225242e-07, 5.09484922e-07, 5.02130032e-07, 4.95037888e-07,
       4.85536393e-07, 4.74757652e-07, 4.66830631e-07, 4.56358980e-07,
       4.53910470e-07, 4.45678829e-07, 4.35704316e-07, 4.34084479e-07,
       4.26484963e-07, 4.13347804e-07, 4.01916308e-07, 3.97702101e-07,
       3.90292851e-07, 3.86498129e-07, 3.76760528e-07, 3.73179456e-07,
       3.63351112e-07, 3.57997757e-07, 3.48773694e-07, 3.42142905e-07,
       3.35540502e-07, 3.27434287e-07, 3.20549997e-07, 3.13802097e-07,
       3.06870950e-07, 3.04664148e-07, 2.99586146e-07, 2.88553566e-07,
       2.83944056e-07, 2.67537628e-07, 2.48780504e-07])

## Explained Variance $R^2$ as Sum of Eigenvalues

In [27]:
# Work out explained proportion 
df_eigval["Explained proportion of Var"] = df_eigval["Eigenvalues"] / np.sum(df_eigval["Eigenvalues"])
df_eigval = df_eigval[:10]

#Format as percentage
df_eigval.style.format({"Explained proportion of Var": "{:.2%}"})

Unnamed: 0,Eigenvalues,Explained proportion of Var
0,0.002029,70.81%
1,0.000463,16.17%
2,0.000163,5.70%
3,8.5e-05,2.97%
4,5.1e-05,1.78%
5,3.3e-05,1.16%
6,1.6e-05,0.55%
7,4e-06,0.16%
8,2e-06,0.07%
9,1e-06,0.03%


In [28]:
(df_eigval["Explained proportion of Var"][:10]*100).iplot(kind='bar',
                                                          title='Percentage of overall variance explained',
                                                          color='cornflowerblue')

## Visualize PCs

In [30]:
# Subsume first 3 components into a dataframe
pcadf = pd.DataFrame(eigenvectors[:,0:3], columns=['PC1','PC2','PC3'], index=data.columns)
pcadf[:10]

Unnamed: 0,PC1,PC2,PC3
0.08,0.004091,-0.008275,0.000235
0.5,0.056204,-0.161934,-0.271539
1.0,0.101034,-0.239236,-0.401805
1.5,0.116817,-0.243675,-0.357226
2.0,0.121388,-0.235475,-0.275176
2.5,0.12589,-0.226757,-0.195816
3.0,0.129107,-0.219537,-0.123907
3.5,0.133088,-0.211509,-0.062428
4.0,0.136317,-0.204675,-0.007698
4.5,0.139725,-0.197136,0.041132


In [35]:
pcadf.iplot(title='First 3 Principal Components for Forward Curve (HJM Lecture) UNSCALED',
            secondary_y='PC1',
            secondary_y_title='PC1', 
            yTitle='change in yield (bps)')

One of the key interpretations of PCA as applied to interest rates are the components of the yield curve. We can attribute the first three principal components to

- PC1: Parallel shifts in yield curve (shifts across the entire yield curve)
- PC2: Changes in short/long rates (steepening/flattening of the curve)
- PC3: Changes in curvature of the model (twists)

The **first PC** represents the situation that all forward rates in the yield curve move in the same direction but points around the 15 year term move more than points at the shorter or longer parts of the yield curve. This corresponds to a general rise (or fall) of all of the forward rates in the yield curve, but cannot be called a uniform or parallel shift. The impact of the first PC can be easily observed amongst the yield curves as it contributes more than 71% of the variability.

The **second PC** represents situations in which the short end of the yield curve moves up at the same time as the long end moves down, or vice versa. This is often described as a tilt in the yield curve, although in practice there is more subtle definition to the shape. This reflects the particular yield curves that were used for the analysis, as well as the structural model and calibration that were used to create them. In this example, the influence of the second PC accounts for about 16.27% of the variability in the yield curves.

The **third PC** is further interpreted as a higher order buckling in which the short end and long end move up at the same time as a region of medium term rates move down, or vice versa. In this particular example, this type of movement is only responsible for about 5.75% of the variability.

Having identified the most important factors, we can use their functional form to predict the most likely evolution of the yeild curve. Thus, a simple linear regression is fitted for the shift factor as it simply moves the curve up and down. Second degree polynomial is fitted for the tilt factor and higher degree can approximate flexing. 

Thus, yield curve can be approximated by linear combination of first three loadings.

---

# UK Government Bond Rates

The purpose of applying PCA to financial markets is to explain the price changes of different assets through a smaller set of factors. This is achieved via the dimensionality reduction of the observations where we pick meaningful factors (among many) explaining the most of the price changes. We'll now apply the principal component analysis to UK government bond spot rates from 0.5 years up to 10 years to maturity.

We'll adopt how two methods to decompose the yield curve (SVD of covariance matrix using two Python functionalities): 
- `numpy.linalg`
- `sklearn.PCA`

We will remember to scale the data in both implementations.

Gilts Curve MONTHLY 1970 to 2015

- 20240624_Py_gilts_spot_1970-2015.xlsx has MONTHLY spot curves for Government Liability Curve (GLC), stripped from UK Treasury Gilts. Excel sheet "4. spot curve"
Period is much longer from January 1970 (now the oldest curves in top lines) to December 2015;

- we limit our analysis to $[1Y, 10.5Y]$ chunk of the spot give. The likely implication is we end up with limited usefulness of PCA and a very strong PC1;

- looking into data, the first column at tenor 0.5 has a lot of missing values. With MONTHLY frequency that would be lot of monthly curves thrown out of analysis, and particularly as we cut the curve to the front end.

## Method 1: Eigen Decomposition

In [39]:
# Import Bank of England spot curve data from excel
df = pd.read_excel("./20240624_Py_gilts_spot_1970-2015.xlsx", 
                   index_col=0, header=3, sheet_name="4. spot curve", skiprows=[4])

# Select all of the data up to 10 years
df = df.iloc[:, 1:21]  # skip first column at 0.5 because values are all NaN

df.head()

Unnamed: 0_level_0,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0,10.5
years:,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1970-01-31,8.635354,8.70743,8.700727,8.664049,8.618702,8.572477,8.528372,8.487617,8.450611,8.417442,8.388098,8.362503,8.340549,8.322116,8.307105,8.295429,8.287013,8.281788,8.279691,8.280665
1970-02-28,8.413131,8.397269,8.370748,8.337633,8.30159,8.265403,8.230804,8.198713,8.169617,8.143742,8.121153,8.10181,8.085616,8.072457,8.062236,8.054864,8.050261,8.048354,8.049074,8.052353
1970-03-31,7.744187,7.782761,7.795017,7.793104,7.784963,7.775288,7.766459,7.759564,7.755068,7.753158,7.753877,7.757181,7.762973,7.771153,7.781635,7.794347,7.809221,7.826197,7.84522,7.866238
1970-04-30,7.606512,7.864352,7.973522,8.002442,7.992813,7.967524,7.938335,7.911422,7.890054,7.875751,7.868985,7.869583,7.877024,7.890789,7.910452,7.935656,7.966093,8.00149,8.041602,8.086202
1970-05-31,7.391107,7.735838,7.862182,7.87751,7.840673,7.782249,7.718053,7.656856,7.603548,7.560502,7.528577,7.507706,7.497355,7.496983,7.506125,7.524371,7.551351,7.586723,7.630168,7.681382


In [40]:
# Drop nan values
df = df.dropna(how="any")
df.shape

(550, 20)

In [41]:
# Standarized data
scaler = StandardScaler()
scaler.fit(df)

df_std = pd.DataFrame(scaler.transform(df), columns=df.columns)
df_std.head()

Unnamed: 0,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0,10.5
0,0.438865,0.440632,0.418813,0.390185,0.360774,0.33232,0.305297,0.279832,0.255942,0.233643,0.212977,0.193991,0.176715,0.161164,0.147336,0.135218,0.124781,0.115988,0.108793,0.103146
1,0.381957,0.360305,0.332602,0.304217,0.276632,0.250291,0.225334,0.201798,0.179716,0.159129,0.140089,0.12264,0.106806,0.092595,0.079998,0.068996,0.059556,0.051636,0.045186,0.040154
2,0.210651,0.201157,0.182185,0.160805,0.139552,0.119366,0.100553,0.083182,0.06726,0.052794,0.039806,0.028317,0.01833,0.009841,0.002834,-0.002718,-0.006851,-0.009608,-0.011039,-0.011196
3,0.175394,0.222287,0.228822,0.215938,0.194702,0.170718,0.14674,0.1242,0.103878,0.08617,0.071236,0.059081,0.049606,0.042699,0.038258,0.036181,0.036367,0.038716,0.043125,0.049493
4,0.120232,0.189004,0.199733,0.183035,0.154334,0.121225,0.087545,0.05544,0.026157,0.000344,-0.021711,-0.039964,-0.054509,-0.06546,-0.072931,-0.077037,-0.077895,-0.075626,-0.070353,-0.062199


## Covariance Matrix (Scaled Data)

In [43]:
# Create a covariance matrix 
cov_matrix_array = np.cov(df_std, rowvar=False)
cov_df1 = pd.DataFrame(cov_matrix_array, columns=df.columns , index =df.columns ) #, index=range(1,21), columns=range(1,21))
cov_df1.style.format("{:.4}")

Unnamed: 0,1.000000,1.500000,2.000000,2.500000,3.000000,3.500000,4.000000,4.500000,5.000000,5.500000,6.000000,6.500000,7.000000,7.500000,8.000000,8.500000,9.000000,9.500000,10.000000,10.500000
1.0,1.002,0.9998,0.9958,0.9912,0.9866,0.9822,0.9779,0.9739,0.9699,0.9659,0.9619,0.9578,0.9537,0.9494,0.9451,0.9407,0.9362,0.9315,0.9267,0.9218
1.5,0.9998,1.002,1.001,0.9982,0.9951,0.9919,0.9886,0.9853,0.982,0.9786,0.9751,0.9716,0.9679,0.9641,0.9602,0.9562,0.952,0.9478,0.9434,0.9389
2.0,0.9958,1.001,1.002,1.001,0.9994,0.9973,0.995,0.9924,0.9897,0.9869,0.9839,0.9808,0.9775,0.9742,0.9706,0.967,0.9632,0.9593,0.9553,0.9511
2.5,0.9912,0.9982,1.001,1.002,1.001,1.0,0.9986,0.9967,0.9946,0.9923,0.9898,0.9871,0.9842,0.9812,0.978,0.9747,0.9712,0.9677,0.9639,0.9601
3.0,0.9866,0.9951,0.9994,1.001,1.002,1.001,1.001,0.9993,0.9977,0.9959,0.9938,0.9915,0.989,0.9863,0.9834,0.9804,0.9773,0.974,0.9705,0.967
3.5,0.9822,0.9919,0.9973,1.0,1.001,1.002,1.002,1.001,0.9997,0.9983,0.9966,0.9947,0.9925,0.9902,0.9876,0.9849,0.982,0.979,0.9759,0.9725
4.0,0.9779,0.9886,0.995,0.9986,1.001,1.002,1.002,1.002,1.001,1.0,0.9987,0.9971,0.9953,0.9932,0.991,0.9886,0.986,0.9832,0.9803,0.9772
4.5,0.9739,0.9853,0.9924,0.9967,0.9993,1.001,1.002,1.002,1.002,1.001,1.0,0.9989,0.9974,0.9957,0.9938,0.9917,0.9893,0.9868,0.9842,0.9813
5.0,0.9699,0.982,0.9897,0.9946,0.9977,0.9997,1.001,1.002,1.002,1.002,1.001,1.0,0.9991,0.9977,0.9961,0.9942,0.9922,0.9899,0.9875,0.9849
5.5,0.9659,0.9786,0.9869,0.9923,0.9959,0.9983,1.0,1.001,1.002,1.002,1.002,1.001,1.0,0.9993,0.9979,0.9964,0.9946,0.9926,0.9904,0.9881


In [44]:
# 3D Surface Plot with larger dimensions
x, y = np.meshgrid(cov_df1.columns, cov_df1.index)
fig_surface = make_subplots(rows=1, cols=1, specs=[[{'type': 'surface'}]])
fig_surface.add_trace(go.Surface(z=cov_df1.values, x=x, y=y, colorscale='Viridis'))

# Update layout for larger dimensions
fig_surface.update_layout(title='Covariance 3D Surface Plot (rotate)',
                          scene=dict(
                              xaxis=dict(title='X Axis'),
                              yaxis=dict(title='Y Axis'),
                              zaxis=dict(title='Z Axis'),
                          ),
                          width=800,  # Adjust width as needed
                          height=600  # Adjust height as needed
                          )

# Show the plot
fig_surface.show()

# Observation: we have ended up with very robust covariance matrix, devoid of noise. High correlations.

## Eigen Decomposition for Singular Value Decomposition

In [45]:
# Perform eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix_array)

# Sort values (good practice)
idx = eigenvalues.argsort()[::-1]   
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:,idx]

# Format into a DataFrame 
df_eigval = pd.DataFrame({"Eigenvalues": eigenvalues}) #, index=range(1,21))

pd.DataFrame(eigenvalues, columns=['Eigenvalue'])

Unnamed: 0,Eigenvalue
0,19.75368
1,0.2636185
2,0.01664724
3,0.0020971
4,0.0003610489
5,1.986134e-05
6,2.050859e-06
7,1.577753e-07
8,2.18653e-08
9,5.242522e-09


In [47]:
# Format into a DataFrame 
df_eigvec = pd.DataFrame(eigenvectors) #, index=range(1,21))

pd.DataFrame(eigenvectors[:,0], columns=['The Vector of the 1st Princial Component']) # Only PC1 is of relevance

Unnamed: 0,The Vector of the 1st Princial Component
0,0.218112
1,0.220712
2,0.222348
3,0.223377
4,0.224043
5,0.224486
6,0.224782
7,0.224972
8,0.225074
9,0.225099


In [48]:
# Work out explained proportion 
df_eigval["Explained proportion of Var"] = df_eigval["Eigenvalues"] / np.sum(df_eigval["Eigenvalues"])

#Format as percentage
df_eigval.style.format({"Explained proportion of Var": "{:.2%}"})

Unnamed: 0,Eigenvalues,Explained proportion of Var
0,19.753684,98.59%
1,0.263619,1.32%
2,0.016647,0.08%
3,0.002097,0.01%
4,0.000361,0.00%
5,2e-05,0.00%
6,2e-06,0.00%
7,0.0,0.00%
8,0.0,0.00%
9,0.0,0.00%


In [49]:
(df_eigval['Explained proportion of Var'][:10]*100).iplot(kind='bar',
                                                          title='Percentage of overall variance explained',
                                                          color='cornflowerblue')

## Visualize PCs

In [51]:
# Subsume first 3 components into a dataframe
pcdf = pd.DataFrame(eigenvectors[:,0:3], columns=['PC1','PC2','PC3'])
pcdf

Unnamed: 0,PC1,PC2,PC3
0,0.218112,0.463225,-0.551906
1,0.220712,0.380888,-0.272863
2,0.222348,0.307708,-0.05987
3,0.223377,0.24491,0.088958
4,0.224043,0.190317,0.185746
5,0.224486,0.141776,0.241599
6,0.224782,0.097657,0.265486
7,0.224972,0.056822,0.264741
8,0.225074,0.018513,0.245486
9,0.225099,-0.01777,0.212737


In [54]:
pcdf.iplot(title='Principal Components for Gilts Curve (unscaled) with First 3 Principal Components',
           secondary_y='PC1', secondary_y_title='PC1')

## Singular Value Decomposition using Sklearn PCA

In [55]:
# Scale and fit the model in a pipeline
pipe = Pipeline([("scaler", StandardScaler()), ("pca", PCA())]) 
pipe.fit(df)

In [56]:
# eigenvectors
pipe['pca'].components_[0]

array([0.218112  , 0.22071219, 0.22234786, 0.22337726, 0.22404304,
       0.22448583, 0.22478226, 0.2249718 , 0.225074  , 0.22509903,
       0.22505332, 0.22494209, 0.22477012, 0.22454186, 0.22426118,
       0.2239312 , 0.22355415, 0.22313142, 0.22266358, 0.22215056])

In [57]:
# eigen values
pipe['pca'].explained_variance_

array([1.97536839e+01, 2.63618514e-01, 1.66472447e-02, 2.09709989e-03,
       3.61048910e-04, 1.98613387e-05, 2.05085861e-06, 1.57775293e-07,
       2.18653003e-08, 5.24252218e-09, 1.03925786e-09, 2.41743126e-10,
       6.16971437e-11, 1.56640092e-11, 6.87519938e-12, 2.16117196e-12,
       7.60306410e-13, 2.01339145e-13, 3.27554585e-14, 7.35673247e-15])

In [58]:
# eigen values proportion
pipe['pca'].explained_variance_ratio_

array([9.85888404e-01, 1.31569604e-02, 8.30848850e-04, 1.04664349e-04,
       1.80196228e-05, 9.91261358e-07, 1.02356489e-07, 7.87442145e-09,
       1.09127726e-09, 2.61649516e-10, 5.18684150e-11, 1.20651796e-11,
       3.07924835e-12, 7.81776461e-13, 3.43134951e-13, 1.07862128e-13,
       3.79462017e-14, 1.00486537e-14, 1.63479515e-15, 3.67167830e-16])

In [59]:
df2 = pd.DataFrame({'Eigenvalues': pipe['pca'].explained_variance_,
                    'Explained proportion in Var': pipe['pca'].explained_variance_ratio_})
#Format as percentage
df2.style.format({"Explained proportion in Var": "{:.2%}"})

Unnamed: 0,Eigenvalues,Explained proportion in Var
0,19.753684,98.59%
1,0.263619,1.32%
2,0.016647,0.08%
3,0.002097,0.01%
4,0.000361,0.00%
5,2e-05,0.00%
6,2e-06,0.00%
7,0.0,0.00%
8,0.0,0.00%
9,0.0,0.00%


# PCA Projections

Dot product operation effectively applies the linear transformation represented by the eigenvectors **to each row** of our original data, providing a new representation of the data in the space defined by the principal components.

Take a single row of curves dataset (forward or spot) as a vector **f** with dimensions $(1,20$, which is 1 row x 20 columns.

The projection of **f** onto the principal components is computed with the eigenvectors matrix **V** with dimensions $(20,3)$ -- eigenvectors are in columns. Matrix **V** will be in transposed position with regard to the data row **f**.

$$\mathbf{f}_{projected} = \mathbf{f} ⋅ \mathbf{V}$$

The dot product is calculated as follows:

$$\mathbf{f}_{projected} = \sum_{j=1}^{20} f_j*\mathbf{V}_{ji},$$
where $f_j$ is the $j$-th element (tenor) of curve row **f**, $\mathbf{V}_{ji}$ is the $j$-th component of the $i$-th eigenvector.

### Resulting table is not the dataset of alternative curves! Its columns are projections, not evolution of rates at specific tenors.

In [75]:
# Dot product below 'projects' principal components, onto the scaled dataframe df1 (tenors x curves)

df_std_projections = df_std.dot(eigenvectors)
# all 20 eigenvectors preserved for dot product to work, and return Nrows the same as data

df_std_projections.index = df.index
df_std_projections.head(10)

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
years:,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1970-01-31,1.102175,0.509879,0.018864,0.055784,0.005776,-0.005033,-0.000456,6.2e-05,-5.1e-05,-4.695128e-06,-4.300898e-05,8.731762e-06,7.927898e-07,-1.504709e-06,7.339142e-07,-5.933358e-07,-6.095368e-07,2.991323e-08,9.971778e-08,7.275981e-09
1970-02-28,0.776345,0.488407,-0.003268,0.03949,-0.00419,-0.001963,-0.000825,0.000148,1.2e-05,2.925997e-05,-5.984062e-06,-7.25569e-07,1.335223e-06,1.340096e-07,1.602598e-06,-1.964802e-07,-3.993696e-07,-9.940089e-09,8.593053e-08,2.670464e-09
1970-03-31,0.306236,0.330603,-0.015749,0.042838,-0.001795,-0.002541,-0.000854,4.1e-05,1.8e-05,3.099402e-05,-3.765539e-06,-2.981874e-06,-4.520448e-07,-1.079784e-06,9.735573e-07,-4.049635e-07,-3.589958e-07,-3.645443e-08,7.534482e-08,-2.276431e-09
1970-04-30,0.476186,0.291168,0.021121,0.109716,0.010959,-0.002292,-0.001483,7.1e-05,-0.00012,5.079726e-05,2.584038e-05,-8.546609e-06,-9.417729e-06,-5.971019e-06,4.465502e-06,-1.296952e-06,3.684175e-07,-3.243913e-07,1.024827e-07,-1.412993e-08
1970-05-31,0.114939,0.418865,0.048168,0.153189,0.015166,-0.003729,-0.001805,0.000355,-0.000189,3.200941e-05,8.779717e-06,-1.141692e-05,-8.88365e-06,-9.71184e-06,5.406608e-06,-1.271584e-06,4.2079e-07,-7.03967e-07,3.11952e-08,-3.345208e-08
1970-06-30,-0.342257,0.429821,-0.096485,0.140175,0.007289,-0.006368,-0.001054,0.00073,-4.3e-05,3.804683e-05,1.588625e-06,-1.984473e-06,8.850244e-07,-6.547972e-06,5.080016e-06,-6.658575e-07,4.953718e-07,-7.388902e-07,-5.345122e-08,-3.804282e-08
1970-07-31,-0.39599,0.205768,-0.161776,0.065392,-0.002375,-0.007756,0.000217,0.000674,0.00017,7.833282e-05,6.884555e-06,-5.203842e-06,4.901055e-06,1.536115e-06,3.29788e-06,3.226162e-07,-1.511889e-07,-2.5918e-07,2.923418e-09,-2.593851e-08
1970-08-31,-0.325605,0.121112,-0.122751,0.074666,-0.005811,-0.004775,-0.001201,0.000192,-4.2e-05,9.405012e-07,-5.131778e-06,-9.337801e-06,4.166484e-06,-1.056237e-06,2.92584e-06,4.897739e-08,-1.832041e-07,-2.907577e-07,2.097201e-08,-1.831401e-08
1970-09-30,-0.374422,0.134884,-0.125019,0.059176,-0.008538,-0.00405,-0.00118,0.000161,7e-06,1.226147e-05,-9.065348e-07,-1.272053e-05,1.943035e-06,-1.505322e-06,1.384893e-06,-9.478688e-08,-3.883932e-07,-1.731043e-07,4.500825e-08,-1.556512e-08
1970-10-31,-0.14649,0.211438,-0.140262,0.100464,0.001645,-0.005772,-0.001345,0.000178,-9.6e-05,-4.329942e-05,-1.088196e-06,-1.806119e-05,6.913641e-06,-8.625355e-07,3.368303e-06,7.540331e-07,-5.633542e-07,-2.731326e-07,4.87339e-08,-1.787028e-08


In [76]:
#Check dimensions
df_std_projections.shape

(550, 20)

In [77]:
# Plot all 
df_std_projections.iplot(title='Projections')

# data.T.iplot(title='Quasi curves') this plot not very useful, it will show that beyond 2nd-3rd column there is no curve information in the projection dataset


In [78]:
df_std_projections_3 = df_std.dot(eigenvectors[:, 0:3])  # only 3 eigenvectors are preserved
df_std_projections_3.index = df.index

df_std_projections_3.shape

(550, 3)

In [79]:
df_std_projections_3.head(10)

Unnamed: 0_level_0,0,1,2
years:,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1970-01-31,1.102175,0.509879,0.018864
1970-02-28,0.776345,0.488407,-0.003268
1970-03-31,0.306236,0.330603,-0.015749
1970-04-30,0.476186,0.291168,0.021121
1970-05-31,0.114939,0.418865,0.048168
1970-06-30,-0.342257,0.429821,-0.096485
1970-07-31,-0.39599,0.205768,-0.161776
1970-08-31,-0.325605,0.121112,-0.122751
1970-09-30,-0.374422,0.134884,-0.125019
1970-10-31,-0.14649,0.211438,-0.140262


In [80]:
# Standarized data
scaler = StandardScaler()
scaler.fit(df)

df_std = pd.DataFrame(scaler.transform(df), columns=df.columns)
df_std.head()

Unnamed: 0,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0,10.5
0,0.438865,0.440632,0.418813,0.390185,0.360774,0.33232,0.305297,0.279832,0.255942,0.233643,0.212977,0.193991,0.176715,0.161164,0.147336,0.135218,0.124781,0.115988,0.108793,0.103146
1,0.381957,0.360305,0.332602,0.304217,0.276632,0.250291,0.225334,0.201798,0.179716,0.159129,0.140089,0.12264,0.106806,0.092595,0.079998,0.068996,0.059556,0.051636,0.045186,0.040154
2,0.210651,0.201157,0.182185,0.160805,0.139552,0.119366,0.100553,0.083182,0.06726,0.052794,0.039806,0.028317,0.01833,0.009841,0.002834,-0.002718,-0.006851,-0.009608,-0.011039,-0.011196
3,0.175394,0.222287,0.228822,0.215938,0.194702,0.170718,0.14674,0.1242,0.103878,0.08617,0.071236,0.059081,0.049606,0.042699,0.038258,0.036181,0.036367,0.038716,0.043125,0.049493
4,0.120232,0.189004,0.199733,0.183035,0.154334,0.121225,0.087545,0.05544,0.026157,0.000344,-0.021711,-0.039964,-0.054509,-0.06546,-0.072931,-0.077037,-0.077895,-0.075626,-0.070353,-0.062199


In [81]:
# Calculate principal components
principal_components = df_std.dot(eigenvectors)
principal_components.index = df.index
principal_components.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
years:,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1970-01-31,1.102175,0.509879,0.018864,0.055784,0.005776,-0.005033,-0.000456,6.2e-05,-5.1e-05,-5e-06,-4.3e-05,8.731762e-06,7.927898e-07,-1.504709e-06,7.339142e-07,-5.933358e-07,-6.095368e-07,2.991323e-08,9.971778e-08,7.275981e-09
1970-02-28,0.776345,0.488407,-0.003268,0.03949,-0.00419,-0.001963,-0.000825,0.000148,1.2e-05,2.9e-05,-6e-06,-7.25569e-07,1.335223e-06,1.340096e-07,1.602598e-06,-1.964802e-07,-3.993696e-07,-9.940089e-09,8.593053e-08,2.670464e-09
1970-03-31,0.306236,0.330603,-0.015749,0.042838,-0.001795,-0.002541,-0.000854,4.1e-05,1.8e-05,3.1e-05,-4e-06,-2.981874e-06,-4.520448e-07,-1.079784e-06,9.735573e-07,-4.049635e-07,-3.589958e-07,-3.645443e-08,7.534482e-08,-2.276431e-09
1970-04-30,0.476186,0.291168,0.021121,0.109716,0.010959,-0.002292,-0.001483,7.1e-05,-0.00012,5.1e-05,2.6e-05,-8.546609e-06,-9.417729e-06,-5.971019e-06,4.465502e-06,-1.296952e-06,3.684175e-07,-3.243913e-07,1.024827e-07,-1.412993e-08
1970-05-31,0.114939,0.418865,0.048168,0.153189,0.015166,-0.003729,-0.001805,0.000355,-0.000189,3.2e-05,9e-06,-1.141692e-05,-8.88365e-06,-9.71184e-06,5.406608e-06,-1.271584e-06,4.2079e-07,-7.03967e-07,3.11952e-08,-3.345208e-08


In [82]:
principal_components.shape

(550, 20)

### PC1: Curve Level via 10Y Yield

In [83]:
level = pd.DataFrame({'10Y': df[10.0],
                      'PC1': principal_components[0]})
level.head()

Unnamed: 0_level_0,10Y,PC1
years:,Unnamed: 1_level_1,Unnamed: 2_level_1
1970-01-31,8.279691,1.102175
1970-02-28,8.049074,0.776345
1970-03-31,7.84522,0.306236
1970-04-30,8.041602,0.476186
1970-05-31,7.630168,0.114939


In [84]:
level.iplot(title='PC1 Projection vs 10Y Yield', secondary_y='PC1')

### PC2: Slope

In [85]:
# Calculate 10Y-2M slope
slope = pd.DataFrame(df)
slope = slope[[2,10]] # 2-year & 10-year
# slope here is considered as the difference between long-run rate minus short-run rate
slope['slope'] = slope[10] - slope[2]
slope['PC2'] = principal_components[1]
slope.head()

Unnamed: 0_level_0,2.0,10.0,slope,PC2
years:,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1970-01-31,8.700727,8.279691,-0.421035,0.509879
1970-02-28,8.370748,8.049074,-0.321674,0.488407
1970-03-31,7.795017,7.84522,0.050204,0.330603
1970-04-30,7.973522,8.041602,0.068079,0.291168
1970-05-31,7.862182,7.630168,-0.232015,0.418865


In [86]:
slope[['slope', 'PC2']].iplot(title='PC2 Projection vs 10Y-2Y Slope', secondary_y='PC2')

In [87]:
# Verify the correlation
np.corrcoef(principal_components[1], slope['slope'])

array([[ 1.        , -0.98227356],
       [-0.98227356,  1.        ]])

Correlation between the projection of PC2 and the slope of yield curve (10Y - 2Y) is near 1.

Confirms that the second principal component represents the slope type of movement.

---

In [88]:
pd.DataFrame(np.c_[principal_components[[0]], df[10]], columns=['10Yr-Rate','Component']).iplot(title='One Component', secondary_y='Component')

In [89]:
pd.DataFrame(np.c_[ principal_components[[0,1]].sum(axis=1), df[10]], columns=['10Yr-Rate','Components']).iplot(title='Two Components', secondary_y='Components')

In [70]:
pd.DataFrame(np.c_[ principal_components[[0,1,2]].sum(axis=1), df[10]], columns=['10Yr-Rate','Components']).iplot(title='Three Components', secondary_y='Components')

Including more components would make the fitting more accurate.