*All content and data presented in the articles on this platform are sourced from the comprehensive boxset titled “Market Risk Analysis” by Carol Alexander. The author of the articles acknowledges and respects the intellectual property rights and copyrights held by the original author of the boxset. The purpose of sharing this information is solely for educational and informational purposes, and no infringement of intellectual property rights is intended.*

# Review of PCA

In this article, we explore statistical factor models that utilize Principal Component Analysis (PCA) to model portfolio returns and profit and loss (P&L) of cash flows. These models are commonly employed for assessing portfolio risks and determining risk-adjusted performance measures for investment ranking.

Statistical factor models for portfolios are built upon factors that lack economic or financial interpretation. To obtain a principal component representation of the percentage return for each asset in the investor's universe, an eigenvector analysis is conducted on a large covariance matrix, incorporating the returns of all assets in the portfolio. Each principal component represents the percentage return associated with a statistical risk factor. By selecting the appropriate number of principal components for each asset's representation, the investor can adjust the asset's specific risk. Optimal portfolios are then constructed by adjusting the weights to align with the desired systematic risk, systematic return, and specific risk characteristics specified by the investor.

For portfolios consisting of interest rate-sensitive instruments like bonds and swaps, factor models assume the portfolio has already been mapped to a fixed set of risk factors that align with standard vertices along one or more yield curves. In such cases, PCA can be based on a covariance or correlation matrix of changes in these risk factors at a specific frequency, such as daily, weekly, or monthly interest rate changes. It is worth noting that yield curve factor models differ from regression-based factor models introduced in the previous articles in two ways: they capture the portfolio's P&L rather than its percentage return, and the P&L is represented as a linear function of changes in risk factors rather than risk factor percentage returns.

 The primary aims of the PCA curve factor models are as follows:
* To reduce number of risk factors to a manageable dimension. For example, instead of sixty
yields of different maturities as risk factors we might use just three principal components.
* To identify the key sources of risk. Typically the most important risk factors are parallel
shifts, changes in slope and changes in convexity of the curves.
* To facilitate the measurement of portfolio risk, for instance by introducing scenarios
on the movements in the major risk factors.
* To help investors form optimal portfolios which are hedged against the most common
types of movements in the curve. For example, using PCA is it easy to derive allocations
to bonds so that the portfolio’s value is unchanged for 95% (or more) of yield curve
variations that have been observed in an historical sample. \

If we consider $V$ the covariance matrix of a set of $n$ returns summarized in T x n matrix $X$, we could obtain the principal components $V$ as

\begin{equation}
P = XW
\end{equation}

where $W$ is the orthogonal matrix of eigenvectors of $V$. Since $W$ is orthogonal, $W^{-1} = W'$ and so  \begin{equation}
X = PW^{-1}
\end{equation}

The *m*th principal component is the *m*th column of $P$ and. Also, if we order $W$ so that the first column of $W$ is the eigenvector corresponding with the largest eigenvalue of $V$, we could see that the sum of the squares of the elements in the *m*th principal component is equal to the *m*th eigenvalue, $\lambda_{m}$. The percentage variation explained by the *m*th component can be obtained as

\begin{equation}
\frac{\lambda_{m}}{\lambda_{1} + \cdot + \lambda_{n}}
\end{equation}

# Case Study - PCA

*In this case study we consolidate the concepts reviewed in the previous section by analysing
a system of 50 key interest rates. We consider daily data on UK government yield curves for maturities between 6 months and 25 years. We perform a PCA on daily changes in each rate and
show that, out of all 50 principal components only the first three will be needed for any
subsequent analysis: these three components together explain more than 99% of the total
variation in the systems of 50 interest rates.*  

In [6]:
import pandas as pd
from scipy import stats
import numpy as np
import plotly.express as px
from scipy.optimize import minimize
import plotly.graph_objects as go
import plotly.io as pio
import pathlib
import sys
utils_path = pathlib.Path().absolute().parent.parent
sys.path.append(utils_path.__str__())
import utils.layout  as lay
from utils.functions import PCA

In [7]:
pio.templates.default = 'simple_white+blog_mra'

In [8]:
df = pd.read_excel(r"data\Case Study II.2_PCA UK Yield Curves\PCA_Spot_Curve.xls", sheet_name="Spot").set_index("Date")
df = df[df.index >"2005-01-01"]
df.index = df.index.date
diff_bps = df.diff()[1:] *100
diff_bps  # convert to bps
vols = diff_bps.std()*np.sqrt(250)

#selected maturities
cols = ["1 yr", "2 yr",  "3 yr",  "4 yr",  "5 yr",  "7 yr",  "10 yr",  "15 yr",  "20 yr", ]
fig = px.line(df[cols])
fig.update_layout(legend_title="", yaxis_title="Yields" , xaxis_title="", 
                  title_text="UK zero coupon spot yield curve, 2000-2007")
fig.show()

$$
\begin{array}{lrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr}
\hline
 & 0.5 yr & 1 yr & 1.5 yr & 2 yr & 2.5 yr & 3 yr & 3.5 yr & 4 yr & 4.5 yr & 5 yr & 5.5 yr & 6 yr & 6.5 yr & 7 yr & 7.5 yr & 8 yr & 8.5 yr & 9 yr & 9.5 yr & 10 yr & 0.5 yr.1 & 1 & 11.5 yr & 12 yr & 12.5 yr & 13 yr & 13.5 yr & 14 yr & 14.5 yr & 15 yr & 15.5 yr & 16 yr & 16.5 yr & 17 yr & 17.5 yr & 18 yr & 18.5 yr & 19 yr & 19.5 yr & 20 yr & 20.5 yr & 21 yr & 21.5 yr & 22 yr & 22.5 yr & 23 yr & 23.5 yr & 24 yr & 24.5 yr & 25 yr \\
\hline
2005-01-04 & 4.55 & 4.46 & 4.39 & 4.37 & 4.37 & 4.38 & 4.39 & 4.40 & 4.41 & 4.42 & 4.43 & 4.43 & 4.44 & 4.45 & 4.46 & 4.46 & 4.47 & 4.47 & 4.48 & 4.48 & 4.48 & 4.48 & 4.49 & 4.49 & 4.49 & 4.48 & 4.48 & 4.48 & 4.48 & 4.47 & 4.47 & 4.46 & 4.45 & 4.45 & 4.44 & 4.43 & 4.43 & 4.42 & 4.41 & 4.40 & 4.39 & 4.39 & 4.38 & 4.37 & 4.36 & 4.35 & 4.35 & 4.34 & 4.33 & 4.32 \\
2005-01-05 & 4.57 & 4.47 & 4.40 & 4.38 & 4.38 & 4.38 & 4.39 & 4.40 & 4.41 & 4.43 & 4.44 & 4.45 & 4.45 & 4.46 & 4.47 & 4.48 & 4.48 & 4.49 & 4.49 & 4.50 & 4.50 & 4.50 & 4.50 & 4.50 & 4.50 & 4.50 & 4.50 & 4.50 & 4.49 & 4.49 & 4.48 & 4.48 & 4.47 & 4.47 & 4.46 & 4.45 & 4.44 & 4.44 & 4.43 & 4.42 & 4.41 & 4.40 & 4.39 & 4.39 & 4.38 & 4.37 & 4.36 & 4.35 & 4.35 & 4.34 \\
2005-01-06 & 4.57 & 4.46 & 4.38 & 4.34 & 4.34 & 4.35 & 4.36 & 4.37 & 4.38 & 4.39 & 4.41 & 4.42 & 4.43 & 4.43 & 4.44 & 4.45 & 4.45 & 4.46 & 4.46 & 4.47 & 4.47 & 4.47 & 4.47 & 4.47 & 4.47 & 4.47 & 4.47 & 4.46 & 4.46 & 4.45 & 4.45 & 4.44 & 4.43 & 4.43 & 4.42 & 4.41 & 4.40 & 4.40 & 4.39 & 4.38 & 4.37 & 4.36 & 4.36 & 4.35 & 4.34 & 4.33 & 4.32 & 4.32 & 4.31 & 4.30 \\
2005-01-07 & 4.58 & 4.46 & 4.38 & 4.35 & 4.34 & 4.35 & 4.36 & 4.37 & 4.38 & 4.39 & 4.40 & 4.41 & 4.42 & 4.43 & 4.44 & 4.44 & 4.45 & 4.45 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.45 & 4.45 & 4.44 & 4.44 & 4.43 & 4.42 & 4.42 & 4.41 & 4.40 & 4.39 & 4.39 & 4.38 & 4.37 & 4.36 & 4.35 & 4.34 & 4.34 & 4.33 & 4.32 & 4.31 & 4.31 & 4.30 & 4.29 \\
2005-01-10 & 4.57 & 4.45 & 4.37 & 4.34 & 4.33 & 4.34 & 4.35 & 4.36 & 4.37 & 4.38 & 4.39 & 4.40 & 4.41 & 4.42 & 4.43 & 4.44 & 4.44 & 4.45 & 4.45 & 4.45 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.46 & 4.45 & 4.45 & 4.45 & 4.44 & 4.44 & 4.43 & 4.42 & 4.42 & 4.41 & 4.40 & 4.39 & 4.39 & 4.38 & 4.37 & 4.36 & 4.36 & 4.35 & 4.34 & 4.33 & 4.33 & 4.32 & 4.31 & 4.30 \\
\hline
\end{array}$$

The profit and loss (P&L) generated by fixed income portfolios is linked to variations in interest rate risk factors, expressed in basis points. Therefore, when considering the volatilities and correlations of interest rates, they pertain to the absolute changes in basis points for interest rates. The following table exhibits the volatility of spot rates in basis points per annum, plotted against the maturity of the respective spot rate. Volatility is at its minimum for shorter maturities and reaches its peak for rates with a maturity between 5 and 10 years. Rates exceeding 5 years exhibit a volatility of approximately 50 basis points per annum. Due to the considerably lower volatility of shorter-term rates, applying PCA to the covariance matrix, which includes rate volatilities, may yield different outcomes compared to applying PCA to the correlation matrix.

In [9]:
corr_mx = diff_bps[cols].corr()
fig= px.line(x=vols.index, y=vols.values)
fig.update_layout(legend_title="", yaxis_title="Volatility (Bps)" , xaxis_title="Maturity",
                  title_text="Volatility of UK spot rates")
fig.show()

$$Correlation Matrix$$
$$
\begin{array}{lrrrrrrrrr}
\hline
 & 1 yr & 2 yr & 3 yr & 4 yr & 5 yr & 7 yr & 10 yr & 15 yr & 20 yr \\
\hline
1 yr & 1.00 & 0.93 & 0.88 & 0.84 & 0.81 & 0.74 & 0.67 & 0.61 & 0.56 \\
2 yr & 0.93 & 1.00 & 0.99 & 0.97 & 0.95 & 0.89 & 0.83 & 0.77 & 0.72 \\
3 yr & 0.88 & 0.99 & 1.00 & 0.99 & 0.98 & 0.94 & 0.88 & 0.83 & 0.78 \\
4 yr & 0.84 & 0.97 & 0.99 & 1.00 & 1.00 & 0.97 & 0.92 & 0.88 & 0.83 \\
5 yr & 0.81 & 0.95 & 0.98 & 1.00 & 1.00 & 0.99 & 0.96 & 0.92 & 0.87 \\
7 yr & 0.74 & 0.89 & 0.94 & 0.97 & 0.99 & 1.00 & 0.99 & 0.96 & 0.92 \\
10 yr & 0.67 & 0.83 & 0.88 & 0.92 & 0.96 & 0.99 & 1.00 & 0.99 & 0.96 \\
15 yr & 0.61 & 0.77 & 0.83 & 0.88 & 0.92 & 0.96 & 0.99 & 1.00 & 0.99 \\
20 yr & 0.56 & 0.72 & 0.78 & 0.83 & 0.87 & 0.92 & 0.96 & 0.99 & 1.00 \\
\hline
\end{array}$$

## PCA on UK Short Spot Rates Correlation Matrix

The below table gives the first six eigenvalues, ordered from largest to smallest, and their corresponding eigenvectors, and the figure plots the first three eigenvectors as function of the maturity of the rate.

In [10]:
corr_mx = diff_bps.corr()
corrmx_eigenvalues, corrmx_eigenvectors = PCA(corr_mx)

fig = px.line(corrmx_eigenvectors.loc[:, :"λ3"])
fig.update_layout(legend_title="", yaxis_title="Eigenvector" , xaxis_title="Maturity", height=600, width=900,
                  title_text="Eigenvectors of the UK daily spot rate correlation matrix")
fig.show()

$$ Eignevalues$$
$$\begin{array}{lrrrrrr}
\hline
 & λ1 & λ2 & λ3 & λ4 & λ5 & λ6 & ... \\
\hline
Eignevalue & 45.52 & 3.42 & 0.66 & 0.30 & 0.06 & 0.02 & ...\\
\% Variation & 0.91 & 0.07 & 0.01 & 0.01 & 0.00 & 0.00 & ...\\
Cumulative \% & 0.91 & 0.98 & 0.99 & 1.00 & 1.00 & 1.00 & ...\\
\hline
\end{array}$$

$$ Eignevectors$$
$$\begin{array}{lrrrrrr}
\hline
 & λ1 & λ2 & λ3 & λ4 & λ5 & λ6 \\
\hline
0.5 yr & 0.07 & 0.35 & 0.69 & 0.44 & -0.36 & 0.25 \\
1 yr & 0.10 & 0.35 & 0.33 & 0.00 & 0.46 & -0.49 \\
1.5 yr & 0.12 & 0.31 & 0.09 & -0.20 & 0.37 & -0.08 \\
2 yr & 0.12 & 0.28 & -0.01 & -0.25 & 0.19 & 0.16 \\
2.5 yr & 0.13 & 0.25 & -0.06 & -0.25 & 0.04 & 0.23 \\
\hline
\end{array}$$

Considerations:
* the first eigenvalue explains 45.52/50 = 91.05% of the covariation between spot rates, while the second and third explain 6.58% and 1.33% respectively
* the first component explains the large majority of volatility while the second and third bring a minor contribution. The rest of the eigenvalues explain an insignificant part of volatilty and may be simply associated with statistical noise.
* The first three eignevalues together explain more than 99% of the total covariation

## Principal Component Representation

If we denote $\Delta R_{m}$ the standardize vector of daily changes in the sport interest rate of maturity m, and $p_{1}, \, p_{2}, \, p_{3} $ the time series of principal components, we can derive the principal component representation of the standardized rates as:

\begin{equation}
\Delta R_{6mth} = 0.0675 p_{1} + 0.3464 p_{2} + 0.6878 p_{3} \\
\\
\cdots \\
\Delta R_{25yr} =0.0140 p_{1} - 0.1541 p_{2} + 0.1535 p_{3}
\end{equation}

 It is common to call the first principal component the *trend component*, or shift component of the term structure. The second principal component is commonly referred to as the *tilt component* and the third principal component is called the *convexity or curvature component*. Is it also important to note that if we shuffle up the ordering of the system the second and third principal components will no longer look like a decreasing line, or a quadratic function as showed in figure above. Hence, the interpretation of these components does depend on having a natural ordering in the system.

## PCA on UK Short Spot Rates Covariance Matrix

In [11]:
cov_mx = diff_bps.cov()
covmx_eigenvalues, covmx_eigenvectors = PCA(cov_mx)

fig = px.line(covmx_eigenvectors.loc[:, :"λ3"])
fig.update_layout(legend_title="", yaxis_title="Eigenvector" , xaxis_title="Maturity", height=600, width=900,
                  title_text="Eigenvectors of the UK daily spot rate covariance matrix")
fig.show()

$$Eigenvalues$$
$$\begin{array}{lrrrrrr}
\hline
 & λ1 & λ2 & λ3 & λ4 & λ5 & λ6 \\
\hline
Eignevalue & 509.35 & 33.95 & 5.02 & 2.53 & 0.56 & 0.17 \\
Variation Explained & 0.92 & 0.06 & 0.01 & 0.00 & 0.00 & 0.00 \\
Cumulative Variation & 0.92 & 0.98 & 0.99 & 1.00 & 1.00 & 1.00 \\
\hline
\end{array}
$$

$$Eigenvectors$$
$$
\begin{array}{lrrrrrr}
\hline
 & λ1 & λ2 & λ3 & λ4 & λ5 & λ6 \\
\hline
0.5 yr & 0.04 & -0.20 & -0.42 & 0.65 & -0.43 & 0.37 \\
1 yr & 0.08 & -0.28 & -0.35 & 0.31 & 0.26 & -0.47 \\
1.5 yr & 0.11 & -0.30 & -0.24 & -0.00 & 0.36 & -0.22 \\
2 yr & 0.13 & -0.30 & -0.15 & -0.16 & 0.27 & 0.05 \\
2.5 yr & 0.14 & -0.28 & -0.09 & -0.22 & 0.14 & 0.18 \\
\hline
\end{array}$$

**Note**: PCA to the covariance matrix or the correlation matrix can yield different results due to the effect of volatility. In particular:

1. Covariance Matrix:
The covariance matrix measures the linear relationship between variables while taking into account their respective scales. Each element in the covariance matrix represents the covariance between two variables. PCA applied to the covariance matrix aims to find the directions (principal components) along which the data exhibits the highest variance.

2. Correlation Matrix:
The correlation matrix, on the other hand, measures the standardized linear relationship between variables, eliminating the scale differences. It represents the correlation coefficients between variables, with values ranging from -1 to 1. PCA applied to the correlation matrix seeks to identify the directions of maximum variance while considering the correlation structure between variables.

The main difference between PCA on the covariance matrix and the correlation matrix lies in the scaling effect. When PCA is applied to the covariance matrix, variables with larger variances dominate the analysis, potentially overshadowing variables with smaller variances. In contrast, PCA on the correlation matrix gives equal weight to all variables, as they are standardized to have unit variances.

Clearly, using PCA considerably simplifies factor model analysis for interest rates: when
we need to model 60 × 61/2 = 1830 variances and covariances of 60 different interest rates
we reduce the problem to finding only three variances! The factor weights (i.e. the first
three eigenvectors of the interest rate covariance matrix) can be used to retrieve the 1830
covariances of the interest rates.