<a href="https://colab.research.google.com/github/hiswaps/eurswaps_pca/blob/main/swap_pca_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**PCA Analysis of Swap Rates (EUR ESTR)**

The following notebook serves as a documentation for the PCA analysis on **ESTR** swap rates. For the purpose of PCA, only the time series of swap rates across different tenors (1-50Y) was used.

For the development of the application, similar procedure was used with two datasets: EUR ESTR swap rates and EUR 6M EURIBOR swap rates



# **Table of Contents**

1.   Importing External Libraries
2.   Fetching Swap Rates From the CSV files
3.   Calculating the Daily Swap Rate Change (in bps)
4.   Computing the Covariance Matrix + PCA
5.   Plotting the PCs across Tenors
6.   Calculating the actual values of the Principal Components
7.   Calculating the Expected Daily Change in Swap Rates (change implied by our PCA)
8.   Calculating the PCA residuals and identifying Relative Value opportunities



#**1. Importing external libraries**

In [1]:
# Importing external libraries

import plotly.express as px
import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn import preprocessing

#**2. Fetching Swap Rates From the CSV files**



*   Storing the data for swap rates (CSV file --> Pandas 
*   Plotting the time series of ESTR swap rates


In [2]:
# Timeseries Data Collection - reading from the CSV

## ESTR SWAP RATES - df_estr

estr_tenors = [1,2,3,4,5,6,7,8,9,10,11,12,15,20,30,40,50]

data = pd.read_csv('pcadata.csv')
data.columns = ['date','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','11Y','12Y','15Y','20Y','30Y','40Y','50Y']

data = data.set_index('date')

df_estr = data.copy(deep=True)
df_estr = df_estr[:-1]

combined_data = df_estr.copy(deep=True)

#ESTR swap rates plot

fig = px.line(df_estr,title = 'ESTR Swap Rates - BBG',
              width=1000,
              height=800,
              template="plotly_dark",
              labels={'value':'Swap Rates (in %)','date':"Date"})
fig.show()

**Quick analysis of the swap rates using the describe function**

In [3]:
df_estr.describe()

Unnamed: 0,1Y,2Y,3Y,4Y,5Y,6Y,7Y,8Y,9Y,10Y,11Y,12Y,15Y,20Y,30Y,40Y,50Y
count,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0,644.0
mean,-0.49669,-0.408536,-0.351397,-0.306167,-0.262699,-0.216726,-0.16739,-0.115187,-0.061775,-0.00887,0.041465,0.088329,0.200417,0.280594,0.249949,0.192864,0.13733
std,0.237433,0.419338,0.487554,0.517532,0.531051,0.536561,0.539809,0.543333,0.546542,0.548848,0.549843,0.549337,0.540862,0.512466,0.477103,0.465898,0.457365
min,-0.757,-0.798,-0.789,-0.777,-0.758,-0.733,-0.7074,-0.6783,-0.6473,-0.6144,-0.5795,-0.5474,-0.4603,-0.3873,-0.471,-0.5486,-0.627
25%,-0.585425,-0.605,-0.601975,-0.587625,-0.562375,-0.532775,-0.4977,-0.45585,-0.412,-0.36625,-0.323,-0.281125,-0.178,-0.09585,-0.115775,-0.1665,-0.211625
50%,-0.57245,-0.57525,-0.55175,-0.516,-0.467,-0.404,-0.34415,-0.2785,-0.2104,-0.152,-0.0934,-0.0385,0.0955,0.1867,0.17825,0.1225,0.074
75%,-0.54495,-0.46875,-0.3905,-0.33375,-0.2905,-0.241325,-0.186225,-0.12375,-0.05495,-0.00175,0.0537,0.10785,0.23875,0.35355,0.372175,0.33625,0.29525
max,0.9683,1.733,1.968,2.048,2.097,2.142,2.184,2.233,2.2828,2.328,2.342,2.388,2.4335,2.3628,2.1923,2.07,2.002


#**3. Calculating the Daily Swap Rate Change (in bps)**

*  Create a copy of the original data
*  Calculate the daily change in swap rates
*  Drop any NaN/infinite values (as this might affect the analysis)
*  Describe the new dataset of returns
*  Plot the daily change in swap rates as a line graph

In [4]:
df = df_estr.copy(deep=True) #creating a dataframe copy of the swap rates
returns = (df - df.shift(1))*100 # calculating the daily change (in bps)

# Removing and replacing erratic values (if any)
returns.replace([np.inf, -np.inf], np.nan, inplace=True)
returns = returns.dropna(axis=0)

# Quick look at the new dataframe
returns

Unnamed: 0_level_0,1Y,2Y,3Y,4Y,5Y,6Y,7Y,8Y,9Y,10Y,11Y,12Y,15Y,20Y,30Y,40Y,50Y
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1/2/2020,-0.40,-0.85,-1.60,-2.70,-3.16,-3.40,-3.60,-3.69,-3.70,-4.04,-4.20,-4.10,-4.40,-4.49,-4.88,-4.94,-3.4
1/3/2020,-0.75,-1.90,-2.25,-2.90,-3.54,-4.15,-4.49,-4.81,-4.94,-5.15,-5.19,-5.40,-5.50,-5.57,-5.53,-5.44,-5.2
1/6/2020,-0.25,0.10,-0.50,-0.70,-1.02,-0.85,-0.79,-0.75,-0.77,-0.75,-0.71,-0.65,-0.70,-0.53,-0.48,-0.40,-0.5
1/7/2020,-0.15,-0.80,-0.49,-0.70,-0.72,-0.90,-0.84,-0.75,-0.69,-0.50,-0.50,-0.50,-0.32,-0.23,0.10,0.14,0.5
1/8/2020,0.20,0.80,1.54,2.40,3.14,3.59,3.62,3.60,3.85,3.80,4.00,4.10,4.32,4.62,4.95,5.16,5.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6/14/2022,16.73,16.20,18.65,18.12,17.60,18.97,20.22,21.20,22.10,22.80,22.00,23.40,23.80,22.20,20.70,19.04,21.6
6/15/2022,-12.43,-22.00,-25.23,-20.92,-17.92,-17.57,-17.01,-16.87,-16.98,-16.37,-13.31,-14.92,-14.05,-12.98,-11.63,-10.20,-8.0
6/16/2022,5.30,6.80,7.02,7.62,8.72,8.77,7.91,7.67,7.30,6.47,6.01,6.42,5.50,4.70,3.00,2.05,-2.9
6/17/2022,-6.70,-9.18,-8.15,-9.75,-11.45,-11.00,-10.30,-10.55,-10.40,-9.80,-9.50,-9.10,-7.45,-5.20,-3.20,-2.70,-0.4


**Quick look at the basic stats of the 'returns' dataframe**

In [5]:
returns.describe()

Unnamed: 0,1Y,2Y,3Y,4Y,5Y,6Y,7Y,8Y,9Y,10Y,11Y,12Y,15Y,20Y,30Y,40Y,50Y
count,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0
mean,0.217496,0.322628,0.351322,0.361586,0.366376,0.363764,0.360809,0.357854,0.355365,0.35224,0.350078,0.346314,0.333281,0.303437,0.271026,0.25888,0.258165
std,1.790666,2.945722,3.284638,3.297219,3.387304,3.414127,3.449729,3.478567,3.518691,3.568888,3.540743,3.551725,3.569969,3.626822,3.795261,3.86898,4.021793
min,-12.43,-22.0,-25.23,-20.92,-17.92,-17.57,-17.01,-16.87,-16.98,-16.37,-13.9,-14.92,-14.05,-14.98,-18.45,-19.48,-19.8
25%,-0.28,-0.5,-0.66,-0.85,-1.1,-1.2,-1.35,-1.45,-1.515,-1.5,-1.6,-1.6,-1.7,-1.785,-1.83,-1.96,-2.0
50%,0.0,0.1,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.02,0.03,0.2
75%,0.35,0.8,1.01,1.25,1.575,1.7,1.875,1.91,2.0,2.1,2.125,2.135,2.16,2.21,2.335,2.395,2.4
max,16.73,19.71,22.15,20.78,19.5,18.97,20.22,21.2,22.1,22.8,22.0,23.4,23.8,22.2,21.53,21.44,21.6


**Plotting a line graph for the daily change in swap rates**

In [6]:
fig = px.line(returns,
              title = 'Daily ESTR Swap Rate Change (in bps)',
              width=1000,
              height=800,
              template="plotly_dark")
fig.show()

#**4. Computing the Covariance Matrix + PCA**
_
***
**Note:** "It is important to use the covariance rather than correlation matrix since the difference in volatility (sensitivity) is a key element of the analysis and must not be netted out by using correlations." - Fixed Income Relative Value Analysis by Doug Huggins
***
_

*  Compute the covariance matrix
*  Apply the PCA function to fit and transform the data
*  Fetch the PCA data for explained variance of the Principal Components 
*  Plotting the 'Scree Plot' to identify the 'contribution' of each principal component to the variance

In [7]:
# COMPUTING THE COVARIANCE MATRIX AND PERFORMING PCA

cov_matrix = returns.cov()

# PCA (fit & transform)

pca = PCA()
pca.fit_transform(cov_matrix)

# Explained variance

per_var = np.round(pca.explained_variance_ratio_*100,decimals=2)
labels = ['PC'+str(x) for x in range(1,len(per_var)+1)]
raw_bars = pd.DataFrame(per_var,index=labels) # quick dataframe to enable easy plotting of % variance explained by the principal components

# Plotting the graph

fig = px.bar(raw_bars[:8],
             title = '% of Explained Variance by PCs',
             width=600,
             height=500,
             labels={
                     "index": "Principal Component",
                     "value": "Percentage of Explained Variance"},
             template="plotly_dark"
             )
fig.update_layout(showlegend=False)
fig.show()

# **Notes on PCA**

* Principal Component Analysis (PCA) quantifies movements in a specific market and represents them as a combination of two to three driving factors, called principal components (PCs).

* When analyzing yield curves, the movements in the yield curve can be expressed in terms of three driving factors: *level, slope, and curvature* (first, second, and third principal component).

* PCA formalizes this viewpoint and allows us to evaluate when a sector of the yield curve has cheapened or richened beyond that prescribed by recent yield movements.

_

**IMPORTANT:**  *We can observe that the first 3 principal components can explain > 90% of the changes, and therefore we will only be using these 3 principal components for our future analysis. *

* The reason we are doing this is because performing PCA on the data has decomposed the market into uncorrelated factors which we can easily interpret.

* In essence, we have reduced the dimensionality of the original data and can now express a view on the market simply by taking a view on any given factor.

* This can allow us to construct trades/portfolios that are exposed to or hedged against any factor.

_

**How is it done?**

- An entire sample’s information can be represented in terms of structural changes – captured by the PCs – and noise.

- Should this noise be significantly different than zero, it highlights a possible dislocation within the dataset.

- This can be interpreted in conjunction with market views to see if there is an actionable trade opportunity. 


_
****
****
_

The values we have obtained so far are essentially the 'changes' in principal components, not the actual principal components. In mathematical terms, these values are the **eigenvectors** but are commonly referred to as 'Factor Loadings'

*  We will now store these factor loadings (for each tenor) in a new dataframe 

In [8]:
rands = pd.DataFrame({'PC1':pca.components_[0],'PC2':pca.components_[1],'PC3':pca.components_[2]}, index=cov_matrix.index)
rands

Unnamed: 0,PC1,PC2,PC3
1Y,0.019748,-0.160429,0.130569
2Y,0.025191,-0.353682,0.369785
3Y,-0.007867,-0.392218,0.369532
4Y,-0.06146,-0.362651,0.214571
5Y,-0.109638,-0.329415,0.084078
6Y,-0.145103,-0.289412,-0.012319
7Y,-0.177895,-0.247613,-0.089858
8Y,-0.20175,-0.212105,-0.139955
9Y,-0.224275,-0.178245,-0.186849
10Y,-0.246721,-0.144192,-0.223637


# **5. Plotting the PCs across Tenors**



*   Plotting the principal components (1-3) across the tenors
*   Interpreting the curve


In [9]:
# Plotting the PCs across tenors

fig_pca = px.line(rands,
                  title = 'PCs across Tenors',
                  width=800,
                  height=700,
                  labels={"value":"Change in Yield","index":"Tenor"},
                  template="plotly_dark",
                  markers=True)
fig_pca.show()

# Interpreting the Curve:

We know that the first three components can be referred to as: 

* PC1: Level
* PC2: Slope
* PC3: Curvature

From the graph above, we can interpret these terms in a fairly intuitive fashion by looking at changes in sign of the principal component loadings.

_ 

* **PC1** has the same sign for each maturity beyond the 3yr mark, so all rates will move up or down together due to the first principal component (level). According to the latest analysis, 1yr and 2yr rates have the opposite signs and will move in the opposite direction. 

* **PC2** has one change in sign, so the shorter maturity rates will move in opposite direction to the longer rates due to the second principal component (slope).

* **PC3** has two changes in sign, so the shortest and longest maturities move in the same direction, whilst the middle maturities move in the opposite direction (curvature).

_
******
******



# **6. Calculating the actual values of the Principal Components**



*   So far we spoke about changes in principal components. 
*   We would like to know what value they actually take. 
*   This is easy; each principal component is a linear combination of the original data and the loadings. 
*   We can calculate this across the entire time series by simply computing the dot product


In [10]:
# actual values of Principal Components

tas = returns.copy(deep=True)
pcas = np.dot(tas,rands)

# Storing the values in a dataframe

pca_df = pd.DataFrame(pcas,columns=['PC1','PC2','PC3'], index=tas.index)

# Combining the dataframes -- change in swap rates + PCA (will allow us to plot the data easily)
tas = tas.join(pca_df)
pca_df

Unnamed: 0_level_0,PC1,PC2,PC3
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/2/2020,14.246990,4.241694,0.399260
1/3/2020,17.751976,5.543449,0.011504
1/6/2020,2.202568,1.396701,0.162580
1/7/2020,0.963819,2.098169,0.367624
1/8/2020,-14.959936,-3.482979,0.590579
...,...,...,...
6/14/2022,-73.648739,-38.130250,7.445778
6/15/2022,46.855295,46.141013,-11.525573
6/16/2022,-15.944026,-20.666772,-1.888978
6/17/2022,23.135690,26.157118,1.460594


**Plotting the actual values of the Principal Components**

In [11]:
fig_pca1 = px.line(tas[tas.columns[-3:]],
                   title = 'Principal Components (Actual Values)',
                   width=800,
                   height=700,
                   labels={"value":"Values","index":"Time"},
                   template='plotly_dark')
fig_pca1.show()

# **7. Calculating the Expected Daily Change in Swap Rates (change implied by our PCA)**

*   PCA is used to represent the original data as a function of a reduced number of factors.
*   In our case that means each change in yield for a chosen swap tenor is a function of three factors.
*   So, for example, on any given day the change in 10yr swap is a given by its loadings (the dataframe 'rands') times the principal components (the dataframe 'pca_df) 
*   We can calculate these implied daily changes across the entire time series for all tenors by simply computing the dot product of 'pca_df' and 'rands.T' (transpose matrix of the eigenvectors)

> Indented block




In [12]:
# Calculating the expected changes

expected_change = np.dot(pca_df,rands.T) #we use the transpose matrix rand.T to enable matrix multiplication
expected_changes = pd.DataFrame(expected_change,index = pca_df.index, columns=returns.columns)
expected_changes

Unnamed: 0_level_0,1Y,2Y,3Y,4Y,5Y,6Y,7Y,8Y,9Y,10Y,11Y,12Y,15Y,20Y,30Y,40Y,50Y
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1/2/2020,-0.347009,-0.993669,-1.628208,-2.328198,-2.925721,-3.299798,-3.620642,-3.829897,-4.025910,-4.215945,-4.243689,-4.325180,-4.410999,-4.432014,-4.420991,-4.329067,-4.079119
1/3/2020,-0.537260,-1.509167,-2.309642,-3.098897,-3.771420,-4.180349,-4.531650,-4.758870,-4.971571,-5.181686,-5.210249,-5.295804,-5.400207,-5.458738,-5.501119,-5.433823,-5.288641
1/6/2020,-0.159347,-0.378382,-0.505060,-0.606998,-0.687910,-0.725824,-0.752276,-0.763370,-0.773315,-0.781173,-0.760509,-0.755326,-0.712410,-0.634407,-0.527787,-0.465843,-0.353567
1/7/2020,-0.269573,-0.581863,-0.694674,-0.741257,-0.765931,-0.751617,-0.724026,-0.690935,-0.658840,-0.622549,-0.562943,-0.526757,-0.392145,-0.183996,0.089947,0.224378,0.455931
1/8/2020,0.340452,1.073393,1.702012,2.309257,2.837178,3.171474,3.470659,3.674274,3.865618,4.061079,4.127442,4.219829,4.412287,4.642998,4.925000,4.998606,5.155865
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6/14/2022,5.634963,14.384011,18.286209,19.952029,21.261411,21.630291,21.874216,21.904202,21.922888,22.003655,21.661865,21.474253,21.072610,20.768143,20.500342,20.417298,21.829426
6/15/2022,-7.981936,-19.400888,-22.725012,-22.085828,-21.305721,-20.010635,-18.724766,-17.626758,-16.579359,-15.635842,-14.731858,-13.903917,-12.308716,-10.852327,-9.320474,-9.055128,-11.375884
6/16/2022,2.754042,6.209298,7.533276,8.069414,8.397202,8.318010,8.123462,7.864618,7.612560,7.336167,6.834816,6.530352,5.442961,3.805070,1.667412,0.627798,-0.967098
6/17/2022,-3.548765,-8.128377,-9.901567,-10.594408,-11.030303,-10.945238,-10.723809,-10.420115,-10.124058,-9.806368,-9.211833,-8.847736,-7.585237,-5.717074,-3.292717,-2.124032,-0.470004


# **8. Calculating the PCA residuals and identifying Relative Value opportunities**

* We can calculate the PCA residuals by taking the difference between the current rate and the rate implied by the PCA model

* Relative value opportunities can be spotted by plotting the residuals

In [13]:
# Calculating the residuals

df_residuals = returns - expected_changes
df_residuals

Unnamed: 0_level_0,1Y,2Y,3Y,4Y,5Y,6Y,7Y,8Y,9Y,10Y,11Y,12Y,15Y,20Y,30Y,40Y,50Y
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1/2/2020,-0.052991,0.143669,0.028208,-0.371802,-0.234279,-0.100202,0.020642,0.139897,0.325910,0.175945,0.043689,0.225180,0.010999,-0.057986,-0.459009,-0.610933,0.679119
1/3/2020,-0.212740,-0.390833,0.059642,0.198897,0.231420,0.030349,0.041650,-0.051130,0.031571,0.031686,0.020249,-0.104196,-0.099793,-0.111262,-0.028881,-0.006177,0.088641
1/6/2020,-0.090653,0.478382,0.005060,-0.093002,-0.332090,-0.124176,-0.037724,0.013370,0.003315,0.031173,0.050509,0.105326,0.012410,0.104407,0.047787,0.065843,-0.146433
1/7/2020,0.119573,-0.218137,0.204674,0.041257,0.045931,-0.148383,-0.115974,-0.059065,-0.031160,0.122549,0.062943,0.026757,0.072145,-0.046004,0.010053,-0.084378,0.044069
1/8/2020,-0.140452,-0.273393,-0.162012,0.090743,0.302822,0.418526,0.149341,-0.074274,-0.015618,-0.261079,-0.127442,-0.119829,-0.092287,-0.022998,0.025000,0.161394,-0.055865
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6/14/2022,11.095037,1.815989,0.363791,-1.832029,-3.661411,-2.660291,-1.654216,-0.704202,0.177112,0.796345,0.338135,1.925747,2.727390,1.431857,0.199658,-1.377298,-0.229426
6/15/2022,-4.448064,-2.599112,-2.504988,1.165828,3.385721,2.440635,1.714766,0.756758,-0.400641,-0.734158,1.421858,-1.016083,-1.741284,-2.127673,-2.309526,-1.144872,3.375884
6/16/2022,2.545958,0.590702,-0.513276,-0.449414,0.322798,0.451990,-0.213462,-0.194618,-0.312560,-0.866167,-0.824816,-0.110352,0.057039,0.894930,1.332588,1.422202,-1.932902
6/17/2022,-3.151235,-1.051623,1.751567,0.844408,-0.419697,-0.054762,0.423809,-0.129885,-0.275942,0.006368,-0.288167,-0.252264,0.135237,0.517074,0.092717,-0.575968,0.070004


**Plotting the PCA residuals (for the last data point)**



In [14]:
last_index = df_residuals.index[-1]
fig_bar = px.bar(df_residuals.T[last_index],
                 width=900,
                 height=700,
                 title="PCA Residuals on "+str(last_index),
                 labels={"value":"Residual (in bps)","index":"Tenors"},
                 template="plotly_dark")
fig_bar.show()

**Plotting a series to observe the mean reverting behaviour of the PCA residuals**

In [15]:
fig_residual_series = px.line(df_residuals,
                              width=1000,
                              height=700,
                              title="PCA Residuals on "+str(last_index),
                              labels={"value":"Residual (in bps)","index":"Tenors"},
                              template="plotly_dark")

fig_residual_series.add_hline(y=0, line_width=2, line_dash="dash", line_color="red")