# PCA Decomposition

Principal Components aims to reduce the dimensionality of a dataset; by finding the least amount of variables that explain the largest proportion of the data. It does this by transforming the data from a correlation matrix (more commonly used on financial data than a covariance matrix), onto a subspace with less dimensions, where all explanatory variables are orthogonal (perpendicular) to each other, i.e there is no multicollinearity. These are <b>statistical properties</b> and do not necesarrily have an economic interpretation.

For this analysis I will use the 10-year-yield for the US and UK to see if we can fiind an estimate for the long-term interest rate/term premium. It is commonly know that:

<b>PC1:</b> constant ~ long term interest rate ~ R*

<b>PC2:</b> slope ~ term premia

<b>PC3:</b> curvature

To read about this in more detail, read the PCA section of <i>"Market Risk Analysis II, Practical Financial Econometrics- Carol Alexander"</i>.

There is also a good paper here which uses PCA and macroeconomic variables: https://pdfs.semanticscholar.org/8736/5855217edbc53e5e29c5c5872db7efb907cc.pdf

In this code I perform PCA manually, but there is a module in scikit-learn which speeds up the process:

https://github.com/scikit-learn/scikit-learn/blob/b194674c4/sklearn/decomposition/_pca.py#L104

A bit more detail on how to do PCA in Python: http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html

# 1. Import and clean data

In [9]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

df = pd.read_excel("GLC Nominal month end data_1970 to 2015.xlsx", 
                   index_col=0, header=3, dtypes = "float64", sheet_name="4. spot curve", skiprows=[4])

df = df.iloc[:,0:20]

In [11]:
df = df.dropna(how="any")

In [30]:
# Standardise the data in the df into z scores
df_std = ((df-df.mean()) / df.std())

# Run a correlation 

cov_matrix_array = np.array(np.cov(df_std, rowvar=False))
cov_matrix_array

array([[1.        , 0.99635313, 0.99034992, 0.98390992, 0.97768224,
        0.97186217, 0.96646558, 0.96142656, 0.95665024, 0.95204292,
        0.94752707, 0.94304904, 0.93857538, 0.93408433, 0.92956163,
        0.92499726, 0.92038309, 0.91571145, 0.91097466, 0.90616492],
       [0.99635313, 1.        , 0.99819358, 0.99437946, 0.98991187,
        0.98535628, 0.98092243, 0.97665237, 0.97251405, 0.96845206,
        0.96441336, 0.96035959, 0.95626709, 0.95212146, 0.94791438,
        0.94364076, 0.93929657, 0.93487753, 0.93037868, 0.9257943 ],
       [0.99034992, 0.99819358, 1.        , 0.99888678, 0.99642004,
        0.99338327, 0.99013468, 0.986815  , 0.98345916, 0.98005839,
        0.97659259, 0.97304627, 0.96941179, 0.96568676, 0.96187199,
        0.95796929, 0.95397987, 0.94990335, 0.9457376 , 0.94147881],
       [0.98390992, 0.99437946, 0.99888678, 1.        , 0.99928341,
        0.99762753, 0.99548459, 0.99306626, 0.99046001, 0.98769611,
        0.98478344, 0.98172757, 0.9785362 , 0

In [31]:
df_std

Unnamed: 0_level_0,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0,9.5,10.0
years:,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1970-07-31,0.046342,0.137478,0.114726,0.075453,0.040053,0.011046,-0.011945,-0.029681,-0.042951,-0.052381,-0.058423,-0.061422,-0.061661,-0.059374,-0.054744,-0.047923,-0.039040,-0.028211,-0.015542,-0.001131
1970-08-31,0.052648,0.089836,0.084056,0.064191,0.042049,0.021638,0.004160,-0.010129,-0.021270,-0.029342,-0.034449,-0.036721,-0.036306,-0.033345,-0.027955,-0.020241,-0.010302,0.001768,0.015877,0.031936
1970-09-30,0.070268,0.094800,0.076648,0.052209,0.028983,0.008886,-0.007734,-0.021033,-0.031266,-0.038647,-0.043350,-0.045530,-0.045331,-0.042874,-0.038262,-0.031586,-0.022931,-0.012381,-0.000020,0.014073
1970-10-31,0.107300,0.167124,0.168778,0.143970,0.112772,0.082847,0.056698,0.034964,0.017657,0.004651,-0.004281,-0.009432,-0.011122,-0.009623,-0.005151,0.002111,0.012003,0.024380,0.039108,0.056058
1971-01-31,0.066607,0.068937,0.061040,0.056791,0.057549,0.062095,0.069180,0.077821,0.087419,0.097681,0.108503,0.119895,0.131914,0.144612,0.158027,0.172179,0.187077,0.202720,0.219093,0.236177
1971-02-28,0.114194,0.117087,0.098703,0.077304,0.061957,0.053468,0.050838,0.052643,0.057596,0.064861,0.073969,0.084675,0.096848,0.110406,0.125291,0.141455,0.158860,0.177466,0.197238,0.218137
1971-03-31,0.023021,0.011294,-0.017238,-0.039340,-0.052139,-0.057141,-0.056268,-0.051307,-0.043585,-0.033918,-0.022783,-0.010452,0.002913,0.017198,0.032323,0.048218,0.064827,0.082095,0.099972,0.118407
1971-04-30,-0.185425,-0.096257,-0.094306,-0.106638,-0.116254,-0.120085,-0.118476,-0.112605,-0.103444,-0.091676,-0.077798,-0.062170,-0.045040,-0.026587,-0.006947,0.013767,0.035461,0.058050,0.081455,0.105603
1971-05-31,-0.245589,-0.276457,-0.280707,-0.253627,-0.215711,-0.175549,-0.136240,-0.098710,-0.063094,-0.029288,0.002948,0.033910,0.063847,0.092946,0.121348,0.149161,0.176462,0.203306,0.229730,0.255758
1971-08-31,-0.342786,-0.350320,-0.317584,-0.277175,-0.238317,-0.202887,-0.170712,-0.141339,-0.114345,-0.089342,-0.065867,-0.043461,-0.021779,-0.000568,0.020364,0.041158,0.061918,0.082718,0.103611,0.124632


## 2. Compute the eigenvalues & eigenvectors of the correlation matrix

In [33]:
eigenvalues, eigenvectors = np.linalg.eig(corr_matrix_array)

df_eigval = pd.DataFrame(eigenvalues, index=range(1,21))
#df_eigval.to_excel("df_eigval_qe.xlsx")
eigenvalues

array([1.96608424e+01, 3.09852494e-01, 2.48898530e-02, 3.49701416e-03,
       8.05608882e-04, 1.02750935e-04, 8.59903764e-06, 1.20913460e-06,
       9.34863268e-08, 1.48779978e-08, 2.83478469e-09, 5.90106953e-10,
       1.35020416e-10, 3.38837276e-11, 7.75822684e-12, 2.55909286e-12,
       1.11736391e-12, 2.91278373e-13, 6.58629317e-14, 1.03113920e-14])

In [18]:
eigenvectors

array([[ 2.16371103e-01, -4.76741954e-01,  5.68236802e-01,
         5.42866299e-01, -2.57846450e-01,  1.96657936e-01,
         5.29604754e-02, -1.76042848e-02,  5.40657796e-03,
         7.83010289e-04,  2.63640772e-04, -4.66589413e-05,
        -1.32071116e-05,  6.50374022e-06,  7.09774237e-07,
         2.46174479e-06,  3.84045065e-06, -1.44255157e-06,
         3.37470564e-07,  1.56078705e-07],
       [ 2.19679678e-01, -3.98920873e-01,  2.44808004e-01,
        -2.45550665e-01,  4.10235171e-01, -5.90760591e-01,
        -3.37693739e-01,  1.81819533e-01, -8.47814480e-02,
        -2.66989155e-02, -8.73960910e-03,  3.24289107e-03,
        -1.54749449e-03, -6.98409763e-04,  8.56113634e-05,
        -5.83807368e-06, -2.17344804e-05,  9.46260213e-06,
        -1.03367824e-05, -3.63023650e-06],
       [ 2.21913346e-01, -3.17187638e-01,  1.93177626e-02,
        -4.02909380e-01,  2.12709649e-01,  1.41278410e-01,
         4.73975974e-01, -4.74420782e-01,  3.63060417e-01,
         1.78736019e-01,  7.9

In [34]:
# Save output to Excel
df_eigvec = pd.DataFrame(eigenvectors, index=range(1,21))
#df_eigvec.to_excel("df_eigvec_qe.xlsx")
df_eigvec

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
1,0.216371,-0.476742,0.568237,0.542866,-0.257846,0.196658,0.05296,-0.017604,0.005407,0.000783,0.000264,-4.7e-05,-1.3e-05,7e-06,7.097742e-07,2e-06,4e-06,-1e-06,3.374706e-07,1.560787e-07
2,0.21968,-0.398921,0.244808,-0.245551,0.410235,-0.590761,-0.337694,0.18182,-0.084781,-0.026699,-0.00874,0.003243,-0.001547,-0.000698,8.561136e-05,-6e-06,-2.2e-05,9e-06,-1.033678e-05,-3.630236e-06
3,0.221913,-0.317188,0.019318,-0.402909,0.21271,0.141278,0.473976,-0.474421,0.36306,0.178736,0.079584,-0.043179,0.020995,0.009823,-0.003015089,-0.000483,8.4e-05,5.6e-05,6.697822e-05,1.888435e-05
4,0.223323,-0.245476,-0.124801,-0.329674,-0.040842,0.362993,0.173454,0.201121,-0.480982,-0.437404,-0.284676,0.201797,-0.113999,-0.059444,0.02538566,0.005959,-0.001269,-0.000851,-0.0002358033,-3.642105e-05
5,0.22419,-0.184527,-0.209822,-0.195068,-0.205588,0.262744,-0.184536,0.347153,-0.091775,0.284907,0.397047,-0.414292,0.317205,0.19018,-0.104842,-0.031374,0.009078,0.004065,0.000603157,1.114684e-06
6,0.224729,-0.131987,-0.253173,-0.061607,-0.270417,0.064608,-0.314403,0.106892,0.302599,0.29979,-0.009224,0.285547,-0.458454,-0.352053,0.2618037,0.097482,-0.035504,-0.010938,-0.001272284,0.0001458347
7,0.225064,-0.085706,-0.266431,0.048888,-0.25371,-0.111366,-0.236609,-0.169951,0.283891,-0.13641,-0.358252,0.248389,0.21452,0.36319,-0.4311498,-0.209797,0.098003,0.017406,0.002384972,-0.0002454151
8,0.225263,-0.044103,-0.257638,0.129874,-0.180892,-0.214763,-0.058736,-0.286935,0.011502,-0.345308,-0.085333,-0.337123,0.290482,-0.093107,0.4534175,0.33868,-0.213779,-0.0078,-0.005784809,-0.0007264844
9,0.22536,-0.006087,-0.232802,0.181239,-0.078139,-0.237196,0.115292,-0.219623,-0.223785,-0.152735,0.290895,-0.251711,-0.315887,-0.280061,-0.18386,-0.408907,0.368351,-0.04026,0.01720511,0.00497824
10,0.225372,0.029085,-0.196639,0.205777,0.031511,-0.194174,0.224316,-0.048902,-0.274033,0.150652,0.273517,0.248301,-0.22242,0.277351,-0.2572505,0.303448,-0.484698,0.140742,-0.04540556,-0.01534737


# 3. PCA projections 

In [37]:
principal_components = df_std.dot(eigenvectors)
principal_components

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
years:,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1970-07-31,-0.032838,-0.204918,0.067482,-0.146593,-0.004550,-0.021740,0.001970,0.000828,0.000339,0.000108,4.999538e-05,-1.068932e-06,-2.262346e-07,-5.346786e-06,1.984039e-06,1.980522e-06,7.075951e-08,-2.237006e-07,2.345608e-07,1.063328e-08
1970-08-31,0.031877,-0.126833,0.062923,-0.105320,-0.031663,-0.008182,0.003184,0.001163,0.000124,-0.000057,1.648789e-05,-1.300173e-06,3.757927e-06,-3.429919e-06,-2.017032e-07,2.316713e-06,-5.062569e-08,-1.516163e-07,2.468590e-07,-1.057459e-08
1970-09-30,-0.009330,-0.150376,0.078643,-0.089315,-0.024882,-0.010703,0.002030,0.001370,0.000030,-0.000024,1.691636e-05,-3.531173e-06,7.747969e-06,-1.953565e-06,-7.394436e-07,1.256405e-06,-1.558537e-07,1.029709e-07,1.467056e-07,-3.054371e-08
1970-10-31,0.219963,-0.213589,0.048486,-0.144802,-0.037515,-0.004840,0.004415,0.001125,0.000176,-0.000110,-7.678133e-06,-1.353022e-05,8.052352e-06,-6.096845e-06,-4.117846e-07,2.860233e-06,-7.672365e-07,5.703405e-08,1.908246e-07,-3.782961e-08
1971-01-31,0.533525,0.219927,0.126723,-0.031135,-0.011024,-0.010590,0.002907,0.000231,-0.000051,-0.000002,1.162118e-05,9.638335e-06,8.792853e-06,4.268889e-06,-3.541674e-07,3.571511e-07,-1.368736e-07,6.247513e-08,5.889270e-08,-4.796363e-09
1971-02-28,0.476124,0.112777,0.175435,-0.068082,-0.014956,-0.009897,0.005793,0.000288,-0.000117,0.000013,6.861846e-05,-1.184830e-05,9.168277e-06,-4.851632e-07,2.405449e-06,1.351500e-06,-5.129641e-08,5.728110e-08,4.922128e-08,-2.509528e-08
1971-03-31,0.025140,0.135552,0.188131,-0.044933,-0.006552,-0.011007,0.003295,0.000197,-0.000210,-0.000028,1.977240e-05,-1.681622e-05,1.223629e-05,4.605001e-06,-4.226188e-07,2.092099e-07,-3.058932e-07,5.767237e-08,-1.480317e-08,-4.241470e-09
1971-04-30,-0.238664,0.289913,0.130225,-0.121154,0.001858,-0.017735,0.003236,0.000236,-0.000062,-0.000020,-3.284754e-05,1.996295e-08,2.560159e-05,8.524029e-06,-2.810794e-06,-1.175072e-06,3.788875e-07,1.431159e-07,8.745018e-08,6.570625e-09
1971-05-31,-0.097244,0.760261,0.205825,0.041957,-0.005567,-0.015866,-0.002540,0.000802,-0.000146,0.000052,1.493069e-05,8.739347e-06,-1.334174e-05,1.966798e-06,-3.858164e-06,-3.192635e-07,-9.848951e-07,3.200821e-07,-9.966749e-08,-1.428339e-08
1971-08-31,-0.431452,0.661414,0.085004,0.024345,-0.018765,-0.002748,0.001563,0.000376,-0.000208,0.000034,8.775150e-05,-1.117268e-05,-1.237506e-05,3.185215e-06,-6.575147e-06,-7.889269e-07,-4.476061e-07,7.742039e-07,-1.256623e-07,-7.759294e-08
