# Statistical Factor Modeling

### Orthogonal Factor Model with $m$ Common Factors
$$X = \mu + L F + \epsilon$$

* $\mu_i$ = mean of variable i
* $\epsilon_i$ = ith specific factor
* $F_j$ = jth common factor
* $l_{ij}$ = loading of the ith variable on the jth factor

The unobservable random vectors $F$ and $\epsilon$ satisfy the following conditions:

$F$ and $\epsilon$ are independent

$E(F)=0,Cov(F)=I$

$E(\epsilon)=0,Cov(\epsilon)=\Psi$, where $\Psi$ is a diagonal matrix

### Covariance Structure for the Orthogonal Factor Model

$$Cov(X) = L L' + \Psi$$

$$Cov(X,F)=L$$

* $L$ is $p$ variables by $m$ factors matrix of factor loadings (`model.components_.T`)
* $\Psi$ is $p$ variables by $p$ variables diagonal matrix of specific variance (`model.noise_variance_`)

Reference: pp. 516-517 of [Applied Multivariate Statistical Analysis](https://www.amazon.com/Applied-Multivariate-Statistical-Analysis-6th/dp/0131877151)

### 1) Toy Model

In [88]:
from sklearn.decomposition import FactorAnalysis # Primary package
from sklearn.datasets import load_iris
from sklearn import preprocessing
import numpy as np
import pandas as pd

In [89]:
X = load_iris().data # n obervations by p variables matrix

In [95]:
# Run the factor analysis with m = three factors
model = FactorAnalysis(n_components=3, random_state=101).fit(X)
psi = np.diag(model.noise_variance_)
L = model.components_.T

In [96]:
# Factor loadings (p variables by m factors)
L

array([[ 0.7290119 ,  0.31206317, -0.12619719],
       [-0.1685682 ,  0.32342127,  0.13617037],
       [ 1.74845079, -0.0530877 , -0.05532421],
       [ 0.73954639, -0.030647  ,  0.12700972]])

In [97]:
# Specific variance (p variables by p variables)
psi

array([[ 0.03649593,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.03532662,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.02959403,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.01458716]])

In [98]:
L @ L.T + psi

array([[ 0.68126343, -0.03914468,  1.26505648,  0.51354605],
       [-0.03914468,  0.18688555, -0.31943642, -0.11728094],
       [ 1.26505648, -0.31943642,  3.09255326,  1.28766073],
       [ 0.51354605, -0.11728094,  1.28766073,  0.57858673]])

In [99]:
np.cov(X, rowvar = False) # Not exactly the same but very close

array([[ 0.68569351, -0.03926846,  1.27368233,  0.5169038 ],
       [-0.03926846,  0.18800403, -0.32171275, -0.11798121],
       [ 1.27368233, -0.32171275,  3.11317942,  1.29638747],
       [ 0.5169038 , -0.11798121,  1.29638747,  0.58241432]])

### 2) Example with the stock data

In [112]:
# Data from Yahoo! finance
monthly = pd.read_csv('../../../data/monthly.csv', parse_dates = True, index_col = 0)
monthly.head()

Unnamed: 0_level_0,AAN,ABM,ABT,ABX,ADM,ADX,AEG,AEM,AEP,AET,...,WRB,WSM,WSO,WTR,WWW,WY,XEL,XOM,XRX,Y
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1986-02-28,0.059698,-0.011299,0.027273,-0.086212,0.013575,0.042254,-0.032053,-0.081633,0.039216,0.100671,...,0.192309,0.289474,0.078435,0.035294,-0.100006,0.122606,0.097222,0.009662,0.099222,0.087267
1986-03-31,-0.014078,0.12,0.221239,0.075472,0.0625,0.047298,0.013246,0.022222,0.04717,0.054878,...,0.168008,0.071429,-0.036365,-0.034091,0.099995,0.051195,0.021097,0.066986,-0.049557,0.053947
1986-04-30,0.071426,0.096939,0.0,0.385978,-0.109244,0.012903,0.081699,-0.065217,-0.067568,-0.065511,...,-0.128883,-0.12381,0.320756,-0.035294,-0.060603,-0.045455,0.024793,0.015695,-0.106145,-0.03995
1986-05-30,-0.079999,-0.046512,0.072464,0.0,0.070755,0.031847,0.018128,-0.085271,0.0,0.049485,...,0.211362,0.043478,0.057141,0.018293,-0.021504,0.010204,0.020161,0.057395,0.020833,-0.009103
1986-06-30,0.072462,0.0,-0.418919,0.03797,-0.356828,0.0,0.005935,0.059322,0.028986,-0.047151,...,-0.302073,-0.260417,-0.081081,0.005988,-0.054942,-0.060606,-0.44664,0.016701,-0.083673,0.032808


In [113]:
# Run the factor analysis with m = 10 factors
model = FactorAnalysis(n_components=10, random_state=101).fit(monthly)
psi = np.diag(model.noise_variance_)
L = model.components_.T

In [114]:
# Factor loadings (p variables by m factors)
print(L.shape)
L

(313, 10)


array([[-0.03354241,  0.01027346, -0.01223865, ..., -0.00844789,
         0.00137836, -0.00456638],
       [-0.03615607,  0.0054172 , -0.00892085, ..., -0.01207086,
        -0.00336242, -0.00692151],
       [-0.02423157, -0.01149042, -0.00140436, ...,  0.00111157,
         0.01369112, -0.00063638],
       ..., 
       [-0.02602295, -0.00726305,  0.02266885, ..., -0.00418812,
        -0.00128919, -0.00667049],
       [-0.06101191,  0.01631924, -0.01677813, ...,  0.0117468 ,
         0.0009385 , -0.00959025],
       [-0.02339781, -0.00208691, -0.00288141, ..., -0.00824166,
         0.00407986,  0.00187918]])

In [115]:
# Specific variance (p variables by p variables diagonal)
print(psi.shape)
psi

(313, 313)


array([[ 0.00971772,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.00636437,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.00496317, ...,  0.        ,
         0.        ,  0.        ],
       ..., 
       [ 0.        ,  0.        ,  0.        , ...,  0.00307304,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.00888547,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.00268977]])

In [116]:
L @ L.T + psi

array([[ 0.01130071,  0.00165033,  0.00057467, ...,  0.00054447,
         0.00262934,  0.00085635],
       [ 0.00165033,  0.00822287,  0.00047285, ...,  0.00074002,
         0.00276773,  0.00094543],
       [ 0.00057467,  0.00047285,  0.00651474, ...,  0.00073578,
         0.00114211,  0.00072767],
       ..., 
       [ 0.00054447,  0.00074002,  0.00073578, ...,  0.00442456,
         0.0010497 ,  0.00058156],
       [ 0.00262934,  0.00276773,  0.00114211, ...,  0.0010497 ,
         0.01451769,  0.00150219],
       [ 0.00085635,  0.00094543,  0.00072767, ...,  0.00058156,
         0.00150219,  0.00342499]])

In [117]:
np.cov(monthly, rowvar = False) # Not exactly the same as L @ L.T + psi but similar

array([[ 0.01138694,  0.00120613,  0.00020764, ...,  0.00045391,
         0.00301268,  0.00099745],
       [ 0.00120613,  0.0081774 ,  0.00063494, ...,  0.00067727,
         0.00215948,  0.00110317],
       [ 0.00020764,  0.00063494,  0.00655432, ...,  0.00042434,
         0.00117249,  0.00112455],
       ..., 
       [ 0.00045391,  0.00067727,  0.00042434, ...,  0.0043913 ,
         0.00141397,  0.00057171],
       [ 0.00301268,  0.00215948,  0.00117249, ...,  0.00141397,
         0.01452947,  0.00107497],
       [ 0.00099745,  0.00110317,  0.00112455, ...,  0.00057171,
         0.00107497,  0.00340748]])

In [118]:
# Portfolio factor exposures for equal weight portfolio (p factors by 1)
def get_exposures(factor_loading_matrix):
    n = factor_loading_matrix.shape[0]
    w_equal = np.ones(shape = (n, 1))/ n
    return(factor_loading_matrix.T @ w_equal)
get_exposures(L)

array([[-0.04440807],
       [ 0.0005459 ],
       [-0.00120203],
       [ 0.00166326],
       [-0.00043369],
       [ 0.00028459],
       [-0.00208653],
       [-0.00044961],
       [ 0.00066152],
       [-0.00024888]])