# Factor Covariance Modeling in Python

$$\Sigma = F \tilde{\Sigma} F^T + D$$

* $\Sigma = E(r-\hat{r})(r-\hat{r})^{T}$, as before

* $F \in R^{n x k}$ is the factor loading matrix with $n$ assets and $k$ factors (i.e. $F_{ij}$ is the loading of asset $i$ onto factor $j$.)

* $\tilde{\Sigma} \in R^{k x k}$ is the factor covariance matrix, where $\tilde{\Sigma} > 0$. **I'm having trouble with this term**

* $D \in R^{n x n}$ is a diagonal matrix containing the idiosyncratic risk (i.e., $D_{ii}$ is the variance in asset $i$ not captured by the factors)

This notation is from Stephen Boyd's [short course](http://web.stanford.edu/~boyd/papers/cvx_short_course.html). Specifically slides 13-14 of [this presentation](http://web.stanford.edu/~boyd/papers/pdf/cvx_applications.pdf).

In [14]:
# This is the main method used
from sklearn.decomposition import FactorAnalysis

* [Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html)
* [Tutorial](http://scikit-learn.org/stable/modules/decomposition.html#fa)

In [17]:
# Load other methods/packages
from sklearn.datasets import load_iris
from sklearn import preprocessing
import numpy as np
import pandas as pd

In [27]:
iris = load_iris()

# Covariance matrix (n x n)
Sigma = np.cov(preprocessing.scale(iris.data), rowvar = False) # Normalization
Sigma

array([[ 1.00671141, -0.11010327,  0.87760486,  0.82344326],
       [-0.11010327,  1.00671141, -0.42333835, -0.358937  ],
       [ 0.87760486, -0.42333835,  1.00671141,  0.96921855],
       [ 0.82344326, -0.358937  ,  0.96921855,  1.00671141]])

In [31]:
factor = FactorAnalysis(n_components=3, random_state=101).fit(Sigma)

# Factor loading matrix (n x k)
F = factor.components_.T
F

array([[-0.43161053,  0.10199265,  0.00558075],
       [ 0.57286184,  0.06457749, -0.01589004],
       [-0.59699114,  0.00681253,  0.00194196],
       [-0.56285738, -0.0197104 , -0.02251163]])

In [41]:
# Factor covariance matrix?

In [38]:
# Idiosyncratic risk (n_features by n_features)
D = np.diag(factor.noise_variance_)
D

array([[  1.00000000e-12,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00],
       [  0.00000000e+00,   1.00000000e-12,   0.00000000e+00,
          0.00000000e+00],
       [  0.00000000e+00,   0.00000000e+00,   1.00000000e-12,
          0.00000000e+00],
       [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          1.00000000e-12]])

In [42]:
# Factor covariance matrix (k x k)
factor.get_covariance()

array([[ 0.19672129, -0.24075545,  0.25837333,  0.24079922],
       [-0.24075545,  0.33259343, -0.34158436, -0.32335465],
       [ 0.25837333, -0.34158436,  0.3564486 ,  0.33584287],
       [ 0.24079922, -0.32335465,  0.33584287,  0.3177037 ]])

In [43]:
# Alternatively
np.matmul(factor.components_.T, factor.components_) + D

array([[ 0.19672129, -0.24075545,  0.25837333,  0.24079922],
       [-0.24075545,  0.33259343, -0.34158436, -0.32335465],
       [ 0.25837333, -0.34158436,  0.3564486 ,  0.33584287],
       [ 0.24079922, -0.32335465,  0.33584287,  0.3177037 ]])