# Factor Covariance Modeling in Python

$$\Sigma = F \tilde{\Sigma} F^T + D$$

1. $\Sigma = E(r-\hat{r})(r-\hat{r})^{T}$, as before

2. $F \in R^{n x k}$ is the factor loading matrix with $n$ assets and $k$ factors (i.e. $F_{ij}$ is the loading of asset $i$ onto factor $j$.)

3. $\tilde{\Sigma} \in R^{k x k}$ is the factor covariance matrix, where $\tilde{\Sigma} > 0$. **I'm having trouble with this term**

4. $D \in R^{n x n}$ is a diagonal matrix containing the idiosyncratic risk (i.e., $D_{ii}$ is the variance in asset $i$ not captured by the factors)

This notation is from Stephen Boyd's [short course](http://web.stanford.edu/~boyd/papers/cvx_short_course.html). Specifically slides 13-14 of [this presentation](http://web.stanford.edu/~boyd/papers/pdf/cvx_applications.pdf).

In [46]:
# This is the main method used
from sklearn.decomposition import FactorAnalysis

* [Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html)
* [Tutorial](http://scikit-learn.org/stable/modules/decomposition.html#fa)

In [47]:
# Load other methods/packages
from sklearn.datasets import load_iris
from sklearn import preprocessing
import numpy as np
import pandas as pd
iris = load_iris()

## 1. Covariance Matrix $\Sigma \in R^{n x n}$

In this example $n = 4$ and $k = 3$

In [48]:
Sigma = np.cov(preprocessing.scale(iris.data), rowvar = False) # Normalization
Sigma

array([[ 1.00671141, -0.11010327,  0.87760486,  0.82344326],
       [-0.11010327,  1.00671141, -0.42333835, -0.358937  ],
       [ 0.87760486, -0.42333835,  1.00671141,  0.96921855],
       [ 0.82344326, -0.358937  ,  0.96921855,  1.00671141]])

In [49]:
# Run the factor analysis with k = 3
factor = FactorAnalysis(n_components=3, random_state=101).fit(Sigma)

## 2.  Factor Loading Matrix $F \in R^{n x k}$

In [50]:
# Factor loading matrix (n x k)
F = np.asmatrix(factor.components_.T)
F

matrix([[-0.43161053,  0.10199265,  0.00558075],
        [ 0.57286184,  0.06457749, -0.01589004],
        [-0.59699114,  0.00681253,  0.00194196],
        [-0.56285738, -0.0197104 , -0.02251163]])

## 3. Factor covariance matrix $\tilde{\Sigma} \in R^{k x k}$?

In [51]:
# How to get the factor covariance matrix?
# Sigma_tilde = ?

## 4. Idiosyncratic Risk  $D \in R^{n x n}$

In [52]:
D = np.asmatrix(np.diag(factor.noise_variance_))
D

matrix([[  1.00000000e-12,   0.00000000e+00,   0.00000000e+00,
           0.00000000e+00],
        [  0.00000000e+00,   1.00000000e-12,   0.00000000e+00,
           0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   1.00000000e-12,
           0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
           1.00000000e-12]])