# Factor Analysis

## Model Specification

$$X=AS+\epsilon,$$ 

where $X\in\mathbb{R}^p$, $A\in\mathbb{R}^{p\times q}$ is called factor loading, $S\in\mathbb{R}^q$ are the common source of variation and $\epsilon\in\mathbb{R}^p$ are idyosyncratic variables. Here $S$ and $\epsilon$ are modeled as standard Gaussian random variables, and the model is fit by maximum likelihood. It describes the covariance matrix structure

$$\Sigma=AA^{\top}+D_{\epsilon},$$

where $D_{\epsilon}=diag[Var(\epsilon_1),\dots, Var(\epsilon_p)]$.

Note that there is an identifiability issue of factor analysis. That is, the expression is equivalent between $A$ and $AR$ for any rotational matrix $R$.

### Variants and Generalizations

## Theoretical Properties

### Advantages

- Factor analysis provides a low-rank approximation to the covariance matrix.

### Disadvantages

### Relation to Other Models

Factor analysis has a strong relation to PCA.
- If the $Var(\epsilon)$ are all assumed to be equal, the leading $q$ components of the SVD identify the subspace determined by $A$
- Because of the uncorrelated disturbances $\epsilon$, factor analysis can be seen to be more about modeling the correlation structure of $X$ rather than the covariance structure, in that the form of $\Sigma=AA^{\top}+D_{\epsilon}$ still holds for the correlation matrix with different $A$ and $D$.
- When $D_{\epsilon}$ has all equal diagonal values, it is call 'probabilistic PCA' or `PPCA` in `sklearn`.

[Independent Factor Analysis](ICA.ipynb) or ICA also uses the latent variable approach. But rather than assuming uncorrelated and Gaussian $S$ as in factor analysis, ICA assumes non-Gaussian, independent variables $S$.

## Empirical Performance

### Advantages 

### Disadvantages

There is one [example](http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_fa_model_selection.html#sphx-glr-auto-examples-decomposition-plot-pca-vs-fa-model-selection-py) in sklearn document that shows if we fit a homogeneous-variance factor-analysis model to an actually heterogenous-variance data, it tends to overfit the number of components.

## Implementation Details and Practical Tricks

**Factor Analysis in `sklearn`**

FactorAnalysis performs a maximum likelihood estimate of the so-called loading matrix, the transformation of the latent variables to the observed ones, using expectation-maximization (EM).

The input and method interface of `sklearn.decomposition.FactorAnalysis` is quite similar to `sklearn.decomposition.PCA`. The notable difference is in two of its methods:

- `score(X[, y])`: Compute the average log-likelihood of the samples
- `score_samples(X)`: Compute the log-likelihood of each sample

This score can be used to determine `n_component`, by just taking the maximum or doing CV.



## Use Cases

## Results Interpretation, Metrics and Visualization

## References 
### Further Reading

## Misc.