https://scikit-learn.org/stable/modules/decomposition.html#factor-analysis

https://github.com/scikit-learn/scikit-learn/blob/main/doc/modules/decomposition.rst

2.5.5. Factor Analysis
===============

In unsupervised learning we only have a dataset $X = \{x_1, x_2, \dots, x_n\}$. How can this dataset be described mathematically? A very simple
`continuous latent variable` model for $X$ is

在无监督学习中，我们只有一个数据集$X = \{x_1, x_2, \dots, x_n\}$.如何用数学方法描述这个数据集？一个非常简单的`连续潜变量`模型$X$是

$$x_i = W h_i + \mu + \epsilon$$

The vector $h_i$ is called "latent" because it is unobserved. $\epsilon$ is considered a noise term distributed according to a Gaussian with mean 0 and covariance $\Psi$ (i.e. $\epsilon \sim \mathcal{N}(0, \Psi)$), $\mu$ is some arbitrary offset vector. Such a model is called "generative" as it describes how $x_i$ is generated from $h_i$. If we use all the $x_i$'s as columns to form a matrix $\mathbf{X}$ and all the $h_i$'s as columns of a matrix $\mathbf{H}$ then we can write (with suitably defined $\mathbf{M}$ and $\mathbf{E}$):

矢量$h_i$被称为“潜在的”，因为它是未被观察到的。$\epsilon$被认为是根据具有均值0和协方差$\Psi$ (i.e. $\epsilon \sim \mathcal{N}(0, \Psi)$)的高斯分布的噪声项，$\mu$是任意的偏移矢量。这样的模型被称为“生成”，因为它描述了$x_i$生成自$h_i$ .如果我们使用所有$x_i$ 's作为列以形成矩阵$\mathbf{X}$以及所有$h_i$ 's作为矩阵的列$\mathbf{H}$然后我们可以写（用适当定义$\mathbf{M}$和$\mathbf{E}$ ):

$$\mathbf{X} = W \mathbf{H} + \mathbf{M} + \mathbf{E}$$

In other words, we *decomposed* matrix $\mathbf{X}$.

If $h_i$ is given, the above equation automatically implies the following
probabilistic interpretation:

换句话说，我们分解了矩阵 $\mathbf{X}$.


如果$h_i$如果给定，则上述方程自动暗示以下概率解释：

$$p(x_i|h_i) = \mathcal{N}(Wh_i + \mu, \Psi)$$

For a complete probabilistic model we also need a prior distribution for the latent variable $h$. The most straightforward assumption (based on the nice properties of the Gaussian distribution) is $h \sim \mathcal{N}(0,\mathbf{I})$.  This yields a Gaussian as the marginal distribution of $x$:

对于一个完整的概率模型，我们还需要潜在变量的先验分布$h$ .最直接的假设（基于高斯分布的良好性质）是$h \sim \mathcal{N}(0,\mathbf{I})$ .这产生了高斯作为的边际分布$x$ :

$$p(x) = \mathcal{N}(\mu, WW^T + \Psi)$$

Now, without any further assumptions the idea of having a latent variable $h$ would be superfluous -- $x$ can be completely modelled with a mean
and a covariance. We need to impose some more specific structure on one of these two parameters. A simple additional assumption regards the
structure of the error covariance $\Psi$:

现在，在没有任何进一步假设的情况下，有一个潜在变量的想法$h$将是多余的 -- $x$可以用均值和协方差完全建模。我们需要在这两个参数中的一个参数上强加一些更具体的结构。一个简单的附加假设是关于误差协方差$\Psi$的结构：

* $\Psi = \sigma^2 \mathbf{I}$: This assumption leads to
  the probabilistic model of [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA).

* $\Psi = \mathrm{diag}(\psi_1, \psi_2, \dots, \psi_n)$: This model is called
  [FactorAnalysis](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html#sklearn.decomposition.FactorAnalysis), a classical statistical model. The matrix W is
  sometimes called the "factor loading matrix".
  
* $\Psi = \sigma^2 \mathbf{I}$: 这一假设导致了PCA的概率模型。

* $\Psi = \mathrm{diag}(\psi_1, \psi_2, \dots, \psi_n)$: 这个模型被称为FactorAnalysis，是一个经典的统计模型。矩阵W有时被称为“因子负载矩阵”。

Both models essentially estimate a Gaussian with a low-rank covariance matrix. Because both models are probabilistic they can be integrated in more complex models, e.g. Mixture of Factor Analysers. One gets very different models (e.g.
[FastICA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html#sklearn.decomposition.FastICA)) if non-Gaussian priors on the latent variables are assumed.

Factor analysis *can* produce similar components (the columns of its loading matrix) to [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA). However, one can not make any general statements
about these components (e.g. whether they are orthogonal):

这两个模型本质上都是用低秩协方差矩阵来估计高斯。因为这两个模型都是概率性的，所以它们可以集成到更复杂的模型中，例如混合因子分析器。如果假设潜在变量上存在非高斯先验，则会得到非常不同的模型（例如FastICA）。

因子分析可以产生与主成分分析相似的成分（其负载矩阵的列）。然而，人们不能对这些成分做出任何一般性的陈述（例如，它们是否正交）：

|||
|-|-|
|[![](./pic/sphx_glr_plot_faces_decomposition_002.png)](https://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decomposition.html)|[![](./pic/sphx_glr_plot_faces_decomposition_008.png)](https://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decomposition.html)|



https://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decomposition.html

The main advantage for Factor Analysis over [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA) is that
it can model the variance in every direction of the input space independently
(heteroscedastic noise):

因子分析相对于PCA的主要优势在于，它可以独立地对输入空间的每个方向上的方差进行建模（异方差噪声）：

[<img src="./pic/sphx_glr_plot_faces_decomposition_009.png" title="" alt="" width="300" style="display: block; margin: auto;">](https://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decomposition.html)



This allows better model selection than probabilistic PCA in the presence
of heteroscedastic noise:

在存在异方差噪声的情况下，这允许比概率PCA更好的模型选择：

[<img src="./pic/sphx_glr_plot_pca_vs_fa_model_selection_002.png" title="" alt="" width="600" style="display: block; margin: auto;">](https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_fa_model_selection.html)



Factor Analysis is often followed by a rotation of the factors (with the parameter `rotation`), usually to improve interpretability. For example,
Varimax rotation maximizes the sum of the variances of the squared loadings, i.e., it tends to produce sparser factors, which are influenced by only a few features each (the "simple structure"). See e.g., the first example below.

因子分析之后通常是因子的旋转（带参数旋转），通常是为了提高可解释性。例如，Varimax旋转使平方载荷的方差之和最大化，即它倾向于产生更稀疏的因子，每个因子只受几个特征（“简单结构”）的影响。例如，请参见下面的第一个示例。

**Examples:**


- [Factor Analysis (with rotation) to visualize patterns](https://scikit-learn.org/stable/auto_examples/decomposition/plot_varimax_fa.html#sphx-glr-auto-examples-decomposition-plot-varimax-fa-py)
- [Model selection with Probabilistic PCA and Factor Analysis (FA)](https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_fa_model_selection.html#sphx-glr-auto-examples-decomposition-plot-pca-vs-fa-model-selection-py)