# 3. PCR Methodology
--- 

Let $I = {1, \dots p}$ be an index set, $x_i \in \mathbb{M}_{n \times 1}$ $\forall i \in I$ be a random vector, such that its entries are independent and follow the same distribution $X_i$, with finite first and second. Note that it would be sufficient to assume that the expected values and variances are equal across entries in $x_i$. However, for ease of notation I decided to stick to the case of equal distributions. The variance covariance matrix (VCV) of this random variables is denoted by $\pmb C_X$. The collection of the $x_i$'s vectors is defined by a matrix 

\begin{align}
\pmb X = \begin{pmatrix} x_1 & x_2 & \dots & x_p \end{pmatrix} \in \mathbb{M}_{n \times p}.
\end{align}

It is assumed that the random vector $x_i$ is already demeaned by the mean of the random variable $X_i$. Even though it might seem on the first glimpse, this assumption is not very restrictive. It is shown in the appendix *(A.1)* that every matrix for which each entries in the respective column vectors have the same distributions, can be transformed to a matrix such that the expected value of the column vectors is zero. For ease of notation it is therefore assumed that $\text{E}(X_i) = 0$ $\forall i \in I$. 

## 3.1 Principal Component Analysis (PCA)
---

### 3.1.1 Aim of PCA
---

The aim of the PCA is to build $M$ new variables $z_1, z_2, \dots, z_M$ as orthogonal linear combinations of $x_1, x_2, \dots, x_p$. Note that $M \leq p$ otherwise the $z_m$'s cannot be orthogonal to each other. Denoting the scalars that are used to build $z_m$ by $\phi_m$, one can express $z_m$ as 

\begin{align}
z_m = \pmb X \cdot \phi_m,
\tag{3.1}
\end{align}

whereby $z_m$ is the $m$-th principal component. By defining $\pmb \phi = \begin{pmatrix} \phi_1 & \phi_2 & \dots & \phi_M \end{pmatrix}$ it is possible to shorten the above notation. This will later be useful to compute the values for all $z_m$ in one equation.

\begin{align}
\pmb Z = \begin{pmatrix} z_1 & z_2 & \dots & z_M \end{pmatrix} = \begin{pmatrix} \pmb X \cdot \phi_1 & \pmb X \cdot \phi_2 & \dots & \pmb X \cdot \phi_M \end{pmatrix} = \pmb X \pmb \phi.
\tag{3.2}
\end{align}

Furthermore, it will turn out that $\pmb \phi$ is the matrix containing all eigenvectors with length one of $\pmb C_X$ as a column. Each column represents an independent eigenvector from an eigensapce. However, this definition implies that there are $2^M$ different $\pmb \phi$ yielding $2^M$ different $\pmb Z$ matrices. 

First, I follow many textbooks and take $\pmb \phi$ as deterministic. However, in practice it is actually a matrix to estimate and therefore adds additional randomness, turning the $\pmb Z$ matrix stochastic. The empirical case with unknown true distributions and correlations is discussed in 3.1.6.

### 3.1.2 Derive the PC theoretically
---

In the next step it is discussed how the $\phi_m$ are chosen in order to ensure uncorrelated $Z_m$'s and such that $\text{Var}(Z_m) \geq \text{Var}(Z_{m+1}) \forall m \in I_{-p}$. The first coordinates $\phi_1$ must be chosen in order to maximize the variance of the random variable $Z_1$. From the first line of equation *(3.9)* it follows that $\text{Var}(Z_1) = \phi'_1 \pmb C_X  \phi_1$ and hence

\begin{align}
    \phi_1 = \arg \max_{||w|| = 1} \text{Var}(Z_1) = \phi'_1 \pmb C_X  \phi_1
    \tag{3.3}
\end{align}

This problem can be solved using the lagrangian with the constraint $w' w = 1$ 
\begin{align}
    \mathscr{L}(w,\lambda) =& w' \pmb C_X w - \lambda \left( w'w-1 \right) \notag \\
    \frac{\partial \mathscr{L}}{\partial \lambda} =& w'w -1 = 0 \notag \\
    \frac{\partial \mathscr{L}}{\partial w} =& 2 \pmb C_X w - 2\lambda w = 0
    \tag{3.4}
\end{align}

From equation *(3.4)* it follows that $\left(\pmb C_X \right) w = \lambda w$. Hence, the solution vector w is an eigenvector of $\pmb C_X$ with eigenvalue $\lambda$. Since $\pmb C_X$ is a positive definit $p \times p$ matrix, it has $p$ real-valued eigenvalues $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_p > 0$ with corresponding eigenvectors of length one $v_1, v_2, \dots, v_p$. Note that the eigenvectors are not unique. I denote the corresponding eigenspaces of the eigenvalues by $S_1, \dots, S_p$. Note that if $\lambda_i = \lambda_j$ it follows that $S_i = S_j$. From linear algebra it is known that there exist for each eigenspce $S_i$ a basis $B_i$ such that the basis consists of eigenvectors of $\lambda_i$ with length one. In particular, if the eigenvalues are all distinct, which is the case in most applications and thus I will focus on this case, there are two possible choices for $B_i$. Since the eigenvector multiplied by minus one has the same length and direction the basis is either $B_i = \left\{v_i \right\}$ or $B_i = \left\{ - v_i \right\}$. There is no clear theory which basis, and therefore eigenvector, should be chosen. However, it is shown within the next step that the result is qualitatively, in a sense that the order of the eigenvalues stays the same, invariant to the choice of the basis.

Armed with the eigenvectors and eigenvalues, the question arises which eigenvalue to use to maximize the variance. Since in the last step the inner product of the eigenvectors of the largest eigenvalue is computed and this is one for both possible bases of the eigenvalue, it follows, as indicated above, that the results stay qualitatively the same. 

\begin{align}
    \arg \max_{\lambda \in \{\lambda_1, \dots, \lambda_p \}} w' \pmb C_X w \stackrel{*(1.4)*}{=}  \arg \max_{\lambda \in \{\lambda_1, \dots, \lambda_p \}} w' \lambda w = \arg \max_{\lambda \in \{\lambda_1, \dots, \lambda_p \}} \underbrace{w' w}_{1} \lambda = \arg \max_{\lambda \in \{\lambda_1, \dots, \lambda_p \}} \lambda
    \tag{3.5}
\end{align}

Hence, the maximum variance is captured by choosing the largest eigenvalue, which is by definition, $\lambda_1$. One of the corresponding eigenvectors $v_1$ or $- v_1$, is then chosen to be $\phi_1$. For notational purposes I will stick to $v_1$. Thus, it is enough to compute the largest eigenvalue of the VCV matrix of all the $X_i$ to compute $\phi_1$ and with that $z_1$. Armed with $\phi_1$, it is possible to compute $\phi_2, \phi_3, \dots, \phi_M$ iteratively by defining new random variables 

\begin{align}
\begin{pmatrix} X_1^{(m)} \\ X_2^{(m)} \\ \vdots \\ X_p^{(m)} \end{pmatrix}= \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_p \end{pmatrix} - \sum_{j = 1}^{m-1} Z_j \phi_j
\tag{3.6}
\end{align}

and solving the former problem for $\text{Var}(Z_m)$ using the random variables $\begin{pmatrix} X_1^{(m)} & X_2^{(m)} & \dots & X_p^{(m)} \end{pmatrix}'$ to get $\phi_m$. It turns out that $\pmb \phi$ equals the matrix $\pmb V = \begin{pmatrix} v_1 & v_2 & \dots & v_M \end{pmatrix} $, whereby $v_i$ is the eigenvector with length one of $\pmb C_X$ to the corresponding $i$-th highest eigenvalue $\lambda_i$. Hence, we can compute $\pmb \phi$ by only computing the eigenvectors of $\pmb C_X$ and sort them in decending order by the corresponding eigenvalues. Note that therefore $\pmb \phi$ is an orthonormal matrix. Subsequent, $\pmb Z$ is obtained by matrix multiplication of $\pmb X$ and $\pmb \phi$ as in formula *(3.2)*. In theory, there is thus a non-stochastic true $\pmb Z$ matrix of principal components, which is dependent on the choice of $\pmb \phi$. Unfortunately, there are two choices for each eigenvector and therefore $2^M$ choices for $\pmb \phi$. For ease of notation I will denote the whole class of solutions to the specific PCA problem as $\left[ \phi \right]$ and any representant of this class by $\pmb \phi$. Note that is trivial to show that this class is in fact an equivalence class. I will also examine at any step if the choice of the matrix affects certain properties.  