--- 
Project for the course in Computational Statistics | Summer 2020, M.Sc. Economics, Bonn University | [Manuel Huth](https://github.com/manuhuth)

# Variance Increase in PCR <a class="tocSkip">   
---

The following notebook contains ... (one setence)

#### Downloading and viewing this notebook

* The ensure that every image or format is displayed properly, I recommend to download this notebook from its repository on [GitHub](https://github.com/manuhuth/PCR-Parameter-Variance-Analysis). Other viewing options like _MyBinder_ or _NBViewer_ might have issues to display formulas and formatting.


#### Information about the Set up
* 2-3 bullet points


---
<h1>Table of Contents<span class="tocSkip"></span></h1>

---

In [None]:
#read R-Files

---
# 1. Introduction
---

# 2. PCR Methodology
--- 

## 2.1 General Definitions
Let $I = {1, \dots p}$ be an index set, $x_i \in \mathbb{M}_{1 \times n}$ $\forall i \in I$ be a random vector, such that its entries are independent and follow the same distribution $X_i$, and the collection of these vectors is defined by 

\begin{align}
\pmb X = \begin{pmatrix} x_1 & x_2 & \dots & x_p \end{pmatrix} \in \mathbb{M}_{n \times p}.
\end{align}

It is assumed that the random vector $x_i$ is already demeaned by the mean of the random variable $X_i$. Even though it might seem on the first glimpse, this assumption is not very restrictive.

**Every such matrix $\pmb X$ can be transformed to a matrix with random variables of mean zero**

Since all entries $x_{ji}$ for all $j = 1, \dots, n$ from a random vector $x_i$, are $i.i.d$ with $\text{E}\left(x_{ji} \right) = \mu_i$, 

\begin{align}
    \overline{x}_i = \underbrace{\frac{1}{n} \text{E}\left(\sum_{j = 1}^n x_{ji} \right)}_{\text{unbiased estimator}} = \frac{1}{n} \sum_{j = 1}^n \text{E}\left(x_{ji}\right) = \frac{1}{n} \sum_{j = 1}^n \mu_i = \mu_i
\end{align}

is an unbiased estimator such that
\begin{align}
\text{E}\left(x_{ij} - \overline{x}_i \right) = \text{E}\left(x_{ij} \right) - \text{E}\left(\overline{x}_i \right) = \mu_i -\mu_i = 0 \text{ for all } j = 1, \dots, n \text{ and } i \in I. 
\tag{1.1}
\end{align} 

Therefore, every matrix $\pmb X$ can be transformed into the desired form with zero means, independent of the magnitude of the $\mu_i$'s. For ease of notation it is therefore assumed that $\text{E}(X_i) = 0$ $\forall i \in I$. 


**The matrix product $\frac{1}{n-1}\pmb X' \pmb X$ is an unbiased estimator of the variance-covariance (VCV) matrix of the X_i's**

Since the mean of the random variables $X_i$ is zero, only one matrix multiplication is needed in order to estimate the VCV matrix of the $X_i$'s.

\begin{align}
   \text{E}\left(\frac{1}{n-1}\pmb X' \pmb X \right) = \frac{1}{n-1} \text{E}\left(\begin{pmatrix} x'_1 \\ \vdots \\ x'_p \end{pmatrix} \begin{pmatrix} x_1 & \dots & x_p \end{pmatrix}\right) = 
   \frac{1}{n-1}\text{E}\begin{pmatrix} 
   x'_1 x_1 & \dots & x'_1 x_p \\
   x'_2 x_1 & \dots & x'_2 x_p \\
   \vdots & \vdots & \vdots \\
   x'_p x_1 & \dots & x'_p x_p \\  
   \end{pmatrix} =
   \begin{pmatrix} \text{var}(X_1)  & \dots & \text{cov}(X_1, X_p) \\
        \text{cov}(X_2, X_1)  & \dots & \text{cov}(X_2, X_p) \\
        \vdots &  \vdots & \vdots \\
        \text{cov}(X_p, X_1) & \dots & \text{var}(X_p) 
    \end{pmatrix}
    \tag{1.2}
\end{align}

**Multiplying a square matrix by a positive scalar does not change its eigenvectors and the order of its eigenvalues**

Let $\pmb A \in \mathbb{M}_{p \times p}, \lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n > 0$ its ordered eigenvalues and $v_1, v_2 \dots v_n$ the eigenvectors corresponding to the repsective $\lambda_i$. Let furthermore be $a \in \mathbb{R}_{++}, a \pmb A = \pmb A'$ and $a \lambda_i = \lambda'_i$. From the properties of eigenvectors and eigenvalues it follows for all $i \in I$
\begin{align}
    \pmb A v_i = \lambda_i v_i \Longleftrightarrow \pmb aA v_i = a \lambda_i  \Longleftrightarrow \pmb A' v_i = \lambda'_i v_i
    \tag{1.3}
\end{align}
 
Since $a >0$ the order of the different $\lambda'_i$ is still $\lambda'_1 \geq \lambda'_2 \geq \dots \geq \lambda'_n > 0$. A side note for latter purposes is that the quotient of an eigenvalue with the sum of all the eigenvalues is also invariant to the multiplication with a positive scalar

\begin{align}
    \frac{\lambda'_i}{\sum_{j = 1}^p \lambda'_j} = \frac{a \lambda_i}{\sum_{j = 1}^p a \lambda_j} = \frac{a \lambda_i}{a \sum_{j = 1}^p \lambda_j} = \frac{\lambda_i}{\sum_{j = 1}^p \lambda_j}
    \tag{1.4}
\end{align}



## 2.2 Principal Component Analysis (PCA)

**Aim of PCA**

The aim of the PCA is to build $M$ new variables $z_1, z_2, \dots, z_M$ as orthogonal linear combinations of $x_1, x_2, \dots, x_p$. Note that $M \leq p$ otherwise the $z_m$'s cannot be orthogonal to each other. Denoting the scalars that are used to build $z_m$ by $\phi_m$, one can express $z_m$ as 

\begin{align}
z_m = \pmb X \cdot \phi_m,
\end{align}

whereby $z_m$ is the $m$-th principal component.By defining $\pmb \phi = \begin{pmatrix} \phi_1 & \phi_2 & \dots & \phi_M \end{pmatrix}$ it is possible to shorten the above notation. This will later be useful to compute the values for all $Z_m$ in one equation.

\begin{align}
\pmb Z = \begin{pmatrix} Z_1 & Z_2 & \dots & Z_M \end{pmatrix} = \begin{pmatrix} \pmb X \cdot \phi_1 & \pmb X \cdot \phi_2 & \dots & \pmb X \cdot \phi_M \end{pmatrix} = \pmb X \phi.
\tag{1.5}
\end{align}

We follow many textbooks and take $\pmb \phi$ as deterministic. However, in practice it is actually a matrix to estimate and therefore adds additional randomness increasing the variance of the estimated $\pmb Z$ matrix. 

**Distribution of the $Z_m$ variables**

Note that all $n$ random variables in $z_{m}$ come from the same distribution $Z_m$, since they are alle the same linear combinations of random vectors $x_i$ with random variables $X_i$, and the expected value of the vector $z_m$ can be computed as

$$\text{E}(z_m) = \text{E}(\pmb X \cdot \phi_m) = \text{E} \left(\sum_{j=1}^p x_j \phi_{jm} \right) = \sum_{j=1}^p \underbrace{E(x_j)}_{= \pmb 0} \phi_{jm} = \pmb 0.$$

Making use of the zero mean and applying formula *(1.2)*, the VCV estimated of the $Z_m$'s can be expressed as $\frac{1}{n-1} Z' Z$.

**Derive the PC**

In the next step the values in $\pmb \phi$ are computed recursively. First, the vector $\phi_1$ is computed. Since the goal is to build uncorrelated variables that carry the maximum variance of $\pmb X$, $\phi_1$ is constructed such that $Z_1$ has the maxmimum possible variance. Since the variance could be arbitrarily high othwerise, one does only consider $\phi_1$ with $||\phi_1|| = 1$ (can be derived by properties of the *Rayleigh Quotient*). Hence, the maximization problem at hand is

\begin{align}
\phi_1 = \arg \max_{||w|| = 1} \left( \frac{1}{n-1} z'_1 z_1 \right)= \arg \max_{||w|| = 1} \left( z'_1 z_1 \right)= \arg \max_{||w|| = 1} \left( (\pmb X w)'(\pmb X w) \right) = \arg \max_{||w|| = 1} \underbrace{\left( \frac{w' \pmb X' \pmb X w}{w'w} \right)}_{\text{Rayleigh Quotient}}
\tag{1.6}
\end{align}

It can be shown that the solution to the Rayleigh Quotient problem is the eigenvector that corresponds to the largest eigenvalue of $X'X$, which is equivalently to the eigenvector that corresponds to the largest eigenvalue of $\frac{1}{n-1}X'X$.As shown in the previos subsection, the latter equals the estimated covariance matrix of the $X_i$ variables. From equation *(1.4)* it follows that $\frac{1}{n-1} \pmb X' \pmb X$ and $\pmb X' \pmb X$ have the same eigenvectors but different scaled eigenvalues. Hence, the transformation to get the Rayleigh-Quotient does not change the order of the eigenvalues, and tehrefore of the $z_m$'s, and the eigenvectors. Armed with $\phi_1$, we can now compute $\phi_2, \phi_3, \dots, \phi_M$ iteratively by setting $\pmb X_m = \pmb X - \sum_{j = 1}^{m-1} X \phi'_j \phi_j$ and solving the former problem for $\pmb X_m$ to get $\phi_m$. 

It turns out that $\pmb \phi$ equals the matrix $\pmb V = \begin{pmatrix} v_1 & v_2 & \dots & v_M \end{pmatrix} $, whereby $v_i$ is the eigenvector with length one of $X'X$ to the corresponding $i$-th highest eigenvalue $\lambda_i$. Hence, we can compute $\pmb \phi$ by only computing the eigenvectors of $X'X$ and sort them in decending order by the corresponding eigenvalues. Subsequent, $\pmb Z$ is obtained by matrix multiplication of $\pmb X$ and $\pmb \phi$ as in formula *(1.5)*.

**Portion of cariance captured by the $\lambda_i$'s**




## 2.3 Principal Component Regression (PCR)
be the matrix of indepedent variables and $\varepsilon \in \mathbb{M}_{1 \times n}$ a random vector.
The data generating process is $Y = \pmb X \beta + \varepsilon$. For ease of notation, we assume the expected value of any covariate and of the dependent variable to be $0$. Hence, $\text{E} ( \pmb X ) = \pmb0$ i.e. $\text{E}(X_i) = 0$ for $\forall i$. Since we can demean any linear model, this is not restrictive in any sense.


# 3. Structural Model

# 4. Simulation

# 5. Conclusion

In [None]:
derivaton of pca
https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch18.pdf

get number of PCA components
 Kaiser  criterion(Guttman, 1954; Kaiser, 1960)
 acceleration  factor(Cattell,  1966; Raiche,  Roipel,  and  Blais,2006)and parallel analysis(Horn, 1965)

comparison of pca and ridge
https://www.researchgate.net/publication/259265422_A_Monte_Carlo_Comparison_between_Ridge_and_Principal_Components_Regression_Methods

Introduction
- Black  (1997)  shows  that  parents  arewilling to pay a premium to buy a house in a neighborhood with aschool that scores well. (Do  Better  Schools  Matter?  Parental  Valuation  ofElementary  Education) - Havard paper
- https://www.researchgate.net/profile/Duncan_Thomas/publication/5194918_Early_Test_Scores_Socioeconomic_Status_and_Future_Outcomes/links/575812c508ae5c6549074510/Early-Test-Scores-Socioeconomic-Status-and-Future-Outcomes.pdf nice to find literature on test scores

1. structural model with equation
2. relationship $y = X \beta + \epsilon$
    Xs are correlated -> pcr
3. simulate structural model using parameter
4. see how large the variance is of the pcr beta and compare to the ones estimated. (can be computed)