In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("Principal Component Analysis.ipynb")

# Matrix Analysis 2024 - EE312
## Week 10 - Principal Component Analysis (PCA)
[LTS2](https://lts2.epfl.ch)

PCA is a classic technique for dimensionality reduction. As you will see in this notebook, this method uses  projections and eigenvalues.

## 1. Eigenvalues and PCA

Let us consider $N$ data points $\{x_k\}, k=1, ..., N$ in $\mathbb{R}^d$. During the rest of the exercise we will make the assumption that the mean value of these data points is 0, i.e. $\frac{1}{N}\sum_{k=1}^Nx_k=0$. We will denote by $X$ the $N\times d$ matrix s.t. :

$
X = \begin{pmatrix}
x_1^T\\
x_2^T\\ \vdots \\ x_N^T \end{pmatrix}$

<!-- BEGIN QUESTION -->

**1.1**
Write the projection of the data points $x_k$ on a unit-norm vector $u\in\mathbb{R}^d$ using a matrix operation

_Type your answer here, replacing this text._

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**1.3** Let us define the matrix $C = \frac{1}{N}X^TX$ (referred to as the **sample covariance matrix** in the litterature). What are the properties of this matrix ? What is the implication on its eigenvalues ?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**1.4** PCA is defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some scalar projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. The following picture illustrate the principle for (blue) points in $\mathbb{R}^2$. Intuitively, the variance of the projected points (in red) will be maximized when the direction of projection matches the main direction of the point cloud in the picture.

![PCA](images/pca.gif)

We will now try to find a vector $u$, $||u||=1$, s.t. the variance of the projection of the data on this vector is maximal. Let us order the eigenvalues of $C$ in a decreasing order, i.e. $\lambda_1\geq \lambda_2\geq...\geq \lambda_d$. Show that the eigenvector associated with the largest eigenvalue maximizes the variance of the projection of $x_k$. This will be the first vector used for the first principal component. (Hint: consider the orthonormal basis formed by the eigenvectors of $C$).

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**1.5** What is the second vector that will maximize the variance of the $x_k$ minus their projection on the first principal component vector (i.e. eigenvector associated to $\lambda_1$) ?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## 2. Applying PCA to data

After completing the previous part, you probably figured that PCA is achieved by iterating projections of residuals on eigenvectors of the covariance matrix. In this exercise, you will apply the PCA to a specific dataset.

**Warning** Computing the eigenvectors and eigenvalues can take time (> 1 minute) and is too slow on noto. Use either a local installation or [google colab](https://colab.research.google.com/)

### 2.1 Loading the dataset
The ["Olivetti faces dataset"](https://scikit-learn.org/stable/datasets/real_world.html?highlight=olivetti) is made of 400 64x64 images (represented as vectors containing 4096 elements). The dataset is made of pictures of 40 persons, with varying light conditions, facial expressions etc.

In [None]:
from sklearn.datasets import fetch_olivetti_faces
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Load the dataset
faces, labels = fetch_olivetti_faces(return_X_y=True, shuffle=True)

In [None]:
plt.imshow(np.reshape(faces[2,:], (64,64)))

The first step to perform the PCA is to remove the mean value from the input:

In [None]:
def remove_mean(input_data):
    """
    Compute the mean vector of the dataset and subtract it

    Parameters
    ----------
    input_data : input data matrix
   
    Returns
    -------
    0-centered input data
    """
    ...

In [None]:
faces_zero_centered = ...

In [None]:
grader.check("q2p1")

### 2.2 Covariance matrix
As seen in the theoretical part, you need compute the covariance matrix $C$ and compute its eigenvalues and eigenvectors (use the [numpy.linalg.eigh](https://numpy.org/doc/stable/reference/generated/numpy.linalg.eigh.html) function). Using the [argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) function, sort the eigenvalues and eigenvectors appropriately (do not trust Numpy's `eigh` documentation !). Be careful, `argsort` sorts in ascending order only, do not forget to reverse the array !



In [None]:
def covariance_sorted_eigv(X):
    """
    Computes the covariance matrix and return its sorted eigenvectors/eigenvalues in decreasing order

    Parameters
    ----------
    X : zero-centered input data
   
    Returns
    -------
    The sorted eigenvectors and eigenvalues of the covariance matrix of X 
    """
    ...
    return sorted_eigen_vals, sorted_eigen_vecs

In [None]:
sorted_eigen_vals, sorted_eigen_vecs = covariance_sorted_eigv(faces_zero_centered)

In [None]:
# Display the eigenvectors associated with the largest eigenvalues
...

In [None]:
grader.check("q2p2")

<!-- BEGIN QUESTION -->

### 2.3 PCA 
We are finally ready to write a function that performs a PCA, i.e. given an input data vector, returns its projection on the $n$ largest eigenvectors (implement the `pca` function). For a given input image, compute an approximation using $n$ principal components (in the `pca_approx` function). How many components (approximately) do you need to have a result that is close to the original image ?

In [None]:
def pca(input_vec, sorted_eig_vectors, n):
    """
    Compute the projection on the n largest eigenvectors

    Parameters
    ----------
    input_vec : input data vector
    sorted_eig_vectors: sorted eigenvectors of the covariance matrix
   
    Returns
    -------
    The projection of input_vec on each eigen vector
    """
    ...

In [None]:
def pca_approx(pca_projection, sorted_eig_vectors, n):
    """
    Compute the PCA approximation

    Parameters
    ----------
    pca_projection : projection of input data vector on the covariance matrix egigenvectors
    sorted_eig_vectors: sorted eigenvectors of the covariance matrix
   
    Returns
    -------
    The projection of input_vec on each eigenv vector
    """
    ...

In [None]:
n = 10 # update this number
img = faces_zero_centered[40, :] # try other images too
approx = ...
ax = plt.subplot(121)
plt.imshow(np.reshape(img, (64, 64)))
ax.set_title('Original')
ax = plt.subplot(122)
plt.imshow(np.reshape(approx, (64, 64)))
ax.set_title('PCA')
plt.show()

Depending on the image, 100 to 200 components yield a good approximation of the initial image.

<!-- END QUESTION -->

## 3. PCA and SVD

Let us now study the relationship between PCA and SVD

<!-- BEGIN QUESTION -->

### 3.1 
Using the SVD of $X$, find a relationship between the eigenvalues/eigenvectors of $C$ and the singular values/singular vectors of $X$.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### 3.2
Check the [svd](https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html) function in Numpy and use it to compute the eigenvalues and eigenvectors of $C$ (use `full_matrices=False` to speed up computations). What is the interest of using SVD vs. computing explicitely $C$ and its eigenvalues/eigenvectors ?

_Type your answer here, replacing this text._

In [None]:
def covariance_sorted_eigv_svd(X):
    """
    Computes the covariance matrix and return its sorted eigenvectors/eigenvalues in decreasing order using SVD

    Parameters
    ----------
    X : zero-centered input data
   
    Returns
    -------
    The sorted eigenvectors and eigenvalues of the covariance matrix of X 
    """
    ...
    return sorted_eigen_vals, sorted_eigen_vecs

In [None]:
svd_eigen_vals, svd_eigen_vec = covariance_sorted_eigv_svd(faces_zero_centered)

In [None]:
...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### 3.3
Compute a low-rank $k\ll d$ approximation $X_k$ of $X$. You can plot the singular values to choose $k$. What is the relationship between $X_k$ and the PCA ?

_Type your answer here, replacing this text._

In [None]:
u,s,vh = np.linalg.svd(faces_zero_centered, full_matrices=False)
plt.plot(s)

In [None]:
def approx_low_rank(X, k):
    ...

In [None]:
k= ...
X_k = approx_low_rank(faces_zero_centered, k)

<!-- END QUESTION -->



In [None]:
# Plot low-rank approximation
nrows = 4
ncols = 4
for k in range(nrows*ncols):
    plt.subplot(nrows, ncols, k+1)
    plt.imshow(np.reshape(X_k[k, :], (64, 64)))

In [None]:
# compare witht the originals
nrows = 4
ncols = 4
for k in range(nrows*ncols):
    plt.subplot(nrows, ncols, k+1)
    plt.imshow(np.reshape(faces_zero_centered[k, :], (64, 64)))