In [None]:
%matplotlib inline

# load some utilities (for loading MNIST and plotting)
# also imports most Python modules
%run utils.py

# load all MNIST images and labels
train_images, train_labels = MNIST()
test_images, test_labels = MNIST(test=True)

# PCA

A common method for dimensionality-reduction and feature extraction is the so-called prinicipal component analysis (PCA). Roughly speaking, we focus on the $k$ orthogonal directions in which we observe the largest variation and neglect all other dimensions. In this notebook we try to perform and analyse PCA of the MNIST image data set with $k = 2$ principal components.

Let $\mathbf{X} \in [0,1]^{N \times 784}$ be a matrix of the MNIST training data set with $N = 60000$, where each row represents a training image. Moreover, let $\mathbf{X} - \overline{\mathbf{x}}^\intercal = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\intercal$ be the singular value decomposition of the centered MNIST training data, where $\overline{\mathbf{x}} = \frac{1}{N} \sum_{i=1}^N \mathbf{x}_i$.

We compute the singular value decomposition $\mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\intercal$ with the function [`torch.svd`](https://pytorch.org/docs/stable/torch.html#torch.svd).

In [None]:
# center training images
train_mean = train_images.mean(dim=0, keepdim=True)
train_images_centered = train_images - train_mean

U, S, V = torch.svd(train_images_centered)

In the preparation exercises, we have discussed how this singular value decomposition can be used to compute the encodings in the principal subspace in a general setting. We apply these results now to the MNIST image data set.

## Task 1

Compute the encoding `train_encoding` $\in \mathbb{R}^{60000 \times 2}$ and `test_encoding` $\in \mathbb{R}^{10000 \times 2}$ of the images in the MNIST training and the test data set in the two-dimensional principal subspace of the MNIST training images with the help of $\mathbf{U}$, $\boldsymbol{\Sigma}$, and $\mathbf{V}$.

*Hint*: As shown in the introductory notebook, you can compute the product $\mathbf{A} \mathbf{B}$ of two PyTorch matrices $\mathbf{A}$ and $\mathbf{B}$ with [`A.mm(B)`](https://pytorch.org/docs/stable/torch.html#torch.mm).

In [None]:
train_encoding = # WRITE YOUR CODE HERE
test_encoding = # WRITE YOUR CODE HERE

We can inspect the distribution of encodings `train_encoding` and `test_encoding` in the latent space visually with `plot_MNIST_encoding`.

In [None]:
plot_encoding((train_encoding, train_labels), (test_encoding, test_labels))

We can investigate the encodings of the images in the MNIST training data for each of the digits $0, 1, \ldots, 9$ separately. For instance, we can compute the mean encoding for each digit.

In [None]:
# compute mean encoding
train_mean_encodings = mean_encodings(train_encoding, train_labels)

These mean encodings are just the mean vector of the different clusters that the encodings of the training images form in the principal subspace.

In [None]:
plot_encoding((train_encoding, train_labels), (test_encoding, test_labels),
              train_mean_encodings, annotate=True)

From these encodings we can reconstruct images, using the formula provided in the preparation exercises.

In [None]:
# compute mean images
train_mean_images = train_mean + train_mean_encodings.mm(V[:, :2].t())

plot_images(train_mean_images, torch.arange(10))

We can investigate the latent space a bit more. Let us build a grid of points that is spanned by the mean encodings for digit "0" and digit "9" in the training data set.

In [None]:
# compute grid of latent vectors
zgrid = create_grid(train_mean_encodings[0], train_mean_encodings[9])

# visualize it
plot_encoding((train_encoding, train_labels), (test_encoding, test_labels), zgrid)

Again we reconstruct and plot the images corresponding to these encodings.

In [None]:
# compute mean images
xgrid = train_mean + zgrid.mm(V[:, :2].t())

plot_images(xgrid)

We turn back to the MNIST data set.

## Task 2

Compute the reconstruction `test_reconstruction` $\in \mathbb{R}^{10000 \times 784}$ of the images in the MNIST test data set.

In [None]:
test_reconstruction = # WRITE YOUR CODE HERE

We plot some test images and their reconstructed counterparts.

In [None]:
plot_reconstruction(test_images, test_reconstruction, test_labels)

The comparison of the original images and their reconstructions provides us with some intuition for how much information is lost by the compression of the images to the two-dimensional latent space. As a less subjective measure we calculate the average squared reconstruction error
\begin{equation*}
\mathrm{sqerr} := \frac{1}{10000} \|\mathbf{x}_i - \tilde{\mathbf{x}}_i\|^2_2
\end{equation*}
of the images $\mathbf{x}_i \in {[0,1]}^{784}$ and their reconstruction $\tilde{\mathbf{x}}_i \in \mathbb{R}^{784}$ ($i = 1,\ldots, 10000$) in the MNIST test data set.

In [None]:
sqerr = (test_images - test_reconstruction).pow(2).sum(dim=1).mean()
print(f"Average squared reconstruction error: {sqerr}")

An advantage of an objective measure such as the average squared reconstruction error is that it enables us to compare the PCA with other models for dimensionality reduction.

## Task 3

Answer Questions 4.1 and 4.2 in the lab instructions.

## Summary

In this first part of the lab session, we have seen how we can extract the two components from the MNIST images of hand-written digits that cause the largest variations in the data set. We can encode any such image (even from the test data set) by mapping it to the principal subspace, and we are able to produce a lossy reconstruction of the images. We generated even some new images, but only on a grid of points that we extracted from the encodings of the training data. In the next part of the lab we will work with a probabilistic model that allows us to sample new images in a more general way.