### CS4102 - Geometric Foundations of Data Analysis I
Prof. Götz Pfeiffer<br />
School of Mathematical and Statistical Sciences<br />
University of Galway

# Week 7

##  Face Recognition

* Start by importing `numpy` and `matplotlib` colormaps.

In [None]:
import numpy as np
import matplotlib.cm as cm

* Assume that the face database has been downloaded and unpacked in the folder `orl_faces`.

In [None]:
root = "orl_faces"

In [None]:
!tree {root}

* In order to read these data into the python session, we need two things:
    * access to the filesytem hierarchy
    * image processing 

* The `os` library provides tools for navigating the filesystem. 
* The `os.walk` function traverses the directory stucture recursively, and distinguishes between files and subfolders.

In [None]:
import os
folders = next(os.walk(root))[1]
folders[:10]

* `os.path.join` constructs pathnames
* `os.listdir` lists a directory's content

In [None]:
names = os.listdir(os.path.join(root, folders[0]))
names[:7]

In [None]:
path = os.path.join(root, folders[8], names[6])
path

* The Python Imaging Library (`PIL`) adds image processing capabilities to your Python interpreter.

In [None]:
from PIL import Image
im = Image.open(path)
im

* `numpy` knows how to convert an image into an array.

In [None]:
ar = np.array(im)
ar

* `matplotlib.pyplot` can display an array as an image.

In [None]:
import matplotlib.pyplot as plt
plt.imshow(ar, cmap=cm.gray)

* And `Image` can convert the array back into an image.

In [None]:
im = Image.fromarray(ar)
im

* How to read the images: this function uses the above tools to read all images into an array `X`.  The list `y` keeps track of the people whose faces are on the images.

In [None]:
def read_images(root):
    c = 0
    X, y = [], []
    for folder in next(os.walk(root))[1]:
        for name in os.listdir(os.path.join(root, folder)):
            path = os.path.join(root, folder, name)
            im = Image.open(path)
            X.append(np.array(im))
            y.append(c)
        c += 1
    return np.array(X), y

* So we read the images and look into `X`

In [None]:
X, y = read_images(root)

In [None]:
X.shape

In [None]:
X[1]

* `X` is a list of $400$ images of $112 \times 92$ pixels.   For PCA, we prefer this to be a list of vectors,
  i.e. a $400 \times (112 \cdot 92)$ matrix.

In [None]:
X.reshape((X.shape[0], -1))

* How to do PCA: This function performs PCA on a given $n \times d$-matrix `X`.
* Numpy's `linalg.eigh` function computes the eigenvalues and eigenvectors of a real symmetric matrix.
* Note that the eigenvectors will form the **columns** of the resulting matrix `evectors`.
* Also note how the calculation distinguishes between cases $n > d$ or not. (This trick is explained in the original notes.)

In [None]:
def pca(X):
    n, d = X.shape
    mu = X.mean(axis=0)
    X = X - mu
    if n > d:
        C = X.T @ X
        evalues, evectors = np.linalg.eigh(C)
    else:
        C = X @ X.T
        evalues, evectors = np.linalg.eigh(C)
        evectors = X.T @ evectors
        for i in range(n):
            evectors[:,i] = evectors[:,i]/np.linalg.norm(evectors[:,i])

    # sort evectors descending by their evalue
    idx = np.argsort(-evalues)
    evalues = evalues[idx]
    evectors = evectors[:,idx]

    return evalues, evectors, mu

* We can now perform a full pca and obtain a (sorted) list of eigenvalues `D`, a matrix `W` of eigenvectors (as its columns), and the average `mu` of the rows of `X` (which is needed later on).

In [None]:
D, W, mu = pca(X.reshape((X.shape[0], -1)))

In [None]:
D[:16]

In [None]:
W.shape

In [None]:
mu

* Up to some reshaping and normalization, the eigenvectors can now be regarded as images: eigenfaces

In [None]:
W[:,5].reshape(X[0].shape)

* The float entries of this $112 \times 92$ matrix version of the eigenvector in column $5$ need to be
  rescaled to become integers in `range(256)`, suitable for images.
* This is done by the following function.

In [None]:
def normalize(X, low, high, dtype=None):
    minX, maxX = np.min(X), np.max(X)

    # normalize to [0...1].	
    X = X - float(minX)
    X = X / float(maxX - minX)

    # scale to [low...high].
    X = X * (high-low)
    X = X + low

    if dtype is None:
        return X
    return np.array(X, dtype=dtype)

* Let's see the effect on eignevector number $5$.

In [None]:
normalize(W[:,5], 0, 255, np.uint8).reshape(X[0].shape)

In [None]:
plt.imshow(normalize(W[:,5], 0, 255, np.uint8).reshape(X[0].shape))

* Let's turn the first (at most) 16 eigenvectors into grayscale images (note: eigenvectors are stored by column!) and plot them in a single picture.

In [None]:
E = []
for i in range(min(W.shape[1], 16)):
    e = normalize(W[:,i], 0, 255, np.uint8)
    E.append(e.reshape(X[0].shape))

* How to plot (details omitted)

In [None]:
def subplot(title, images, rows, cols, sptitle="subplot", sptitles=[], colormap=cm.gray, filename=None):
    fig = plt.figure()

    # main title
    fig.text(.5, .95, title, horizontalalignment='center')

    for i in range(len(images)):
        fig.add_subplot(rows, cols, i+1)
        if len(sptitles) == len(images):
            plt.title("%s #%s" % (sptitle, str(sptitles[i])), { 'fontsize': 8 })
        else:
            plt.title("%s #%d" % (sptitle, i+1), { 'fontsize': 8 })
        plt.imshow(np.asarray(images[i]), cmap=colormap)
        plt.axis('off')

    if filename is None:
        plt.show()
    else:
        fig.savefig(filename)


* Let's plot the $16$ eigenfaces using a coloured colormap (and store the plot to "python_eigenfaces.pdf").

In [None]:
subplot(
    title="Eigenfaces AT&T Facedatabase", 
    images=E, 
    rows=4, cols=4, 
    sptitle="Eigenface", 
    colormap=cm.jet, 
    filename="python_pca_eigenfaces.pdf"
)

* Finally, we project a face into the small space generated by only a few eigenvectors, and then try and reconstruct the face from that information.
* The formulas for projection and reconstruction  are straightforward. 

* How to project

In [None]:
def project(W, X, mu):
    return (X - mu) @ W

* How to reconstruct

In [None]:
def reconstruct(W, Y, mu):
    return Y @ W.T + mu

* Let's apply this to face number $21$, projected onto the first $50$ eigenvectors.

In [None]:
P = project(W[:,:50], X[21].reshape(-1), mu)
P

In [None]:
R = reconstruct(W[:,:50], P, mu)
plt.imshow(R.reshape(X[0].shape), cmap=cm.gray)

* How does the quality improve over range of numbers or eigenvectors?

In [None]:
steps = range(10, min(len(X), 320), 20)
E = []
for step in steps:
    P = project(W[:,:step], X[21].reshape(-1), mu)
    R = reconstruct(W[:,:step], P, mu).reshape(X[0].shape)
    E.append(normalize(R, 0, 255, dtype=np.uint8))

* plot them and store the plot to "python_reconstruction.pdf"

In [None]:
subplot(
    title="Reconstruction AT&T Facedatabase", 
    images=E, 
    rows=4, cols=4, 
    sptitle="Eigenvectors", 
    sptitles=steps, 
    colormap=cm.gray, 
    filename="python_pca_reconstruction.pdf"
)