### CS4102 - Geometric Foundations of Data Analysis I
Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics<br />
NUI Galway

# Week 10

##  Face Recognition Reloaded

* Start by importing `numpy` and `matplotlib` colormaps.

In [1]:
import numpy as np
import matplotlib.cm as cm

* Assume that the face database has been downloaded and unpacked in the folder `orl_faces`.

In [2]:
root = "orl_faces"

* In order to read these data into the python session, we need two things:
    * access to the filesytem hierarchy
    * image processing 

* The `os` library provides tools for navigating the filesystem. 
* The `os.walk` function traverses the directory stucture recursively, and distinguishes between files and subfolders.

In [3]:
import os

* The Python Imaging Library (`PIL`) adds image processing capabilities to your Python interpreter.

In [4]:
from PIL import Image

* `matplotlib.pyplot` can display an array as an image.

In [5]:
import matplotlib.pyplot as plt

* How to read the images: this function uses the above tools to read all images into an array `X`.  The list `y` keeps track of the people whose faces are on the images.

In [6]:
def read_images(root):
    c = 0
    X, y = [], []
    for folder in next(os.walk(root))[1]:
        for name in os.listdir(os.path.join(root, folder)):
            path = os.path.join(root, folder, name)
            im = Image.open(path)
            X.append(np.array(im))
            y.append(c)
        c += 1
    return np.array(X), y

* So we read the images and look into `X`

In [7]:
X, y = read_images(root)

In [8]:
X.shape

(400, 112, 92)

In [9]:
X[1]

array([[ 50,  49,  50, ...,  41,  42,  38],
       [ 55,  46,  50, ...,  39,  39,  36],
       [ 50,  49,  49, ...,  37,  42,  38],
       ...,
       [183, 165, 145, ..., 118, 114, 105],
       [133, 148, 155, ..., 120, 118, 116],
       [144, 150, 169, ..., 159, 111, 117]], dtype=uint8)

* `X` is a list of $400$ images of $112 \times 92$ pixels.   For PCA, we prefer this to be a list of vectors,
  i.e. a $400 \times (112 \cdot 92)$ matrix.

In [10]:
X.reshape((X.shape[0], -1))

array([[ 44,  48,  51, ..., 129, 131, 125],
       [ 50,  49,  50, ..., 159, 111, 117],
       [ 51,  46,  48, ..., 196, 173, 110],
       ...,
       [ 86,  89,  90, ..., 124, 116,  83],
       [ 88,  97,  95, ...,  67,  68,  69],
       [ 95,  89,  89, ...,  43,  55,  55]], dtype=uint8)

##  Special methods

* Special functions are part of the class definition.
* The can be used to provide additional behavior for instances of the class.

### List-like behavior

* For example, here is a class `MyList` of objects that behave like lists.
* Each such object stores a position `pos` (in `range(10)`).
* It should then behave like the standard basis vector that has all its entries $0$, except for an entry $1$ in position `pos`.
* So the special constructor method `__init__` is defined with `pos` as argument, and it stores whatever is passed in the `pos` component of the new object.

In [11]:
class MyList:
    def __init__(self, pos):
        self.pos = pos

* We can now construct such an object and print it.

In [12]:
l = MyList(5)
l.pos

5

In [13]:
l

<__main__.MyList at 0x7f4175c4fcd0>

* Wouldn't it be nice if that string representation of a `MyList` object was a bit more informative, like the expression that was used to create it in the first place?
* We can arrange that in the special method `__repr__`

In [14]:
class MyList:
    def __init__(self, pos):
        self.pos = pos
        
    def __repr__(self):
        return f"MyList({self.pos})"

In [15]:
l = MyList(4)
l

MyList(4)

In [16]:
len(l)

TypeError: object of type 'MyList' has no len()

* When the special method `__len__` is implemented, a `MyList` object can respond to a `len` call.

In [17]:
class MyList:
    def __init__(self, pos):
        self.pos = pos
        
    def __repr__(self):
        return f"MyList({self.pos})"
    
    def __len__(self):
        return 10

In [18]:
l = MyList(6)
len(l)

10

In [19]:
l.pos

6

In [20]:
l[6]

TypeError: 'MyList' object is not subscriptable

* An implementation of the special method `__getitem__` makes `MyList` objects subscriptable.

In [21]:
class MyList:
    def __init__(self, pos):
        self.pos = pos
        
    def __repr__(self):
        return f"MyList({self.pos})"
    
    def __len__(self):
        return 10
    
    def __getitem__(self, i):
        if i == self.pos:
            return 1
        else:
            return 0

In [22]:
l = MyList(8)
l[8]

1

In [23]:
l[1]

0

In [24]:
[l[i] for i in range(len(l))]

[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]

* But, assignment to positions is not possible.  What should it mean after all ...

In [25]:
l[4] = 1

TypeError: 'MyList' object does not support item assignment

### Function like behavior

* In a similar way, objects can be taught to act like functions, i.e., be callable.
* For this, we implement the special method `__call__`.

* For example, if we want a `MyList` object to act like the function that takes a numerical argument $c$, and returns the $c$-multiple of the standard basis vector it represents:

In [26]:
class MyList:
    def __init__(self, pos):
        self.pos = pos
        
    def __repr__(self):
        return f"MyList({self.pos})"
    
    def __len__(self):
        return 10
    
    def __getitem__(self, i):
        if i == self.pos:
            return 1
        else:
            return 0

    def __call__(self, c):
        return [c*self[i] for i in range(len(self))]

In [27]:
l = MyList(5)
l(3)

[0, 0, 0, 0, 0, 3, 0, 0, 0, 0]

## Inheritance

* A class can inherit (methods) from another class
* Instances of the following class `BetterList` will have all the functionality of `MyList` objects ...
* ... and additionally they can assign to positions.

In [28]:
class BetterList(MyList):
    def __setitem__(self, i, val):
        self.pos = i

In [29]:
m = BetterList(3)
m

MyList(3)

In [30]:
m[0] = 1

In [31]:
[m[i] for i in range(len(m))]

[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [32]:
m

MyList(0)

* To fix the string representation, we can overload the `__repr__` method in `BetterList`.
* This change will not affect `MyList` objects.

In [33]:
class BetterList(MyList):
    def __setitem__(self, i, val):
        self.pos = i

    def __repr__(self):
        return f"BetterList({self.pos})"

In [34]:
m = BetterList(7)
m

BetterList(7)

In [35]:
l = MyList(7)
l

MyList(7)

## Abstract Classes

* Sometimes it can be useful to define an abstract class, i.e., a class that is not meant to be instantiated.
* No objects of this type would ever be created.
* The class rather serves as a prototype for a collection of (concrete) classes that share some common behaviour.

* Here is an example of an abstract class for a yet to be defined collection of different notions of distance.
* It consists of a constructor, a string representation, and an implementation of `__call__` that just raises an error.

In [36]:
class Distance:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return self.name

    def __call__(self, p, q):
        raise NotImplementedError(f"don't know yet how to {self}(p, q).")

In [37]:
d = Distance("d")
d

d

In [38]:
d(1,2)

NotImplementedError: don't know yet how to d(p, q).

### Euclidean Distance

* Eulidean distance between $x$ and $y$ is
\\[
 e(x, y) = (\sum_i (x_i - y_i)^2)^{1/2}
\\]

* To implement this, we could literally take the above `Distance` class as a blueprint, copy and paste most of it and provide a working implementation of `__call__`:

In [39]:
class EuclideanDist:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return self.name

    def __call__(self,p,q):
        p = np.array(p).flatten()
        q = np.array(q).flatten()
        return np.sqrt(np.sum((p - q)**2))

* Then we make an object `e` of this type, name it `"e"` so that it prints itself as `e` and the call it with two vectors as arguments.

In [40]:
e = EuclideanDist("e")
e

e

In [41]:
e([1,2,3], [4,5,6])

5.196152422706632

* Through the clever use of numpy's `flatten` method, we can als measure distance between images from the database imported above.
* Images are $2$-dimensional arrays, which can be regarded as $1$-dimensional vectors.

In [42]:
e(X[1], X[2])

956.625318502495

In [43]:
e(X[1], X[21])

1043.0560866990807

* But inheritance is more efficient than copy-and-paste.
* If we define `EuclideanDist` as subclass of `Distance`, we only need to define the `__call__` method.
* `__init__` and `__repr__` are inherited.

In [44]:
class EuclideanDist(Distance):
    def __call__(self, p, q):
        p = np.array(p).flatten()
        q = np.array(q).flatten()
        return np.sqrt(np.sum((p - q)**2))

In [45]:
e = EuclideanDist("e")
e

e

In [46]:
e([1,2,3], [4,5,6])

5.196152422706632

* The Taxicab distance between vectors $x$ and $y$ is:
\\[
   t(x, y) = \sum_i |x_i - y_i |
\\]
* Again, this notion of distance can be defined as a subclass of the abstract `Distance` class by simply implementing the formula as part of the `__call__` method.

In [47]:
class TaxicabDist(Distance):
    def __call__(self, p, q):
        p = np.array(p).flatten()
        q = np.array(q).flatten()
        return np.sum(np.abs(p - q))   

* Now we can make and use an object of this type.

In [48]:
t = TaxicabDist("t")
t

t

In [49]:
t([1,2,3],[4,5,6])

9

In [50]:
t(X[1], X[2])

1194156

In [51]:
t(X[1], X[21])

1520020

* The Infintity Distance between vectors $x$ and $y$ is:
\\[
  i(x, y) = \max |x_i - y_i|
\\]
* As a subclass of `Distance`:

In [52]:
class InfinityDist(Distance):
    def __call__(self, p, q):
        p = np.array(p).flatten()
        q = np.array(q).flatten()
        return np.max(np.abs(p - q))   

In [53]:
i = InfinityDist("i")
i

i

In [54]:
i([1,2,3],[4,5,6])

3

In [55]:
i(X[1], X[2])

255

In [56]:
i(X[1], X[21])

255

* Did it say above that an abstract class is the place to put common code?
* We could define this collection of classes even more succintly, by defining the operation of flattening the arguments, which is common to all three concrete distance classes, into the abstract class:

In [57]:
class Distance:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return self.name

    def __call__(self, p, q):
        raise NotImplementedError(f"don't know yet how to {self}(p, q).")
        
    def flatten(self, p, q):
        p = np.array(p).flatten()
        q = np.array(q).flatten()
        return p - q

* Then all the subclasses essentially are one-liners ...

In [58]:
class EuclideanDist(Distance):
    def __call__(self, p, q):
        return np.sqrt(np.sum(self.flatten(p, q)**2))

In [59]:
e = EuclideanDist("e")
e([1,2,3],[4,5,6])

5.196152422706632

In [60]:
class TaxicabDist(Distance):
    def __call__(self, p, q):
        return np.sum(np.abs(self.flatten(p, q)))   

In [61]:
t = TaxicabDist("t")
t([1,2,3],[4,5,6])

9

In [62]:
class InfinityDist(Distance):
    def __call__(self, p, q):
        return np.max(np.abs(self.flatten(p, q)))   

In [63]:
i = InfinityDist("i")
i([1,2,3],[4,5,6])

3