# Image Recognition with Support Vector Machines

_In this notebook, we show how to perform face recognition using Support Vector Machines. We will use the Olivetti faces dataset, included in Scikit-learn. More info at: http://scikit-learn.org/stable/datasets/olivetti_faces.html_

Start by importing numpy, scikit-learn, and matplotlib, the Python libraries we will be using in this chapter. Show the versions we will be using (in case you have problems running the notebooks) and use the inline plotting mode.

In [1]:
import numpy as np
import pandas as pd
import sklearn as sk
import matplotlib
import matplotlib.pyplot as plt

## 1 - Load Olivetti Face dataset

Import the olivetti faces dataset.

In [2]:
from sklearn.datasets import fetch_olivetti_faces

# Fetch the faces data
faces = fetch_olivetti_faces()

print(faces.DESCR)

Modified Olivetti faces dataset.

The original database was available from (now defunct)

    http://www.uk.research.att.com/facedatabase.html

The version retrieved here comes in MATLAB format from the personal
web page of Sam Roweis:

    http://www.cs.nyu.edu/~roweis/

There are ten different images of each of 40 distinct subjects. For some
subjects, the images were taken at different times, varying the lighting,
facial expressions (open / closed eyes, smiling / not smiling) and facial
details (glasses / no glasses). All the images were taken against a dark
homogeneous background with the subjects in an upright, frontal position (with
tolerance for some side movement).

The original dataset consisted of 92 x 112, while the Roweis version
consists of 64x64 images.



## 2 - Investigate the Olivetti Face Dataset¶

Let's look at the data, faces.images has 400 images of faces, each one is composed by a matrix of 64x64 pixels.
faces.data has the same data but in rows of 4096 attributes instead of matrices (4096 = 64x64)

In [3]:
print(faces.keys())
print(faces.images.shape)
print(faces.data.shape)
print(faces.target.shape)

dict_keys(['DESCR', 'data', 'images', 'target'])
(400, 64, 64)
(400, 4096)
(400,)


We don't have to scale attributes, because data is already normalized. Prove this.

In [4]:
from sklearn.preprocessing import StandardScaler
print(faces.data)
print(faces.images)

# printing data and images shows that all the has been normalized already
# all values are between 0 and 1

[[ 0.30991736  0.36776859  0.41735536 ...,  0.15289256  0.16115703
   0.1570248 ]
 [ 0.45454547  0.47107437  0.51239669 ...,  0.15289256  0.15289256
   0.15289256]
 [ 0.31818181  0.40082645  0.49173555 ...,  0.14049587  0.14876033
   0.15289256]
 ..., 
 [ 0.5         0.53305787  0.60743803 ...,  0.17768595  0.14876033
   0.19008264]
 [ 0.21487603  0.21900827  0.21900827 ...,  0.57438016  0.59090906
   0.60330576]
 [ 0.5165289   0.46280992  0.28099173 ...,  0.35950413  0.35537189
   0.38429752]]
[[[ 0.30991736  0.36776859  0.41735536 ...,  0.37190083  0.33057851
    0.30578512]
  [ 0.3429752   0.40495867  0.43801653 ...,  0.37190083  0.33884299
    0.3140496 ]
  [ 0.3429752   0.41735536  0.45041323 ...,  0.38016528  0.33884299
    0.29752067]
  ..., 
  [ 0.21487603  0.20661157  0.22314049 ...,  0.15289256  0.16528925
    0.17355372]
  [ 0.20247933  0.2107438   0.2107438  ...,  0.14876033  0.16115703
    0.16528925]
  [ 0.20247933  0.20661157  0.20247933 ...,  0.15289256  0.16115703
    

Plot all the faces in a matrix of 20x20, for each one, we'll put the target value in the top left corner and it's index in the bottom left corner.

In [5]:
plt.figure(figsize=[64,64])
for i in range(0,20*20):
    cnt = i
    plt.subplot(20,20,1+i)
    
#     plt.axis('off')
    ax = plt.gca()
    ax.set_yticks(np.arange(6,1,6))
    ax.set_xticks(np.arange(6,6,1))
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.spines['left'].set_visible(False)
#     target
    label = ax.set_ylabel(faces.target[i], fontsize = 9)
    ax.yaxis.set_label_coords(-0.025, 1.05)
#     label
    label = ax.set_xlabel(i, fontsize = 9)
    ax.xaxis.set_label_coords(-0.025, -0.025)
    
    plt.imshow(faces.images[cnt].reshape(64,64),cmap = plt.cm.gray, interpolation='nearest')
plt.show()
# plt.imshow(faces.images.reshape(64,64),cmap=plt.cm.gray)
# plt.show()

## 3 - Analysis with SVM

We will try to build a classifier whose model is a hyperplane that separates instances (points) of one class from the rest. Support Vector Machines (SVM) are supervised learning methods that try to obtain these hyperplanes in an optimal way, by selecting the ones that pass through the widest possible gaps between instances of different classes. New instances will be classified as belonging to a certain category based on which side of the surfaces they fall on. Let's import the SVC class from the sklearn.svm module. SVC stands for Support Vector Classifier: we will use SVM for classification.

In [6]:
from sklearn.svm import SVC
svc_1 = SVC(kernel='linear')
print(svc_1)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)


Build training and testing sets and perform 5-fold cross-validation (use the ``sklearn.cross_validation`` package for this). Show what all the accuracy scores are and compute the average value. Consult the sklearn documentation and when needed ask your teacher for help.

In [7]:
from sklearn import cross_validation
X, y = faces.data, faces.target

X_train, X_test, y_train, y_test = cross_validation.train_test_split(faces.data, faces.target, test_size=0.25, random_state=0)

scores = cross_validation.cross_val_score(svc_1, X, y, cv=5)

print("cross_val_score:")
print(scores)
print("\navg. score:")
print(sum(scores/len(scores)))

cross_val_score:
[ 1.      0.9625  0.975   0.95    0.95  ]

avg. score:
0.9675


In [8]:
clf = svc_1.fit(X_train, y_train)
clf.score(X_test, y_test)

0.98999999999999999

## 4 - Other Metrics

Import the sklearn ``metrics`` package and determine also precision and recall for the test set, for _each class_. 

In [9]:
from sklearn import metrics

def train_and_evaluate(clf, X_train, X_test, y_train, y_test):
    
    clf.fit(X_train, y_train)
    
    print("Accuracy on training set:")
    print(clf.score(X_train, y_train))
    print("Accuracy on testing set:")
    print(clf.score(X_test, y_test))
    
    y_pred = clf.predict(X_test)
    
    print("Classification Report:")
    print(metrics.classification_report(y_test, y_pred))
    print("Confusion Matrix:")
    print(metrics.confusion_matrix(y_test, y_pred))

Let's measure precision and recall on the evaluation set, for _each class_. 

In [10]:
train_and_evaluate(svc_1, X_train, X_test, y_train, y_test)

Accuracy on training set:
1.0
Accuracy on testing set:
0.99
Classification Report:
             precision    recall  f1-score   support

          0       0.86      1.00      0.92         6
          1       1.00      1.00      1.00         4
          2       1.00      1.00      1.00         2
          3       1.00      1.00      1.00         1
          4       1.00      1.00      1.00         1
          5       1.00      1.00      1.00         5
          6       1.00      1.00      1.00         4
          7       1.00      0.67      0.80         3
          9       1.00      1.00      1.00         1
         10       1.00      1.00      1.00         4
         11       1.00      1.00      1.00         1
         12       1.00      1.00      1.00         2
         13       1.00      1.00      1.00         3
         14       1.00      1.00      1.00         5
         15       1.00      1.00      1.00         3
         17       1.00      1.00      1.00         6
         19    

Conclusion performance of SVM for face recognition is incredibly high!

## 5 - Discriminate People with or without Glasses

Now, another problem: Let's try to classify images of people with and without glasses. Mark people with glasses as 1 and people without glasses as 0. 

In [11]:
def create_target(segments):
    # create a new y array of target size initialized with zeros
    y = np.zeros(faces.target.shape[0])
    # put 1 in the specified segments
    for (start, end) in segments:
        y[start:end + 1] = 1
    return y

In [12]:
glasses = [
    (10, 19), (30, 32), (37, 38), (50, 59), (63, 64),
    (69, 69), (120, 121), (124, 129), (130, 139), (160, 161),
    (164, 169), (180, 182), (185, 185), (189, 189), (190, 192),
    (194, 194), (196, 199), (260, 269), (270, 279), (300, 309),
    (330, 339), (358, 359), (360, 369)]


Create training and test set for the new problem.

In [13]:
from sklearn.cross_validation import train_test_split
target_glasses = create_target(glasses)
X_train, X_test, y_train, y_test = train_test_split(faces.data, target_glasses, test_size=0.25, random_state=0)

Try again with a linear SVM kernel and show a classification report as above.

In [14]:
svc_2 = SVC(kernel='linear')
clf2 = svc_2.fit(X_train, y_train)
clf2.score(X_test, y_test)

0.98999999999999999

In [15]:
train_and_evaluate(svc_2, X_train, X_test, y_train, y_test)

Accuracy on training set:
1.0
Accuracy on testing set:
0.99
Classification Report:
             precision    recall  f1-score   support

        0.0       1.00      0.99      0.99        67
        1.0       0.97      1.00      0.99        33

avg / total       0.99      0.99      0.99       100

Confusion Matrix:
[[66  1]
 [ 0 33]]
