# Support Vector Machines (SVM)

Support Vector Machines, or SVMs, can often do better than other models at fitting data -- especially data is that is highly non-linear. Let's demonstrate by using the "Faces in the Wild" dataset that is provided with scikit-learn to build a facial-recognition model.

## Load the dataset

The first step is to import facial images from the dataset. We'll set the minimum number of faces per person to 100, which means that only five sets of faces will be imported corresponding to five famous people.

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_lfw_people
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

faces = fetch_lfw_people(min_faces_per_person=100)
print(faces.target_names)
print(faces.images.shape)

In total, 1,140 facial images were loaded. Each image measures 62 by 47 pixels for a total of 2,914 pixels per image. That basically means we're working with a model with 2,914 feature columns. That's a lot of columns! Let's check the balance in our dataset by generating a histogram showing how many facial images were imported for each of the five personalities.

In [None]:
from collections import Counter

counts = Counter(faces.target)
names = {}

for key in counts.keys():
    names[faces.target_names[key]] = counts[key]

df = pd.DataFrame.from_dict(names, orient='index')
df.plot(kind='bar')

The dataset is not very well balanced, but we're not too concerned about that because the net effect will probably be that our model is better at recognizing certain people than others. Let's plot some of the facial images so we can see what they look like.

In [None]:
fig, ax = plt.subplots(3, 5, figsize=(12, 10))
for i, axi in enumerate(ax.flat):
    axi.imshow(faces.images[i], cmap='gist_gray')
    axi.set(xticks=[], yticks=[], xlabel=faces.target_names[faces.target[i]])

## Train an SVM model

The next task is to train an SVM model to do image classification using the faces in our dataset. Let's start by splitting the dataset so 80% can be used for training and 20% for testing.

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(faces.data, faces.target, train_size=0.8, random_state=42)

Now let's create an SVM classifier and train it using the 80% of the dataset reserved for training.

In [None]:
from sklearn.svm import SVC

model = SVC(kernel='rbf', class_weight='balanced')
model.fit(x_train, y_train)

Next, let's use the 20% of the dataset split off for testing to assess the accuracy of the model.

In [None]:
model.score(x_test, y_test)

That's not very encouraging. But we're far from done.

## Apply Principal Component Analysis (PCA)

It is possible that using PCA to reduce the number of columns ("features") will increase the accuracy of our model by filtering out the "noise" introduced by less-significant facial features. A pleasant side effect is that the model should train faster, too. Let's build a pipeline that performs a PCA transform on the input data, reducing 2,914 columns to 150, and uses an SVM classifier to fit a model to the training data.

> **Pipelines** are a handy mechanism in scikit-learn for building complex models that transform input data before using it to train or predict.

In [None]:
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

pca = PCA(n_components=150, whiten=True, svd_solver='randomized', random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)
model.fit(x_train, y_train)

Now let's score the model again.

In [None]:
model.score(x_test, y_test)

That's *much* better, but we might be able to improve the accuracy even more by tuning the model's hyperparameters.

## Tune the hyperparameters

One way to find the optimum combination of parameters provided to a learning algorithm in scikit-learn is to use *GridSearchCV*, which trains the model multiple times with all the different combinations of parameters that you specify. Let's use *GridSearchCV* to find the optimum values for the SVM's *C* and *gamma* parameters, which tend to have an important effect on SVM models. Note that training will take longer now because the model will be trained 16 times. (Good thing we reduced the number of dimensions by almost 95% with PCA!)

In [None]:
from sklearn.model_selection import GridSearchCV

params = {'svc__C': [1, 5, 10, 50],
          'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}

grid = GridSearchCV(model, params)
grid.fit(x_train, y_train)

Now let's find out what the optimum values for *C* and *gamma* are, and replace *model* with the optimized model.

In [None]:
print(grid.best_params_)
model = grid.best_estimator_

Finally, let's see if the optimized ("hypertuned") model does a better job of recognizing faces than our original model.

In [None]:
model.score(x_test, y_test)

It appears that we improved the model's accuracy by about 2.5%. Let's print a classification report to get a more detailed assessment of the model's accuracy.

In [None]:
from sklearn.metrics import classification_report

y_predicted = model.predict(x_test)
print(classification_report(y_test, y_predicted, target_names=faces.target_names))

For clarity, let's generate a confusion matrix to see how the model *really* performed during testing.

In [None]:
from sklearn.metrics import confusion_matrix

mat = confusion_matrix(y_test, y_predicted)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False, cmap='Blues',
            xticklabels=faces.target_names,
            yticklabels=faces.target_names)
plt.xlabel('Actual label')
plt.ylabel('Predicted label')

The model correctly identified Colin Powell 49 times out of 50, Donald Rumsfeld 23 times out of 25, and so on. That's not bad. And it's a great example of Support Vector Machines at work. It would be challenging, perhaps impossible, to do this well using more conventional learning algorithms such as logistic regression.