<div style="width: 100%; clear: both;">
<div style="float: left; width: 50%;">
<img src="http://www.uoc.edu/portal/_resources/common/imatges/marca_UOC/UOC_Masterbrand.jpg", align="left">
</div>
<div style="float: right; width: 50%;">
<p style="margin: 0; padding-top: 22px; text-align:right;">M0.532 · Pattern Recognition</p>
<p style="margin: 0; text-align:right;">Computational Engineering and Mathematics Master</p>
<p style="margin: 0; text-align:right; padding-button: 100px;">Computers, Multimedia and Telecommunications Department</p>
</div>
</div>
<div style="width:100%;">&nbsp;</div>

## Image Classification with Bag of Features (BoF) and Support Vector Machines (SVM)

In this notebook, we will train an image classifier based on Bag of Features (BoF) and using SVMs. The BoF is a feature representation based on quantization. Using local features from a training dataset, these features are clustered and some representative features (the center data from each cluster) are used as reference. Once these representative features have been obtained, any image can be represented by a combination of these features. This process is done by extracting first the local features. Then, for each local feature, the nearest reference feature must be found. As a result, if we have M reference features (size of the codebook), the image will be represented as a vector f of dimension M, where each component $f_i$ will represent how many local features from the image have been assigned to the reference feature $i$. These vector representations of the images will be used to train an image classification system with Support Vector Machines.

Let's start by importing some data from our Google Drive account.

In [None]:
from google.colab import drive
 
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


We will be using the Caltech101 image dataset, which includes 101 categories. In the following code, we can see the names of the different categories included in Caltech101.

In [None]:
from glob import glob
from os.path import exists, isdir, basename, join, splitext

datasetpath = '/content/drive/My Drive/Docència/Reconeixement de Patrons/Notebooks/Image Classification/101_ObjectCategories/'

cat_paths = [files
              for files in glob(datasetpath + "/*")
              if isdir(files)]
cat_paths.sort()
cats = [basename(cat_path) for cat_path in cat_paths]

print(cats)

['BACKGROUND_Google', 'Faces', 'Faces_easy', 'Leopards', 'Motorbikes', 'accordion', 'airplanes', 'anchor', 'ant', 'barrel', 'bass', 'beaver', 'binocular', 'bonsai', 'brain', 'brontosaurus', 'buddha', 'butterfly', 'camera', 'cannon', 'car_side', 'ceiling_fan', 'cellphone', 'chair', 'chandelier', 'cougar_body', 'cougar_face', 'crab', 'crayfish', 'crocodile', 'crocodile_head', 'cup', 'dalmatian', 'dollar_bill', 'dolphin', 'dragonfly', 'electric_guitar', 'elephant', 'emu', 'euphonium', 'ewer', 'ferry', 'flamingo', 'flamingo_head', 'garfield', 'gerenuk', 'gramophone', 'grand_piano', 'hawksbill', 'headphone', 'hedgehog', 'helicopter', 'ibis', 'inline_skate', 'joshua_tree', 'kangaroo', 'ketch', 'lamp', 'laptop', 'llama', 'lobster', 'lotus', 'mandolin', 'mayfly', 'menorah', 'metronome', 'minaret', 'nautilus', 'octopus', 'okapi', 'pagoda', 'panda', 'pigeon', 'pizza', 'platypus', 'pyramid', 'revolver', 'rhino', 'rooster', 'saxophone', 'schooner', 'scissors', 'scorpion', 'sea_horse', 'snoopy', 's

Let's focus on only two categories, e.g. airplanes and motorbikes, in order to make the problem easier and faster to be trained. Therefore, we will have a binary classification problem. We will need to classify the images as either an image containing an airplane or a motorbike.

In [None]:
cats_used = ['airplanes', 'Motorbikes']
ncats = len(cats_used)
print(ncats)

2


We install the OpenCV library in order to extract the local features from the images.

In [None]:
!pip install opencv-contrib-python==4.4.0.44

Collecting opencv-contrib-python==4.4.0.44
  Downloading opencv_contrib_python-4.4.0.44-cp37-cp37m-manylinux2014_x86_64.whl (55.7 MB)
[K     |████████████████████████████████| 55.7 MB 1.2 MB/s 
Installing collected packages: opencv-contrib-python
  Attempting uninstall: opencv-contrib-python
    Found existing installation: opencv-contrib-python 4.1.2.30
    Uninstalling opencv-contrib-python-4.1.2.30:
      Successfully uninstalled opencv-contrib-python-4.1.2.30
Successfully installed opencv-contrib-python-4.4.0.44


We import the openCV cv2 module which will allows us to use some image transformations as well as extract the local features. Let's define a function extractSIFT, which computes the extract and computed the SIFT descriptors for a set of input_files given.

In [None]:
import cv2 as cv

def extractSIFT(input_files):
    all_features_dict = {}
    feature_extractor = cv.SIFT.create()
    for i, fname in enumerate(input_files):
        rgb = cv.cvtColor(cv.imread(fname), cv.COLOR_BGR2RGB)
        gray = cv.cvtColor(rgb, cv.COLOR_RGB2GRAY)
        kp, desc = feature_extractor.detectAndCompute(gray, None)
        all_features_dict[fname] = desc
    return all_features_dict

Let's also define another function that returns all the image filenames from a given path.

In [None]:
EXTENSIONS = [".jpg", ".bmp", ".png", ".pgm", ".tif", ".tiff"]

def get_imgfiles(path):
    all_files = []
    all_files.extend([join(path, basename(fname))
                    for fname in glob(path + "/*")
                    if splitext(fname)[-1].lower() in EXTENSIONS])
    return all_files

Then, we use both previous functions to compute the SIFT feature vectors from all the images from the two categories considered above.

In [None]:
all_files = []
all_files_labels = {}
all_features = {}
cat_label = {}

for cat, label in zip(cats_used, range(ncats)):
    cat_path = join(datasetpath, cat)
    cat_files = get_imgfiles(cat_path)
    cat_features = extractSIFT(cat_files)
    all_files = all_files + cat_files
    all_features.update(cat_features)
    cat_label[cat] = label
    for i in cat_files:
        all_files_labels[i] = label

As we have explained at the beginning of the notebook, we need to cluster the SIFT local features to generate a codebook, which will be used later to represent any image as a feature vector. In order to generate this codebook, we can use the BOWKMeansTrainer function from OpenCV. We need to specify the size of our codebook or dictionary, e.g. 100. We add all the SIFT features computed previously and we apply the function cluster to generate the codeworks or representative features of our dictionary.

In [None]:
dictionarySize = 100
BOW = cv.BOWKMeansTrainer(dictionarySize)

for feat in all_features:
    BOW.add(all_features[feat])
dictionary = BOW.cluster()

We can check the shape of our dictionary. The first dimension (100) correspond to the size of the dictionary. The second dimension (128) correspond to the number of components that any SIFT feature vector has.

In [None]:
print(dictionary.shape)

(100, 128)


Let's also check the shape of the features extracted of a given image. The first dimension (203) correspond to the number of keypoints or interesting points detected on the given image. The second dimension (128) corresponds to the number of components that any SIFT feature vector has. The number of keypoints will vary from one image to another. We will use the dictionary created before to have a fixed-size representation of any image.

In [None]:
print(all_features['/content/drive/My Drive/Docència/Reconeixement de Patrons/Notebooks/Image Classification/101_ObjectCategories/Motorbikes/image_0734.jpg'].shape)

(203, 128)


In order to generate a fixed-size representation for each image, what we need to do is assign each local feature to the nearest codework from our dictionary. To find the nearest codework we will use the BFMatcher, which checks the distance of a given feature to all codeworks features from our dictionary by Brute Force (BF). If we want to have a faster system, other approaches such as indexed trees can be used to find the nearest codework in a more efficient way, e.g. FlannBasedMatcher.

In the following code, we use the BFMatcher to find the nearest codework from our dictionary for each local feature (desc_query) from each image. Once we have found the nearest codework for all features of a given image, we compute the histogram in order to have a normalized representation. This is done because the number of features extracted from a given image varies from one to another.

We store the new feature representation in a variable named X and the corresponding label (referring to the image category) in a variable named y, which will be used later to train the image classification model.

In [None]:
from numpy import histogram
import numpy as np

matcher = cv.BFMatcher(normType=cv.NORM_L2)
all_features_BOW = {}

X = np.empty((len(all_files),dictionarySize))
y = np.empty((len(all_files),))

count = 0
for filename in all_files:
    desc_query = all_features[filename]
    matches = matcher.match(desc_query,dictionary)
    train_idxs = []
    for j in range(len(matches)):
      train_idxs.append(matches[j].trainIdx)
    hist, bin_edges = histogram(train_idxs, bins=range(dictionarySize+1),normed=True)
    all_features_BOW[filename] = hist
    X[count,:] = hist
    y[count] = all_files_labels[filename]
    count = count + 1




We already have a fixed-size feature representation based on SIFT descriptors for each image in our dataset. Therefore, we can train a classifier based on them. For that, we will use Support Vector Machines (SVM) from the Scikit-learn library. We will also use the function train_test_split from this library to split the data into two subsets, one for training and another one for testing. Then, we train the model by using the method fit from the function SVC. Once the model has been trained, we obtain the classification accuracy by using the function score.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn import svm

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
clf.score(X_test, y_test)

0.9047619047619048

Instead of just doing a single split into training and test subsets, we can also define multiple splits by using the function StratifiedKFold. 

In [None]:
from sklearn.model_selection import StratifiedKFold, KFold

scores = []
skf = StratifiedKFold(n_splits=5)
for train, test in skf.split(X, y):
  clf = svm.SVC(kernel='linear', C=1).fit(X[train], y[train])
  score = clf.score(X[test], y[test])
  scores.append(score)

print(scores)

[0.8385093167701864, 0.9503105590062112, 0.9285714285714286, 0.9003115264797508, 0.9345794392523364]
