# Bag of Visual Words Image Classification

## Implementation

### Pipeline

The process is divided into 3 independent steps using a scikit `Pipeline`. Its input is a list of pictures, with the keypoints and the descriptors. The steps are:

- SpatialPyramid: performs the clustering of the descriptors and calculates the histograms.
- StandardScaler: standarizes each column of the feature matrix.
- SVC: performs the classification.

Internally, the pipeline uses a cache in order to speed up repeated operations.

### GridSearchCV

We use scikit-learn class `GridSearchCV` in order to compute the parameter combinations. Internally, it performs cross-validation with the different combinations and returns the best result or, in our case, a table with all the results. This table is then wrapped around a pandas `DataFrame` and returned.

We are using 5 folds for the cross-validation and creating as many jobs as cores in the computer.

### Histogram intersection kernel

We use the following implementation for the histogram intersection kernel:

```python
def histogram_intersection_kernel(x, u):
    n_samples, n_features = x.shape
    K = np.zeros((x.shape[0], u.shape[0]), dtype=np.float32)
    for d in range(n_samples):
        K[d, :] = np.sum(np.minimum(x[d], u), axis=1)
    return K
```

In [None]:
from argparse import Namespace

import numpy as np
from matplotlib import pyplot as plt

from descriptors.histogram_intersection_kernel import histogram_intersection_kernel
from main import main


def run_experiment(param_grid: dict):
    args = Namespace(train_path='../data/MIT_split/train',
                     test_path='../data/MIT_split/test',
                     cache_path='../.cache')
    return main(args, param_grid)


### Experiments
#### Codebook sampling
Since we run Dense SIFT with a small `step_size` and for several `scales` at each location, we get a great amount of descriptor vectors. But we don't need all of them for the construction of the codebook, we can get away with a random subset of those. In this experiment, we test different amounts of samples used for the creation of the codeebok.

In [None]:
param_grid = {
    'transformer__n_samples': np.linspace(10000, 100000, 5, dtype=int),
}
results = run_experiment(param_grid)

results.plot.line(x='param_transformer__n_samples', y='mean_test_score')
plt.xlabel('samples')
plt.ylabel('accuracy')
plt.legend(loc='best')
plt.show()

Comment results above

#### Codebook size
In this experiment, we test different codebook sizes or, in other words, different number of clusters for the K-Means algorithm used to generate the codebook. We try sizes multiples of 2.

In [None]:
param_grid = {
    'transformer__n_clusters': np.logspace(7, 10, 8, base=2, dtype=int),
    'transformer__n_levels': [1]
}
results = run_experiment(param_grid)

results.plot.line(x='param_transformer__n_clusters', y='mean_test_score')
plt.xlabel('n_clusters')
plt.ylabel('accuracy')
plt.legend(loc='best')
plt.show()

Comment results above

#### Normalization of descriptors
In this experiment, we test different types of normalization of the descriptor vectors. We try L1-norm, L2-norm and Power-norm, which consists in applying to each dimension the following function:

$f(z) = \operatorname{sign}(z)|z|^{\alpha}$

where $0 \leq \alpha \leq 1$ is a parameter of the normalization (https://www.robots.ox.ac.uk/~vgg/rg/papers/peronnin_etal_ECCV10.pdf).

In [None]:
param_grid = {
    'transformer__norm': ['l1', 'l2', 'power'],
}
results = run_experiment(param_grid)

# Colormap needed until a bug is fixed in next version of pandas.
results.plot.bar(x='param_transformer__norm', y='mean_test_score', colormap='jet')
plt.xlabel('norm')
plt.ylabel('accuracy')
plt.legend(loc='best')
plt.show()

Comment results above

#### Spatial Pyramid levels
In this experiment, we test different number of levels for the spatial pyramid that takes into account the location of the descriptors to generate a global image descriptor.

In [None]:
param_grid = {
    'transformer__n_levels': np.linspace(1, 3, 3, dtype=int),
}
results = run_experiment(param_grid)

# Colormap needed until a bug is fixed in next version of pandas.
results.plot.bar(x='param_transformer__n_levels', y='mean_test_score', colormap='jet')
plt.xlabel('n_levels')
plt.ylabel('accuracy')
plt.legend(loc='best')
plt.show()

Comment results above

#### Kernel type and penalty parameter
In this experiment, we test different kernels and values for the penalty parameter `C` of the error term of the classifier. We also investigate how different number of spatial pyramid levels may affect.

In [None]:
param_grid = {
    'classifier__kernel': ['linear', 'rbf', 'sigmoid', histogram_intersection_kernel],
    'classifier__C': np.logspace(-3, 15, 5, base=2),
    'transformer__n_levels': [1, 2]
}

results = run_experiment(param_grid)
results.loc[results.param_classifier__kernel == histogram_intersection_kernel, 'param_classifier__kernel'] = \
    "histogram_intersection"

results.pivot(index='param_classifier__C', columns='param_classifier__kernel', values='mean_test_score') \
    .plot.line(logx=True)

plt.xlabel('C')
plt.ylabel('accuracy')
plt.legend(loc='best')

plt.show()

Comment results above. With `n_levels=1` RBF kernel should perform much better.

#### Kernel type and kernel coefficient
In this experiment, we test different kernels and values for the kernel coefficient `gamma` of the classifier.

In [None]:
param_grid = {
    'classifier__kernel': ['linear', 'rbf', 'sigmoid', histogram_intersection_kernel],
    'classifier__gamma': np.logspace(-15, 3, 5, base=2)
}

results = run_experiment(param_grid)
results.loc[results.param_classifier__kernel == histogram_intersection_kernel, 'param_classifier__kernel'] = \
    "histogram_intersection"

results.pivot(index='param_classifier__gamma', columns='param_classifier__kernel', values='mean_test_score') \
    .plot.line(logx=True)

plt.xlabel('gamma')
plt.ylabel('accuracy')
plt.legend(loc='best')

plt.show()

Comment results above