# Laboratory #4_2 : Image Classification using Bag of Visual Words

At the end of this laboratory, you would get familiarized with

*   Creating Bag of Visual Words
    *   Feature Extraction
    *   Codebook construction
    *   Classification
*   Using pre-trained deep networks for feature extraction

**Remember this is a graded exercise.**

*   For every plot, make sure you provide appropriate titles, axis labels, legends, wherever applicable.
*   Create reusable functions where ever possible, so that the code could be reused at different places.
*   Mount your drive to access the images.
*   Add sufficient comments and explanations wherever necessary.

---

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
path = '/content/drive/MyDrive/p4_2_image_classification_using_BoVW/'

In [None]:
# Loading necessary libraries (Feel free to add new libraries if you need for any computation)

import os
import numpy as np

from skimage.feature import ORB
from skimage.color import rgb2gray
from skimage.io import imread
from scipy.cluster.vq import vq

from matplotlib import pyplot as plt

import pickle
from sklearn.cluster import MiniBatchKMeans
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import Model
from keras.backend import clear_session
from tensorflow.keras.applications.resnet import preprocess_input
from tensorflow.keras.preprocessing import image as tfimage
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

## Loading dataset

We will use 3 categories from Caltech 101 objects dataset for this experiment. Upload the dataset to the drive and mount it.

In [None]:
# modify the dataset variable with the path from your drive
dataset_path = path + r'datasets/'

In [None]:
categories = ['butterfly', 'kangaroo', 'dalmatian']
ncl = len(categories) * 10

*   Create a list of file and the corresponding labels

In [None]:
# solution
data = []
labels = []

for category in categories:
  category_path = os.path.join(dataset_path, category)
  for index, image_name in enumerate(os.listdir(category_path)):
    img_path = os.path.join(category_path, image_name)
    image = rgb2gray(imread(img_path))
    data.append(image)
    labels.append(category)

# This will be later used with the ResNet50 model
rgb_data_tf = np.empty((len(data), 224, 224, 3))

for category in categories:
  category_path = os.path.join(dataset_path, category)
  for index, image_name in enumerate(os.listdir(category_path)):
    img_path = os.path.join(category_path, image_name)
    x = tfimage.img_to_array(tfimage.load_img(img_path, target_size=(224, 224)))
    x = np.expand_dims(x, axis=0)
    rgb_data_tf[index] = x

  if __name__ == '__main__':


In [None]:
data = np.asarray(data)
labels = np.asarray(labels)

print('Total number of images:', len(data))

Total number of images: 244


  return array(a, dtype, copy=False, order=order)


*   Create a train / test split where the test is 10% of the total data

In [None]:
# solution
samples = data.shape[0]
train_perc = 0.9

indices = np.arange(0,samples)
np.random.shuffle(indices)
n_tr = round(samples*train_perc)
train_ind = indices[0:n_tr]
test_ind = indices[n_tr:samples]


x_train = data[train_ind]
y_train = labels[train_ind]
x_rgb_train = rgb_data_tf[train_ind]
x_test =  data[test_ind]
y_test = labels[test_ind]
x_rgb_test = rgb_data_tf[test_ind]

In [None]:
print('Train set:', len(x_train))
print('Test set:', len(x_test))

Train set: 220
Test set: 24


*   How do you select the train/test split?

**Solution**

First we create an indices array and we shuffle it to ensure that samples are selected independently of the order we stored them in. We later calculate the number of training instances (n_tr) and select the first n_tr indices from the shuffled array as the traininig indices. The rest of the indices correspond to test indices. 

Finally we select those indices from the dataset and the labels array to create the x_train, y_train, x_test and y_test.

## Feature Extraction using ORB

The first step is to extract descriptors for each image in our dataset. We will use ORB to extract descriptors.

*   Create ORB detector with 256 keypoints.


In [None]:
# solution
orb_detector = ORB(n_keypoints=256)

In [None]:
def obtain_descriptors(data):
  descriptors = []
  for image in data:
    orb_detector.detect_and_extract(image)
    descriptors.append(orb_detector.descriptors)
  return descriptors



*   Extract ORB descriptors from all the images in the train set.


In [None]:
# solution
orb_train_descriptors = obtain_descriptors(x_train)

*   What is the size of the feature descriptors? What does each dimension represent in the feature descriptors?

**Solution**

The shape of descriptors is like (Q, descriptor_size), Q being the number of keypoints and descriptor_size being the size of the BRIEF descriptor for every keypoint, being 256 by default (https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.ORB.detect_and_extract).



## Codebook Construction (TO DO CORRECTLY)

Codewords are nothing but vector representation of similar patches. This codeword produces a codebook similar to a word dictionary. We will create the codebook using K-Means algorithm

*   Create a codebook using K-Means with k=number_of_classes*10
*   Hint: Use sklearn.cluster.MiniBatchKMeans for K-Means

In [None]:
all_orb_descriptors = []

for descriptors in orb_train_descriptors:
  for descriptor in descriptors:
    all_orb_descriptors.append(descriptor)

In [None]:
# solution
kmeans = MiniBatchKMeans(n_clusters=ncl)
kmeans = kmeans.fit(all_orb_descriptors)
predictions = kmeans.predict(all_orb_descriptors)

*   Create a histogram using the cluster centers for each image descriptor.
    *   Remember the histogram would be of size *n_images x n_clusters*.

In [None]:
# solution
def create_histogram_list(data):
    histogram_list = np.zeros((len(data), ncl))
    for descriptor_ind in range(len(data)):
      hist, _ = np.histogram(data[descriptor_ind], bins=ncl)
      histogram_list[descriptor_ind] = hist #/ desc.shape[0]
    return histogram_list


# Creating Classification Model

*   The next step is to create a classification model. We will use a C-Support Vector Classification for creating the model.



In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

*   Use GridSearchCV to find the optimal value of C and Gamma.

# Testing the Classification Model

*   Extract descriptors using ORB for the test split
*   Use the previously trained k-means to generate the histogram
*   Use the classifier to predict the label


In [None]:
pip install scikit-multilearn

Collecting scikit-multilearn
  Downloading scikit_multilearn-0.2.0-py3-none-any.whl (89 kB)
[?25l[K     |███▊                            | 10 kB 23.6 MB/s eta 0:00:01[K     |███████▍                        | 20 kB 24.1 MB/s eta 0:00:01[K     |███████████                     | 30 kB 27.0 MB/s eta 0:00:01[K     |██████████████▊                 | 40 kB 31.0 MB/s eta 0:00:01[K     |██████████████████▍             | 51 kB 34.3 MB/s eta 0:00:01[K     |██████████████████████          | 61 kB 32.8 MB/s eta 0:00:01[K     |█████████████████████████▊      | 71 kB 30.1 MB/s eta 0:00:01[K     |█████████████████████████████▍  | 81 kB 30.8 MB/s eta 0:00:01[K     |████████████████████████████████| 89 kB 7.3 MB/s 
[?25hInstalling collected packages: scikit-multilearn
Successfully installed scikit-multilearn-0.2.0


In [27]:
# solution
# using binary relevance
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.naive_bayes import GaussianNB

# initialize binary relevance multi-label classifier
# with a gaussian naive bayes base classifier
classifier = BinaryRelevance(GaussianNB())
# train
classifier.fit(x_train, y_train)
# predict
predictions = classifier.predict(x_test)
# accuracy
#orb_train_descriptors = obtain_descriptors(x_test)
#create_histogram_list(predictions)


*   Calculate the accuracy score for the classification model

In [None]:
# solution
print("Accuracy = ",accuracy_score(y_test,predictions))



*   Generate the confusion matrix for the classification model

In [None]:
# solution
from sklearn.metrics import multilabel_confusion_matrix
confusion=multilabel_confusion_matrix(y_test, predictions)

NameError: ignored

*   Why do we use Clustering to create the codebook? 
*   What are the other techniques that can be used to create the codebook?

**Solution**

*(Double-click or enter to edit)*

...

*   Will adding more keypoints increase the performanc of the algorithm?

**Solution**

*(Double-click or enter to edit)*

...

# Extracting features from Deep Network

It is quite possible to extract features (similar to SIFT or ORB) from different layers of deep network.

*   Load ResNet50 model with imagenet weights and check the summary of the model
*   Create a model to extract features from the 'avg_pool' layer.
*   Extract features from the layer for all the train images.

In [None]:
# solution
clear_session()

resnet50_model = ResNet50(weights='imagenet')
resnet50_model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5
Model: "resnet50"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                    

In [None]:
model = Model(inputs=resnet50_model.input, outputs=resnet50_model.get_layer('avg_pool').output)
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                              

*   What is the size of the feature descriptors?

In [None]:
# solution
preprocessed_train = preprocess_input(x_rgb_train)
feature_descriptors_deep = model.predict(preprocessed_train)

print('Shape of feature descriptors: ', feature_descriptors_deep.shape)

Shape of feature descriptors:  (220, 2048)


*   Create codebook using the extracted features

In [None]:
# solution

all_descriptors_deep = np.reshape(feature_descriptors_deep, (-1, 2048))
print('Shape of all descriptors: ', all_descriptors_deep.shape)

kmeans = MiniBatchKMeans(n_clusters=ncl)
y_pred_deep = kmeans.fit_predict(all_descriptors_deep)
image_labels_deep = np.reshape(y_pred_deep, (-1, 220))
codebook = create_histogram_list(image_labels_deep)
print('Shape of codebook: ', codebook.shape)

Shape of all descriptors:  (220, 2048)
Shape of codebook:  (1, 30)


*   Train SVM classifier using the codebook

In [27]:
# solution

# Enlisting the possible values of gamma and C
gammas = np.array([1e-4, 1e-3, 2e-3, 2e-2, 1e-2, 0.1, 0.2])
C = np.array([1,2,3,4,8,16,32,64])

svc = SVC(decision_function_shape='ovr')

# Finding the best params for training
clf = GridSearchCV(estimator=svc, param_grid=dict(gamma=gammas, C=C), n_jobs=-1)

clf.fit(codebook, y_train)

print('Best accuracy:', clf.best_score_) 
print('The best value of gamma:', clf.best_estimator_.gamma)
print('The best value of C:', clf.best_estimator_.C)

*   Evaluate the test set using the above method

In [27]:
# solution
y_pred = clf.predict(x_test)

*   Calculate the accuracy score and confusion matrix for the classification model

In [None]:
# solution
print("Accuracy = ",accuracy_score(y_test,y_pred))
confusion=multilabel_confusion_matrix(y_test, y_pred)

*   Compare the performance of both the BoVW models. Which model works better and why?

**Solution**

...

The Later model performs better because it gives a better accuracy score compared to previous model

*   Can the performance of pre-trained model increased further? If so, how?

**Solution**


...

*   What happens if the test image does not belong to any of the trained classes?

**Solution**

...

*   Combine the features extracted using ORB and Deep Neural Network.
*   Create a codebook with the combined features
*   Train a SVM classifier using the generated codebook and evaluate the performance using accuracy and confusion matrix.

In [None]:
# solution - 



*   Do the combined features increase the performance of the classifier?

**Solution**

*(Double-click or enter to edit)*

...

## t-distributed Stochastic Neighbor Embedding (Optional).

In order to visualize the features of a higher dimension data, t-SNE is used. t-SNE converts the affinities of the data points to probabilities. It recreates the probability distribution in a low-dimensional space. It is very helpful in visualizing features of different layers in a neural network.

You can find more information about t-SNE [here](https://scikit-learn.org/stable/modules/manifold.html#t-distributed-stochastic-neighbor-embedding-t-sne)

In [None]:
from sklearn.manifold import TSNE

model = TSNE(n_components=2, random_state=0)

np.set_printoptions(suppress=True)

low_embedding = model.fit_transform(histogram) 

plt.figure(figsize=(20,10))
for cat, label in zip(categories, np.unique(y_train)):
    subData = low_embedding[y_train == label]
    plt.scatter(subData[:, 0], subData[:, 1], label=cat)
plt.title("TSNE visualization")
plt.legend()
plt.show()

*   What do you infer from the t-SNE plot?

**Solution**

...


---

## **End of P4_2: Image Classification using Bag of Visual Words**
Deadline for P4_2 submission in CampusVirtual is: **Monday, the 6th of December, 2021**