# Image Recognition

In this workbook I practice:
* Basic machine learning models including K-means, K-NN, SVM, and Adaboost using scit-learn
* Bag-of-Words model
* Applying machine learning models for image recognition

In [2]:
import numpy as np
import cv2 as cv
from sklearn.cluster import KMeans

In `./resources/FoodImages`, there are two folders: Train and Test containing training and test images respectively.
Each Train/Test folder contains three sub-folders corresponding to three different food types including `Cakes`, `Pasta`,
and `Pizza`. 

For each food category, there are equal numbers of images (30 images) used for training and testing. I build a BoW model for food image recognition based on the training images of the supplied food database.

The Dictionary class below is developed to build BoW models using K-means algorithm.

In [3]:
class Dictionary(object):
    def __init__(self, name, img_filenames, num_words):
        self.name = name #name of your dictionary
        self.img_filenames = img_filenames #list of image filenames
        self.num_words = num_words #the number of words

        self.training_data = [] #this is the training data required by the K-Means algorithm
        self.words = [] #list of words, which are the centroids of clusters

    def learn(self):
        sift = cv.xfeatures2d.SIFT_create()

        num_keypoints = [] #this is used to store the number of keypoints in each image

        #load training images and compute SIFT descriptors
        for filename in self.img_filenames:
            img = cv.imread(filename)
            img_gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
            list_des = sift.detectAndCompute(img_gray, None)[1]
            if list_des is None:
                num_keypoints.append(0)
            else:
                num_keypoints.append(len(list_des))
                for des in list_des:
                    self.training_data.append(des)

        #cluster SIFT descriptors using K-means algorithm
        kmeans = KMeans(self.num_words)
        kmeans.fit(self.training_data)
        self.words = kmeans.cluster_centers_

        #create word histograms for training images
        training_word_histograms = [] #list of word histograms of all training images
        index = 0
        for i in range(0, len(self.img_filenames)):
            #for each file, create a histogram
            histogram = np.zeros(self.num_words, np.float32)
            #if some keypoints exist
            if num_keypoints[i] > 0:
                for j in range(0, num_keypoints[i]):
                    histogram[kmeans.labels_[j + index]] += 1
                index += num_keypoints[i]
                histogram /= num_keypoints[i]
                training_word_histograms.append(histogram)

        return training_word_histograms

    def create_word_histograms(self, img_filenames):
        sift = cv.xfeatures2d.SIFT_create()
        histograms = []

        for filename in img_filenames:
            img = cv.imread(filename)
            img_gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
            descriptors = sift.detectAndCompute(img_gray, None)[1]

            histogram = np.zeros(self.num_words, np.float32) #word histogram for the input image

            if descriptors is not None:
                for des in descriptors:
                    #find the best matching word
                    min_distance = 1111111 #this can be any large number
                    matching_word_ID = -1 #initial matching_word_ID=-1 means no matching

                    for i in range(0, self.num_words): #search for the best matching word
                        distance = np.linalg.norm(des - self.words[i])
                        if distance < min_distance:
                            min_distance = distance
                            matching_word_ID = i

                    histogram[matching_word_ID] += 1

                histogram /= len(descriptors) #normalise histogram to frequencies

            histograms.append(histogram)

        return histograms

First create a list of train and test file names to pass into the dictionary class.

In [15]:
import os

foods = ['Cakes', 'Pasta', 'Pizza']
path = './resources/FoodImages/'

training_file_names = []
training_food_labels = []

test_file_names = []
test_food_labels = []

for i in range(0, len(foods)):
    sub_path = path + 'Train/' + foods[i] + '/'
    sub_file_names = [os.path.join(sub_path, f) for f in os.listdir(sub_path)]
    sub_food_labels = [i] * len(sub_file_names)
    training_file_names += sub_file_names
    training_food_labels += sub_food_labels
    
for i in range(0, len(foods)):
    sub_path = path + 'Test/' + foods[i] + '/'
    sub_file_names = [os.path.join(sub_path, f) for f in os.listdir(sub_path)]
    sub_food_labels = [i] * len(sub_file_names)
    test_file_names += sub_file_names
    test_food_labels += sub_food_labels

Create an instance of the Dictionary class and then train the BoW model and extract the word historgrams for teh train and test sets

In [16]:
num_words = 50
dictionary_name = 'food'
dictionary = Dictionary(dictionary_name, training_file_names, num_words)

In [17]:
training_word_histograms = dictionary.learn()
test_word_histograms = dictionary.create_word_histograms(test_file_names)

In [18]:
import pickle

with open('food_dictionary.dic', 'wb') as f:
    pickle.dump(dictionary, f)

In [19]:
with open('food_dictionary.dic', 'rb') as f: #'rb' is for binary read
    dictionary = pickle.load(f)

## 2. k-NN

In [20]:
from sklearn.neighbors import KNeighborsClassifier

In [21]:
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.metrics import classification_report, accuracy_score

In [22]:
ks =  [5, 10, 15, 20, 25, 30]

In [23]:
for k in ks:
    knn = KNeighborsClassifier(n_neighbors = k)
    knn.fit(training_word_histograms, training_food_labels)
    predicted_food_labels = knn.predict(test_word_histograms)
    print(f"""accuracy for k={k}: {accuracy_score(test_food_labels, predicted_food_labels)*100:.2f}%
confusion matrix:
{confusion_matrix(test_food_labels, predicted_food_labels)}
--------------------------------------------
""")

accuracy for k=5: 73.33%
confusion matrix:
[[20  4  6]
 [ 0 27  3]
 [ 1 10 19]]
--------------------------------------------

accuracy for k=10: 70.00%
confusion matrix:
[[14  7  9]
 [ 0 27  3]
 [ 0  8 22]]
--------------------------------------------

accuracy for k=15: 66.67%
confusion matrix:
[[13  7 10]
 [ 0 27  3]
 [ 0 10 20]]
--------------------------------------------

accuracy for k=20: 64.44%
confusion matrix:
[[11  9 10]
 [ 0 26  4]
 [ 0  9 21]]
--------------------------------------------

accuracy for k=25: 62.22%
confusion matrix:
[[ 9 10 11]
 [ 0 27  3]
 [ 0 10 20]]
--------------------------------------------

accuracy for k=30: 60.00%
confusion matrix:
[[ 8 11 11]
 [ 0 27  3]
 [ 0 11 19]]
--------------------------------------------



## 3. SVM

In [24]:
from sklearn import svm

In [25]:
cs = [10, 20, 30, 40, 50]

In [26]:
for c in cs:
    svm_classifier = svm.SVC(C = c,
                         kernel = 'linear')
    svm_classifier.fit(training_word_histograms, training_food_labels)
    predicted_food_labels = svm_classifier.predict(test_word_histograms)
    print(f"""accuracy for C={c}: {accuracy_score(test_food_labels, predicted_food_labels)*100:.2f}%
confusion matrix:
{confusion_matrix(test_food_labels, predicted_food_labels)}
--------------------------------------------
""")

accuracy for C=10: 78.89%
confusion matrix:
[[25  3  2]
 [ 0 25  5]
 [ 1  8 21]]
--------------------------------------------

accuracy for C=20: 82.22%
confusion matrix:
[[26  2  2]
 [ 0 24  6]
 [ 1  5 24]]
--------------------------------------------

accuracy for C=30: 82.22%
confusion matrix:
[[26  2  2]
 [ 0 24  6]
 [ 1  5 24]]
--------------------------------------------

accuracy for C=40: 84.44%
confusion matrix:
[[27  1  2]
 [ 0 24  6]
 [ 1  4 25]]
--------------------------------------------

accuracy for C=50: 85.56%
confusion matrix:
[[28  1  1]
 [ 0 24  6]
 [ 1  4 25]]
--------------------------------------------



## 4.. AdaBoost

In [27]:
from sklearn.ensemble import AdaBoostClassifier

In [28]:
n_estmators =  [50, 100, 150, 200, 250, 300, 400, 500]

In [29]:
for n in n_estmators:
    adb_classifier = AdaBoostClassifier(n_estimators = n,
                                        random_state = 0)

    adb_classifier.fit(training_word_histograms, training_food_labels)
    predicted_food_labels = adb_classifier.predict(test_word_histograms)
    print(f"""accuracy for n={n}: {accuracy_score(test_food_labels, predicted_food_labels)*100:.2f}%
confusion matrix:
{confusion_matrix(test_food_labels, predicted_food_labels)}
--------------------------------------------
""")

accuracy for n=50: 65.56%
confusion matrix:
[[17  2 11]
 [ 1 17 12]
 [ 3  2 25]]
--------------------------------------------

accuracy for n=100: 61.11%
confusion matrix:
[[17  3 10]
 [ 1 16 13]
 [ 3  5 22]]
--------------------------------------------

accuracy for n=150: 68.89%
confusion matrix:
[[21  3  6]
 [ 1 20  9]
 [ 2  7 21]]
--------------------------------------------

accuracy for n=200: 75.56%
confusion matrix:
[[21  3  6]
 [ 0 23  7]
 [ 0  6 24]]
--------------------------------------------

accuracy for n=250: 76.67%
confusion matrix:
[[20  3  7]
 [ 0 22  8]
 [ 0  3 27]]
--------------------------------------------

accuracy for n=300: 74.44%
confusion matrix:
[[20  4  6]
 [ 0 23  7]
 [ 0  6 24]]
--------------------------------------------

accuracy for n=400: 75.56%
confusion matrix:
[[21  3  6]
 [ 0 22  8]
 [ 0  5 25]]
--------------------------------------------

accuracy for n=500: 77.78%
confusion matrix:
[[21  4  5]
 [ 0 22  8]
 [ 0  3 27]]
-----------------------

## Conclusion

Upon testing 3 different algorithms with various parameters. The best model parameters for each algorithm was:
* k-NN: k=5
* SVM: C=40 & 50 (2-way tie)
* AdaBoost: n_estimators = 150, 200, 300 (3-way tie)

Overall, the best algorithm was the SVM achieving the highest test set accuracy of 85.56%. 