### A.	[LO 2] Build Visual Vocabulary from the dataset using the Visual Bag of Words algorithm. Explain comprehensively how the algorithms work! 


First step is to import the libraries we are using. We will need the KMeans cluster function, provided by scikit-learn in order to create the word clusters and to compute the centroid for each clusters we will need to calculate the distance from each point to the clusters, thus we also need the distance function provided by scipy.
Other libraries will be used for image processing.

In [1]:
import cv2
import os
import numpy as np 
from scipy.spatial import distance
from sklearn.cluster import KMeans

First we need to load the dataset we will use. Here I'm using a local dataset and thus we need to import them beforehand (dataset can be accessed through this link: ).
Loading the dataset will be done both for the training and testing data, storing them into a dictionary as the class labels.

In [2]:
def load_images(path):
    image_dict = {}
    for animal_type in os.listdir(path):
        animal_type_image = []
        type_path = f"{path}/{animal_type}"
        for image in os.listdir(type_path): 
            img = cv2.imread(f"{type_path}/{image}", 0)
            if img is not None: 
                img = cv2.resize(img, (100,100))
                animal_type_image.append(img)
        image_dict[animal_type] = animal_type_image
    return image_dict

In [3]:
train_set = load_images('cat-and-dog/training_set/training_set/')

In [4]:
test_set = load_images('cat-and-dog/test_set/test_set/')

Below is the function to get local features of the image using SIFT. In one dataset consists of two categories: cat and dog. These categories will be iterated separatedly, by getting the keypoints and descriptors for every image in both categories (cat and dog). The descriptor for all images will be combined in the descriptors list, while the descriptors for specific class will be stored in the sift_vectors dictionary. This function will return a list that has 2 values; the first value is the combined descriptors of all images and the second value is the visual dictionary of our descriptors from specific classes. 

In [5]:
def sift_features(dataset):
    sift_vectors = {}
    descriptors = []
    sift = cv2.SIFT_create()

    for category, animal_img in dataset.items():
        features = []
        for img in animal_img:
            kp, desc = sift.detectAndCompute(img, None)
            descriptors.extend(desc)
            features.append(desc)
        sift_vectors[category] = features

    return [descriptors, sift_vectors]

In [6]:
compute_sift = sift_features(train_set)
descriptor_list = compute_sift[0]
bovw_dictionary = compute_sift[1]

Next, after getting all the descriptors, we can pass it to the K-Means clustering algorithm to find the visual words, which are the center points for each cluster. here, we set the K (cluster groups) to 100. 

In [7]:
def kmeans(k, desc):
    kmeans = KMeans(n_clusters=k, n_init=10)
    kmeans.fit(desc)
    visual_bow = kmeans.cluster_centers_

    return visual_bow

In [8]:
visual_bow = kmeans(100, descriptor_list)

### B.	[LO 3] Use K-NN to predict the object (i.e., dog or cat) and explain the results!

As what we did with the training data, we will also compute the features for the test dataset and store it into a dictionary. 

In [9]:
test_sift = sift_features(test_set)
test_dictionary = test_sift[1]

Compute the histogram for train and test dataset so we may compare the accuracy of the prediction.

In [10]:
def data_hist(bovw, centroid):
    feature_dict = {}
    for category, value in bovw.items():
        histograms = []
        for img in value: 
            hist = cv2.calcHist([img], [0], None, [256], [0,256])
            histograms.append(hist)
        feature_dict[category] = hist
    return feature_dict

In [11]:
bovw_hist_train = data_hist(bovw_dictionary, visual_bow)
bovw_hist_test = data_hist(test_dictionary, visual_bow)

Next, we will compute the prediction for the test images using the KNN method. Below I had it done from scratch.

So first, we will initiate the test_count as 0 (to count the times we run the test), correct as 0 (to keep track the correct predictions) and initializing the cat_or_dog dictionary, to store the prediction results. 

After that we will start iterating the test dataset. Previously we stored it in a dictionary, where the label marks the category of the image and the value are the images. We also take the test_category of the images because later on we will be counting the correctness of the prediction labels for each image in the test dataset. 

So now we will conduct the iteration for all the test images. First we will also initiate three variables: predict (for the starting checkpoint), min_dist (to keep track on the minimum distance from the centroid) and pred_category (which later on will be replaced with the predicted category result).

Then we can count the distance between our train and test image keypoints using the Euclidean distance. The first iteration distance will be stored as the min_dist, while if during the next iterations we found another shorter distance than the previous one, it will replace the previous minimum distance. 
[IMPORTANT!] Don't forget to store the category of the train image that has the minimum distance with the test image. The shortest distance between the test image and the train image will be considered to be one categorized. 

After iterating through all the train images, we will check whether the current predicted category with the test category (the one we store during the start of the test_loop) has the same label, we will add one to our correct variable and to the current category we are predicting (cat/dog). 

This function will return a list consists of test_count (times we run the function), correct (number of correctly predicted labels), and the cat_or_dog dictionary (contains the number of predicted test images for each category).

In [12]:
def knn(images, test):
    test_count = 0
    correct = 0
    cat_or_dog = {}

    for test_category, test_value in test.items():
        cat_or_dog[test_category] = [0,0]

        # iterating the images in test dataset
        for test_img in test_value:
            predict = 0
            min_dist = 0
            pred_category = "animal"
            # iterating the images in train dataset 
            for train_category, train_value in images.items():
                for train_img in train_value: 
                    if (predict==0): # setting the first data 
                        min_dist = distance.euclidean(test_img, train_img)
                        pred_category = train_category
                        predict += 1
                    else: # after the first data 
                        dist = distance.euclidean(test_img, train_img)

                        if (dist < min_dist): 
                            min_dist = dist
                            pred_category = train_category

            if (test_category == pred_category):
                correct += 1
                cat_or_dog[test_category][0] += 1
            test_count += 1
            cat_or_dog[test_category][1] += 1

    return [test_count, correct, cat_or_dog]


In [13]:
result = knn(bovw_hist_train, bovw_hist_test)

Last step is to check the accuracy of our program. To do that we will count the number of correctly predicted test images divided with the number of test we made, which will result in the accuracy average of the test image. 
We will also print out the accuracy for each class category (cat & dog) by dividing the result of correctly predicted test images with the number of test made for that category.

In [14]:
def accuracy(result):
    acc_avg = (result[1] / result[0]) * 100 #correct prediction / number of test made 
    print(f"Average accuracy: {acc_avg}%")
    print("Class based on accuracy: ")
    for category, value in result[2].items():
        print(f"Class {category}: {value[0]/value[1] * 100}%")

In [15]:
accuracy(result)

Average accuracy: 50.9765625%
Class based on accuracy: 
Class cats: 84.765625%
Class dogs: 17.1875%


In [16]:
print(f"Test counts: {result[0]} || Correct Prediction: {result[1]}")

Test counts: 512 || Correct Prediction: 261


From the results above we can see that the average accuracy is 50.977%, which can still be tuned more to get better results especially for the dogs dataset, due to the accuracy for the cats category is rather well (84.766%) while the dogs category still has a low accuracy score (17.188%).