<a href="https://colab.research.google.com/github/shankch/reverse-image/blob/main/4_calcuate_accuracy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Similarity Search: Calculate Accuracy

So far we experimented with different visualization techniques on the results, t-SNE and PCA on the results. Now we will calculate the accuracies of the features obtained from the pretrained and finetuned models.

In [1]:
#from google.colab import drive
#drive.mount('/content/drive')
#!mkdir data
#!cp -a drive/MyDrive/c99t1/* data/ #copying from gdrive to data directory

Mounted at /content/drive


In [7]:
import numpy as np
import pickle
from tqdm import tqdm, tqdm_notebook
import random
import time
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import PIL
from PIL import Image
from sklearn.neighbors import NearestNeighbors

import glob
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

For these experiments we will use the same features of the UC Merced Land Use Dataset that we were using before.

Let's utilize the features from the previously trained model.

In [8]:
filenames = pickle.load(open('data/filenames-udset.pickle', 'rb'))
feature_list = pickle.load(open('data/features-udset-resnet.pickle',
                                'rb'))
class_ids = pickle.load(open('data/class_ids-udset.pickle', 'rb'))

num_images = len(filenames)
num_features_per_image = len(feature_list[0])
print("Number of images = ", num_images)
print("Number of features per image = ", num_features_per_image)

Number of images =  2100
Number of features per image =  2048


First, let's make a helper function that calculates the accuracy of the resultant features using the nearest neighbors brute force algorithm.

In [9]:
# Helper function to get the classname
def classname(str):
    return str.split('/')[-2]


# Helper function to get the classname and filename
def classname_filename(str):
    return str.split('/')[-2] + '/' + str.split('/')[-1]

def calculate_accuracy(feature_list):
    num_nearest_neighbors = 5
    correct_predictions = 0
    incorrect_predictions = 0
    neighbors = NearestNeighbors(n_neighbors=num_nearest_neighbors,
                                 algorithm='brute',
                                 metric='euclidean').fit(feature_list)
    for i in tqdm_notebook(range(len(feature_list))):
        distances, indices = neighbors.kneighbors([feature_list[i]])
        for j in range(1, num_nearest_neighbors):
            if (classname(filenames[i]) == classname(filenames[indices[0][j]])):
                correct_predictions += 1
            else:
                incorrect_predictions += 1
    print("Accuracy is ",round(100.0 * correct_predictions / (1.0 * correct_predictions + incorrect_predictions), 2))

### 1. Accuracy of Brute Force over UC Merced Land Use features

In [10]:
# Calculate accuracy
calculate_accuracy(feature_list[:])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


  0%|          | 0/2100 [00:00<?, ?it/s]

Accuracy is  87.31


### 2. Accuracy of Brute Force over the PCA compressed UC Merced Land Use features

In [11]:
num_feature_dimensions = 100
pca = PCA(n_components=num_feature_dimensions)
pca.fit(feature_list)
feature_list_compressed = pca.transform(feature_list[:])

Let's calculate accuracy over the compressed features.

In [12]:
calculate_accuracy(feature_list_compressed[:])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


  0%|          | 0/2100 [00:00<?, ?it/s]

Accuracy is  87.8


### 3. Accuracy of Brute Force over the finetuned UC Merced Land Use features

In [13]:
# Use the features from the finetuned model
filenames = pickle.load(open('data/filenames-udset.pickle', 'rb'))
feature_list = pickle.load(
    open('data/features-udset-resnet-finetuned.pickle', 'rb'))
class_ids = pickle.load(open('data/class_ids-udset.pickle', 'rb'))

In [14]:
num_images = len(filenames)
num_features_per_image = len(feature_list[0])
print("Number of images = ", num_images)
print("Number of features per image = ", num_features_per_image)

Number of images =  2100
Number of features per image =  21


In [15]:
# Calculate accuracy
calculate_accuracy(feature_list[:])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


  0%|          | 0/2100 [00:00<?, ?it/s]

Accuracy is  96.65


### 4. Accuracy of Brute Force over the PCA compressed finetuned UC Merced Land Use features

In [24]:
# Perform PCA
num_feature_dimensions = 15
pca = PCA(n_components=num_feature_dimensions)
pca.fit(feature_list)
feature_list_compressed = pca.transform(feature_list[:])

In [25]:
# Calculate accuracy over the compressed features
calculate_accuracy(feature_list_compressed[:])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


  0%|          | 0/2100 [00:00<?, ?it/s]

Accuracy is  96.35


### Accuracy 

These results lead to the accuracy on UC Merced Land Use Dataset. 

Accuracy on UC Merced Land Use Dataset.

| Algorithm | Accuracy using Pretrained features| Accuracy using Finetuned features | 
|-------------|----------------------------|------------------------|
| Brute Force | 87.31 | 96.65 | 
| PCA + Brute Force | 87.80  |  96.35 |
