## Nearest Neighbor experiment

Simple experiment with just training and test data (no cross-validation)

In [1]:
import numpy as np
import tarfile
import pickle
import tensorflow as tf
from NearestNeighbor import NearestNeighbor
import time

import CIFAR-10 dataset and divide it into training and test: I'll be using tensorflow==2.20 API.

I'm not using GPU config from tf because we won't be using tf for neural networks, just to use the main API

In [2]:
cifar10 = tf.keras.datasets.cifar10
(Xtr, Ytr), (Xte, Yte) = cifar10.load_data()

print(f'x_train shape: {Xtr.shape}')
print(f'y_train shape: {Ytr.shape}')
print(f'x_test shape: {Xte.shape}')
print(f'y_test shape: {Yte.shape}')

x_train shape: (50000, 32, 32, 3)
y_train shape: (50000, 1)
x_test shape: (10000, 32, 32, 3)
y_test shape: (10000, 1)


In [3]:
# flatten images to be one-dimensional
Xtr = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3)
Xte = Xte.reshape(Xte.shape[0], 32 * 32 * 3) 

print(f'x_train flatten shape: {Xtr.shape}')
print(f'x_test flatten shape: {Xte.shape}')

x_train flatten shape: (50000, 3072)
x_test flatten shape: (10000, 3072)


In [4]:
# call NearestNeighbor() from NearestNeighbor code
# now we havbe our training and test set, such as the labels
start_time = time.time()
nn = NearestNeighbor() 
nn.train(Xtr, Ytr, dist='l2') 
Yte_predict = nn.predict(Xte) 

acc = np.mean(Yte_predict == Yte)
end_time = time.time()
execution_time = end_time - start_time
print(f'accuracy Nearest Neighbor: {acc}')
print(f'Execution time of Nearest Neighbor: {execution_time}s')

accuracy Nearest Neighbor: 0.1
Execution time of Nearest Neighbor: 1972.184007883072s


We can see that the accuracy is very low and it took around 30m to test. Training can be expensive, but testing must be fast so we can implement it on any devices. NN is not a good choice for images! 

let's try it with l2 distance (euclidian)

$d_{1}(I_{1}, I_{2}) = \sum_{p}^{} |I^{p}_{1} - I^{p}_{2}| = l_{1}$

$d_{2}(I_{1}, I_{2}) = \sqrt{\sum_{p}^{}(I^{p}_{1} - I^{p}_{2})^{2}} = l_{2}$

In [5]:
start_time = time.time()
nn = NearestNeighbor() 
dist = 'l1'
nn.train(Xtr, Ytr, dist=dist) 
Yte_predict = nn.predict(Xte) 

acc = np.mean(Yte_predict == Yte)
end_time = time.time()
execution_time = end_time - start_time
print(f'accuracy Nearest Neighbor with {dist} distance: {acc}')
print(f'Execution time of Nearest Neighbor: {execution_time}s')

accuracy Nearest Neighbor with l1 distance: 0.1
Execution time of Nearest Neighbor: 1881.576533794403s
