## Image Classification

In [52]:
# dataset: http://www.cs.toronto.edu/~kriz/cifar.html
# http://cs231n.stanford.edu/syllabus.html
# decompose the dataset and 
def unpickle(file):
    import cPickle
    fo = open(file, 'r')
    dict = cPickle.load(fo)
    fo.close()
    return dict

# Extract training set and test set
from numpy import *
import glob

dic_1 = unpickle("cifar10/data_batch_1")
training_set = dic_1["data"]
training_label = dic_1["labels"]

test_data = unpickle("cifar10/test_batch")
test_set = test_data["data"]
test_label = test_data["labels"]

for files in glob.glob("cifar10/data_batch*"):
    dic = unpickle(files)
    if(files != "cifar10/data_batch_1"):
        training_set = np.vstack((training_set, dic["data"]))
        training_label = np.hstack((training_label, dic["labels"]))

### Distance Measurement

A very important concept for nearest neighbor classfier is the distance measurement between two objects. There are two options: L1 and L2 distance.

L1 distance, absolute value of elementwise subtraction and then all the differences are added up to a single var.

\begin{equation*}
d_1(I_1, I_2)=\sum_p |I_1^p - I_2^p|
\end{equation*}

There are many other ways of computing distances between vectors. Another common choice could be to instead use the L2 distance, which has the geometric interpretation of computing the euclidean distance between two vectors. The distance takes the form:

\begin{equation*}
d_1(I_1, I_2)=\sqrt{ \sum_p (I_1^p - I_2^p)^2 }
\end{equation*}

In [93]:
import numpy as np

class NNClassifier(object):
    # Do nothing in initialization state
    def __init__(self):
        pass
    
    # train the classifier, uppercase parameter is a vector and lowercase parameter is a single element
    def train(self, X, y):
        self.data = X
        self.label = y
        
    # test data set
    def predict(self, X):
        Ypred = np.zeros(X.shape[0], dtype = self.label.dtype)
        # xrange only available in python 2.x
        for i in xrange(X.shape[0]):
            # L1 distance, axis = 0, summation of each column, 1 is row 
            dis = np.sum(np.abs(self.data - X[i,:]), axis = 1)
            # L2 distance
            # dis = np.sqrt(np.sum(np.squre(self.data - X[i, :])))
            
            # Returns the indices of the minimum values along an axis.
            min_idx = np.argmin(dis)
            Ypred[i] = self.label[min_idx]
            
        return Ypred
    
NNC = NNClassifier()
NNC.train(training_set, training_label)
pred_label = NNC.predict(test_set)
print 'Accuracy: %f' % (np.mean(pred_label == test_label))

### Pros and Cons of NNC

If you ran this code, you would see that this classifier only achieves 24.9% on CIFAR-10. That’s more impressive 
than guessing at random (which would give 10% accuracy since there are 10 classes), but nowhere near human 
performance or near state-of-the-art Convolutional Neural Networks that achieve about 95%.

The advantage of NNC is that no need to train the system and easy implementation. Researchers can simply store and index the training data. While in Deep Neural Network, it takes a long  time to trian but it is very cheap to test different dataset. Nearest Neighbor Classifier is rarely appropriate for practical image classification. One problem is that images are high-dimensional objects and distances over high-dimensional spaces can be very counter-intuitive

### Flow of Lecture
Basically, lecture introduced the following topic:
Image classsification --> NNC --> KNN --> Validation Set --> Cross Validation --> Evaluation --> NNC disadvantages

Since I have learnt Machine Learning before and I want to spend more time on DNN and related tools. I don't show more details about the other topic. If you want to learn more about it, you can download related materials from the website provide at the beginning of this note

## Linear Classification