# Lecutre 2


## Image classification is a core task in Computer Vision.

## Problem: Semantic Gap

If given an image of a cat, computer sees a giant grid of numbers, wheras we see a cat. This is the semantic problem.

As well, if we took a different picture of this cat, all the pixel numbers would be different, but still be of the same cat.

More challenges: illumination, deformation, occlusion (seeing only parts of an object), background clutter (bg looks similar to object), interclass variation (cats have different shapes, colours, and sizes)



In [1]:
## Naive Attempt
def classify_image(image):
    # some magic here?
    return class_label

## Previous attempts
1. find edges of an image
2. find corners and categorize them
3. get a label using an explicit set of rules

This doesn't work well.
Super brittle, doesn't work for other objects. Thus, not scalable.


## Data-Driven Approach
1. Collect a dataset of images and labels
2. Use ML to train a classifier
3. Evaluate the classifier on new images


In [None]:
def train(images, labels):
    # Machine learning
    return model 
    
def predict(model, test_images):
    # Use model to predict labels
    return test_labels

Here our API has changed. One that trains our model, one that uses the model to make a prediction. 

## First classifier: Nearest Neighbour
An ex of a simple classifer

in our training step, we'll **memorize** all data and labels

in our prediction, we'll predict the model of the most similar training image

This will basically return an image that LOOKS visually similar. This however won't be super accurate, since from far away a white bg png of a cat would look similar to one of a lynx.

## How do we compare images?

### L1 distance
$$d_1 (I_1, I_2) = \sum_{P} |I_1^p - I_2^p|$$

compare individual pixels and add all pixel-wise absolute value differences. 

Here's an example of the implementation of the nearest neighbour classifier:

In [None]:
import numpy as np

class NearestNeighbour:
    def __init__(self):
        pass

    def train(self, X, y):
        ''' X is N x D where each row is an example. y is a 1-D of size N'''
        # the nearest neighbour classifier simply remembers all the training data
        self.Xtr = X
        self.Ytr = y

    def predict(self, X):
        ''' X is N x D where each row is an example we wish to predict the label for '''
        num_test = X.shape[0]
        # let's make sure that the output type matches the input type
        Ypred = np.zeros(num_test, dtype=self.ytr.dtype)

        # loop over all test rows
        for i in xrange(num_test):
            # find nearest training image to the ith test image
            # using L1 distance 
            distances = np.sum(np.abs(self.Xtr - X[i,:]), axis =1)
            min_index = np.argmin(distances) # get the index w smallest distance
            Ypred[i] = self.ytr[min_index]  # predict the label of nearest example

        return Ypred

Q. with N examples, how fast are training and prediction?

A. Train O(1), predict O(N)

This is actually bad, we want classifiers that are fast at prediction, slow for training is OK.

Q. What does this look like?

A. 

![nearest neighbour](../pics/neighbour.jpg)

This is prone to error, because if there was 1 yellow dot in the middle of the green area, there would now be a yellow island that shouldn't relaly be there.


This introduces the idea of: K-Nearest Neighbours.

## K-Nearest Neighbours
Instead of copying label from nearest neighbour, take the **majority vote** from K closest points.

![k nearest neighbours](../pics/knn.jpg)
As K increases, the decision boundaries become smoother and leads to better results.

This will yield white regions though - regions where there was no majority amongst the k-nearest neighbours. 
You could take it further and just do a random guess in the white regions, but for now we'll just say there was no majority there. 


## Different viewpoints
1. Idea of high dimentional points in the plane
2. Concrete images (pixels of img are high dimensional vectors)


## K-Nearest Neighbours: Distance Metrics

We learned the L1 distance, AKA (Manhattan Distance).

$$d_1(I_1, I_2) = \sum_p |I_1^P - I_2^P|$$

This function looks like a rotated square at origin

We introduce a new distance metric:
L2 (Euclidian) distance

$$d_2(I_1, I_2) = \sqrt{\sum_p (I_1^P - I_2^P)^2}$$

this function looks like a circle at origin


By using different different distance metrics, you can generalize the knn classifier to many different types of data. 

e.g. classify pieces of text: 
only thing that needs to be done is to specify some distance function that can measure distances between two sentences. 

So, despite its simplicity, it's something you can always try for any problem.

Play aroud with it here: http://vision.stanford.edu/teaching/cs231n-demos/knn/

## Question becomes: how do you choose best K and best distance metric to use?

These are hyperparameters: choices about the algorithm that we set rather than learn.

These are problem dependent. Best approach is try them out and see what works best (guess and check).

## Setting Hyperparameters:

#### Idea 1: 
Choose hyperparameters that work best on the data. 
**This is bad.** K=1 always works perfectly on training data.
We saw how bad it was when we plotted an island point in another region.

#### Idea 2:
Split data into **train** and **test**, choose hyperparameters that work best on test data.
**This is bad**. We have idea how algorithm will perform on new data.

#### Idea 3:
Split data into train, val, and test; choose hyperparameters on val and evaluate on test.
Better!
Choose a training set, where you train with diff choices of hyperparams, evaluate these on the validation set. Choose best set of hyperparams on the validation set, and use it once on the test set.

#### Idea 4: Cross-validation
Split the data into **folds**, try each fold as validation and average the results.

This is useful for small datasets, but not used too frequently in DL.



Once you do this K-fold cross-validation, you can get plots that shows the accuracy of your model as a function of hyperparameters. This helps you determine your hyperparameters.

## k-Nearest Neighbour on images is **never used**
- Very slow at test time
- Distance metrics on pixels are not informative.

## Curse of dimensionlity
To densly cover pixels in a 3-d space, you exponentially need more data/images. you'll never have enough.

## Summary of k-Nearest Neighbours
In **Image classification**, we start with a **training set** of images and labels, and must predict labels on the **test set**.

The **K-nearest Neighbours** classifer predicts labels based on nearest training examples.

Distance metric and K are **hyperparameters**.

Choose hyperparameters using the **validation set**, only run on the test set once at the very end!

## Linear Classification
Analogy: a neural network is like a tower of lego blocks.

One of the most basic building blocks is this linear classifier.

## Recall CIFAR 10
50000 training images, each 32x32x3

10000 test images.

### Linear classifier is a parametric approach

image --> $f(x, W)$ --> 10 numbers giving class scores

$W$ = parameters or weights

$$f(x, W) = Wx + b$$

10x1    =    (10x3072) (3072x1), since x = 32x32x3 = 3072x1

b is 10x1

![lc_example](../pics/lc_ex.jpg)

Trained weights of linear classifiers make "templates" per class it needs to classify. So, when given class images of different orientations, it'll try and average all those out through one template. 

This is where linear classifiers fall short, and more complex models can compensate for this.

You can also view linear classifiers as a series of a bunch of lines, where each line seperates one class from all the other lines.


## Places where linear classifiers struggle
- classifying even vs odd
- multimodal situations (3 blobs of blue in a sea of red. hard to draw single line that seperate these regions), i.e. 1 class that can appear in different regions of space

## How can we tell if this W is good or bad?
Next lecture. 
Coming up:
- loss function
- optimization
- convnets