# [Scene Recognition with Bag-of-Words](https://www.cc.gatech.edu/~hays/compvision/proj4/)
For this project, you will need to report performance for three
combinations of features / classifiers. It is suggested you code them in
this order, as well:
1. Tiny image features and nearest neighbor classifier
2. Bag of sift features and nearest neighbor classifier
3. Bag of sift features and linear SVM classifier

The starter code is initialized to 'placeholder' just so that the starter
code does not crash when run unmodified and you can get a preview of how
results are presented.

## Setup

In [2]:
# Set up parameters, image paths and category list
%matplotlib notebook
%load_ext autoreload
%autoreload 2

import cv2
import numpy as np
import os.path as osp
import pickle
from random import shuffle
import matplotlib.pyplot as plt
from utils import *
import student_code as sc


# This is the list of categories / directories to use. The categories are
# somewhat sorted by similarity so that the confusion matrix looks more
# structured (indoor and then urban and then rural).
categories = ['Kitchen', 'Store', 'Bedroom', 'LivingRoom', 'Office', 'Industrial', 'Suburb',
              'InsideCity', 'TallBuilding', 'Street', 'Highway', 'OpenCountry', 'Coast',
              'Mountain', 'Forest'];
# This list of shortened category names is used later for visualization
abbr_categories = ['Kit', 'Sto', 'Bed', 'Liv', 'Off', 'Ind', 'Sub',
                   'Cty', 'Bld', 'St', 'HW', 'OC', 'Cst',
                   'Mnt', 'For'];

# Number of training examples per category to use. Max is 100. For
# simplicity, we assume this is the number of test cases per category, as
# well.
num_train_per_cat = 100

# This function returns lists containing the file path for each train
# and test image, as well as lists with the label of each train and
# test image. By default all four of these lists will have 1500 elements
# where each element is a string.
data_path = osp.join('..', 'data')
train_image_paths, test_image_paths, train_labels, test_labels = get_image_paths(data_path,
                                                                                 categories,
                                                                                 num_train_per_cat);

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


ImportError: No module named 'student_code'

## Section 1: Tiny Image features with Nearest Neighbor classifier

### Section 1a: Represent each image with the Tiny Image feature

Each function to construct features should return an N x d numpy array, where N is the number of paths passed to the function and d is the dimensionality of each image representation. See the starter code for each function for more details.

In [2]:
print('Using the TINY IMAGE representation for images')

train_image_feats = sc.get_tiny_images(train_image_paths)
test_image_feats = sc.get_tiny_images(test_image_paths)

Using the TINY IMAGE representation for images


### Section 1b: Classify each test image by training and using the Nearest Neighbor classifier

Each function to classify test features will return an N element list, where N is the number of test cases and each entry is a string indicating the predicted category for each test image. Each entry in 'predicted_categories' must be one of the 15 strings in 'categories', 'train_labels', and 'test_labels'. See the starter code for each function for more details.

In [3]:
print('Using NEAREST NEIGHBOR classifier to predict test set categories')

predicted_categories = sc.nearest_neighbor_classify(train_image_feats, train_labels, test_image_feats)

Using NEAREST NEIGHBOR classifier to predict test set categories


### Section 1c: Build a confusion matrix and score the recognition system

(You do not need to code anything in this section.)

If we wanted to evaluate our recognition method properly we would train
and test on many random splits of the data. You are not required to do so
for this project.

This function will create a confusion matrix and various image
thumbnails each time it is called. View the confusion matrix to help interpret
your classifier performance. Where is it making mistakes? Are the
confusions reasonable?

Interpreting your performance with 100 training examples per category:
- accuracy  =   0 -> Your code is broken (probably not the classifier's fault! A classifier would have to be amazing to perform this badly).
- accuracy ~= .07 -> Your performance is chance. Something is broken or you ran the starter code unchanged.
- accuracy ~= .20 -> Rough performance with tiny images and nearest neighbor classifier. Performance goes up a few percentage points with K-NN instead of 1-NN.
- accuracy ~= .20 -> Rough performance with tiny images and linear SVM classifier. The linear classifiers will have a lot of trouble trying to separate the classes and may be unstable (e.g. everything classified to one category)
- accuracy ~= .50 -> Rough performance with bag of SIFT and nearest neighbor classifier. Can reach .60 with K-NN and different distance metrics.
- accuracy ~= .60 -> You've gotten things roughly correct with bag of SIFT and a linear SVM classifier.
- accuracy >= .70 -> You've also tuned your parameters well. E.g. number of clusters, SVM regularization, number of patches sampled when building vocabulary, size and step for dense SIFT features.
- accuracy >= .80 -> You've added in spatial information somehow or you've added additional, complementary image features. This represents state of the art in Lazebnik et al 2006.
- accuracy >= .85 -> You've done extremely well. This is the state of the art in the 2010 SUN database paper from fusing many  features. Don't trust this number unless you actually measure many random splits.
- accuracy >= .90 -> You used modern deep features trained on much larger image databases.
- accuracy >= .96 -> You can beat a human at this task. This isn't a realistic number. Some accuracy calculation is broken or your classifier is cheating and seeing the test labels.

In [6]:
show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

<IPython.core.display.Javascript object>

## Section 2: Bag of SIFT features with Nearest Neighbor classifier

### Section 2a: Represent each image with the Bag of SIFT feature

To create a new vocabulary, make sure `vocab_filename` is different than the old vocabulary, or delete the old one.

In [7]:
print('Using the BAG-OF-SIFT representation for images')

vocab_filename = 'vocab.pkl'
if not osp.isfile(vocab_filename):
    # Construct the vocabulary
    print('No existing visual word vocabulary found. Computing one from training images')
    vocab_size = 200  # Larger values will work better (to a point) but be slower to compute
    vocab = sc.build_vocabulary(train_image_paths, vocab_size)
    with open(vocab_filename, 'wb') as f:
        pickle.dump(vocab, f)
        print('{:s} saved'.format(vocab_filename))

train_image_feats = sc.get_bags_of_sifts(train_image_paths, vocab_filename)
test_image_feats = sc.get_bags_of_sifts(test_image_paths, vocab_filename)

Using the BAG-OF-SIFT representation for images


### Section 2b: Classify each test image by training and using the Nearest Neighbor classifier

In [8]:
print('Using NEAREST NEIGHBOR classifier to predict test set categories')
predicted_categories = sc.nearest_neighbor_classify(train_image_feats, train_labels, test_image_feats)

Using NEAREST NEIGHBOR classifier to predict test set categories


### Section 2c: Build a confusion matrix and score the recognition system

In [9]:
show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

<IPython.core.display.Javascript object>

## Section 3: Bag of SIFT features and SVM classifier
We will reuse the bag of SIFT features from Section 2a.

The difference is that this time we will classify them with a support vector machine (SVM).

### Section 3a: Classify each test image by training and using the SVM classifiers

In [10]:
print('Using SVM classifier to predict test set categories')
predicted_categories = sc.svm_classify(train_image_feats, train_labels, test_image_feats)

Using SVM classifier to predict test set categories


### Section 3b: Build a confusion matrix and score the recognition system

In [11]:
show_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
             predicted_categories)

<IPython.core.display.Javascript object>

### Section3c: Parameter tuning and Extra credit

In [None]:
def svm_classify_tuning(train_image_feats, train_labels, 
                 test_image_feats, svms, categories = categories):

    train_image_feats = np.array(train_image_feats, dtype = np.float32)
    test_image_feats = np.array(test_image_feats, dtype = np.float32)
    test_labels = []  
    N = train_image_feats.shape[0]
    d = len(svms) 
    M = test_image_feats.shape[0]
    
    # validation: 10 folds
    K = 10
    iter_ = 0
    accs = []
    while iter_ < K: 
        valid_size = 100
        
        idx = np.random.randint(low = 0, high = len(train_labels), size = valid_size) 
        valid_img_feats = train_image_feats[idx]
        valid_img_labels = [train_labels[i] for i in idx]
        
        remain_idx = list(set(list(range(len(train_labels)))) - set(list(idx)))
        train_img_feats = np.delete(train_image_feats, idx, axis = 0)
        train_img_labels = [train_labels[i] for i in remain_idx]
        
        for item in svms:
            # build the binary dataset for each svm
            Y = np.zeros((len(train_img_labels), 1))
            for i in range(len(train_img_labels)):
                if item == train_img_labels[i]:
                    Y[i] = 1
                else:
                    Y[i] = 0
            svms[item].fit(train_img_feats, Y.ravel())
            
        # valid_result is the original output of the classifier
        valid_result = np.zeros((valid_size, d))
        for i, category in enumerate(categories):
            valid_result[:, i] = svms[category].decision_function(valid_img_feats)

        # the final classification
        valid_pred = [categories[np.argmax(valid_result[i, :])] for i in range(valid_size)]

        # now to estimate the performance in each iteration
        cm = confusion_matrix(valid_img_labels, valid_pred)
        cm = cm.astype(np.float) / (cm.sum(axis=1)[:, np.newaxis] + 0.01)
        acc = np.mean(np.diag(cm))
        accs.append(acc)
        if not np.isnan(acc):
            accs.append(acc)
            iter_ += 1
    accs = np.array(accs, dtype = np.float32)
    valid_acc = np.mean(accs)
    std = np.std(accs)
    print('(parameter tuning)The current model gives accuracy: {0} and std: {1}'.format(valid_acc, std))
    
    # prediction
    result = np.zeros((M, d))
    for i, category in enumerate(categories):
        result[:, i] = svms[category].decision_function(test_image_feats)
    
    for i in range(M):
        test_labels.append(categories[np.argmax(result[i, :])])
    
    return test_labels, svms

In [None]:
def cross_validation(train_image_feats, train_labels, 
                     test_image_feats, test_labels, svms):
    size = 100
    train_image_feats = np.array(train_image_feats, dtype = np.float32)
    test_image_feats = np.array(test_image_feats, dtype = np.float32)
    assert train_image_feats.shape[1] == test_image_feats.shape[1]
    # Randomly pick 100 training and 100 testing images for each iteration and 
    # report average performance and standard deviations.
    
    iter_ = 0
    K = 10
    accs = []
    while iter_ < K:
        img_feats = np.zeros((size*2, train_image_feats.shape[1]))
        labels = []
        idx = np.random.randint(low = 0, high = len(train_labels), size = size)
        img_feats[:size] = train_image_feats[idx]
        for i in idx: labels.append(train_labels[i])
        idx = np.random.randint(low = 0, high = +len(test_labels), size = size)
        img_feats[size:] = test_image_feats[idx]
        for i in idx: labels.append(test_labels[i])

        d = len(svms)
        cross_result = np.zeros((img_feats.shape[0], d))
        for i, category in enumerate(categories):
            cross_result[:, i] = svms[category].decision_function(img_feats)

        cross_pred = [categories[np.argmax(cross_result[i, :])] for i in range(cross_result.shape[0])]
        cm = confusion_matrix(labels, cross_pred)
        cm = cm.astype(np.float) / (cm.sum(axis=1)[:, np.newaxis] + 0.01)
        acc = np.mean(np.diag(cm))
        accs.append(acc)
        if not np.isnan(acc):
            accs.append(acc)
            iter_ += 1
    accs = np.array(accs, dtype = np.float32)
    cross_acc = np.mean(accs)
    std = np.std(accs)
    print('(cross validation)The current model gives accuracy: {0} and std: {1}'.format(cross_acc, std))
    
    return cross_acc, std

In [1]:
svms_list = []
# C is the penalty parameter of the error term
C = [2, 3, 4, 5]
# C = [0.001, 0.01, 0.1, 1, 10, 50]
for c in C:
    svms = {cat: LinearSVC(random_state=0, tol=1e-3, loss='hinge', C=c, penalty='l2') for cat in categories}
#     svms = {cat: SVC(gamma=2, C = c, kernel='rbf') for cat in categories} # default kernel = 'rbf'
    svms_list.append(svms)
for i in range(len(svms_list)):
    print("current C is:", C[i])
    predicted_categories, curr_svm = svm_classify_tuning(train_image_feats, train_labels, 
                 test_image_feats, categories = categories, svms = svms_list[i])
    cross_validation(train_image_feats, train_labels, 
                 test_image_feats, test_labels, svms = curr_svm)
    print("==============================================================================")

NameError: name 'categories' is not defined

In [6]:
c = ['Tiny img_KNN', 'BagOfSIFT_KNN', 'BagOfSIFT_SVM','SIFT10000_SVM', 'BagOfSIFT_kernelSVM']
x = [1, 2, 3, 4, 5]
result = [22, 54, 65, 68.33, 69]
plt.figure(figsize=(8, 6))
plt.title("Accuracy Improvements")
#plt.xlabel("Different combinations", fontsize = 10)
plt.ylabel("Prediction accuracy", fontsize = 10)
plt.plot(c, result, '-*')
plt.show()

<IPython.core.display.Javascript object>

In [None]:
def get_bags_of_sifts_fisher(image_paths, vocab_size):

    # dummy features variable
    feats = []
    dim = 128
    N = len(image_paths)
    features = 200
    sift_feats = np.zeros((features*N, dim))
    feats_index = 0
    step_size = 10
    
    # build sift feature
    for i in range(N):
        argv = image_paths[i]
        frames, descriptors = vlfeat.sift.dsift(load_image_gray(argv), fast=True, step=step_size)
        index = np.random.permutation(descriptors.shape[0])
        sift_feats[feats_index:feats_index+features, :] = descriptors[index[:features], :]
        feats_index += features
    
    means, covars, priors, _, _ = vlfeat.gmm.gmm(sift_feats, vocab_size)
    means = means.T.astype(np.float32)
    covars = covars.T.astype(np.float32)
    priors = priors.ravel().astype(np.float32)
    for i in range(N):
        argv = image_paths[i]
        frames, descriptors = vlfeat.sift.dsift(load_image_gray(argv), fast=True, step=step_size)
        descriptors = descriptors.T.astype(np.float32)
        encoding = vlfeat.fisher.fisher(descriptors, means, covars, priors, normalized=True, fast=True)
        feats.append(encoding)
    return feats