CS576 Assignment #1: Image Classification using Bag of Visual Words (BoVW) 
====
Primary TA : Jaehoon Yoo

TA's E-mail : wogns98@kaist.ac.kr, whieya@kaist.ac.kr

QnA Channel: https://join.slack.com/t/kaistcs576/shared_invite/zt-o3gqak0y-yj3NCb_SQFxVkqO0U6PWYw
## Instruction
- In this assignment, we will classify the images into five categories (aeroplane, backgrounds, car, horse, motorcycle, person) using Bag of Visual Word (BoVW) and Support Vector Machine (SVM).
 
- We will extract the SIFT descriptors from the images and construct a codebook. After that, we will encode the images to histogram features using codebook, and train the classifier using those features.

- As you follow the given steps, fill in the section marked ***Problem*** with the appropriate code. There are **7 problems** in total.
    - For **Problem 1 ~ Problem 4**, you will get full credits (10pt each) if you implement correctly.  
    - For **Problem 5 ~ Problem 7**, you **have to write a discussion about the results** as well as implementing the codes. Each problem takes 5pt for the correct implementation and 5 pt for proper discussion. In other words, you will get only 5pt without proper discussion even if you correctly implement the codes. To get full credit for discussion, please follow **Discussion Guidelines**.

## Discussion Guidelines
- You should write a discussion about **Problem 5 ~ Problem 7** on the **Discussion and Analysis** section. 
- Simply reporting the scores (e.g. classification accuracy) is not considered as a discussion.
- For each problem's discussion, you should explain and compare how each method improves the results. 

## Submission guidelines
- Your code and report will be all in Colab. Copy this example to your google drive and edit it to complete your assignment. 
- <font color="red"> You will get the full credit **only if** you complete the code **and** write a discussion of the results in the discussion section at the bottom of this page. </font>
- We should be able to reproduce your results using your code. Please double-check if your code runs without error and reproduces your results. Submissions failed to run or reproduce the results will get a substantial penalty. 
- <font color="red"> **DO NOT modify any of the skeleton codes when you submit.** Please write your codes only in the designated area. </font>
- As a proof that you've ran this code by yourself, **make sure your notebook contains the output of each code block.**

## Deliverables
- Download your Colab notebook, and submit it in a format: [StudentID].ipynb.
- Your assignment should be submitted through KLMS. All other submissions (e.g., via email) will not be considered as valid submissions. 

## Due date
- **23:59:59 April 7th.**
- Late submission is allowed until 23:59:59 April 9th.
- Late submission will be applied 20% penalty.



## Questions
- Please use the SLACK channel (https://join.slack.com/t/kaistcs576/shared_invite/zt-o3gqak0y-yj3NCb_SQFxVkqO0U6PWYw) as a main communication channel. 
When you post questions, please make it public so that all students can share the information. Please use the prefix "[Assignment 1]" in the subject for all questions regarding this assignment (e.g., [Assignment 1] Regarding the grading policy).



## Step 0: Set the enviroments
For this assignment, you need the special library for extracting features & training classifier (cyvlfeat & sklearn).
This step takes about 5~15 minutes.

###  0-1: Download cyvlfeat library & conda

In [1]:
# install conda on colab
!wget -c https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh
!chmod +x Anaconda3-5.3.1-Linux-x86_64.sh
!bash ./Anaconda3-5.3.1-Linux-x86_64.sh -b -f -p /usr/local

# install cyvlfeat
# Reference : https://anaconda.org/menpo/cyvlfeat
# Update URL (2021/03/22)
!conda install -c menpo cyvlfeat python==3.7 -y
!conda install cython numpy scipy -y

import sys
sys.path.append('/cyvlfeat')
sys.path.append('/usr/local/lib/python3.7/site-packages/')

!git clone https://github.com/menpo/cyvlfeat.git /cyvlfeat
!cd /cyvlfeat && CFLAGS="-I$CONDA_PREFIX/include" LDFLAGS="-L$CONDA_PREFIX/lib" pip install -e ./

--2021-04-06 12:28:17--  https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c84f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh [following]
--2021-04-06 12:28:17--  https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8303, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 667976437 (637M) [application/x-sh]
Saving to: ‘Anaconda3-5.3.1-Linux-x86_64.sh’


2021-04-06 12:28:23 (109 MB/s) - ‘Anaconda3-5.3.1-Linux-x86_64.sh’ saved [667976437/667976437]

PREFIX=/usr/local
reinstalling: python-3.7.0-hc3d631a

###  0-2: Connect to your Google Drive.

It is required for loading the data.

Enter your authorization code to access your drive.


In [2]:
# mount drive https://datascience.stackexchange.com/questions/29480/uploading-images-folder-from-my-system-into-google-colab
import os
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


### 0-3: Import modules

In [3]:
# Import libraries
import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import glob
import cyvlfeat
import time
import scipy
import multiprocessing

## Helper functions

In [4]:
def euclidean_dist(x, y):
    """
    :param x: [m, d]
    :param y: [n, d]
    :return:[m, n]
    """
    m, n = x.shape[0], y.shape[0]    
    eps = 1e-6 

    xx = np.tile(np.power(x, 2).sum(axis=1), (n,1)) #[n, m]
    xx = np.transpose(xx) # [m, n]
    yy = np.tile(np.power(y, 2).sum(axis=1), (m,1)) #[m, n]
    xy = np.matmul(x, np.transpose(y)) # [m, n]
    dist = np.sqrt(xx + yy - 2*xy + eps)

    return dist

def read_img(image_path):
    img = Image.open(image_path).convert('L')
    img = img.resize((480, 480))
    return np.float32(np.array(img)/255.)

def read_txt(file_path):
    with open(file_path, "r") as f:
        data = f.read()
    return data.split()
    
def dataset_setup(data_dir):
    train_file_list = []
    val_file_list = []

    for class_name in ['aeroplane','background','car','horse','motorbike','person']:
        train_txt_path = os.path.join(data_dir, class_name+'_train.txt')
        train_file_list.append(np.array(read_txt(train_txt_path)))
        val_txt_path = os.path.join(data_dir, class_name+'_val.txt')
        val_file_list.append(np.array(read_txt(val_txt_path)))

    train_file_list = np.unique(np.concatenate(train_file_list))
    val_file_list = np.unique(np.concatenate(val_file_list))

    f = open(os.path.join(data_dir, "train.txt"), 'w')
    for i in range(train_file_list.shape[0]):
        data = "%s\n" % train_file_list[i]
        f.write(data)
    f.close()

    f = open(os.path.join(data_dir, "val.txt"), 'w')
    for i in range(val_file_list.shape[0]):
        data = "%s\n" % val_file_list[i]
        f.write(data)
    f.close()

def load_train_data(data_dir):
    dataset_setup(data_dir)
    num_proc = 12 # num_process

    txt_path = os.path.join(data_dir, 'train.txt')
    file_list = read_txt(txt_path)
    image_paths = [os.path.join(data_dir+'/images', file_name+'.jpg') for file_name in file_list]
    with multiprocessing.Pool(num_proc) as pool:
      imgs = pool.map(read_img, image_paths)
      imgs = np.array(imgs)
      idxs = np.array(file_list)

    return imgs, idxs

def load_val_data(data_dir):
    dataset_setup(data_dir)
    num_proc = 12 # num_process

    txt_path = os.path.join(data_dir, 'val.txt')
    file_list = read_txt(txt_path)
    image_paths = [os.path.join(data_dir+'/images', file_name+'.jpg') for file_name in file_list]
    with multiprocessing.Pool(num_proc) as pool:
      imgs = pool.map(read_img, image_paths)
      imgs = np.array(imgs)
      idxs = np.array(file_list)
    
    return imgs, idxs

def get_labels(idxs, target_idxs):
    """
    Get the labels from file index(name).

    :param idxs(numpy.array): file index(name). shape:[num_images, ]
    :param target_idxs(numpy.array): target index(name). shape:[num_target,]
    :return(numpy.array): Target label(Binary label consisting of True and False). shape:[num_images,]
    """
    return np.isin(idxs, target_idxs)

def load_train_idxs(data_dir):
    txt_path = os.path.join(data_dir, 'train.txt')
    train_idxs = np.array(read_txt(txt_path))
    return train_idxs

def load_val_idxs(data_dir):
    txt_path = os.path.join(data_dir, 'val.txt')
    val_idxs = np.array(read_txt(txt_path))
    return val_idxs

## Step 1: Load the data

In [None]:
''' 
Set your data path for loading images & labels.
Example) CS_DATA_DIR = '/gdrive/My Drive/data'
'''
%env CS_DATA_DIR=/gdrive/My Drive/data
!mkdir -p "$CS_DATA_DIR"
!cd "$CS_DATA_DIR" && wget http://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz && tar -zxf practical-category-recognition-2013a-data-only.tar.gz

In [5]:
category = ['aeroplane', 'car', 'horse', 'motorbike', 'person'] # DON'T MODIFY THIS. 
%env CS_DATA_DIR=/gdrive/My Drive/data
data_dir = os.path.join(os.environ["CS_DATA_DIR"], "practical-category-recognition-2013a", "data")

env: CS_DATA_DIR=/gdrive/My Drive/data


## Step 2: Bag of Visual Words (BoVW) Construction

### 2-1. (**Problem 1**): SIFT descriptor extraction & Save the descriptors (10pt)

In [None]:
def SIFT_extraction(imgs):
    """
    Extract Local SIFT descriptors from images using cyvlfeat.sift.sift().
    Refer to https://github.com/menpo/cyvlfeat
    You should set the parameters of cyvlfeat.sift.sift() as bellow.
    1.compute_descriptor = True  2.float_descriptors = True

    :param train_imgs(numpy.array): Gray-scale images in Numpy array format. shape:[num_images, width_size, height_size]
    :return(numpy.array): SIFT descriptors. shape:[num_images, ], ndarray with object(descripotrs)
    """
    # YOUR CODE HERE  
    sift_feature = [(cyvlfeat.sift.sift(image=imgs[i], compute_descriptor=True, float_descriptors=True))[1] for i in range(imgs.shape[0])]
    result = np.array(sift_feature, dtype=object)
    del sift_feature
    return result  

### 2-2. (**Problem 2**): Codebook(Bag of Visual Words) construction (10pt)
In this step, you will construct the codebook using K-means clustering.

In [None]:
def get_codebook(des , k):
  """
  Construct the codebook with visual codewords using k-means clustering.
  In this step, you should use cyvlfeat.kmeans.kmeans().
  Refer to https://github.com/menpo/cyvlfeat

  :param des(numpy.array): Descriptors.  shape:[num_images, num_des_of_each_img, 128]
  :param k(int): Number of visual words.
  :return(numpy.array): Bag of visual words shape:[k, 128]
  """
  # YOUR CODE HERE
  AllSIFTfeatures = np.concatenate(des, axis=0) 
  result = cyvlfeat.kmeans.kmeans(data=AllSIFTfeatures, num_centers=k)  
  del AllSIFTfeatures
  return result

### 2-3. (**Problem 3**): Encode images to histogram feature based on codewords (10pt)

In [None]:
def extract_features(des, codebook):
  """
  Construct the Bag-of-visual-Words histogram features for images using the codebook.
  HINT: Refer to helper functions.

  :param des(numpy.array): Descriptors.  shape:[num_images, num_des_of_each_img, 128]
  :param codebook(numpy.array): Bag of visual words. shape:[k, 128]
  :return(numpy.array): Bag of visual words shape:[num_images, k]

  """
  # YOUR CODE HERE
  '''
    (m=num_des_of_each_img, n=k)
    dist = [num_des_of_each_img, k]
    reference: https://kr.mathworks.com/help/vision/ug/image-classification-with-bag-of-visual-words.html
  '''
  histogram_features = [((np.apply_along_axis(lambda x: np.where(x==np.min(x), 1, 0), 1, euclidean_dist(des[i], codebook))).sum(axis=0)) for i in range(des.shape[0])]
  result = np.array(histogram_features)
  del histogram_features
  return result

## Step 3. (**Problem 4**): Train the classifiers (10pt)
Train a classifier using the sklearn library (SVC) 

In [None]:
from sklearn.svm import SVC

In [None]:
def train_classifier(features, labels, svm_params):
  """
  Train the SVM classifier using sklearn.svm.svc()
  Refer to https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

  :param features(numpy.array): Historgram representation. shape:[num_images, dim_feature]
  :param labels(numpy.array): Target label(binary). shape:[num_images,]
  :return(sklearn.svm.SVC): Trained classifier
  """
  # Your code here 
  clf = SVC(**svm_params)
  clf.fit(features, labels)
  return clf

In [None]:
def Trainer(feat_params, svm_params):
    
    """
    Train the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to save codebooks & results.
        
    :param svm_params(dict): parameters for classifier training.
        ['C'](float): Regularization parameter.
        ['kernel'](str): Specifies the kernel type to be used in the algorithm.
   
    :return(sklearn.svm.SVC): trained classifier
    """
    
    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']
    
    if not os.path.isdir(result_dir):
        os.mkdir(result_dir)
    
    print("Load the training data...")
    start_time = time.time()
    train_imgs, train_idxs = load_train_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    print("Extract the local descriptors...")
    start_time = time.time()
    train_des = extractor(train_imgs)
    np.save(os.path.join(result_dir, 'train_des.npy'), train_des)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    del train_imgs
    
    print("Construct the bag of visual words...")
    start_time = time.time()
    codebook = get_codebook(train_des, k)
    np.save(os.path.join(result_dir, 'codebook.npy'), codebook)
    print("{:.4f} seconds".format(time.time()-start_time))

    print("Extract the image features...")
    start_time = time.time()    
    train_features = extract_features(train_des, codebook)
    np.save(os.path.join(result_dir, 'train_features.npy'), train_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del train_des, codebook
    
    print('Train the classifiers...')
    accuracy = 0
    models = {}
    
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_train.txt'.format(class_name)))])
        target_labels = get_labels(train_idxs, target_idxs)
        
        models[class_name] = train_classifier(train_features, target_labels, svm_params)
        train_accuracy = models[class_name].score(train_features, target_labels) 
        print('{} Classifier train accuracy:  {:.4f}'.format(class_name ,train_accuracy))
        accuracy += train_accuracy
    
    print('Average train accuracy: {:.4f}'.format(accuracy/len(category)))
    del train_features, target_labels, target_idxs

    return models

In [None]:
feat_params = {'extractor': SIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'sift_1024')}
svm_params = {'C': 1, 'kernel': 'linear'}

- Below code will take about 30~70 minutes.

In [None]:
models = Trainer(feat_params, svm_params)

Load the training data...
34.3899 seconds
Extract the local descriptors...
425.9539 seconds
Construct the bag of visual words...
4141.7708 seconds
Extract the image features...
101.7935 seconds
Train the classifiers...
aeroplane Classifier train accuracy:  1.0000
car Classifier train accuracy:  1.0000
horse Classifier train accuracy:  1.0000
motorbike Classifier train accuracy:  1.0000
person Classifier train accuracy:  0.9426
Average train accuracy: 0.9885


## Step 4: Test the classifier on validation set



In [None]:
def Test(feat_params, models):
    """
    Test the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to load codebooks & save results.
        
    :param models(dict): dict of classifiers(sklearn.svm.SVC)
    """
    
    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']
    
    print("Load the validation data...")
    start_time = time.time()
    val_imgs, val_idxs = load_val_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    print("Extract the local descriptors...")
    start_time = time.time()
    val_des = extractor(val_imgs)
    np.save(os.path.join(result_dir, 'val_des.npy'), val_des)
    print("{:.4f} seconds".format(time.time()-start_time))
    
    
    del val_imgs
    codebook = np.load(os.path.join(result_dir, 'codebook.npy'))
    
    print("Extract the image features...")
    start_time = time.time()    
    val_features = extract_features(val_des, codebook)
    np.save(os.path.join(result_dir, 'val_features.npy'), val_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del val_des, codebook

    print('Test the classifiers...')
    accuracy = 0
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_val.txt'.format(class_name)))])
        target_labels = get_labels(val_idxs, target_idxs)
        
        val_accuracy = models[class_name].score(val_features, target_labels)
        print('{} Classifier validation accuracy:  {:.4f}'.format(class_name ,val_accuracy))
        accuracy += val_accuracy
    
    del val_features, target_idxs, target_labels

    print('Average validation accuracy: {:.4f}'.format(accuracy/len(category)))

In [None]:
Test(feat_params ,models)

Load the validation data...
100.1162 seconds
Extract the local descriptors...
402.4537 seconds
Extract the image features...
104.6940 seconds
Test the classifiers...
aeroplane Classifier validation accuracy:  0.9406
car Classifier validation accuracy:  0.7425
horse Classifier validation accuracy:  0.9002
motorbike Classifier validation accuracy:  0.9159
person Classifier validation accuracy:  0.5829
Average validation accuracy: 0.8164


## **Problem 5**: Implement Dense SIFT (10pt)
Modify the feature extractor using the dense SIFT and evaluate the performance.

In [None]:
def DenseSIFT_extraction(imgs):
  """
  Extract Dense SIFT descriptors from images using cyvlfeat.sift.dsift().
  Refer to https://github.com/menpo/cyvlfeat
  You should set the parameters of cyvlfeat.sift.dsift() as bellow.
    1.step = 12  2.float_descriptors = True

  :param train_imgs(numpy.array): Gray-scale images in Numpy array format. shape:[num_images, width_size, height_size]
  :return(numpy.array): Dense SIFT descriptors. shape:[num_images, num_des_of_each_img, 128]
  """
  # YOUR CODE HERE
  dsift_feature = [(cyvlfeat.sift.dsift(image=imgs[i], step=12, float_descriptors=True))[1] for i in range(imgs.shape[0])] 
  result = np.zeros((imgs.shape[0], dsift_feature[0].shape[0], dsift_feature[0].shape[1])) 
  for i in range(imgs.shape[0]): 
    result[i] = np.array(dsift_feature[i])
  del dsift_feature
  return result

In [None]:
feat_params = {'extractor': DenseSIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'dsift_1024')}
svm_params = {'C': 1, 'kernel': 'linear'}

In [None]:
models = Trainer(feat_params, svm_params)

Load the training data...
35.7257 seconds
Extract the local descriptors...
618.2104 seconds
Construct the bag of visual words...
8225.8245 seconds
Extract the image features...
203.8099 seconds
Train the classifiers...
aeroplane Classifier train accuracy:  1.0000
car Classifier train accuracy:  1.0000
horse Classifier train accuracy:  1.0000
motorbike Classifier train accuracy:  1.0000
person Classifier train accuracy:  0.9814
Average train accuracy: 0.9963


In [None]:
Test(feat_params ,models)

Load the validation data...
35.7720 seconds
Extract the local descriptors...
570.1793 seconds
Extract the image features...
207.0575 seconds
Test the classifiers...
aeroplane Classifier validation accuracy:  0.9475
car Classifier validation accuracy:  0.7882
horse Classifier validation accuracy:  0.9155
motorbike Classifier validation accuracy:  0.9074
person Classifier validation accuracy:  0.6023
Average validation accuracy: 0.8322


## **Problem 6**: Implement the Spatial Pyramid (10pt)
Modify the feature extractor using the spatial pyramid matching and evaluate the performance.


In [None]:
def SpatialPyramid(des, codebook):
  """
  Extract image representation with Spatial Pyramid Matching using your DenseSIFT descripotrs & codebook.

  :param des(numpy.array): DenseSIFT Descriptors.  shape:[num_images, num_des_of_each_img, 128]
  :param codebook(numpy.array): Bag of visual words. shape:[k, 128]

  :return(numpy.array): Image feature using SpatialPyramid [num_images, features_dim]
  """
  # YOUR CODE HERE 
      # https://darkpgmr.tistory.com/125 
      # https://www.youtube.com/watch?v=6MwuK2wHlOg 
  '''
  Therefore, "all implementations which includes the advantage of Spatial Pyramid will be fine." 
  Simply concatenating the features would be one of them.
  ''' 
  # Level0, Level1; (num_images,1600,128),(num_images,400,128) 
  idx_cut_level1 = int(des.shape[1]/4) # 1600/4 = 400
  h0 = extract_features(des, codebook) # (num_images, k)
  h1_0 = extract_features(des[:,:idx_cut_level1,:], codebook) # (num_images, k)
  h1_1 = extract_features(des[:,idx_cut_level1:idx_cut_level1*2,:], codebook) # (num_images, k)
  h1_2 = extract_features(des[:,idx_cut_level1*2:idx_cut_level1*3,:], codebook) # (num_images, k)
  h1_3 = extract_features(des[:,idx_cut_level1*3:idx_cut_level1*4,:], codebook) # (num_images, k)
  result = np.concatenate((h0, h1_0, h1_1, h1_2, h1_3), axis=1) # (num_images, 5k) 
  del h0, h1_0, h1_1, h1_2, h1_3

  return result

In [None]:
def SP_Trainer(des_path, codebook_path, result_dir, svm_params):
    
    """
    Train the SVM classifier using SpatialPyramid representations.

    :param des_path(str): path for loading training dataset DenseSIFT descriptors.
    :param codebook(str): path for loading codebook for DenseSIFT descriptors.
    :param result_dir(str): diretory to save features.
        
    :param svm_params(dict): parameters for classifier training.
        ['C'](float): Regularization parameter.
        ['kernel'](str): Specifies the kernel type to be used in the algorithm.
   
    :return(sklearn.svm.SVC): trained classifier
    """
    train_idxs = load_train_idxs(data_dir)
    train_des = np.load(des_path)
    codebook = np.load(codebook_path)
    train_features = SpatialPyramid(train_des, codebook)
    np.save(os.path.join(result_dir, 'train_sp_features.npy'), train_features)

    del train_des, codebook
    
    print('Train the classifiers...')
    accuracy = 0
    models = {}
    
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_train.txt'.format(class_name)))])
        target_labels = get_labels(train_idxs, target_idxs)
        
        models[class_name] = train_classifier(train_features, target_labels, svm_params)
        train_accuracy = models[class_name].score(train_features, target_labels) 
        print('{} Classifier train accuracy:  {:.4f}'.format(class_name ,train_accuracy))
        accuracy += train_accuracy
    
    print('Average train accuracy: {:.4f}'.format(accuracy/len(category)))
    del train_features, target_labels, target_idxs

    return models

In [None]:
def SP_Test(des_path, codebook_path, result_dir, models):
    """
    Test the SVM classifier.

    :param des_path(str): path for loading validation dataset DenseSIFT descriptors.
    :param codebook(str): path for loading codebook for DenseSIFT descriptors.
    :param result_dir(str): diretory to save features.      
    :param models(dict): dict of classifiers(sklearn.svm.SVC)

    """ 
    val_idxs = load_val_idxs(data_dir)
    val_des = np.load(des_path)
    codebook = np.load(codebook_path)
    val_features = SpatialPyramid(val_des, codebook)
    np.save(os.path.join(result_dir, 'val_sp_features.npy'), val_features)


    del val_des, codebook

    print('Test the classifiers...')
    accuracy = 0
    for class_name in category:
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_val.txt'.format(class_name)))])
        target_labels = get_labels(val_idxs, target_idxs)
        
        val_accuracy = models[class_name].score(val_features, target_labels)
        print('{} Classifier validation accuracy:  {:.4f}'.format(class_name ,val_accuracy))
        accuracy += val_accuracy

    del val_features, target_idxs, target_labels

    print('Average validation accuracy: {:.4f}'.format(accuracy/len(category)))

In [None]:
#YOUR CODE HERE for training & testing with Spatial Pyramid 
  # Train
result_dir = os.path.join(data_dir,'dsift_1024') 
des_path = result_dir+'/train_des.npy'
codebook_path = result_dir+'/codebook.npy'
svm_params = {'C': 1, 'kernel': 'linear'}   
models = SP_Trainer(des_path, codebook_path, result_dir, svm_params)

  # Test
print('\n')
des_path = result_dir+'/val_des.npy'
SP_Test(des_path, codebook_path, result_dir, models)

Train the classifiers...
aeroplane Classifier train accuracy:  1.0000
car Classifier train accuracy:  1.0000
horse Classifier train accuracy:  1.0000
motorbike Classifier train accuracy:  1.0000
person Classifier train accuracy:  1.0000
Average train accuracy: 1.0000


Test the classifiers...
aeroplane Classifier validation accuracy:  0.9604
car Classifier validation accuracy:  0.8294
horse Classifier validation accuracy:  0.9458
motorbike Classifier validation accuracy:  0.9430
person Classifier validation accuracy:  0.6342
Average validation accuracy: 0.8626


## **Problem 7**: Improve classification using non-linear SVM (10pt)
Modify the classifier using the non-linear SVM and evaluate the performance. 


In [None]:
# YOUR CODE HERE to improve classification using non-linear SVM
# YOUR CODE should include training & testing with non-linear SVM.
svm_params = {'C': 1, 'kernel': 'rbf'}

# Train
print('Train the classifiers...')
train_imgs, train_idxs = load_train_data(data_dir)
train_features = np.load(os.path.join(data_dir,'dsift_1024')+'/train_sp_features.npy')

accuracy = 0
models = {}

for class_name in category:
    target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_train.txt'.format(class_name)))])
    target_labels = get_labels(train_idxs, target_idxs)
    
    models[class_name] = train_classifier(train_features, target_labels, svm_params)
    train_accuracy = models[class_name].score(train_features, target_labels) 
    print('{} Classifier train accuracy:  {:.4f}'.format(class_name ,train_accuracy))
    accuracy += train_accuracy

print('Average train accuracy: {:.4f}'.format(accuracy/len(category)))
del train_features, target_labels, target_idxs 

# Test
print('\nTest the classifiers...')
val_imgs, val_idxs = load_val_data(data_dir) 
val_features = np.load(os.path.join(data_dir,'dsift_1024')+'/val_sp_features.npy')

accuracy = 0

for class_name in category:
    target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_val.txt'.format(class_name)))])
    target_labels = get_labels(val_idxs, target_idxs)
    
    val_accuracy = models[class_name].score(val_features, target_labels)
    print('{} Classifier validation accuracy:  {:.4f}'.format(class_name ,val_accuracy))
    accuracy += val_accuracy

print('Average validation accuracy: {:.4f}'.format(accuracy/len(category)))
del val_features, target_idxs, target_labels

Train the classifiers...
aeroplane Classifier train accuracy:  0.9786
car Classifier train accuracy:  0.8997
horse Classifier train accuracy:  0.9498
motorbike Classifier train accuracy:  0.9547
person Classifier train accuracy:  0.8936
Average train accuracy: 0.9353

Test the classifiers...
aeroplane Classifier validation accuracy:  0.9568
car Classifier validation accuracy:  0.8715
horse Classifier validation accuracy:  0.9406
motorbike Classifier validation accuracy:  0.9495
person Classifier validation accuracy:  0.6952
Average validation accuracy: 0.8827


# <font color="blue"> Discussion and Analysis </font>
## Discussion Guidelines
- You should write discussion about **Problem 5 ~ Problem 7**.
- Simply reporting the results (e.g. classification accuracy) is not considered as a discussion.
- For each problem's discussion, you should explain and compare how each method improves the results.


Please write discussions on the results above.

--------------------------------------------------------------------------------
The results (SIFT + Linear SVM) were below:

**Train Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|1.0000|
|car Classifier|1.0000|
|horse Classifier|1.0000|
|motorbike Classifier|1.0000|
|person Classifier|0.9426|
|**Average**|**0.9885**|

**Test Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|0.9406|
|car Classifier|0.7425|
|horse Classifier|0.9002|
|motorbike Classifier|0.9159|
|person Classifier|0.5829|
|**Average**|**0.8164**| 

--------------------------------------------------------------------------------
**Problem 5**: Dense SIFT + Linear SVM
----------------------------------------------------
reference : https://www.quora.com/Why-does-Dense-SIFT-perform-better-or-even-comparable-than-SIFT

In general, the SIFT algorithm calculates a descriptor at a specific location of the extracted key-point. However, Dense SIFT assumes feature points that are evenly concentrated at regular intervals in the entire area of an image and calculates a descriptor from it. As a result, Dense SIFT generally performs better than SIFT because it searches as many descriptors as possible in an image.

**Train Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|1.0000|
|car Classifier|1.0000|
|horse Classifier|1.0000|
|motorbike Classifier|1.0000|
|person Classifier|0.9814|
|**Average**|**0.9963**|

**Test Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|0.9475|
|car Classifier|0.7882|
|horse Classifier|0.9155|
|motorbike Classifier|0.9074|
|person Classifier|0.6023|
|**Average**|**0.8322**| 

It was confirmed that the  test accuracy was improved from 0.8164 (SIFT + Linear SVM) to 0.8322 (Dense SIFT + Linear SVM).

--------------------------------------------------------------------------------
**Problem 6**: Dense SIFT + Spatial Pyramid + Linear SVM 
----------------------------------------------------
The Bag of Visual Words (BoVW) method has a problem of losing the geometrical positional relationship between features because it basically expresses an image as a histogram of features obtained from the entire image area. In other words, even if the codeword is the same, if the position of the codeword is different, the histrogram appears differently. To make up for this shortcoming, Spatial Pyramid was used.
The Spatial Pyramid method divides an image into several steps, obtains a histogram for each segmented area, and compares them as a whole.

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

temp = (cyvlfeat.sift.dsift(image=np.random.rand(480,480), step=12,float_descriptors=True)[0]) 

print(temp[0]) # [4.5, 4.5]

print(temp[1]) # [4.5, 16.5]

print(temp[2]) # [4.5, 28.5]

-> **Row-major order**

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


When the histogram generated at level 0 (original) of the input image is h0, and the histogram generated at level 1 (4x1 division) is h1_0, h1_1, h1_2, h1_3 (in order), concatenation is performed to make a single feature.  Then, the feature for one image is (h0, h1_0, h1_1, h1_2, h1_3) whose size is (1, 5*k) (assuming the size of the codebook is k). 

To sum up, Each Histrogram in the divided area can hold spatial information.

The results (Dense SIFT + Spatial Pyramid + Linear SVM) were below: 

**Train Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|1.0000|
|car Classifier|1.0000|
|horse Classifier|1.0000|
|motorbike Classifier|1.0000|
|person Classifier|1.0000|
|**Average**|**1.0000**|

**Test Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|0.9604|
|car Classifier|0.8294|
|horse Classifier|0.9458|
|motorbike Classifier|0.9430|
|person Classifier|0.6342|
|**Average**|**0.8626**|

It was confirmed that the test accuracy was improved from 0.8322 (Dense SIFT + Linear SVM) to 0.8626 (Dense SIFT + Spatial Pyramid + Linear SVM).



--------------------------------------------------------------------------------
**Problem 7**: Dense SIFT + Spatial Pyramid + Non-Linear SVM
----------------------------------------------------
SVM is a technique to find a hyperplane with maximized margin while classifying two categories well. However, it is difficult to classify actual data linearly.

The basic idea of ​​Kernel-SVM is to map the data from the input space to a feature space that can be linearly separated. In other words, the kernel method (non-linear) maps the given data into a high-dimensional feature space. After being mapped in a high-dimensional space, there is a way to classify it into a linear shape that was not visible in the original dimension. 

There are many types of kernels such as Polynomial kernel, Sigmoid kernel, Gaussian RBF kernel, and I chose Gaussian RBF kernel among them. Although the parameters were used as defaults without empirically finding optimal parameter values ​​through hyperparamter tuning(GridSearch), the performance of Non-Linear SVM improved compared to Linear SVM.

The results (Dense SIFT + Spatial Pyramid + Non-Linear SVM) were below: 

**Train Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|0.9786|
|car Classifier|0.8997|
|horse Classifier|0.9498|
|motorbike Classifier|0.9547|
|person Classifier|0.8936|
|**Average**|**0.9353**|

**Test Accuracy**

|Category|Accuracy|
|:--------:|:-------:|
|aeroplane Classifier|0.9568|
|car Classifier|0.8715|
|horse Classifier|0.9406|
|motorbike Classifier|0.9495|
|person Classifier|0.6952|
|**Average**|**0.8827**| 

It was confirmed that the test accuracy was improved from 0.8626 (Dense SIFT + Spatial Pyramid + Linear SVM) to 0.8827 (Dense SIFT + Spatial Pyramid + Non-Linear SVM).