# Practice With Feature Extraction

In this exercise, you will load the project dataset into colab and extract HOG features (histogram of oriented gradients).

First we will just extract and visualize the features for a single image, and then we will average together features from multiple images from each class, to see if there are any patterns in the features across classes.

Working in pairs, complete each of the TO DO's listed in the notebook below. If you have time, go back and try the optional variations. If you get stuck, raise your hand and we will come around to help you.

In [None]:
from google.colab import drive
import numpy as np
import matplotlib.pyplot as plt
from skimage.feature import hog
from random import sample
from glob import glob

# Part 1 -- Extract HOG Features From A Single Image

### **TO DO**: Load the Dataset

Use the `drive.mount` function to mount your main folder, then use the bash command `!unzip "Path/to/zip/file.zip"` to unzip the dataset into your current working directory.

In [None]:
drive.mount('/content/drive/')

Mounted at /content/drive/


In [None]:
!unzip "/content/drive/MyDrive/CS 369 Shared Folder/Project Resources/Intel Training Dataset.zip"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: Intel Training Dataset/mountain/3872.jpg  
  inflating: __MACOSX/Intel Training Dataset/mountain/._3872.jpg  
  inflating: Intel Training Dataset/mountain/3641.jpg  
  inflating: __MACOSX/Intel Training Dataset/mountain/._3641.jpg  
  inflating: Intel Training Dataset/mountain/15160.jpg  
  inflating: __MACOSX/Intel Training Dataset/mountain/._15160.jpg  
  inflating: Intel Training Dataset/mountain/13277.jpg  
  inflating: __MACOSX/Intel Training Dataset/mountain/._13277.jpg  
  inflating: Intel Training Dataset/mountain/14518.jpg  
  inflating: __MACOSX/Intel Training Dataset/mountain/._14518.jpg  
  inflating: Intel Training Dataset/mountain/15606.jpg  
  inflating: __MACOSX/Intel Training Dataset/mountain/._15606.jpg  
  inflating: Intel Training Dataset/mountain/11460.jpg  
  inflating: __MACOSX/Intel Training Dataset/mountain/._11460.jpg  
  inflating: Intel Training Dataset/mountain/18322.jpg  
  infla

In [None]:
# If you completed the previous step correctly, this cell should print out
# the list of class names from the dataset.

# Path to Dataset
root_path = './Intel Training Dataset/'

# split into subfolders based on class label
subfolders = sorted(glob(root_path + '*'))
label_names = [p.split('/')[-1] for p in subfolders]
print(label_names)

['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']


### **TO DO**: Load and Visualize One Image

Use `plt.imread` and `plt.imshow` to load and visualize the `0th` image in the `0th` subfolder of the dataset.

(hint: the `subfolders` variable contains a list of paths to each subfolder in the dataset. You can use the `glob` and `sorted` to get a sorted list of all the filenames in a particular folder.)

### **TO DO**: Extract a HOG feature vector from the image

Use the `hog` function (imported above) to generate a HOG feature vector for this image. To start, you can use the following parameters:
```
orientations = 4
pixels_per_cell = (30, 30)
cells_per_block = (1, 1)
visualize = True
channel_axis = -1
```
NOTE! When visualize = True, the `hog` function returns _two_ arguments!

### **TO DO**: Examine the HOG features

Print out the shape of the feature vector, then use imshow to visualize the original image and the HOG image side-by-side.

### _Optional Variation_:

Try using different parameters for your HOG feature extraction. For example, increase the number of orientations and decrease the number of pixels per cell.

You can add a third sub-plot to your figure above, to see the original and the two HOG image variations side-by-side.

# Part 2 -- Compare Average HOG features Across Classes

### **TO DO**: Average HOG features for 10 randomly selected images per class

Looping over the subfolders, use `glob` to get a list of filepaths for all the files. Then use the `sample` function (imported above) to select a random subset from the total number of images in each folder.



### _Optional Variation_:

Try re-running your visualization with a different random subset. Are the results consistent? How do the results change if you increase the size of the subset (e.g. from 10 to 20)?

In [None]:
# Visualize average feature vector per class
# You need to have created the following variables:
# label_names, all_data (where each row of all_data corresponds to the mean
# HOG feature for each class)

fig, ax = plt.subplots(nrows=2, ncols=len(label_names)//2, figsize=(8,7))
for i in range(len(label_names)):
  ax[i%2, i//2].plot(all_data[i])
  ax[i%2, i//2].set_ylim([0,1])
  ax[i%2, i//2].set_title(label_names[i])
plt.show()