# Action Recognition @ UCF101  
**Due date: 11:59 pm on Dec. 11, 2018 (Tuesday)**

## Description
---
In this homework, you will be doing action recognition using Recurrent Neural Network (RNN), (Long-Short Term Memory) LSTM in particular. You will be given a dataset called UCF101, which consists of 101 different actions/classes and for each action, there will be 145 samples. We tagged each sample into either training or testing. Each sample is supposed to be a short video, but we sampled 25 frames from each videos to reduce the data amount. Consequently, a training sample is a tuple of 3D volume with one dimension encoding *temporal correlation* between frames and a label indicating what action it is.

To tackle this problem, we aim to build a neural network that can not only capture spatial information of each frame but also temporal information between frames. Fortunately, you don't have to do this on your own. RNN — a type of neural network designed to deal with time-series data — is right here for you to use. In particular, you will be using LSTM for this task.

Instead of training a end-to-end neural network from scratch whose computation is prohibitively expensive for CPUs. We divide this into two steps: feature extraction and modelling. Below are the things you need to implement for this homework:
- **{35 pts} Feature extraction**. Use the pretrained VGG network to extract features from each frame. Specifically, we recommend  to use the activations of the first fully connected layer `torchvision.models.vgg16` (4096 dim) as features of each video frame. This will result into a 4096x25 matrix for each video. 
    **hints**: 
    - use `scipy.io.savemat()` to save feature to '.mat' file and `scipy.io.loadmat()` load feature.
    - norm your images using `torchvision.transforms`
    ```
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    prep = transforms.Compose([ transforms.ToTensor(), normalize ])
    prep(img)
    
    ```
    More detils of image preprocessing in PyTorch can be found at http://pytorch.org/tutorials/beginner/data_loading_tutorial.html
    
- **{35 pts} Modelling**. With the extracted features, build an LSTM network which takes a 4096x25 sample as input, and outputs the action label of that sample.
- **{20 pts} Evaluation**. After training your network, you need to evaluate your model with the testing data by computing the prediction accuracy. Moreover, you need to compare the result of your network with that of support vector machine (SVM) (stacking the 4096x25 feature matrix to a long vector and train a SVM).
- **{10 pts} Report**. Details regarding the report can be found in the submission section below.

Notice that the size of the raw images is 256x340, whereas VGG16 takes 224x224 images as inputs. To solve this problem, instead of resizing the images which unfavorably changes the spatial ratio, we take a better solution: Cropping five 224x224 images at the image center and four corners and compute the 4096-dim VGG16 features for each of them, and average these five 4096-dim feature to get final feature representation for the raw image.

In order to save you computational time, we did the feature extraction of most samples for you except for class 1. For class 1, we provide you with the raw images, and you need to write code to extract the feature of the samples in class 1. Instead of training over the whole dataset on CPUs which mays cost you serval days, **use the first 15** classes of the whole dataset. The same applies to those who have access to GPUs.


## Dataset
Download dataset at [UCF101](http://vision.cs.stonybrook.edu/~yangwang/public/UCF101_dimitris_course.zip). 

The dataset is consist of the following two parts: video images and extracted features.

### 1. Video Images  

UCF101 dataset contains 101 actions and 13,320 videos in total.  

+ `annos/actions.txt`  
  + lists all the actions (`ApplyEyeMakeup`, .., `YoYo`)   
  
+ `annots/videos_labels_subsets.txt`  
  + lists all the videos (`v_000001`, .., `v_013320`)  
  + labels (`1`, .., `101`)  
  + subsets (`1` for train, `2` for test)  

+ `images_class1/`  
  + contains videos belonging to class 1 (`ApplyEyeMakeup`)  
  + each video folder contains 25 frames  


### 2. Video Features

+ `extract_vgg16_relu6.py`  
  + used to extract video features  
     + Given an image (size: 256x340), we get 5 crops (size: 224x224) at the image center and four corners. The `vgg16-relu6` features are extracted for all 5 crops and subsequently averaged to form a single feature vector (size: 4096).  
     + Given a video, we process its 25 images seuqentially. In the end, each video is represented as a feature sequence (size: 4096 x 25).  
  + written in PyTorch; supports both CPU and GPU.  

+ `vgg16_relu6/`  
   + contains all the video features, EXCEPT those belonging to class 1 (`ApplyEyeMakeup`)  
   + you need to run script `extract_vgg16_relu6.py` to complete the feature extracting process   


## Some Tutorials
- Good materials for understanding RNN and LSTM
    - http://blog.echen.me
    - http://karpathy.github.io/2015/05/21/rnn-effectiveness/
    - http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- Implementing RNN and LSTM with PyTorch
    - [LSTM with PyTorch](http://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#sphx-glr-beginner-nlp-sequence-models-tutorial-py)
    - [RNN with PyTorch](http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)

In [25]:
# write your codes here
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision

import numpy as np
import scipy as sp
import pandas as pd
import cv2
from scipy.io import savemat
import os
import glob
import pdb

## Feature extraction

In [2]:
vgg16 = torchvision.models.vgg16(True)

In [3]:
del vgg16.classifier[2:]

In [4]:
normalize = torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
prep = torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), normalize ])


In [5]:
def imgsample_raw(img, size=(224, 224)):
    return (img[:size[0], :size[1], :], 
            img[(img.shape[0]-size[0]):, :size[1], :], 
            img[:size[0], (img.shape[1]-size[1]):, :],
            img[(img.shape[0]-size[0]):, (img.shape[1]-size[1]):, :], 
            img[(img.shape[0]-size[0])//2:(img.shape[0]+size[0])//2, (img.shape[1]-size[1])//2:(img.shape[1]+size[1])//2, :])

In [6]:
def imgsample(img, size=(224, 224)):
    return (img[:, :size[0], :size[1]], img[:, (img.shape[1]-size[0]):, :size[1]], 
            img[:, :size[0], (img.shape[2]-size[1]):],
            img[:, (img.shape[1]-size[0]):, (img.shape[2]-size[1]):], 
            img[:, (img.shape[1]-size[0])//2:(img.shape[1]+size[0])//2, (img.shape[2]-size[1])//2:(img.shape[2]+size[1])//2])

In [12]:
class1folder = 'UCF101_release/images_class1/'
targetfolder = 'UCF101_release/vgg16_relu6/'

for d in os.listdir(class1folder):
    features = []
    print(d)
    for f in os.listdir(os.path.join(class1folder, d)):
        fname = os.path.join(class1folder, d, f)
        print(fname)
        #pdb.set_trace()
        img = prep(cv2.imread(fname))
        with torch.no_grad():
            img = vgg16(torch.stack(imgsample(img)))
            features.append(img.mean(dim=0))
        
    savemat(os.path.join(targetfolder, d+'.mat'), {'Feature':torch.stack(features).numpy()})

v_000059
UCF101_release/images_class1/v_000059/i_0022.jpg
UCF101_release/images_class1/v_000059/i_0014.jpg
UCF101_release/images_class1/v_000059/i_0011.jpg
UCF101_release/images_class1/v_000059/i_0004.jpg
UCF101_release/images_class1/v_000059/i_0006.jpg
UCF101_release/images_class1/v_000059/i_0001.jpg
UCF101_release/images_class1/v_000059/i_0021.jpg
UCF101_release/images_class1/v_000059/i_0002.jpg
UCF101_release/images_class1/v_000059/i_0020.jpg
UCF101_release/images_class1/v_000059/i_0003.jpg
UCF101_release/images_class1/v_000059/i_0009.jpg
UCF101_release/images_class1/v_000059/i_0019.jpg
UCF101_release/images_class1/v_000059/i_0015.jpg
UCF101_release/images_class1/v_000059/i_0013.jpg
UCF101_release/images_class1/v_000059/i_0025.jpg
UCF101_release/images_class1/v_000059/i_0010.jpg
UCF101_release/images_class1/v_000059/i_0023.jpg
UCF101_release/images_class1/v_000059/i_0018.jpg
UCF101_release/images_class1/v_000059/i_0016.jpg
UCF101_release/images_class1/v_000059/i_0024.jpg
UCF101_rele

UCF101_release/images_class1/v_000047/i_0023.jpg
UCF101_release/images_class1/v_000047/i_0018.jpg
UCF101_release/images_class1/v_000047/i_0016.jpg
UCF101_release/images_class1/v_000047/i_0024.jpg
UCF101_release/images_class1/v_000047/i_0007.jpg
UCF101_release/images_class1/v_000047/i_0005.jpg
UCF101_release/images_class1/v_000047/i_0017.jpg
UCF101_release/images_class1/v_000047/i_0012.jpg
UCF101_release/images_class1/v_000047/i_0008.jpg
v_000045
UCF101_release/images_class1/v_000045/i_0022.jpg
UCF101_release/images_class1/v_000045/i_0014.jpg
UCF101_release/images_class1/v_000045/i_0011.jpg
UCF101_release/images_class1/v_000045/i_0004.jpg
UCF101_release/images_class1/v_000045/i_0006.jpg
UCF101_release/images_class1/v_000045/i_0001.jpg
UCF101_release/images_class1/v_000045/i_0021.jpg
UCF101_release/images_class1/v_000045/i_0002.jpg
UCF101_release/images_class1/v_000045/i_0020.jpg
UCF101_release/images_class1/v_000045/i_0003.jpg
UCF101_release/images_class1/v_000045/i_0009.jpg
UCF101_rele

UCF101_release/images_class1/v_000127/i_0002.jpg
UCF101_release/images_class1/v_000127/i_0020.jpg
UCF101_release/images_class1/v_000127/i_0003.jpg
UCF101_release/images_class1/v_000127/i_0009.jpg
UCF101_release/images_class1/v_000127/i_0019.jpg
UCF101_release/images_class1/v_000127/i_0015.jpg
UCF101_release/images_class1/v_000127/i_0013.jpg
UCF101_release/images_class1/v_000127/i_0025.jpg
UCF101_release/images_class1/v_000127/i_0010.jpg
UCF101_release/images_class1/v_000127/i_0023.jpg
UCF101_release/images_class1/v_000127/i_0018.jpg
UCF101_release/images_class1/v_000127/i_0016.jpg
UCF101_release/images_class1/v_000127/i_0024.jpg
UCF101_release/images_class1/v_000127/i_0007.jpg
UCF101_release/images_class1/v_000127/i_0005.jpg
UCF101_release/images_class1/v_000127/i_0017.jpg
UCF101_release/images_class1/v_000127/i_0012.jpg
UCF101_release/images_class1/v_000127/i_0008.jpg
v_000024
UCF101_release/images_class1/v_000024/i_0022.jpg
UCF101_release/images_class1/v_000024/i_0014.jpg
UCF101_rele

UCF101_release/images_class1/v_000105/i_0008.jpg
v_000031
UCF101_release/images_class1/v_000031/i_0022.jpg
UCF101_release/images_class1/v_000031/i_0014.jpg
UCF101_release/images_class1/v_000031/i_0011.jpg
UCF101_release/images_class1/v_000031/i_0004.jpg
UCF101_release/images_class1/v_000031/i_0006.jpg
UCF101_release/images_class1/v_000031/i_0001.jpg
UCF101_release/images_class1/v_000031/i_0021.jpg
UCF101_release/images_class1/v_000031/i_0002.jpg
UCF101_release/images_class1/v_000031/i_0020.jpg
UCF101_release/images_class1/v_000031/i_0003.jpg
UCF101_release/images_class1/v_000031/i_0009.jpg
UCF101_release/images_class1/v_000031/i_0019.jpg
UCF101_release/images_class1/v_000031/i_0015.jpg
UCF101_release/images_class1/v_000031/i_0013.jpg
UCF101_release/images_class1/v_000031/i_0025.jpg
UCF101_release/images_class1/v_000031/i_0010.jpg
UCF101_release/images_class1/v_000031/i_0023.jpg
UCF101_release/images_class1/v_000031/i_0018.jpg
UCF101_release/images_class1/v_000031/i_0016.jpg
UCF101_rele

UCF101_release/images_class1/v_000017/i_0010.jpg
UCF101_release/images_class1/v_000017/i_0023.jpg
UCF101_release/images_class1/v_000017/i_0018.jpg
UCF101_release/images_class1/v_000017/i_0016.jpg
UCF101_release/images_class1/v_000017/i_0024.jpg
UCF101_release/images_class1/v_000017/i_0007.jpg
UCF101_release/images_class1/v_000017/i_0005.jpg
UCF101_release/images_class1/v_000017/i_0017.jpg
UCF101_release/images_class1/v_000017/i_0012.jpg
UCF101_release/images_class1/v_000017/i_0008.jpg
v_000070
UCF101_release/images_class1/v_000070/i_0022.jpg
UCF101_release/images_class1/v_000070/i_0014.jpg
UCF101_release/images_class1/v_000070/i_0011.jpg
UCF101_release/images_class1/v_000070/i_0004.jpg
UCF101_release/images_class1/v_000070/i_0006.jpg
UCF101_release/images_class1/v_000070/i_0001.jpg
UCF101_release/images_class1/v_000070/i_0021.jpg
UCF101_release/images_class1/v_000070/i_0002.jpg
UCF101_release/images_class1/v_000070/i_0020.jpg
UCF101_release/images_class1/v_000070/i_0003.jpg
UCF101_rele

UCF101_release/images_class1/v_000090/i_0021.jpg
UCF101_release/images_class1/v_000090/i_0002.jpg
UCF101_release/images_class1/v_000090/i_0020.jpg
UCF101_release/images_class1/v_000090/i_0003.jpg
UCF101_release/images_class1/v_000090/i_0009.jpg
UCF101_release/images_class1/v_000090/i_0019.jpg
UCF101_release/images_class1/v_000090/i_0015.jpg
UCF101_release/images_class1/v_000090/i_0013.jpg
UCF101_release/images_class1/v_000090/i_0025.jpg
UCF101_release/images_class1/v_000090/i_0010.jpg
UCF101_release/images_class1/v_000090/i_0023.jpg
UCF101_release/images_class1/v_000090/i_0018.jpg
UCF101_release/images_class1/v_000090/i_0016.jpg
UCF101_release/images_class1/v_000090/i_0024.jpg
UCF101_release/images_class1/v_000090/i_0007.jpg
UCF101_release/images_class1/v_000090/i_0005.jpg
UCF101_release/images_class1/v_000090/i_0017.jpg
UCF101_release/images_class1/v_000090/i_0012.jpg
UCF101_release/images_class1/v_000090/i_0008.jpg
v_000046
UCF101_release/images_class1/v_000046/i_0022.jpg
UCF101_rele

UCF101_release/images_class1/v_000091/i_0012.jpg
UCF101_release/images_class1/v_000091/i_0008.jpg
v_000080
UCF101_release/images_class1/v_000080/i_0022.jpg
UCF101_release/images_class1/v_000080/i_0014.jpg
UCF101_release/images_class1/v_000080/i_0011.jpg
UCF101_release/images_class1/v_000080/i_0004.jpg
UCF101_release/images_class1/v_000080/i_0006.jpg
UCF101_release/images_class1/v_000080/i_0001.jpg
UCF101_release/images_class1/v_000080/i_0021.jpg
UCF101_release/images_class1/v_000080/i_0002.jpg
UCF101_release/images_class1/v_000080/i_0020.jpg
UCF101_release/images_class1/v_000080/i_0003.jpg
UCF101_release/images_class1/v_000080/i_0009.jpg
UCF101_release/images_class1/v_000080/i_0019.jpg
UCF101_release/images_class1/v_000080/i_0015.jpg
UCF101_release/images_class1/v_000080/i_0013.jpg
UCF101_release/images_class1/v_000080/i_0025.jpg
UCF101_release/images_class1/v_000080/i_0010.jpg
UCF101_release/images_class1/v_000080/i_0023.jpg
UCF101_release/images_class1/v_000080/i_0018.jpg
UCF101_rele

UCF101_release/images_class1/v_000115/i_0025.jpg
UCF101_release/images_class1/v_000115/i_0010.jpg
UCF101_release/images_class1/v_000115/i_0023.jpg
UCF101_release/images_class1/v_000115/i_0018.jpg
UCF101_release/images_class1/v_000115/i_0016.jpg
UCF101_release/images_class1/v_000115/i_0024.jpg
UCF101_release/images_class1/v_000115/i_0007.jpg
UCF101_release/images_class1/v_000115/i_0005.jpg
UCF101_release/images_class1/v_000115/i_0017.jpg
UCF101_release/images_class1/v_000115/i_0012.jpg
UCF101_release/images_class1/v_000115/i_0008.jpg
v_000130
UCF101_release/images_class1/v_000130/i_0022.jpg
UCF101_release/images_class1/v_000130/i_0014.jpg
UCF101_release/images_class1/v_000130/i_0011.jpg
UCF101_release/images_class1/v_000130/i_0004.jpg
UCF101_release/images_class1/v_000130/i_0006.jpg
UCF101_release/images_class1/v_000130/i_0001.jpg
UCF101_release/images_class1/v_000130/i_0021.jpg
UCF101_release/images_class1/v_000130/i_0002.jpg
UCF101_release/images_class1/v_000130/i_0020.jpg
UCF101_rele

UCF101_release/images_class1/v_000137/i_0001.jpg
UCF101_release/images_class1/v_000137/i_0021.jpg
UCF101_release/images_class1/v_000137/i_0002.jpg
UCF101_release/images_class1/v_000137/i_0020.jpg
UCF101_release/images_class1/v_000137/i_0003.jpg
UCF101_release/images_class1/v_000137/i_0009.jpg
UCF101_release/images_class1/v_000137/i_0019.jpg
UCF101_release/images_class1/v_000137/i_0015.jpg
UCF101_release/images_class1/v_000137/i_0013.jpg
UCF101_release/images_class1/v_000137/i_0025.jpg
UCF101_release/images_class1/v_000137/i_0010.jpg
UCF101_release/images_class1/v_000137/i_0023.jpg
UCF101_release/images_class1/v_000137/i_0018.jpg
UCF101_release/images_class1/v_000137/i_0016.jpg
UCF101_release/images_class1/v_000137/i_0024.jpg
UCF101_release/images_class1/v_000137/i_0007.jpg
UCF101_release/images_class1/v_000137/i_0005.jpg
UCF101_release/images_class1/v_000137/i_0017.jpg
UCF101_release/images_class1/v_000137/i_0012.jpg
UCF101_release/images_class1/v_000137/i_0008.jpg
v_000022
UCF101_rele

UCF101_release/images_class1/v_000006/i_0017.jpg
UCF101_release/images_class1/v_000006/i_0012.jpg
UCF101_release/images_class1/v_000006/i_0008.jpg
v_000111
UCF101_release/images_class1/v_000111/i_0022.jpg
UCF101_release/images_class1/v_000111/i_0014.jpg
UCF101_release/images_class1/v_000111/i_0011.jpg
UCF101_release/images_class1/v_000111/i_0004.jpg
UCF101_release/images_class1/v_000111/i_0006.jpg
UCF101_release/images_class1/v_000111/i_0001.jpg
UCF101_release/images_class1/v_000111/i_0021.jpg
UCF101_release/images_class1/v_000111/i_0002.jpg
UCF101_release/images_class1/v_000111/i_0020.jpg
UCF101_release/images_class1/v_000111/i_0003.jpg
UCF101_release/images_class1/v_000111/i_0009.jpg
UCF101_release/images_class1/v_000111/i_0019.jpg
UCF101_release/images_class1/v_000111/i_0015.jpg
UCF101_release/images_class1/v_000111/i_0013.jpg
UCF101_release/images_class1/v_000111/i_0025.jpg
UCF101_release/images_class1/v_000111/i_0010.jpg
UCF101_release/images_class1/v_000111/i_0023.jpg
UCF101_rele

UCF101_release/images_class1/v_000104/i_0013.jpg
UCF101_release/images_class1/v_000104/i_0025.jpg
UCF101_release/images_class1/v_000104/i_0010.jpg
UCF101_release/images_class1/v_000104/i_0023.jpg
UCF101_release/images_class1/v_000104/i_0018.jpg
UCF101_release/images_class1/v_000104/i_0016.jpg
UCF101_release/images_class1/v_000104/i_0024.jpg
UCF101_release/images_class1/v_000104/i_0007.jpg
UCF101_release/images_class1/v_000104/i_0005.jpg
UCF101_release/images_class1/v_000104/i_0017.jpg
UCF101_release/images_class1/v_000104/i_0012.jpg
UCF101_release/images_class1/v_000104/i_0008.jpg
v_000035
UCF101_release/images_class1/v_000035/i_0022.jpg
UCF101_release/images_class1/v_000035/i_0014.jpg
UCF101_release/images_class1/v_000035/i_0011.jpg
UCF101_release/images_class1/v_000035/i_0004.jpg
UCF101_release/images_class1/v_000035/i_0006.jpg
UCF101_release/images_class1/v_000035/i_0001.jpg
UCF101_release/images_class1/v_000035/i_0021.jpg
UCF101_release/images_class1/v_000035/i_0002.jpg
UCF101_rele

UCF101_release/images_class1/v_000125/i_0006.jpg
UCF101_release/images_class1/v_000125/i_0001.jpg
UCF101_release/images_class1/v_000125/i_0021.jpg
UCF101_release/images_class1/v_000125/i_0002.jpg
UCF101_release/images_class1/v_000125/i_0020.jpg
UCF101_release/images_class1/v_000125/i_0003.jpg
UCF101_release/images_class1/v_000125/i_0009.jpg
UCF101_release/images_class1/v_000125/i_0019.jpg
UCF101_release/images_class1/v_000125/i_0015.jpg
UCF101_release/images_class1/v_000125/i_0013.jpg
UCF101_release/images_class1/v_000125/i_0025.jpg
UCF101_release/images_class1/v_000125/i_0010.jpg
UCF101_release/images_class1/v_000125/i_0023.jpg
UCF101_release/images_class1/v_000125/i_0018.jpg
UCF101_release/images_class1/v_000125/i_0016.jpg
UCF101_release/images_class1/v_000125/i_0024.jpg
UCF101_release/images_class1/v_000125/i_0007.jpg
UCF101_release/images_class1/v_000125/i_0005.jpg
UCF101_release/images_class1/v_000125/i_0017.jpg
UCF101_release/images_class1/v_000125/i_0012.jpg
UCF101_release/image

UCF101_release/images_class1/v_000094/i_0005.jpg
UCF101_release/images_class1/v_000094/i_0017.jpg
UCF101_release/images_class1/v_000094/i_0012.jpg
UCF101_release/images_class1/v_000094/i_0008.jpg
v_000025
UCF101_release/images_class1/v_000025/i_0022.jpg
UCF101_release/images_class1/v_000025/i_0014.jpg
UCF101_release/images_class1/v_000025/i_0011.jpg
UCF101_release/images_class1/v_000025/i_0004.jpg
UCF101_release/images_class1/v_000025/i_0006.jpg
UCF101_release/images_class1/v_000025/i_0001.jpg
UCF101_release/images_class1/v_000025/i_0021.jpg
UCF101_release/images_class1/v_000025/i_0002.jpg
UCF101_release/images_class1/v_000025/i_0020.jpg
UCF101_release/images_class1/v_000025/i_0003.jpg
UCF101_release/images_class1/v_000025/i_0009.jpg
UCF101_release/images_class1/v_000025/i_0019.jpg
UCF101_release/images_class1/v_000025/i_0015.jpg
UCF101_release/images_class1/v_000025/i_0013.jpg
UCF101_release/images_class1/v_000025/i_0025.jpg
UCF101_release/images_class1/v_000025/i_0010.jpg
UCF101_rele

UCF101_release/images_class1/v_000015/i_0015.jpg
UCF101_release/images_class1/v_000015/i_0013.jpg
UCF101_release/images_class1/v_000015/i_0025.jpg
UCF101_release/images_class1/v_000015/i_0010.jpg
UCF101_release/images_class1/v_000015/i_0023.jpg
UCF101_release/images_class1/v_000015/i_0018.jpg
UCF101_release/images_class1/v_000015/i_0016.jpg
UCF101_release/images_class1/v_000015/i_0024.jpg
UCF101_release/images_class1/v_000015/i_0007.jpg
UCF101_release/images_class1/v_000015/i_0005.jpg
UCF101_release/images_class1/v_000015/i_0017.jpg
UCF101_release/images_class1/v_000015/i_0012.jpg
UCF101_release/images_class1/v_000015/i_0008.jpg
v_000095
UCF101_release/images_class1/v_000095/i_0022.jpg
UCF101_release/images_class1/v_000095/i_0014.jpg
UCF101_release/images_class1/v_000095/i_0011.jpg
UCF101_release/images_class1/v_000095/i_0004.jpg
UCF101_release/images_class1/v_000095/i_0006.jpg
UCF101_release/images_class1/v_000095/i_0001.jpg
UCF101_release/images_class1/v_000095/i_0021.jpg
UCF101_rele

UCF101_release/images_class1/v_000050/i_0004.jpg
UCF101_release/images_class1/v_000050/i_0006.jpg
UCF101_release/images_class1/v_000050/i_0001.jpg
UCF101_release/images_class1/v_000050/i_0021.jpg
UCF101_release/images_class1/v_000050/i_0002.jpg
UCF101_release/images_class1/v_000050/i_0020.jpg
UCF101_release/images_class1/v_000050/i_0003.jpg
UCF101_release/images_class1/v_000050/i_0009.jpg
UCF101_release/images_class1/v_000050/i_0019.jpg
UCF101_release/images_class1/v_000050/i_0015.jpg
UCF101_release/images_class1/v_000050/i_0013.jpg
UCF101_release/images_class1/v_000050/i_0025.jpg
UCF101_release/images_class1/v_000050/i_0010.jpg
UCF101_release/images_class1/v_000050/i_0023.jpg
UCF101_release/images_class1/v_000050/i_0018.jpg
UCF101_release/images_class1/v_000050/i_0016.jpg
UCF101_release/images_class1/v_000050/i_0024.jpg
UCF101_release/images_class1/v_000050/i_0007.jpg
UCF101_release/images_class1/v_000050/i_0005.jpg
UCF101_release/images_class1/v_000050/i_0017.jpg
UCF101_release/image

UCF101_release/images_class1/v_000132/i_0007.jpg
UCF101_release/images_class1/v_000132/i_0005.jpg
UCF101_release/images_class1/v_000132/i_0017.jpg
UCF101_release/images_class1/v_000132/i_0012.jpg
UCF101_release/images_class1/v_000132/i_0008.jpg
v_000122
UCF101_release/images_class1/v_000122/i_0022.jpg
UCF101_release/images_class1/v_000122/i_0014.jpg
UCF101_release/images_class1/v_000122/i_0011.jpg
UCF101_release/images_class1/v_000122/i_0004.jpg
UCF101_release/images_class1/v_000122/i_0006.jpg
UCF101_release/images_class1/v_000122/i_0001.jpg
UCF101_release/images_class1/v_000122/i_0021.jpg
UCF101_release/images_class1/v_000122/i_0002.jpg
UCF101_release/images_class1/v_000122/i_0020.jpg
UCF101_release/images_class1/v_000122/i_0003.jpg
UCF101_release/images_class1/v_000122/i_0009.jpg
UCF101_release/images_class1/v_000122/i_0019.jpg
UCF101_release/images_class1/v_000122/i_0015.jpg
UCF101_release/images_class1/v_000122/i_0013.jpg
UCF101_release/images_class1/v_000122/i_0025.jpg
UCF101_rele

UCF101_release/images_class1/v_000131/i_0019.jpg
UCF101_release/images_class1/v_000131/i_0015.jpg
UCF101_release/images_class1/v_000131/i_0013.jpg
UCF101_release/images_class1/v_000131/i_0025.jpg
UCF101_release/images_class1/v_000131/i_0010.jpg
UCF101_release/images_class1/v_000131/i_0023.jpg
UCF101_release/images_class1/v_000131/i_0018.jpg
UCF101_release/images_class1/v_000131/i_0016.jpg
UCF101_release/images_class1/v_000131/i_0024.jpg
UCF101_release/images_class1/v_000131/i_0007.jpg
UCF101_release/images_class1/v_000131/i_0005.jpg
UCF101_release/images_class1/v_000131/i_0017.jpg
UCF101_release/images_class1/v_000131/i_0012.jpg
UCF101_release/images_class1/v_000131/i_0008.jpg
v_000136
UCF101_release/images_class1/v_000136/i_0022.jpg
UCF101_release/images_class1/v_000136/i_0014.jpg
UCF101_release/images_class1/v_000136/i_0011.jpg
UCF101_release/images_class1/v_000136/i_0004.jpg
UCF101_release/images_class1/v_000136/i_0006.jpg
UCF101_release/images_class1/v_000136/i_0001.jpg
UCF101_rele

UCF101_release/images_class1/v_000033/i_0011.jpg
UCF101_release/images_class1/v_000033/i_0004.jpg
UCF101_release/images_class1/v_000033/i_0006.jpg
UCF101_release/images_class1/v_000033/i_0001.jpg
UCF101_release/images_class1/v_000033/i_0021.jpg
UCF101_release/images_class1/v_000033/i_0002.jpg
UCF101_release/images_class1/v_000033/i_0020.jpg
UCF101_release/images_class1/v_000033/i_0003.jpg
UCF101_release/images_class1/v_000033/i_0009.jpg
UCF101_release/images_class1/v_000033/i_0019.jpg
UCF101_release/images_class1/v_000033/i_0015.jpg
UCF101_release/images_class1/v_000033/i_0013.jpg
UCF101_release/images_class1/v_000033/i_0025.jpg
UCF101_release/images_class1/v_000033/i_0010.jpg
UCF101_release/images_class1/v_000033/i_0023.jpg
UCF101_release/images_class1/v_000033/i_0018.jpg
UCF101_release/images_class1/v_000033/i_0016.jpg
UCF101_release/images_class1/v_000033/i_0024.jpg
UCF101_release/images_class1/v_000033/i_0007.jpg
UCF101_release/images_class1/v_000033/i_0005.jpg
UCF101_release/image

UCF101_release/images_class1/v_000043/i_0024.jpg
UCF101_release/images_class1/v_000043/i_0007.jpg
UCF101_release/images_class1/v_000043/i_0005.jpg
UCF101_release/images_class1/v_000043/i_0017.jpg
UCF101_release/images_class1/v_000043/i_0012.jpg
UCF101_release/images_class1/v_000043/i_0008.jpg
v_000041
UCF101_release/images_class1/v_000041/i_0022.jpg
UCF101_release/images_class1/v_000041/i_0014.jpg
UCF101_release/images_class1/v_000041/i_0011.jpg
UCF101_release/images_class1/v_000041/i_0004.jpg
UCF101_release/images_class1/v_000041/i_0006.jpg
UCF101_release/images_class1/v_000041/i_0001.jpg
UCF101_release/images_class1/v_000041/i_0021.jpg
UCF101_release/images_class1/v_000041/i_0002.jpg
UCF101_release/images_class1/v_000041/i_0020.jpg
UCF101_release/images_class1/v_000041/i_0003.jpg
UCF101_release/images_class1/v_000041/i_0009.jpg
UCF101_release/images_class1/v_000041/i_0019.jpg
UCF101_release/images_class1/v_000041/i_0015.jpg
UCF101_release/images_class1/v_000041/i_0013.jpg
UCF101_rele

UCF101_release/images_class1/v_000142/i_0009.jpg
UCF101_release/images_class1/v_000142/i_0019.jpg
UCF101_release/images_class1/v_000142/i_0015.jpg
UCF101_release/images_class1/v_000142/i_0013.jpg
UCF101_release/images_class1/v_000142/i_0025.jpg
UCF101_release/images_class1/v_000142/i_0010.jpg
UCF101_release/images_class1/v_000142/i_0023.jpg
UCF101_release/images_class1/v_000142/i_0018.jpg
UCF101_release/images_class1/v_000142/i_0016.jpg
UCF101_release/images_class1/v_000142/i_0024.jpg
UCF101_release/images_class1/v_000142/i_0007.jpg
UCF101_release/images_class1/v_000142/i_0005.jpg
UCF101_release/images_class1/v_000142/i_0017.jpg
UCF101_release/images_class1/v_000142/i_0012.jpg
UCF101_release/images_class1/v_000142/i_0008.jpg
v_000093
UCF101_release/images_class1/v_000093/i_0022.jpg
UCF101_release/images_class1/v_000093/i_0014.jpg
UCF101_release/images_class1/v_000093/i_0011.jpg
UCF101_release/images_class1/v_000093/i_0004.jpg
UCF101_release/images_class1/v_000093/i_0006.jpg
UCF101_rele

UCF101_release/images_class1/v_000023/i_0014.jpg
UCF101_release/images_class1/v_000023/i_0011.jpg
UCF101_release/images_class1/v_000023/i_0004.jpg
UCF101_release/images_class1/v_000023/i_0006.jpg
UCF101_release/images_class1/v_000023/i_0001.jpg
UCF101_release/images_class1/v_000023/i_0021.jpg
UCF101_release/images_class1/v_000023/i_0002.jpg
UCF101_release/images_class1/v_000023/i_0020.jpg
UCF101_release/images_class1/v_000023/i_0003.jpg
UCF101_release/images_class1/v_000023/i_0009.jpg
UCF101_release/images_class1/v_000023/i_0019.jpg
UCF101_release/images_class1/v_000023/i_0015.jpg
UCF101_release/images_class1/v_000023/i_0013.jpg
UCF101_release/images_class1/v_000023/i_0025.jpg
UCF101_release/images_class1/v_000023/i_0010.jpg
UCF101_release/images_class1/v_000023/i_0023.jpg
UCF101_release/images_class1/v_000023/i_0018.jpg
UCF101_release/images_class1/v_000023/i_0016.jpg
UCF101_release/images_class1/v_000023/i_0024.jpg
UCF101_release/images_class1/v_000023/i_0007.jpg
UCF101_release/image

UCF101_release/images_class1/v_000068/i_0016.jpg
UCF101_release/images_class1/v_000068/i_0024.jpg
UCF101_release/images_class1/v_000068/i_0007.jpg
UCF101_release/images_class1/v_000068/i_0005.jpg
UCF101_release/images_class1/v_000068/i_0017.jpg
UCF101_release/images_class1/v_000068/i_0012.jpg
UCF101_release/images_class1/v_000068/i_0008.jpg
v_000116
UCF101_release/images_class1/v_000116/i_0022.jpg
UCF101_release/images_class1/v_000116/i_0014.jpg
UCF101_release/images_class1/v_000116/i_0011.jpg
UCF101_release/images_class1/v_000116/i_0004.jpg
UCF101_release/images_class1/v_000116/i_0006.jpg
UCF101_release/images_class1/v_000116/i_0001.jpg
UCF101_release/images_class1/v_000116/i_0021.jpg
UCF101_release/images_class1/v_000116/i_0002.jpg
UCF101_release/images_class1/v_000116/i_0020.jpg
UCF101_release/images_class1/v_000116/i_0003.jpg
UCF101_release/images_class1/v_000116/i_0009.jpg
UCF101_release/images_class1/v_000116/i_0019.jpg
UCF101_release/images_class1/v_000116/i_0015.jpg
UCF101_rele

## Training

In [96]:
num_subclass = 15
num_hidden = 32

In [147]:
class LSTMAction(nn.Module):
    def __init__(self, feature_dim, hidden_dim, action_size):
        super(LSTMAction, self).__init__()
        self.hidden_dim = hidden_dim
        
        self.lstm = nn.LSTM(feature_dim, hidden_dim)

        # The linear layer that maps from hidden state space to action space
        self.hidden2tag = nn.Linear(hidden_dim, action_size)
        self.hidden = self.init_hidden()

    def init_hidden(self):
        # The axes semantics are (num_layers, minibatch_size, hidden_dim)
        return (torch.zeros(1, 1, self.hidden_dim),
                torch.zeros(1, 1, self.hidden_dim))

    def forward(self, video):
        lstm_out, self.hidden = self.lstm(
            video.view(len(video), 1, -1), self.hidden)
        #pdb.set_trace()
        tag_space = self.hidden2tag(self.hidden[0])
        #tag_scores = F.log_softmax(tag_space)
        return tag_space


In [72]:
# load data
anno = pd.read_csv("UCF101_release/annos/videos_labels_subsets.txt", header=None, sep='\t')

In [73]:
#anno[anno[1]==1].groupby(2).size()

In [74]:
# keep only 15 classes to train and test
anno = anno[anno[1] <= num_subclass]

In [200]:
# load into train and test data
train = []
test = []
for _, line in anno.iterrows():
    val = torch.tensor(line[1]-1)#torch.zeros(num_subclass, dtype=torch.long)
    #val[line[1]-1] = 1
    
    dat = torch.tensor(sp.io.loadmat(os.path.join(targetfolder, line[0]+'.mat'))['Feature'])
    if line[2]==1:
        # training data
        train.append((dat, val))
    else:
        test.append((dat, val))
    #print(line[0])

In [214]:
len(train), len(test)

(1442, 568)

In [148]:
model = LSTMAction(4096, num_hidden, num_subclass)

In [149]:
loss_function = nn.CrossEntropyLoss() #nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)#nn.CrossEntropyLoss() #

In [150]:
model(torch.tensor(train[0][0]))

tensor([[[ 0.0895,  0.1018, -0.0164,  0.0766, -0.0089, -0.1175,  0.3437,
           0.2227,  0.0827,  0.3466,  0.1245, -0.5249,  0.0762,  0.1486,
           0.3947]]], grad_fn=<ThAddBackward>)

In [151]:
for epoch in range(100):
    print("epoch: %d" % epoch)
    for video, tags in train:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Also, we need to clear out the hidden state of the LSTM,
        # detaching it from its history on the last instance.
        model.hidden = model.init_hidden()

        tag_scores = model(video)

        # Step 4. Compute the loss, gradients, and update the parameters by
        #  calling optimizer.step()
        loss = loss_function(tag_scores.view(1, num_subclass), tags.view(1))#tags.view(1, len(tags)))
        loss.backward()
        optimizer.step()


epoch: 0
epoch: 1
epoch: 2
epoch: 3
epoch: 4
epoch: 5
epoch: 6
epoch: 7
epoch: 8
epoch: 9
epoch: 10
epoch: 11
epoch: 12
epoch: 13
epoch: 14
epoch: 15
epoch: 16
epoch: 17
epoch: 18
epoch: 19
epoch: 20
epoch: 21
epoch: 22
epoch: 23
epoch: 24
epoch: 25
epoch: 26
epoch: 27
epoch: 28
epoch: 29
epoch: 30
epoch: 31
epoch: 32
epoch: 33
epoch: 34
epoch: 35
epoch: 36
epoch: 37
epoch: 38
epoch: 39
epoch: 40
epoch: 41
epoch: 42
epoch: 43
epoch: 44
epoch: 45
epoch: 46
epoch: 47
epoch: 48
epoch: 49
epoch: 50
epoch: 51
epoch: 52
epoch: 53
epoch: 54
epoch: 55
epoch: 56
epoch: 57
epoch: 58
epoch: 59
epoch: 60
epoch: 61
epoch: 62
epoch: 63
epoch: 64
epoch: 65
epoch: 66
epoch: 67
epoch: 68
epoch: 69
epoch: 70
epoch: 71
epoch: 72
epoch: 73
epoch: 74
epoch: 75
epoch: 76
epoch: 77
epoch: 78
epoch: 79
epoch: 80
epoch: 81
epoch: 82
epoch: 83
epoch: 84
epoch: 85
epoch: 86
epoch: 87
epoch: 88
epoch: 89
epoch: 90
epoch: 91
epoch: 92
epoch: 93
epoch: 94
epoch: 95
epoch: 96
epoch: 97
epoch: 98
epoch: 99


In [146]:
video

tensor([[0.3634, 0.3389, 0.1276,  ..., 1.1072, 1.3218, 1.0727],
        [0.4031, 0.1988, 0.3807,  ..., 1.0267, 0.6490, 0.9582],
        [0.6928, 0.5587, 0.2917,  ..., 1.2329, 0.8478, 0.5353],
        ...,
        [0.8513, 0.0559, 0.3609,  ..., 0.5479, 1.1266, 1.1944],
        [0.6365, 0.1056, 0.2076,  ..., 1.5774, 1.2476, 0.8031],
        [0.5580, 0.5953, 0.2271,  ..., 0.9218, 0.7570, 0.8148]])

In [142]:
epoch

64

## Test

In [166]:
# ==========================================
#            Evaluating Network
# ==========================================
correct = 0
total = 0
with torch.no_grad():
    for data in test:
        video, labels = data
        outputs = model(video)
        predicted = np.argmax(outputs.numpy())#torch.max(outputs.data, 1)
        total += 1#labels.size(0)
        correct += (predicted == labels-1).sum().item()

print('Accuracy of the network on the test videos: %d %%' % (
    100 * correct / total))

Accuracy of the network on the test videos: 44 %


In [167]:
from sklearn.svm import LinearSVC

In [183]:
svctrain = []
svclabel = []
for dat, label in train:
    svctrain.append(dat.view(1, -1))
    svclabel.append(label.view(1, -1))

In [185]:
svctrain = torch.cat(svctrain).numpy()

svclabel = torch.cat(svclabel).view(-1).numpy()

In [189]:
svcmodel = LinearSVC()

In [190]:
svcmodel.fit(svctrain, svclabel)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

In [201]:
svctest = []
svctestlabel = []
for dat, label in test:
    svctest.append(dat.view(1, -1))
    svctestlabel.append(label)

In [195]:
svctest = torch.cat(svctest).numpy()

svctestlabel = np.array(svctestlabel)

In [196]:
svctest_results = svcmodel.predict(svctest)

In [213]:
print("Accuracy for SVC model: %f %%" % (100.0*np.sum(svctest_results == svctestlabel) / len(svctest_results)))

Accuracy for SVC model: 95.422535 %


## Submission
---
**Runnable source code in ipynb file and a pdf report are required**.

The report should be of 3 to 4 pages describing what you have done and learned in this homework and report performance of your model. If you have tried multiple methods, please compare your results. If you are using any external code, please cite it in your report. Note that this homework is designed to help you explore and get familiar with the techniques. The final grading will be largely based on your prediction accuracy and the different methods you tried (different architectures and parameters).

Please indicate clearly in your report what model you have tried, what techniques you applied to improve the performance and report their accuracies. The report should be concise and include the highlights of your efforts.