# Action Recognition @ UCF101  
**Due date: 11:59 pm on Nov. 19, 2019 (Tuesday)**

## Description
---
In this homework, you will be doing action recognition using Recurrent Neural Network (RNN), (Long-Short Term Memory) LSTM in particular. You will be given a dataset called UCF101, which consists of 101 different actions/classes and for each action, there will be 145 samples. We tagged each sample into either training or testing. Each sample is supposed to be a short video, but we sampled 25 frames from each videos to reduce the amount of data. Consequently, a training sample is an image tuple that forms a 3D volume with one dimension encoding *temporal correlation* between frames and a label indicating what action it is.

To tackle this problem, we aim to build a neural network that can not only capture spatial information of each frame but also temporal information between frames. Fortunately, you don't have to do this on your own. RNN — a type of neural network designed to deal with time-series data — is right here for you to use. In particular, you will be using LSTM for this task.

Instead of training an end-to-end neural network from scratch whose computation is prohibitively expensive, we divide this into two steps: feature extraction and modelling. Below are the things you need to implement for this homework:
- **{35 pts} Feature extraction**. Use any of the [pre-trained models](https://pytorch.org/docs/stable/torchvision/models.html) to extract features from each frame. Specifically, we recommend not to use the activations of the last layer as the features tend to be task specific towards the end of the network. 
    **hints**: 
    - A good starting point would be to use a pre-trained VGG16 network, we suggest first fully connected layer `torchvision.models.vgg16` (4096 dim) as features of each video frame. This will result into a 4096x25 matrix for each video. 
    - Normalize your images using `torchvision.transforms` 
    ```
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    prep = transforms.Compose([ transforms.ToTensor(), normalize ])
    prep(img)
    The mean and std. mentioned above is specific to Imagenet data
    
    ```
    More details of image preprocessing in PyTorch can be found at http://pytorch.org/tutorials/beginner/data_loading_tutorial.html
    
- **{35 pts} Modelling**. With the extracted features, build an LSTM network which takes a **dx25** sample as input (where **d** is the dimension of the extracted feature for each frame), and outputs the action label of that sample.
- **{20 pts} Evaluation**. After training your network, you need to evaluate your model with the testing data by computing the prediction accuracy **(5 points)**. The baseline test accuracy for this data is 75%, and **10 points** out of 20 is for achieving test accuracy greater than the baseline. Moreover, you need to compare **(5 points)** the result of your network with that of support vector machine (SVM) (stacking the **dx25** feature matrix to a long vector and train a SVM).
- **{10 pts} Report**. Details regarding the report can be found in the submission section below.

Notice that the size of the raw images is 256x340, whereas your pre-trained model might take **nxn** images as inputs. To solve this problem, instead of resizing the images which unfavorably changes the spatial ratio, we take a better solution: Cropping five **nxn** images, one at the image center and four at the corners and compute the **d**-dim features for each of them, and average these five **d**-dim feature to get a final feature representation for the raw image.
For example, VGG takes 224x224 images as inputs, so we take the five 224x224 croppings of the image, compute 4096-dim VGG features for each of them, and then take the mean of these five 4096-dim vectors to be the representation of the image.

In order to save you computational time, you need to do the classification task only for **the first 25** classes of the whole dataset. The same applies to those who have access to GPUs. **Bonus 10 points for running and reporting on the entire 101 classes.**


## Dataset
Download **dataset** at [UCF101](http://vision.cs.stonybrook.edu/~yangwang/public/UCF101_images.tar)(Image data for each video) and the **annos folder** which has the video labels and the label to class name mapping is included in the assignment folder uploaded. 


UCF101 dataset contains 101 actions and 13,320 videos in total.  

+ `annos/actions.txt`  
  + lists all the actions (`ApplyEyeMakeup`, .., `YoYo`)   
  
+ `annots/videos_labels_subsets.txt`  
  + lists all the videos (`v_000001`, .., `v_013320`)  
  + labels (`1`, .., `101`)  
  + subsets (`1` for train, `2` for test)  

+ `images/`  
  + each folder represents a video
  + the video/folder name to class mapping can be found using `annots/videos_labels_subsets.txt`, for e.g. `v_000001` belongs to class 1 i.e. `ApplyEyeMakeup`
  + each video folder contains 25 frames  



## Some Tutorials
- Good materials for understanding RNN and LSTM
    - http://blog.echen.me
    - http://karpathy.github.io/2015/05/21/rnn-effectiveness/
    - http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- Implementing RNN and LSTM with PyTorch
    - [LSTM with PyTorch](http://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#sphx-glr-beginner-nlp-sequence-models-tutorial-py)
    - [RNN with PyTorch](http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)

In [2]:
#importing the necessities

import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
import random # to randomize our batch data

import torch
import torchvision
import torchvision.transforms as transforms 
import torchvision.models as models 

from torch.autograd import Variable 
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import itertools
from scipy.io import savemat 
from scipy.io import loadmat 
from sklearn.svm import LinearSVC
from google.colab import drive

import time # to record our work's time

print ('OpenCV version = ' + cv2.__version__)


OpenCV version = 3.4.3


In [2]:
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [3]:
cd '/content/gdrive/My Drive/Kurakula_Mohith_112504214_HW5'

/content/gdrive/My Drive/Kurakula_Mohith_112504214_HW5


In [0]:
#for crashing the colab so that Ram will be extended to 25 GB 
#d=[]
#while(1):
 # d.append('1')

**Getting the video names of the Train data and test data for both 101 classes and 25 classes**

In [24]:

def videos_labels_subsets(file):
  videos_labels_subsets=[]
  total=[]
  filename=str(file)
  with open(filename) as name:
    for i in name:
      split= i.strip().split(',')
      for j in split:
        row=j.strip().split('\t')
        total.append(row)  
        if (int(row[1]) <= 25):
          videos_labels_subsets.append(row)

  return data(videos_labels_subsets,total)

def data(file1,file2):
  train_data=[]
  test_data=[]
  train_labels=[]
  total_train=[]
  total_test=[]
  test_labels=[]
  video_class_name=[]
  total_train_labels=[]
  total_test_labels=[]
  for i in file1:
    video_class_name.append(i[0])
    if (i[2] == '1'):
      train_data.append(i[0])
      train_labels.append(int(i[1]))
    else:
      test_data.append(i[0])
      test_labels.append(int(i[1]))
  for i in file2:
    video_class_name.append(i[0])
    if (i[2] == '1'):
      total_train.append(i[0])
      total_train_labels.append(int(i[1]))
    else:
      total_test.append(i[0])
      total_test_labels.append(int(i[1]))

  return train_data, train_labels, test_data, test_labels,total_train,total_test,total_train_labels,total_test_labels,video_class_name




train_data, train_labels, test_data, test_labels,total_train,total_test,total_train_labels,total_test_labels,video_class = videos_labels_subsets('annos/videos_labels_subsets.txt')
print(video_class)

['v_000001', 'v_000002', 'v_000003', 'v_000004', 'v_000005', 'v_000006', 'v_000007', 'v_000008', 'v_000009', 'v_000010', 'v_000011', 'v_000012', 'v_000013', 'v_000014', 'v_000015', 'v_000016', 'v_000017', 'v_000018', 'v_000019', 'v_000020', 'v_000021', 'v_000022', 'v_000023', 'v_000024', 'v_000025', 'v_000026', 'v_000027', 'v_000028', 'v_000029', 'v_000030', 'v_000031', 'v_000032', 'v_000033', 'v_000034', 'v_000035', 'v_000036', 'v_000037', 'v_000038', 'v_000039', 'v_000040', 'v_000041', 'v_000042', 'v_000043', 'v_000044', 'v_000045', 'v_000046', 'v_000047', 'v_000048', 'v_000049', 'v_000050', 'v_000051', 'v_000052', 'v_000053', 'v_000054', 'v_000055', 'v_000056', 'v_000057', 'v_000058', 'v_000059', 'v_000060', 'v_000061', 'v_000062', 'v_000063', 'v_000064', 'v_000065', 'v_000066', 'v_000067', 'v_000068', 'v_000069', 'v_000070', 'v_000071', 'v_000072', 'v_000073', 'v_000074', 'v_000075', 'v_000076', 'v_000077', 'v_000078', 'v_000079', 'v_000080', 'v_000081', 'v_000082', 'v_000083', 'v_

**Length of the Trainin and Testing Data for both 101 and 25 classes**

In [5]:
print(train_data)
print(test_data)
print(total_train)
print(total_test)

print("Total Training Data for the 101 classes: {}".format(len(total_train)))
print("Total Testing Data for the 101 classes: {}".format(len(total_test)))
print("Total Training Data for the 25 classes: {}".format(len(train_data)))
print("Total Testing Data for the 25 classes: {}".format(len(test_data)))


['v_000045', 'v_000046', 'v_000047', 'v_000048', 'v_000049', 'v_000050', 'v_000051', 'v_000052', 'v_000053', 'v_000054', 'v_000055', 'v_000056', 'v_000057', 'v_000058', 'v_000059', 'v_000060', 'v_000061', 'v_000062', 'v_000063', 'v_000064', 'v_000065', 'v_000066', 'v_000067', 'v_000068', 'v_000069', 'v_000070', 'v_000071', 'v_000072', 'v_000073', 'v_000074', 'v_000075', 'v_000076', 'v_000077', 'v_000078', 'v_000079', 'v_000080', 'v_000081', 'v_000082', 'v_000083', 'v_000084', 'v_000085', 'v_000086', 'v_000087', 'v_000088', 'v_000089', 'v_000090', 'v_000091', 'v_000092', 'v_000093', 'v_000094', 'v_000095', 'v_000096', 'v_000097', 'v_000098', 'v_000099', 'v_000100', 'v_000101', 'v_000102', 'v_000103', 'v_000104', 'v_000105', 'v_000106', 'v_000107', 'v_000108', 'v_000109', 'v_000110', 'v_000111', 'v_000112', 'v_000113', 'v_000114', 'v_000115', 'v_000116', 'v_000117', 'v_000118', 'v_000119', 'v_000120', 'v_000121', 'v_000122', 'v_000123', 'v_000124', 'v_000125', 'v_000126', 'v_000127', 'v_

**Getting the cropped images**

In [0]:
def img_norm(img):
    size = (224,224)
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    prep = transforms.Compose([ transforms.ToTensor(), normalize ])
    return prep(img) 

def cropping(image, image_size):
    size_height, size_width = image_size
    crop_parts = []
    is_tensor = torch.is_tensor(image)
    if (is_tensor):
        temp = image.numpy()
        temp = np.transpose(temp, [1, 2, 0])
        image = temp
        
    height, width, color = image.shape
    crop_parts.append(image[0:size_height, 0:size_width]) 
    crop_parts.append(image[0:size_height, (width-size_width):width]) 
    crop_parts.append(image[(height-size_height):height, 0:size_width])
    crop_parts.append(image[(height-size_height):height, (width-size_width):width])
    mid_x = int(width/2 - size_width/2)
    mid_y = int(height/2 - size_height/2)
    crop_parts.append(image[mid_y:(mid_y + size_height), mid_x:(mid_x + size_width)])

    return crop_parts
def prep_image(image, image_size):
    return (cropping(img_norm(image), image_size))

**Because of the crashing of colab when loading large data i used the below code for getting the first 150 and then next 150 and so on**

In [25]:
##class_names = [name[len('images/'):] for name in glob.glob('images/*')]
#print(class_names)
#class_names = dict(zip(range(len(class_names)), class_names))
#class_names = dict(zip(range(len(video_class)), video_class))
#print(class_names)
#print(len(class_names))
class1=dict(zip(range(len(video_class[0:150])), video_class[0:150]))
#class2=dict(zip(range(len(video_class[900:1800])), video_class[900:1800]))
#class3=dict(zip(range(len(video_class[1800:2700])), video_class[1800:2700]))
class4=dict(zip(range(len(video_class[3300:])), video_class[3300:]))
print(class1)
print(class4)





{0: 'v_000001', 1: 'v_000002', 2: 'v_000003', 3: 'v_000004', 4: 'v_000005', 5: 'v_000006', 6: 'v_000007', 7: 'v_000008', 8: 'v_000009', 9: 'v_000010', 10: 'v_000011', 11: 'v_000012', 12: 'v_000013', 13: 'v_000014', 14: 'v_000015', 15: 'v_000016', 16: 'v_000017', 17: 'v_000018', 18: 'v_000019', 19: 'v_000020', 20: 'v_000021', 21: 'v_000022', 22: 'v_000023', 23: 'v_000024', 24: 'v_000025', 25: 'v_000026', 26: 'v_000027', 27: 'v_000028', 28: 'v_000029', 29: 'v_000030', 30: 'v_000031', 31: 'v_000032', 32: 'v_000033', 33: 'v_000034', 34: 'v_000035', 35: 'v_000036', 36: 'v_000037', 37: 'v_000038', 38: 'v_000039', 39: 'v_000040', 40: 'v_000041', 41: 'v_000042', 42: 'v_000043', 43: 'v_000044', 44: 'v_000045', 45: 'v_000046', 46: 'v_000047', 47: 'v_000048', 48: 'v_000049', 49: 'v_000050', 50: 'v_000051', 51: 'v_000052', 52: 'v_000053', 53: 'v_000054', 54: 'v_000055', 55: 'v_000056', 56: 'v_000057', 57: 'v_000058', 58: 'v_000059', 59: 'v_000060', 60: 'v_000061', 61: 'v_000062', 62: 'v_000063', 6

In [0]:
def load_dataset(class_names,path, img_size, batch_num=1, is_color=False):
  data = []
  if is_color:
    channel_num = 3
  else:
    channel_num = 1
  for id, class_name in class_names.items():
    print("loading {}".format(class_name))
    img_path_class = glob.glob(path + class_name + '/*.jpg')
    for filename in img_path_class:
      img = cv2.imread(filename) 
      crops = prep_image(img, img_size)
      for i in crops: 
        data.append(np.transpose(i, [2, 0, 1]))
                
    
  if batch_num > 1:
    batch_data = []
    for i in range(int(len(data) / batch_num)):
      minibatch_d = np.array(data[i*batch_num: (i+1)*batch_num])
      minibatch_d = np.reshape(minibatch_d, (batch_num, channel_num, img_size[0], img_size[1]))
      batch_data.append(torch.from_numpy(minibatch_d))
    data = batch_data
        
  return data

**As the colab is crashing if i am loading large data i took 150 videos each time loaded the videos and extracted the features each time the loading of the first 150 videos and the feature saving for the first 150 and last 60 videos are displayed in the colab notebook**

In [0]:
start = time.clock()
img_size = (224, 224)
print(class1)
data1 = load_dataset(class4,'images/', img_size, batch_num=5, is_color=True)
end = time.clock()
data1_num = len(data1)
print("Time Taken: ", int((end-start)//60), "min")

{0: 'v_000001', 1: 'v_000002', 2: 'v_000003', 3: 'v_000004', 4: 'v_000005', 5: 'v_000006', 6: 'v_000007', 7: 'v_000008', 8: 'v_000009', 9: 'v_000010', 10: 'v_000011', 11: 'v_000012', 12: 'v_000013', 13: 'v_000014', 14: 'v_000015', 15: 'v_000016', 16: 'v_000017', 17: 'v_000018', 18: 'v_000019', 19: 'v_000020', 20: 'v_000021', 21: 'v_000022', 22: 'v_000023', 23: 'v_000024', 24: 'v_000025', 25: 'v_000026', 26: 'v_000027', 27: 'v_000028', 28: 'v_000029', 29: 'v_000030', 30: 'v_000031', 31: 'v_000032', 32: 'v_000033', 33: 'v_000034', 34: 'v_000035', 35: 'v_000036', 36: 'v_000037', 37: 'v_000038', 38: 'v_000039', 39: 'v_000040', 40: 'v_000041', 41: 'v_000042', 42: 'v_000043', 43: 'v_000044', 44: 'v_000045', 45: 'v_000046', 46: 'v_000047', 47: 'v_000048', 48: 'v_000049', 49: 'v_000050', 50: 'v_000051', 51: 'v_000052', 52: 'v_000053', 53: 'v_000054', 54: 'v_000055', 55: 'v_000056', 56: 'v_000057', 57: 'v_000058', 58: 'v_000059', 59: 'v_000060', 60: 'v_000061', 61: 'v_000062', 62: 'v_000063', 6

---
---
## **Problem 1.** Feature extraction

**VGG**

In [11]:
vgg = models.vgg16(pretrained=True) 
print (vgg)
net=list(vgg.classifier)[0:2]
net=nn.Sequential(*net)
print("***************after Modifying***************")
vgg.classifier= net
print(net)

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

**Feature Extraction**

In [0]:
def features(data, i):
    data = np.array(data)
    batch_num = 25  
    for j in range(int(len(data) /batch_num)): 
        feature = np.array(data[j*batch_num: (j+1)*batch_num],dtype = 'float32')
        features_dict = {}
        features_dict['Feature'] =  feature
        video_name = class1[i]
        i=i+1
        savemat('features/'+ video_name, features_dict) 
def outputs_feat(outs):
    data = []
    for i in outs: 
        data.append((i.data).numpy())
    return (np.sum(data, axis = 0)/len(outs)).tolist() 


In [0]:
start_time = time.clock()

features_extract = [] 
arr= [] 
i=0

for data in data1: 
    images = data 
    output = vgg(Variable(images)) 
    for out in output: 
        arr.append(out)
    del images
    del output
    
    features_extract.append(outputs_feat(arr)) 
    arr = [] 
    if (len(features_extract) == 25): 
        features(features_extract,i) 
        print ("Video saved: {}".format(class1[i])) 
        i=i+1
        features_extract = [] 
    
end_time = time.clock()
     

	 Video saved: v_000001
	 Video saved: v_000002
	 Video saved: v_000003
	 Video saved: v_000004
	 Video saved: v_000005
	 Video saved: v_000006
	 Video saved: v_000007
	 Video saved: v_000008
	 Video saved: v_000009
	 Video saved: v_000010
	 Video saved: v_000011
	 Video saved: v_000012
	 Video saved: v_000013
	 Video saved: v_000014
	 Video saved: v_000015
	 Video saved: v_000016
	 Video saved: v_000017
	 Video saved: v_000018
	 Video saved: v_000019
	 Video saved: v_000020
	 Video saved: v_000021
	 Video saved: v_000022
	 Video saved: v_000023
	 Video saved: v_000024
	 Video saved: v_000025
	 Video saved: v_000026
	 Video saved: v_000027
	 Video saved: v_000028
	 Video saved: v_000029
	 Video saved: v_000030
	 Video saved: v_000031
	 Video saved: v_000032
	 Video saved: v_000033
	 Video saved: v_000034
	 Video saved: v_000035
	 Video saved: v_000036
	 Video saved: v_000037
	 Video saved: v_000038
	 Video saved: v_000039
	 Video saved: v_000040
	 Video saved: v_000041
	 Video saved: v

In [0]:
print("Time Taken: {} min ".format(int((end_time-start_time)//60)))

Time Taken:  145 min


In [0]:
start_time = time.clock()

features_extract = [] 
arr= [] 
i=0

for data in data1: 
    images = data 
    output = vgg(Variable(images)) 
    for out in output: 
        arr.append(out)
    del images
    del output
    
    features_extract.append(outputs_feat(arr)) 
    arr = [] 
    if (len(features_extract) == 25): 
        features(features_extract,i) 
        print ("\t Video saved: " + class4[i]) 
        i=i+1
        features_extract = [] 
    
end_time = time.clock()
     

	 Video saved: v_003301
	 Video saved: v_003302
	 Video saved: v_003303
	 Video saved: v_003304
	 Video saved: v_003305
	 Video saved: v_003306
	 Video saved: v_003307
	 Video saved: v_003308
	 Video saved: v_003309
	 Video saved: v_003310
	 Video saved: v_003311
	 Video saved: v_003312
	 Video saved: v_003313
	 Video saved: v_003314
	 Video saved: v_003315
	 Video saved: v_003316
	 Video saved: v_003317
	 Video saved: v_003318
	 Video saved: v_003319
	 Video saved: v_003320
	 Video saved: v_003321
	 Video saved: v_003322
	 Video saved: v_003323
	 Video saved: v_003324
	 Video saved: v_003325
	 Video saved: v_003326
	 Video saved: v_003327
	 Video saved: v_003328
	 Video saved: v_003329
	 Video saved: v_003330
	 Video saved: v_003331
	 Video saved: v_003332
	 Video saved: v_003333
	 Video saved: v_003334
	 Video saved: v_003335
	 Video saved: v_003336
	 Video saved: v_003337
	 Video saved: v_003338
	 Video saved: v_003339
	 Video saved: v_003340
	 Video saved: v_003341
	 Video saved: v

***
***
## **Problem 2.** Modelling

* ##### **Print the size of your training and test data**

**Loading the extracted features from the drive**




In [0]:
def data_features(dataset_path, size, feat_name, label, batch_num):
  data = []
  labels = label
  data_names=[]
  for data_name in feat_name:
    data_names.append(data_name)
    feat = (loadmat( dataset_path + data_name +'.mat'))['Feature']
    data.append(feat)    
  if batch_num >= 1:
    batch_data = []
    batch_labels = []
    for i in range(int(len(data) / batch_num)):
      minibatch_d = np.array(data[i*batch_num: (i+1)*batch_num])
      minibatch_d = np.reshape(minibatch_d, (batch_num, size[0], size[1]))
      batch_data.append(torch.from_numpy(minibatch_d))
            
      minibatch_l = labels[i*batch_num: (i+1)*batch_num]
      batch_labels.append(torch.LongTensor(minibatch_l))
  print(len(data_names))

  return list(zip(batch_data, batch_labels))

In [29]:
start_time = time.clock()
size = (25,4096)  
training_data = data_features('features/', size, train_data, train_labels, batch_num=10)
testing_data = data_features('features/', size, test_data, test_labels, batch_num=10)
end_time = time.clock()
print("Time Taken:{} ".format(end_time-start_time))

2409
951
Time Taken:7.635409999999979 


**Length of Training and Testing Data**

In [16]:
# Don't hardcode the shape of train and test data
print('The  training data is :{} minibatches(=10) of the total {} data'.format(len(training_data),len(train_data)))
print('Shape of test data is :{} minibatches(=10) of the total {} data'.format(len(testing_data),len(test_data)))

The  training data is :240 minibatches(=10) of the total 2409 data
Shape of test data is :95 minibatches(=10) of the total 951 data


In [32]:
class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.rnn = nn.LSTM(input_size=4096, hidden_size = 64, num_layers = 2, batch_first=True)
        self.fc1 = nn.Linear(64, 25) 
    def forward(self, x):
        output, hidden = self.rnn(x, None) 
        x = self.fc1(output[:, -1, :])
        return x

rnn_model = RNN()
print(rnn_model)
optimizer = optim.SGD(rnn_model.parameters(), lr = 0.001, momentum=0.9) 
cross_entropy_loss = nn.CrossEntropyLoss()


RNN(
  (rnn): LSTM(4096, 64, num_layers=2, batch_first=True)
  (fc1): Linear(in_features=64, out_features=25, bias=True)
)


In [0]:
def svm_data(svm_t_data):
    svm_data = []
    svm_labels = []
    for data in svm_t_data:
        inps, labels = data
        for i in labels:
            svm_labels.append(i)
        for j in inps:
            np_arr = np.array(j.numpy())
            temp = len(np_arr)*len(np_arr[0])
            temp_d = np.reshape(np_arr,temp)
            svm_data.append(temp_d)
        
    return svm_data, svm_labels
svm_train,svm_train_labels = svm_data(training_data)
svm_test,svm_test_labels= svm_data(testing_data)

---
---
## **Problem 3.** Evaluation

In [34]:

def train_model(rnn_model,epoch,training_data,optimizer,cross_entropy_loss):
  start_train = time.clock()
  for epoch in range(epoch): 
    loss = 0
    for i, data in enumerate(training_data, 0):
      inps, labels = data
      inps, labels = Variable(inps), Variable(labels-1)
      optimizer.zero_grad()
      rnn_model.train(True) 
      outs = rnn_model(inps)
      cross_loss = cross_entropy_loss(outs, labels) 
      cross_loss.backward() 
      optimizer.step()
      loss = loss + cross_loss.data
    print('Epoch:{} Loss:{:.2f}'.format(epoch+1,loss/60))
  end_train = time.clock()
  time_1=end_train-start_train
  return time_1
time_1=train_model(rnn_model,50,training_data,optimizer,cross_entropy_loss)
print("Time Taken for Training the Data :{}".format(time_1))

Epoch:1 Loss:13.10
Epoch:2 Loss:12.85
Epoch:3 Loss:12.54
Epoch:4 Loss:12.12
Epoch:5 Loss:11.50
Epoch:6 Loss:10.68
Epoch:7 Loss:9.65
Epoch:8 Loss:8.49
Epoch:9 Loss:7.30
Epoch:10 Loss:6.46
Epoch:11 Loss:5.71
Epoch:12 Loss:5.29
Epoch:13 Loss:4.57
Epoch:14 Loss:4.52
Epoch:15 Loss:4.43
Epoch:16 Loss:4.29
Epoch:17 Loss:4.45
Epoch:18 Loss:3.88
Epoch:19 Loss:3.34
Epoch:20 Loss:2.64
Epoch:21 Loss:2.18
Epoch:22 Loss:1.63
Epoch:23 Loss:1.62
Epoch:24 Loss:1.58
Epoch:25 Loss:1.59
Epoch:26 Loss:1.22
Epoch:27 Loss:1.29
Epoch:28 Loss:0.91
Epoch:29 Loss:0.83
Epoch:30 Loss:0.89
Epoch:31 Loss:1.07
Epoch:32 Loss:2.12
Epoch:33 Loss:1.79
Epoch:34 Loss:1.40
Epoch:35 Loss:1.04
Epoch:36 Loss:0.78
Epoch:37 Loss:1.04
Epoch:38 Loss:1.30
Epoch:39 Loss:0.72
Epoch:40 Loss:0.68
Epoch:41 Loss:0.41
Epoch:42 Loss:0.34
Epoch:43 Loss:0.24
Epoch:44 Loss:0.18
Epoch:45 Loss:0.16
Epoch:46 Loss:0.10
Epoch:47 Loss:0.08
Epoch:48 Loss:0.07
Epoch:49 Loss:0.07
Epoch:50 Loss:0.06
Time Taken for Training the Data :706.7208290000001


In [35]:
def test_model(rnn_model,testing_data):
  running_corrects= 0
  total = 0
  test_start=time.clock()
  for data in testing_data:
    imgs, labels = data
    labels = labels - 1
    rnn_model.train(False)
    outs = rnn_model(Variable(imgs))
    _, predicted = torch.max(outs.data, 1)
    total =total + labels.size(0)
    running_corrects =running_corrects+ (predicted == labels).sum()
  test_end=time.clock()
  test_time=int(test_end-test_start)
  return test_time,running_corrects,total

test_time,running_corrects,total=test_model(rnn_model,testing_data)
print("The accuracy of Model for the test Data is : {}%".format(100*running_corrects/total))
print("Total Testing Time for test data: {:.6f}".format(test_time))

test_time1,running_corrects1,total1=test_model(rnn_model,training_data)
print("The accuracy of Model for the train Data is : {}%".format(100*running_corrects1/total1))
print("Total Testing Time for train data: {:.6f}".format(test_time1))  



The accuracy of Model for the test Data is : 81%
Total Testing Time for test data: 2.000000
The accuracy of Model for the train Data is : 99%
Total Testing Time for train data: 5.000000


**SVM**

In [22]:
svm=LinearSVC()
train_start = time.time()
svm.fit((svm_train),svm_train_labels)
train_end = time.time()

test_start=time.time()
prediction=svm.predict(svm_test)
test_end=time.time()

The accuracy of my Model is : 89.57894736842105%
Total Training Time: 242.238757
Total Testing Time: 0.542239


In [36]:
train_time = train_end-train_start
test_time =  test_end -test_start
accuracy = (np.sum(prediction == svm_test_labels)) / float(len(svm_test_labels))
print("The accuracy of my Model is : {:.2f}%".format(accuracy*100))
print("Total Training Time: {:.6f}".format(train_time))
print("Total Testing Time: {:.6f}".format(test_time))

The accuracy of my Model is : 89.58%
Total Training Time: 242.238757
Total Testing Time: 0.542239


In [37]:
test_start1=time.time()
prediction1=svm.predict(svm_train)
test_end1=time.time()
test_time1 =  test_end1 -test_start1
accuracy1 = (np.sum(prediction1 == svm_train_labels)) / float(len(svm_train_labels))
print("The accuracy of my Model for Test Data is : {:.2f}%".format(accuracy1*100))
print("Total Testing Time For Train Data: {:.4f}".format(test_time1))

The accuracy of my Model for Test Data is : 100.00%
Total Testing Time For Train Data: 1.7612
