# CSE527 Homework 4
**Due date: 23:59 on Nov. 5, 2019 (Thuesday)**

In this semester, we will use Google Colab for the assignments, which allows us to utilize resources that some of us might not have in their local machines such as GPUs. You will need to use your Stony Brook (*.stonybrook.edu) account for coding and Google Drive to save your results.

## Google Colab Tutorial
---
Go to https://colab.research.google.com/notebooks/, you will see a tutorial named "Welcome to Colaboratory" file, where you can learn the basics of using google colab.

Settings used for assignments: ***Edit -> Notebook Settings -> Runtime Type (Python 3)***.


## Description
---
This project is an introduction to deep learning tools for computer vision. You will design and train deep convolutional networks for scene recognition using [PyTorch](http://pytorch.org). You can visualize the
structure of the network with [mNeuron] (http://vision03.csail.mit.edu/cnn_art/index.html)

Remember Homework 3: Scene recognition with bag of words. You worked hard to design a bag of features representations that achieved 60% to 70% accuracy (most likely) on 16-way scene classification. We're going to attack the same task with deep learning and get higher accuracy. Training from scratch won't work quite as well as homework 3 due to the insufficient amount of data, fine-tuning an existing network will work much better than homework 3.

In Problem 1 of the project you will train a deep convolutional network from scratch to recognize scenes. The starter codes gives you methods to load data and display them. You will need to define a simple network architecture and add jittering, normalization, and regularization to increase recognition accuracy to 50, 60, or perhaps 70%. Unfortunately, we only have 2,400 training examples so it doesn't seem possible to train a network from scratch which outperforms hand-crafted features

For Problem 2 you will instead fine-tune a pre-trained deep network to achieve about 85% accuracy on the task. We will use the pretrained AlexNet network which was not trained to recognize scenes at all. 

These two approaches represent the most common approaches to recognition problems in computer vision today -- train a deep network from scratch if you have enough data (it's not always obvious whether or not you do), and if you cannot then instead fine-tune a pre-trained network.

There are 2 problems in this homework with a total of 110 points including 10 bonus points. Be sure to read **Submission Guidelines** below. They are important. For the problems requiring text descriptions, you might want to add a markdown block for that.

## Dataset
---
Save the [dataset(click me)](https://drive.google.com/open?id=1NWC3TMsXSWN2TeoYMCjhf2N1b-WRDh-M) into your working folder in your Google Drive for this homework. <br>
Under your root folder, there should be a folder named "data" (i.e. XXX/Surname_Givenname_SBUID/data) containing the images.
**Do not upload** the data subfolder before submitting on blackboard due to size limit. There should be only one .ipynb file under your root folder Surname_Givenname_SBUID.

## Some Tutorials (PyTorch)
---
- You will be using PyTorch for deep learning toolbox (follow the [link](http://pytorch.org) for installation).
- For PyTorch beginners, please read this [tutorial](http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) before doing your homework.
- Feel free to study more tutorials at http://pytorch.org/tutorials/.
- Find cool visualization here at http://playground.tensorflow.org.


## Starter Code
---
In the starter code, you are provided with a function that loads data into minibatches for training and testing in PyTorch.

In [0]:
# import packages here
import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
import random 
import time

import torch
import torchvision
import torchvision.transforms as transforms

from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import time
from torchvision import datasets, models, transforms



In [0]:
# Mount your google drive where you've saved your assignment folder
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
# Set your working directory (in your google drive)
# Note that 'gdrive/My Drive/Y2019Fall/CSE-527-Intro-To-Computer-Vision/hw4' is just an example, 
#   change it to your specific homework directory.
cd '/content/gdrive/My Drive/Duppala_Sai_112684112_hw4'

/content/gdrive/My Drive/Duppala_Sai_112684112_hw4


## Problem 1: Training a Network From Scratch
{Part 1: 35 points} Gone are the days of hand designed features. Now we have end-to-end learning in which a highly non-linear representation is learned for our data to maximize our objective (in this case, 16-way classification accuracy). Instead of 70% accuracy we can now recognize scenes with... 25% accuracy. OK, that didn't work at all. Try to boost the accuracy by doing the following:

**Data Augmentation**: We don't have enough training data, let's augment the training data.
If you left-right flip (mirror) an image of a scene, it never changes categories. A kitchen doesn't become a forest when mirrored. This isn't true in all domains — a "d" becomes a "b" when mirrored, so you can't "jitter" digit recognition training data in the same way. But we can synthetically increase our amount of training data by left-right mirroring training images during the learning process.

After you implement mirroring, you should notice that your training error doesn't drop as quickly. That's actually a good thing, because it means the network isn't overfitting to the 2,400 original training images as much (because it sees 4,800 training images now, although they're not as good as 4,800 truly independent samples). Because the training and test errors fall more slowly, you may need more training epochs or you may try modifying the learning rate. You should see a roughly 10% increase in accuracy by adding mirroring. You are **required** to implement mirroring as data augmentation for this part.

You can try more elaborate forms of jittering -- zooming in a random amount, rotating a random amount, taking a random crop, etc. These are not required, you might want to try these in the bonus part.

**Data Normalization**: The images aren't zero-centered. One simple trick which can help a lot is to subtract the mean from every image. It would arguably be more proper to only compute the mean from the training images (since the test/validation images should be strictly held out) but it won't make much of a difference. After doing this you should see another 15% or so increase in accuracy. This part is **required**.

**Network Regularization**: Add dropout layer. If you train your network (especially for more than the default 30 epochs) you'll see that the training error can decrease to zero while the val top1 error hovers at 40% to 50%. The network has learned weights which can perfectly recognize the training data, but those weights don't generalize to held out test data. The best regularization would be more training data but we don't have that. Instead we will use dropout regularization.

What does dropout regularization do? It randomly turns off network connections at training time to fight overfitting. This prevents a unit in one layer from relying too strongly on a single unit in the previous layer. Dropout regularization can be interpreted as simultaneously training many "thinned" versions of your network. At test, all connections are restored which is analogous to taking an average prediction over all of the "thinned" networks. You can see a more complete discussion of dropout regularization in this [paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf).

The dropout layer has only one free parameter — the dropout rate — the proportion of connections that are randomly deleted. The default of 0.5 should be fine. Insert a dropout layer between your convolutional layers. In particular, insert it directly before your last convolutional layer. Your test accuracy should increase by another 10%. Your train accuracy should decrease much more slowly. That's to be expected — you're making life much harder for the training algorithm by cutting out connections randomly. 

If you increase the number of training epochs (and maybe decrease the learning rate) you should be able to achieve around 50% test accuracy. In this part, you are **required** to add dropout layer to your network.

Please give detailed descriptions of your network layout in the following format:<br>
Data augmentation: [descriptions]<br>
Data normalization: [descriptions]<br>
Layer 1: [layer_type]: [Parameters]<br>
Layer 2: [layer_type]: [Parameters]<br>
...<br>
Then report the final accuracy on test set and time consumed for training and testing separately.

{Part 2: 15 points} Try **three techniques** taught in the class to increase the accuracy of your model. Such as increasing training data by randomly rotating training images, adding batch normalization, different activation functions (e.g., sigmoid) and model architecture modification. Note that too many layers can do you no good due to insufficient training data. Clearly describe your method and accuracy increase/decrease for each of the three techniques.

#PROBLEM 1
#PART I

In [0]:
#load dataset
def load_dataset2(data_path,img_size, batch_num, shuffle, augment, zero_centered, rotate_random):

    #Transform
    transform = transforms.Compose([
        transforms.Resize((img_size,img_size)),
        transforms.ToTensor(),
        #DATA NORMALIZATION
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    
    #ROTATE RANDOM
    if rotate_random == True:
        transform = transforms.Compose([
            transforms.RandomRotation(degrees = (30,-30)),
            transforms.Resize((img_size,img_size)),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
        

    train_dataset = torchvision.datasets.ImageFolder(
        root=data_path,
        transform=transform
    )

    #AUGMENTATION
    if augment == True:
        transform = transforms.Compose([
            transforms.RandomHorizontalFlip(p=1),
            transforms.Resize((img_size,img_size)),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
        #RANDOM ROTATION
        if rotate_random == True:
            transform = transforms.Compose([
                transforms.RandomHorizontalFlip(p=1),
                transforms.RandomRotation(degrees = (30,-30)),
                transforms.Resize((img_size,img_size)),
                transforms.ToTensor(),
                #DATA NORMALIZATION
                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
            
        train_dataset_mirror = torchvision.datasets.ImageFolder(
            root=data_path,
            transform=transform
        )

        train_dataset += train_dataset_mirror


    print("size = ", len(train_dataset))
    train_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=batch_num,
        num_workers=0,
        shuffle=True
    )
    return train_loader

batch_num = 50
img_size = 64



In [0]:
# trainloader_no_augment = load_dataset2('./data/train/', 64, batch_num, shuffle=True, augment=False, zero_centered=True, rotate_random= False)
trainloader = load_dataset2('./data/train/', img_size, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= False)

testloader = load_dataset2('./data/test/', img_size, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= False)

size =  4800
size =  800


In [0]:
classes = ('Forest', 'Industrial', 'Flower','Coast','InsideCity', 'Office', 'Bedroom', 'Highway', 'Street', 'TallBuilding', 'LivingRoom', 'Suburb', 'OpenCountry', 'Mountain', 'Kitchen', 'Store')

In [0]:
import matplotlib.pyplot as plt
import numpy as np

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()


#       Define Network Architecture
## Data augmentation: 
Done in load_dataset2
## Data normalization: 
Done in load_dataset2
##Network
<ul><li>
Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
</li><li>(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
</li><li>(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
</li><li>(fc1): Linear(in_features=2704, out_features=120, bias=True)
</li><li>(fc2): Linear(in_features=120, out_features=84, bias=True)
</li><li>(fc3): Linear(in_features=84, out_features=16, bias=True)
</li><li>(dropout): Dropout(p=0.5, inplace=False)
</li>
</ul>

In [0]:
# ==========================================
#       Define Network Architecture
# Data augmentation: [descriptions]
# Data normalization: [descriptions]
# Layer 1: [layer_type]: [Parameters]
# Layer 2: [layer_type]: [Parameters]
# ==========================================
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        #3 input channeles, 6 output
        #5x5 conv
        self.conv1 = nn.Conv2d(3, 6, 5)
        #maxpooling
        self.pool = nn.MaxPool2d(2, 2)
        #convolve again
        self.conv2 = nn.Conv2d(6, 16, 5)
        #13*13 from image dimension 
        #and 16 layers
        self.fc1 = nn.Linear(16 * 13 * 13, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 16)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # print(x.shape)
        x = x.view(-1, self.num_flat_features(x))
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()

In [0]:
# ==========================================
#         Optimize/Train Network
# ==========================================

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


In [0]:
# ==========================================
#            Evaluating Network
# ==========================================

def EvaluateNetwork(trainloader, optimizer, net, epocs):
    net = net.cuda()
    for epoch in range(epocs):  # loop over the dataset multiple times
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs
            inputs, labels = data
            inputs, labels = inputs.cuda(), labels.cuda()
            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            # print statistics
            running_loss += loss.item()
            if i % 20 == 19:    # print every 10 mini-batches
                print('[%d, %5d] loss: %.3f' %
                    (epoch + 1, i + 1, running_loss / 20))
                running_loss = 0.0

    print('Finished Training')

In [0]:
EvaluateNetwork(trainloader, optimizer, net, 60)

In [0]:
def getAccuracy(testloader, net, outputs):
    correct = 0
    total = 0
    net = net.cuda()
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images, labels = images.cuda(),labels.cuda()
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 400 test images: %d %%' % (
        100 * correct / total))


In [0]:
dataiter = iter(testloader)
images, labels = dataiter.next()
images, labels = images.cuda(), labels.cuda()

output_ = net(images)
_, predicted = torch.max(output_, 1)
getAccuracy(testloader, net, output_)

Accuracy of the network on the 400 test images: 41 %


#PART 2

#RANDOM ROTATION

In [0]:
#RANDOM ROTATION

trainloader_rot_ran = load_dataset2('./data/train/', 64, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= True)

net2 = Net()
criterion = nn.CrossEntropyLoss()
optimizer_1 = optim.SGD(net2.parameters(), lr=0.001, momentum=0.9)

EvaluateNetwork(trainloader_rot_ran,optimizer_1, net2, 60)


In [0]:
testloader_1 = load_dataset2('./data/test/', 64, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= False)

dataiter = iter(testloader_1)
images, labels = dataiter.next()
images, labels = images.cuda(), labels.cuda()

outputs_1 = net2(images)
_, predicted = torch.max(outputs_1, 1)
getAccuracy(testloader_1, net2, outputs_1)

size =  800
Accuracy of the network on the 400 test images: 39 %


#RANDOM ROTATION AND DIFFERENT ACTIVATION FUNCTIONS


In [0]:
#RANDOM ROTATION AND DIFFERENT ACTIVATION FUNCTIONS

class Net_Activation_Functions(nn.Module):
    def __init__(self):
        super(Net_Activation_Functions, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 13 * 13, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 16)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # print(x.shape)
        x = x.view(-1, self.num_flat_features(x))
        x = self.dropout(F.relu6(self.fc1(x)))
        x = self.dropout(F.relu6(self.fc2(x)))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net_AF = Net_Activation_Functions()


In [0]:
trainloader_rot_ran_plus_AF = load_dataset2('./data/train/', 64, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= True)

criterion = nn.CrossEntropyLoss()
optimizer_2 = optim.SGD(net_AF.parameters(), lr=0.001, momentum=0.9)

EvaluateNetwork(trainloader_rot_ran_plus_AF, optimizer_2, net_AF, 60)

In [0]:
testloader_2 = load_dataset2('./data/test/', 64, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= True)

dataiter = iter(testloader_2)
images, labels = dataiter.next()
images, labels = images.cuda(), labels.cuda()

outputs_2 = net_AF(images)
_, predicted = torch.max(outputs_2, 1)
getAccuracy(testloader_2, net_AF, outputs_2)

#RANDOM ROTATION AND BATCH NORMALIZATION


In [0]:
#RANDOM ROTATION AND BATCH NORMALIZATIOn

class Net_Activation_Functions_BN(nn.Module):
    def __init__(self):
        super(Net_Activation_Functions_BN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 13 * 13, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 16)
        self.dropout = nn.Dropout(p=0.5)
        self.bn1 = nn.BatchNorm1d(120)
        self.bn2 = nn.BatchNorm1d(84)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # print(x.shape)
        x = x.view(-1, self.num_flat_features(x))
        x = self.dropout(self.bn1(F.relu(self.fc1(x))))
        x = self.dropout(self.bn2(F.relu(self.fc2(x))))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net_AF_BN = Net_Activation_Functions_BN()

In [0]:
trainloader_rot_ran_plus_AF_plus_BN = load_dataset2('./data/train/', 64, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= True)

criterion = nn.CrossEntropyLoss()
optimizer_3 = optim.SGD(net_AF_BN.parameters(), lr=0.001, momentum=0.9)

EvaluateNetwork(trainloader_rot_ran_plus_AF_plus_BN, optimizer_3, net_AF_BN, 60)

In [0]:
testloader_3 = load_dataset2('./data/test/', 64, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= True)

dataiter = iter(testloader_3)
images, labels = dataiter.next()
images, labels = images.cuda(), labels.cuda()

outputs_3 = net_AF_BN(images)
_, predicted = torch.max(outputs_3, 1)
getAccuracy(testloader_3, net_AF_BN, outputs_3)

size =  800
Accuracy of the network on the 400 test images: 54 %


## Problem 2: Fine Tuning a Pre-Trained Deep Network
{Part 1: 30 points} Our convolutional network to this point isn't "deep". Fortunately, the representations learned by deep convolutional networks is that they generalize surprisingly well to other recognition tasks. 

But how do we use an existing deep network for a new recognition task? Take for instance,  [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) network has 1000 units in the final layer corresponding to 1000 ImageNet categories.

**Strategy A**: One could use those 1000 activations as a feature in place of a hand crafted feature such as a bag-of-features representation. You would train a classifier (typically a linear SVM) in that 1000 dimensional feature space. However, those activations are clearly very object specific and may not generalize well to new recognition tasks. It is generally better to use the activations in slightly earlier layers of the network, e.g. the 4096 activations in the last 2nd fully-connected layer. You can often get away with sub-sampling those 4096 activations considerably, e.g. taking only the first 200 activations. 

**Strategy B**: *Fine-tune* an existing network. In this scenario you take an existing network, replace the final layer (or more) with random weights, and train the entire network again with images and ground truth labels for your recognition task. You are effectively treating the pre-trained deep network as a better initialization than the random weights used when training from scratch. When you don't have enough training data to train a complex network from scratch (e.g. with the 16 classes) this is an attractive option. Fine-tuning can work far better than Strategy A of taking the activations directly from an pre-trained CNN. For example, in [this paper](http://www.cc.gatech.edu/~hays/papers/deep_geo.pdf) from CVPR 2015, there wasn't enough data to train a deep network from scratch, but fine tuning led to 4 times higher accuracy than using off-the-shelf networks directly.

You are required to implement **Strategy B** to fine-tune a pre-trained **AlexNet** for this scene classification task. You should be able to achieve performance of 85% approximately. It takes roughly 35~40 minutes to train 20 epoches with AlexNet.

Please provide detailed descriptions of:<br>
(1) which layers of AlexNet have been replaced<br>
(2) the architecture of the new layers added including activation methods (same as problem 1)<br>
(3) the final accuracy on test set along with time consumption for both training and testing <br>

{Part 2: 20 points} Implement Strategy A where you use the activations of the pre-trained network as features to train one-vs-all SVMs for your scene classification task. Report the final accuracy on test set along with time consumption for both training and testing.

{Bonus: 10 points} Bonus will be given to those who fine-tune the [VGG network](https://pytorch.org/docs/stable/_modules/torchvision/models/vgg.html) [paper](https://arxiv.org/pdf/1409.1556.pdf) and compare performance with AlexNet. Explain why VGG performed better or worse.

**Hints**:
- Many pre-trained models are available in PyTorch at [here](http://pytorch.org/docs/master/torchvision/models.html).
- For fine-tuning pretrained network using PyTorch, please read this [tutorial](http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html).

In [0]:
# reload data with a larger size
img_size_alex = 224
batch_num = 50

trainloader_alex = load_dataset2('./data/train/', img_size_alex, batch_num, shuffle=True, augment=True, zero_centered=True, rotate_random= False)
testloader_alex = load_dataset2('./data/test/', img_size_alex, batch_num, shuffle=True, augment=False, zero_centered=True, rotate_random= False)

#AlexNet Changes
From the last but 2 connected layer at last which gives 4096*2 outputs. I've made a Final Connected layer to 128 outputs. -> ReLU -> Final connected with output = 16

Architecture:
<ul>
<li>(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
</li><li>(1): ReLU(inplace=True)
</li><li>(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1,ceil_mode=False)
</li><li>(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
</li><li>(4): ReLU(inplace=True)
</li><li>(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
</li><li>(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
</li><li>(7): ReLU(inplace=True)
</li><li>(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
</li><li>(9): ReLU(inplace=True)
</li><li>(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
</li><li>(11): ReLU(inplace=True)
</li><li>(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
</li><li>And in addition
</li><li>(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
</li><li>(0): Linear(in_features=9216, out_features=128, bias=True)
</li><li>(1): ReLU(inplace=True)
</li><li>(2): Linear(in_features=128, out_features=16, bias=True)
</ul>

In [0]:
# ==========================================
#       Fine-Tune Pretrained Network
# ==========================================

num_classes = len(classes)
alexNet = models.alexnet(pretrained=True) 
alexNet.classifier  = nn.Sequential(
            nn.Linear(256 * 6 * 6, 128),
            nn.ReLU(inplace=True),
            nn.Linear(128, num_classes),
)

criterion = nn.CrossEntropyLoss()
optimizer_alex = optim.SGD(alexNet.parameters(), lr=0.001, momentum=0.9)

t11 = time.time()
# Train and evaluate
EvaluateNetwork(trainloader_alex, optimizer_alex, alexNet, 20)
t21 = time.time()

In [0]:
print("Time Taken using GPU = ", int(t21-t11), "seconds")

Time Taken using GPU =  408 seconds


In [0]:
dataiter = iter(testloader_alex)
images, labels = dataiter.next()
images, labels = images.cuda(), labels.cuda()


output_alex = alexNet(images)
_, predicted = torch.max(output_alex, 1)
getAccuracy(testloader_alex, alexNet, output_alex)

Accuracy of the network on the 400 test images: 85 %


##Part 2

In [0]:

alexNet_2 = models.alexnet(pretrained=True) 

alexNet_2.classifier  = nn.Sequential(
            nn.Linear(256 * 6 * 6, 128),
)

In [0]:
batch_size = 2400
img_size_alex = 224
trainloader_svm = load_dataset2('./data/train/', img_size_alex, 2400, shuffle=True, augment=False, zero_centered=True, rotate_random= False)
testloader_svm = load_dataset2('./data/test/', img_size_alex, 400, shuffle=True, augment=False, zero_centered=True, rotate_random= False)

size =  2400
size =  400


In [0]:

def GetFeatures(trainloader, optimizer, net):
    net = net.cuda()
    features = []
    running_loss = 0
    for i, data in enumerate(trainloader, 0):
        # get the inputs

        inputs, labels = data
        inputs, labels = inputs.cuda(), labels.cuda()
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        # print statistics
        running_loss += loss.item()
        if i % 20 == 19:    # print every 10 mini-batches
            print('[%d, %5d] loss: %.3f' %
                (i + 1, running_loss / 20))
            running_loss = 0.0

        features.append(outputs)
    print('Finished Training')
    return features

In [0]:
criterion = nn.CrossEntropyLoss()
optimizer_svm = optim.SGD(alexNet_2.parameters(), lr=0.001, momentum=0.9)


features = GetFeatures(trainloader_svm, optimizer_svm, alexNet_2)

Finished Training


In [0]:
features_ = []
for e in features:
    features_.append(e.cpu().detach().numpy())


In [0]:
test_features = GetFeatures(testloader_svm, optimizer_svm, alexNet_2)

Finished Training


In [0]:
test_features_ = []
for e in test_features:
    test_features_.append(e.cpu().detach().numpy())


In [0]:
dataiter = iter(trainloader_svm)
train_images, train_labels = dataiter.next()

dataiter = iter(testloader_svm)
test_images, test_labels = dataiter.next()

In [0]:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.utils.testing import ignore_warnings
from sklearn.exceptions import ConvergenceWarning

@ignore_warnings(category=ConvergenceWarning)
def onevsrestModel(features, labels):
    model3 = OneVsRestClassifier(LinearSVC( max_iter=100000))
    model3.fit(features, labels)
    return model3

t31 = time.time()

#save svm model
svm_model = onevsrestModel(features_[0], train_labels)


#make predictions
predictions = svm_model.predict(test_features_[0])

accuracy = sum(np.array(predictions) == test_labels.numpy()) /len(test_features_[0])

t32 = time.time()

print("Accuracy of one-vs-all SVMs) =", accuracy*100)
print("Time Taken = ", t32-t31)

Accuracy of one-vs-all SVMs) = 5.25
Time Taken =  34.47607707977295


##VGG

In [0]:
torch.cuda.empty_cache()

In [0]:
vgg16Net = models.vgg16(pretrained=True) 
vgg16Net.classifier  = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 16),
)

criterion = nn.CrossEntropyLoss()
optimizer_vgg16 = optim.SGD(vgg16Net.parameters(), lr=0.001, momentum=0.9)

t11 = time.time()
# Train and evaluate
EvaluateNetwork(trainloader_alex, optimizer_vgg16, vgg16Net, 20)
t21 = time.time()

In [0]:
print("Time taken on VGG = ", int(t21 - t11), "seconds")

Time taken for VGG =  1185 seconds


In [0]:
dataiter = iter(testloader_alex)
images, labels = dataiter.next()
images, labels = images.cuda(), labels.cuda()


output_vgg16 = vgg16Net(images)
_, predicted = torch.max(output_vgg16, 1)
getAccuracy(testloader_alex, vgg16Net, output_vgg16)

Accuracy of the network on the 400 test images: 90 %


VGG is similar to AlexNet but, only 3x3 convolutions and lots of them. VGG is more complex net than alexnet and therefore it performs considerably better.

2 3x3 convolution = 18 calculations
1 5x5 = 25 calculations

## Submission guidelines
---
Extract the downloaded .zip file to a folder of your preference. The input and output paths are predefined and **DO NOT** change them, (we assume that 'Surname_Givenname_SBUID_hw4' is your working directory, and all the paths are relative to this directory).  The image read and write functions are already written for you. All you need to do is to fill in the blanks as indicated to generate proper outputs. **DO NOT** zip and upload the dataset on blackboard due to size limit.

When submitting your .zip file through blackboard, please
-- name your .zip file as **Surname_Givenname_SBUID_hw*.zip**.

This zip file should include:
```
Surname_Givenname_SBUID_hw*
        |---Surname_Givenname_SBUID_hw*.ipynb
        |---Surname_Givenname_SBUID_hw*.py
        |---Surname_Givenname_SBUID_hw*.pdf
```
where Surname_Givenname_SBUID_hw*.py is the Python code of Surname_Givenname_SBUID_hw*.ipynb, which can be dowloaded by File->Download .py.

For instance, student Michael Jordan should submit a zip file named "Jordan_Michael_111134567_hw4.zip" for homework4 in this structure:
```
Jordan_Michael_111134567_hw4
        |---Jordan_Michael_111134567_hw4.ipynb
        |---Jordan_Michael_111134567_hw4.py
        |---Jordan_Michael_111134567_hw4.pdf
```

The **Surname_Givenname_SBUID_hw*.pdf** should include a **google shared link** and **Surname_Givenname_SBUID_Pred*.pdf** should be your test set prediction file in the specified format. To generate the **google shared link**, first create a folder named **Surname_Givenname_SBUID_hw*** in your Google Drive with your Stony Brook account. The structure of the files in the folder should be exactly the same as the one you downloaded. If you alter the folder structures, the grading of your homework will be significantly delayed and possibly penalized.

Then right click this folder, click ***Get shareable link***, in the People textfield, enter two TA's emails: ***bo.cao.1@stonybrook.edu*** and ***sayontan.ghosh@stonybrook.edu***. Make sure that TAs who have the link **can edit**, ***not just*** **can view**, and also **uncheck** the **Notify people** box.

Colab has a good feature of version control, you should take advantage of this to save your work properly. However, the timestamp of the submission made in blackboard is the only one that we consider for grading. To be more specific, we will only grade the version of your code right before the timestamp of the submission made in blackboard. 

You are encouraged to post and answer questions on Piazza. Based on the amount of email that we have received in past years, there might be dealys in replying to personal emails. Please ask questions on Piazza and send emails only for personal issues.

Be aware that your code will undergo plagiarism check both vertically and horizontally. Please do your own work.

**Late submission penalty:** <br>
There will be a 10% penalty per day for late submission. However, you will have 4 days throughout the whole semester to submit late without penalty. Note that the grace period is calculated by days instead of hours. If you submit the homework one minute after the deadline, one late day will be counted. Likewise, if you submit one minute after the deadline, the 10% penaly will be imposed if not using the grace period.

<!--Write your report here in markdown or html-->
