# Computer Vision Assignment 1 Part 2
---

Semester: **Fall 2022**

Due date: **September 29th 2022, 11.59PM EST.**

## Introduction
---
This assignment requires you to participate in a Kaggle competition with the rest of the class on the [German Traffic Sign Recognition Benchmark](http://benchmark.ini.rub.de/?section=gtsrb). The objective is to produce a model that gives the highest possible accuracy on the test portion of this dataset. You can register for the competition using the private link: https://www.kaggle.com/c/nyu-computer-vision-csci-ga2271-2022/overview.

Skeleton code is provided in the colab below. This contains code for training a simple default model and evaluating it on the test set. The evaluation script produces a file `gtsrb_kaggle.csv` that lists the IDs of the test set images, along with their predicted label. This file should be uploaded to the Kaggle webpage, which will then produce a test accuracy score. 

Your goal is to implement a new model architecture that improves upon the baseline performance. You are free to implement any approach covered in class or from research papers. This part will count for 50% of the overall grade for assignment 1. This Grading will depend on your Kaggle performance and rank, as well as novelty of the architecture.  

## Rules
---
You should make a copy of this Colab (`File->Save a copy in Drive`). Please start the assignment early and don’t be afraid to ask for help from either the TAs or myself. You are allowed to collaborate with other students in terms discussing ideas and possible solutions. However you code up the solution yourself, i.e. you must write your own code. Copying your friends code and just changing all the names of the variables is NOT ALLOWED! You are not allowed to use solutions from similar assignments in courses from other institutions, or those found elsewhere on the web.

Your solutions should be submitted via the Brightspace system. This should include a brief description (in the Colab) explaining the model architectures you explored, citing any relevant papers or techniques that you used. You should also include convergence plots of training accuracy vs epoch for relevant models. 

## Important Details
---
• You are only allowed 8 submissions to the Kaggle evaluation server per day. This is to prevent over-fitting on the test dataset. So be sure to start the assignment early!

• You are NOT ALLOWED to use the test set labels during training in any way. Doing so will be regarded as cheating and penalized accordingly.

• The evaluation metric is accuracy, i.e. the fraction of test set examples where the predicted label agrees with the ground truth label.

• You should be able to achieve a test accuracy of at least 95% 

• **Extra important:** Please use your NYU NetID as your team name on Kaggle, so the TAs can figure out which user you are on the leaderboard. 

# Dataset Preparation
___

1.  Download `dataset.zip` from the course website to your local machine.
2.  Unzip the file. You should see a `dataset` directory with three subfolders: `training`, `validation`, and `testing`. 
3.  Go to Google Drive (on your NYU account) and make a new directory (say `cv_kaggle_assignment`).
4.  Upload each of the three subfolders to it. 
5.  Run the code block below. It will ask for permission to mount your Google Drive (NYU account) so this colab can access it. Paste the authorization code into the box as requested. 


I had trouble getting this model to converge, not sure what i was doing wrong in the resnet

In [37]:
# Load the Drive helper and mount
from google.colab import drive
drive.mount('/content/drive')
%cd  /content/drive/'My Drive'/cv_kaggle_assignment/

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/My Drive/cv_kaggle_assignment


# Dataloader

In [123]:
import torch
from torch.utils.data import Dataset
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

batch_size = 32
momentum = 0.9
lr = 0.01
epochs = 10
log_interval = 100

class MyDataset(Dataset):

    def __init__(self, X_path="X.pt", y_path="y.pt"):

        self.X = torch.load(X_path).squeeze(1)
        self.y = torch.load(y_path).squeeze(1)
    
    def __len__(self):
        return self.X.size(0)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

train_dataset = MyDataset(X_path="train/X.pt", y_path="train/y.pt")
val_dataset = MyDataset(X_path="validation/X.pt", y_path="validation/y.pt")

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, num_workers=1)

val_loader = torch.utils.data.DataLoader(
    val_dataset, batch_size=batch_size, shuffle=True, num_workers=1)

# Model

In [124]:
from torch.nn.modules.batchnorm import BatchNorm2d
import torch
import torch.nn as nn
import torch.nn.functional as F

nclasses = 43 # GTSRB has 43 classes

class ResBlock(nn.Module):
    def __init__(self,input,output,downsample):
        super().__init__()
        if downsample:
            self.conv1 = nn.Conv2d(input, output, kernel_size=3, stride=2, padding=1) #can mess with the parameters later
            self.skip = nn.Sequential(
                nn.Conv2d(input, output, kernel_size=1, stride=2),
                nn.BatchNorm2d(output)
            )
        else:
            self.conv1 = nn.Conv2d(input, output, kernel_size=3, stride=1, padding=1)
            self.skip = nn.Sequential(
                nn.Identity()
            )

        self.conv2 = nn.Conv2d(output, output, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(output)
        self.bn2 = nn.BatchNorm2d(output)

    def forward(self, input):
        skip_layer = self.skip(input)
        input = F.relu(self.bn1(self.conv1(input)))
        input = F.relu(self.bn2(self.conv2(input)))
        input = input + skip_layer
        return F.relu(input)

class ResBottleneckBlock(nn.Module):
    def __init__(self, input, output, downsample):
        super().__init__()
        self.downsample = downsample
        self.conv1 = nn.Conv2d(input, output//4, kernel_size=1, stride=1)
        self.conv2 = nn.Conv2d(output//4, output//4, kernel_size=3, stride=2 if downsample else 1, padding=1)
        self.conv3 = nn.Conv2d(output//4, output, kernel_size=1, stride=1)

        if self.downsample or input != output:
            self.shortcut = nn.Sequential(
                nn.Conv2d(input, output, kernel_size=1, stride=2 if self.downsample else 1),
                nn.BatchNorm2d(output)
            )
        else:
            self.shortcut = nn.Sequential()

        self.bn1 = nn.BatchNorm2d(output//4)
        self.bn2 = nn.BatchNorm2d(output//4)
        self.bn3 = nn.BatchNorm2d(output)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = F.relu(self.bn1(self.conv1(input)))
        input = F.relu(self.bn2(self.conv2(input)))
        input = F.relu(self.bn3(self.conv3(input)))
        input = input + shortcut
        return F.relu(input)


class ResNet(nn.Module):
    def __init__(self, input, resblock, repeat, useBottleneck=False, final_out = nclasses):
        super().__init__()

        self.conv0 = nn.Conv2d(input,64, 7, 2, 3)
        self.bn0 = nn.BatchNorm2d(64)

        if useBottleneck:
            filters = [64,256,512,1024,2048]
        else:
            filters = [64,64,128,256,512]

        self.layer1 = nn.Sequential()
        self.layer1.add_module('conv2_1', resblock(filters[0], filters[1], downsample=False))
        for i in range(1, repeat[0]):
            self.layer1.add_module('conv2_%d'%(i+1,), resblock(filters[1], filters[1], downsample=False))

        self.layer2 = nn.Sequential()
        self.layer2.add_module('conv3_1', resblock(filters[1], filters[2], downsample=True))
        for i in range(1, repeat[1]):
            self.layer2.add_module('conv3_%d'%(i+1,), resblock(filters[2], filters[2], downsample=False))

        self.layer3 = nn.Sequential()
        self.layer3.add_module('conv4_1', resblock(filters[2], filters[3], downsample=True))
        for i in range(1, repeat[2]):
            self.layer3.add_module('conv2_%d'%(i+1,), resblock(filters[3], filters[3], downsample=False))

        self.layer4 = nn.Sequential()
        self.layer4.add_module('conv5_1', resblock(filters[3], filters[4], downsample=True))
        for i in range(1, repeat[3]):
            self.layer4.add_module('conv3_%d'%(i+1,), resblock(filters[4], filters[4], downsample=False))

        self.gap = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(filters[4],final_out)

    def forward(self, input):
        input = F.relu(self.bn0(F.max_pool2d(self.conv0(input),2)))
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        input = torch.flatten(input,1)
        input = self.fc(input)

        return input


In [125]:
class SkeletonNet(nn.Module):
    def __init__(self):
        super(SkeletonNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(500, 50)
        self.fc2 = nn.Linear(50, nclasses)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 500)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x,dim=1)

In [126]:
class NewerNet(nn.Module):
    def __init__(self):
        super(NewerNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=7)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=7)
        self.conv3 = nn.Conv2d(32,64,kernel_size = 3,padding=1)
        self.conv4 = nn.Conv2d(64,64,kernel_size = 3,padding=1)
        self.fc1 = nn.Linear(64, 50)
        self.fc2 = nn.Linear(50, nclasses)
        self.bn = nn.BatchNorm2d(64)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = F.relu(self.bn(self.conv3(x)))
        x_skip = x
        x = F.relu(self.bn(self.conv4(x)))
        x = F.relu(self.bn(self.conv4(x)))
        x = F.relu(x_skip+x)
        x_skip = x
        x = F.relu(self.bn(self.conv4(x)))
        x = F.relu(self.bn(self.conv4(x)))
        x = F.relu(x_skip+x)
        x = F.relu(F.max_pool2d(self.conv4(x), 2))
        x = torch.flatten(x,1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x,dim=1)

In [127]:
class NewNet(nn.Module):
    def __init__(self):
        super(NewNet, self).__init__()
        #Layer 0
        self.conv_0 = nn.Conv2d(3,64,kernel_size=7, stride = 2)
        self.bn_0 = nn.BatchNorm2d(64)
        #Layer 1
        self.conv_1 = nn.Conv2d(64,64,kernel_size=3)
        self.bn_1 = nn.BatchNorm2d(64)
        #Layer 2
        self.conv_2_s = nn.Conv2d(64,128,kernel_size=1,stride=2)
        self.conv_2_e = nn.Conv2d(64,128,kernel_size=3,stride=2,padding=1)
        self.conv_2 = nn.Conv2d(128,128,kernel_size=3,padding=1)
        self.bn_2 = nn.BatchNorm2d(128)
        #Layer 3
        self.conv_3_s = nn.Conv2d(128,256,kernel_size=1,stride=2)
        self.conv_3_e = nn.Conv2d(128,256,kernel_size=3,stride=2,padding=1)
        self.conv_3 = nn.Conv2d(256,256,kernel_size=3,padding=1)
        self.bn_3 = nn.BatchNorm2d(256)
        #Layer 4
        self.conv_4_s = nn.Conv2d(256,512,kernel_size=1,stride=2)
        self.conv_4_e = nn.Conv2d(256,512,kernel_size=3,stride=2,padding=1)
        self.conv_4 = nn.Conv2d(512,512,kernel_size=3,padding=1)
        self.bn_4 = nn.BatchNorm2d(512)
        #Final Layers
        self.gap = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(512,nclasses)



    def forward(self, x):
        #Layer 0
        x = F.relu(self.bn_0(F.max_pool2d(self.conv_0(x), 2)))

        #Layer 1
        x = F.relu(self.bn_1(self.conv_1(x)))
        x = F.relu(self.bn_1(self.conv_1(x)))
        
        #Layer 2
        x_skip = self.conv_2_s(x)
        x = F.relu(self.bn_2(self.conv_2_e(x)))
        x = F.relu(self.bn_2(self.conv_2(x)))
        x = F.relu(x_skip + x)
        x_skip = x
        x = F.relu(self.bn_2(self.conv_2(x)))
        x = F.relu(self.bn_2(self.conv_2(x)))
        x = F.relu(x_skip + x)

        #Layer 3
        x_skip = self.conv_3_s(x)
        x = F.relu(self.bn_3(self.conv_3_e(x)))
        x = F.relu(self.bn_3(self.conv_3(x)))
        x = F.relu(x_skip + x)
        x_skip = x
        x = F.relu(self.bn_3(self.conv_3(x)))
        x = F.relu(self.bn_3(self.conv_3(x)))
        x = F.relu(x_skip + x)

        #Layer 4
        x_skip = self.conv_4_s(x)
        x = F.relu(self.bn_4(self.conv_4_e(x)))
        x = F.relu(self.bn_4(self.conv_4(x)))
        x = F.relu(x_skip + x)
        x_skip = x
        x = F.relu(self.bn_4(self.conv_4(x)))
        x = F.relu(self.bn_4(self.conv_4(x)))
        x = F.relu(x_skip+x)

        #Final
        x = self.gap(x)
        x = torch.flatten(x,1)
        return F.relu(self.fc(x))

        

# Training

In [128]:
#model = ResNet(3, ResBottleneckBlock, [3,4,6,3], useBottleneck=False, final_out=nclasses)
model = NewerNet()
#model = SkeletonNet()
lr = .01
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
    return loss

def validation():
    model.eval()
    validation_loss = 0
    correct = 0
    for data, target in val_loader:
        output = model(data)
        validation_loss += F.nll_loss(output, target, reduction="sum").item() # sum up batch loss
        pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    validation_loss /= len(val_loader.dataset)
    print('\nValidation set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        validation_loss, correct, len(val_loader.dataset),
        100. * correct / len(val_loader.dataset)))


for epoch in range(1, epochs + 1):
    loss = train(epoch)
    validation()
    model_file = 'model_' + str(epoch) + '.pth'
    torch.save(model.state_dict(), model_file)
    print('\nSaved model to ' + model_file + '.')



Validation set: Average loss: 0.7433, Accuracy: 3077/3870 (80%)


Saved model to model_1.pth.

Validation set: Average loss: 0.7218, Accuracy: 3081/3870 (80%)


Saved model to model_2.pth.

Validation set: Average loss: 0.7475, Accuracy: 3042/3870 (79%)


Saved model to model_3.pth.

Validation set: Average loss: 0.6348, Accuracy: 3204/3870 (83%)


Saved model to model_4.pth.

Validation set: Average loss: 0.7148, Accuracy: 3224/3870 (83%)


Saved model to model_5.pth.

Validation set: Average loss: 0.8014, Accuracy: 3160/3870 (82%)


Saved model to model_6.pth.

Validation set: Average loss: 0.7538, Accuracy: 3149/3870 (81%)


Saved model to model_7.pth.

Validation set: Average loss: 0.5430, Accuracy: 3379/3870 (87%)


Saved model to model_8.pth.

Validation set: Average loss: 0.6373, Accuracy: 3257/3870 (84%)


Saved model to model_9.pth.

Validation set: Average loss: 0.5996, Accuracy: 3271/3870 (85%)


Saved model to model_10.pth.


# Evaluate and Submit to Kaggle



In [129]:
import pickle
import pandas as pd

outfile = 'gtsrb_kaggle.csv'

output_file = open(outfile, "w")
dataframe_dict = {"Filename" : [], "ClassId": []}

test_data = torch.load('testing/test.pt')
file_ids = pickle.load(open('testing/file_ids.pkl', 'rb'))
model.eval() # Don't forget to put your model on eval mode !

for i, data in enumerate(test_data):
    data = data.unsqueeze(0)

    output = model(data)
    pred = output.data.max(1, keepdim=True)[1].item()
    file_id = file_ids[i][0:5]
    dataframe_dict['Filename'].append(file_id)
    dataframe_dict['ClassId'].append(pred)

df = pd.DataFrame(data=dataframe_dict)
df.to_csv(outfile, index=False)
print("Written to csv file {}".format(outfile))

Written to csv file gtsrb_kaggle.csv


# Submitting to Kaggle

Now download the CSV file `grtsrb_kaggle.csv` from your Google drive and then submit it to Kaggle to check the performance of your model.