# Computer Vision and Pattern Recognition - Project 3 (CNN classifier)
#### Gaia Marsich [SM3500600]

* [Introduction](#intro)
* [1. Task 1](#1-bullet)
* [2. Task 2](#2-bullet)
* [3. Task 3](#3-bullet)
* [References](#ref)

## Introduction <a class="anchor" id="#intro"></a>

This project requires the implementation of an image classifier based on convolutional neural networks. The provided dataset (from [Lazebnik et al., 2006]), contains 15 categories (office, kitchen, living room, bedroom, store, industrial, tall building, inside city, street, highway, coast, open country, mountain, forest, suburb), and is already divided in training set and test set.

First of all, let's do the imports:

In [7]:
import torch
from torch import nn

import os
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import random_split, DataLoader
import torch.optim as optim
import torch.nn.init as init

# from torch.utils.data import DataLoader
# import torchvision
# import matplotlib.pyplot as plt
# import numpy as np
# from datetime import datetime
# import copy

## Task 1 <a class="anchor" id="#1-bullet"></a>

In [2]:
# set a seed for reproducibility
torch.manual_seed(10)

# Build the network

class CNN(nn.Module):

    # A model will have an __init__() function, where it instantiates its layers

    def __init__(self):
        super(CNN, self).__init__() # the constructor of the parent class (nn.Module) is called to initialize the model properly.

        # Convolutional layer 1: in_channels=1 since the images are in greyscale
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=3, stride=1) # from [1] we get the formula: output = ((input - kernel_size + 2*padding)/stride) + 1 => 62*62
        # ReLU activation after conv1
        self.relu1 = nn.ReLU() # output: 62*62 #TODO OK
        # Max pooling layer 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2) # output: 31*31 (from 62/2)

        # Convolutional layer 2
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, stride=1) # from [1] we know that output: 29*29
        # ReLU activation after conv2
        self.relu2 = nn.ReLU() # output: 29*29 #TODO OK
        # Max pooling layer 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2) # output: 14*14 (from the test in dim_images.ipynb)

        # Convolutional layer 3
        self.conv3 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1) # from [1] we know that output: 12*12
        # ReLU activation after conv3  
        self.relu3 = nn.ReLU() # output: 12*12

        # Fully connected layer. 32: number of channels; 12, 12: height and width of the feature map
        self.fc = nn.Linear(32 * 12 * 12, 15) #TODO OK
        # Classification layer
        #self.output = nn.CrossEntropyLoss() #TODO: ma è giusto da mettere? Al momento è tolto

        self.initialize_weights()




    def initialize_weights(self):       #TODO: write comments here
        for module in self.modules():
            if isinstance(module, nn.Conv2d) or isinstance(module, nn.Linear):
                init.normal_(module.weight, mean=0, std=0.01)
                if module.bias is not None:
                    init.constant_(module.bias, 0)




    # A model will have a forward() function
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)

        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)

        x = self.conv3(x)
        x = self.relu3(x)

        x = x.view(-1, 32 * 12 * 12)  # flatten the tensor before passing to fully connected layers (the size -1 is inferred from other dimensions)
        
        x = self.fc(x)
        #x = self.output(x) #TODO: in caso, da eliminare

        return x

In [6]:
# Split the provided training set in 85% for actual training set and 15% to be used as validation set

resized_train_path = '/Users/Gaia/Desktop/CVPR-project/CVPR-project/resized/train'

dataset_train = ImageFolder(root=resized_train_path, transform=transforms.ToTensor())

train_size = int(0.85 * len(dataset_train))
val_size = len(dataset_train) - train_size

train_dataset, val_dataset = random_split(dataset_train, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) # batch_size=32 required by the project
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=True) # batch_size=32 required by the project

In [None]:
# Instantiate the model
model = CNN()


# Training

loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), momentum=0.9) # the momentum by default is 0, but I need it different from 0




def train_one_epoch(epoch_index, loader): # to train for just one epoch (one epoch: the network sees the whole training set)
    running_loss = 0.

    for i, data in enumerate(loader):

        inputs, labels = data # get the minibatch

        outputs = model(inputs) # forward pass

        loss = loss_function(outputs, labels) # compute the loss
        running_loss += loss.item() # sum up the loss for the minibatches processed so far

        optimizer.zero_grad() # notice that by default, the gradients are accumulated, hence we need to set them to zero
        loss.backward() # backward pass
        optimizer.step() # update the weights

    return running_loss/(i+1) # average loss per minibatch


# Training loop
EPOCHS = 5

print('Training loop...')
for epoch in range(EPOCHS):
    train_loss = train_one_epoch(epoch,cifar2_train_loader)
    print(f'Epoch [{epoch + 1}/{EPOCHS}], Loss: {train_loss:.3f}')

## Task 2 <a class="anchor" id="#2-bullet"></a>

## Task 3 <a class="anchor" id="#3-bullet"></a>

## References <a class="anchor" id="ref"></a>

[1] https://dingyan89.medium.com/calculating-parameters-of-convolutional-and-fully-connected-layers-with-keras-186590df36c6