# CE-40719: Deep Learning
## HW3 - CNN / CNN Case Studies / CNN Applications
(23 points)

#### Name: Sadroddin Barikbin
#### Student No.: 98208824

In this assignment we go through the following topics:
- Writing custome pytorch modules
- Using `tensorboard` for logging and visualization
- Data Augmentation
- Saving / Loading Models

Please Keep in mind that:
- You can not use out-of-the-box pytorch modules (nn.Conv2d, nn.Linear, nn.BatchNorm, nn.Dropout, ...)
- You can run this notebook on your computer. If you prefer using Google Colab you may lose some of the functionalities of `tensorboard` (like Projector). You can install `tensorboard` on your computer using package manager of your choice, and download `runs` folder from Google Colab and run it locally using `tensorboard --logdir=runs`.
- Use the [documentation](https://pytorch.org/docs/stable/index.html).

In this assignment we are going to train a convolutional neural network to classify images from [fashion-mnist](https://github.com/zalandoresearch/fashion-mnist) dataset. Fashion-mnist is a simple dataset containing 60000 training and 10000 test $28 \times 28$ grayscale images of 10 different classes. Each class corresponds to a different kind of clothing. 

## 1. Setup (1.5 pts)

In [0]:
import torch
from torch import nn, optim
from torch.nn import functional as F
from torchvision import datasets, transforms, utils
from torch.utils.data import DataLoader

%load_ext tensorboard
%tensorflow_version 2.x
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile

from torch.utils.tensorboard import SummaryWriter

In [0]:
print(torch.__version__)

1.4.0


To easily train your model on different gpu devices or your computer's cpu you can define a `torch.device` object corresponding to that device and use `.to(device)` method to easily move modules or tensors to different devices. Pytorch provides helper functions in [torch.cuda](https://pytorch.org/docs/stable/cuda.html) package.

In [2]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# create a cpu device if cuda is not available or cuda_device=None otherwise
# create a cuda:{cuda_device} device.
#################################################################################
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#################################################################################
#                                   THE END                                     #
#################################################################################
print(device)

cuda:0


Fashion-mnist dataset is available in `torchvision.datasets` package.

In [3]:
batch_size = 32
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# Initialize and download trainset and testset with datasets.FashionMNIST and 
# transform data into torch.Tensor. Initialize trainloader and testloader with 
# given batch_size.
#################################################################################
from torch.utils.data.sampler import SubsetRandomSampler
transform = transforms.Compose([transforms.ToTensor()])

trainset = datasets.FashionMNIST(root='./data', train=True,download=True, transform=transform)
indices=list(range(len(trainset)))
#np.random.shuffle(indices)
train_sampler = SubsetRandomSampler(indices)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, sampler=train_sampler)

testset = datasets.FashionMNIST(root='./data', train=False,download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size)

#################################################################################
#                                   THE END                                     #
#################################################################################
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz



HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw
Processing...
Done!


To get a sense of data it is always helpfull to see a few of samples. We can do this using `tensorboard`. Please read [this](https://pytorch.org/docs/stable/tensorboard.html) documentation page to get familiar with tensorboard. Run the following cell to intialize a SummaryWriter and log some of the training images to tensorboard. You can run tensorboard using `tensorboard --logdir=runs` and view images.

In [0]:
writer = SummaryWriter('./runs/FashionMNIST')



dataiter = iter(trainloader)
images, labels = dataiter.next()

img_grid = utils.make_grid(images[:16], nrow=4)

writer.add_image('FashionMNIST', img_grid)

We can also visualize data (or any representation of it) using dimmensionality reduction techniques provided by tensorboard. The following cell adds raw pixel values as embeddings to visualize data. You can see visualizations in projector tab of tensorboard after running the following cell.

In [0]:
def select_n_random(data, labels, n=100):
    perm = torch.randperm(len(data))
    return data[perm][:n], labels[perm][:n]

nimages, nlabels = select_n_random(trainset.data, trainset.targets)


writer.add_image('FashionMNIST', img_grid)
writer.add_embedding(nimages.view(-1, 28 * 28), 
                     metadata=[classes[label] for label in nlabels],
                     label_img=nimages.unsqueeze(1), 
                     tag='raw_pixels')
writer.flush()

## 2. Modules (7 pts)

In this part you will define all the required modules for a convolutional model. You can only use functional package `torch.nn.functional` unless stated otherwise.

### 2.1 Convolution Module (1.5 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define convolution parameters using nn.Parameter.
# initialize weihgt using nn.init.kaiming_uniform and bias by zeroes
# use F.conv2d in forward method.
#################################################################################
class Conv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True):
        super(Conv2d, self).__init__()
        self.W=torch.empty(out_channels,in_channels,*kernel_size)
        nn.init.kaiming_uniform_(self.W)
        self.W=nn.Parameter(self.W)
        if bias:
            self.b=nn.Parameter(torch.zeros(out_channels))
        self.bias=bias
        self.padding=padding
        self.stride=stride

    def forward(self, x):
        out = F.conv2d(x,self.W,bias=self.b if self.bias else None,stride=self.stride,padding=self.padding)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

### 2.2 Linear (Fully-connected) Module (1.5 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define parameters using nn.Parameter.
# initialize weihgt using nn.init.kaiming_uniform and bias by zeroes
# use F.linear in forward method.
#################################################################################
class Linear(nn.Module):
    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.W=torch.empty(out_features,in_features)
        nn.init.kaiming_uniform_(self.W)
        self.W=nn.Parameter(self.W)
        if bias:
            self.b=nn.Parameter(torch.zeros(out_features))
        self.bias=bias

    def forward(self, x):
        out = F.linear(x,self.W,bias=self.b if self.bias else None)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

### 2.3 1D Batch Normalization Module (2 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define and intitialize running_mean and running_var by zeroes and ones
# respectively.
# define weight and bias using nn.Parameter. initialize weights to a 
# normal distribution (std=1) and bias to zero.
# use F.batch_norm in forward method.
# use self.training to differ between training and test phase.
#################################################################################
class BatchNorm(nn.Module):
    def __init__(self, num_features):
        super(BatchNorm, self).__init__()
        self.W=nn.Parameter(torch.randn(num_features))
        self.b=nn.Parameter(torch.randn(num_features))
        self.running_mean=torch.zeros(num_features).to(device)
        self.running_var=torch.ones(num_features).to(device)
    def forward(self, x):
        out = F.batch_norm(x,self.running_mean,self.running_var,weight=self.W,bias=self.b,training=self.training)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

### 2.4 2D Batch Normalization Module (2 pts)

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# define and intitialize running_mean and running_var by zeroes and ones
# respectively.
# define weight and bias using nn.Parameter. initialize weights to a
# normal distribution (std=1) and bias to zero.
# use F.batch_norm in forward method.
# use self.training to differ between training and test phase.
# more info on 2d batch normalization:
# https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-neural-network
#################################################################################
class BatchNorm2d(nn.Module):
    def __init__(self, num_features):
        super(BatchNorm2d, self).__init__()
        self.W=nn.Parameter(torch.randn(num_features))
        self.b=nn.Parameter(torch.randn(num_features))
        self.running_mean=torch.zeros(num_features).to(device)
        self.running_var=torch.ones(num_features).to(device)
    def forward(self, x):
        out = F.batch_norm(x,self.running_mean,self.running_var,weight=self.W,bias=self.b,training=self.training)
        return out
#################################################################################
#                                   THE END                                     #
#################################################################################

## 3. Model (3.5 pts)

Using the modules defined in previous part define the following model:

`[Conv2d(3, 3), channels=8, stride=1, padding=1] > [BatchNorm2d] > [relu]`

`[Conv2d(5, 5), channels=16, stride=1, padding=0] > [BatchNorm2d] > [relu] > [max_pool2d(2, 2), stride=(2, 2), padding=0]`

`[Conv2d(5, 5), channels=32, stride=1, padding=0] > [BatchNorm2d] > [relu] > [max_pool2d(2, 2), stride=(2, 2), padding=0]`

`[Linear(128)] > [BatchNorm] > [relu]`

`[Linear(64)] > [BatchNorm] > [relu]`        __(features)__

`[Linear(10)]`

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################

class Model(nn.Module):
    def __init__(self, dropout=None):
        super(Model, self).__init__()
        self.dropout=dropout
        self.conv1=Conv2d(1,8,(3,3),padding=1)
        self.batch1=BatchNorm2d(8)
        self.conv2=Conv2d(8,16,(5,5))
        self.batch2=BatchNorm2d(16)
        self.maxpool1=nn.MaxPool2d(2)
        self.conv3=Conv2d(16,32,(5,5))
        self.batch3=BatchNorm2d(32)
        self.maxpool2=nn.MaxPool2d(2)
        self.line1=Linear(32*4*4,128)
        self.batch4=BatchNorm(128)
        self.line2=Linear(128,64)
        self.batch5=BatchNorm(64)
        self.line3=Linear(64,10)
    def forward(self, x):
        x=F.relu(self.batch1(self.conv1(x)))
        x=self.maxpool1(F.relu(self.batch2(self.conv2(x))))
        x=self.maxpool2(F.relu(self.batch3(self.conv3(x))))
        x=self.line1(torch.flatten(x,1))
        if self.dropout is not None:
            x=F.dropout(x,p=self.dropout,training=self.training)
        x=self.line2(F.relu(self.batch4(x)))
        if self.dropout is not None:
            x=F.dropout(x,p=self.dropout,training=self.training)
        features=F.relu(self.batch5(x))
        out=self.line3(features)
        return out, features
#################################################################################
#                                   THE END                                     #
#################################################################################

## 4. Training the Model (5 pts)

In [0]:
def train(model, optimizer, trainloader, testloader, device, num_epoches, label):
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# write the main training loop procedure:
# move data to defined device
# zero_grad optimizer
# forward
# compute loss using F.cross_entropy
# backward
# step the optimizer
# accumulate running loss
#################################################################################
    model.to(device)
    for epoch in range(num_epoches):
        print('EPOCH {:2d}:'.format(epoch + 1))
        running_loss = 0.
        for i, (x, y) in enumerate(trainloader):
            inputs, labels = x.to(device), y.to(device)
            optimizer.zero_grad()
            out,_ = model(inputs)
            loss = F.cross_entropy(out, labels)
            loss.backward()
            optimizer.step()  
            running_loss += loss.item()
#################################################################################
#                                   THE END                                     #
#################################################################################
       
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# compute test loss:
# dont forget to change model mode from train to eval
# write the code in a with torch.no_grad() block to prevent computing and 
# accumulating gradients
# accumulate loss in test_loss variable
#################################################################################
            if i % 100 == 99:
                test_loss = 0.
                with torch.no_grad():
                    for data in testloader:
                        images, labels = data[0].to(device), data[1].to(device)
                        out,_ = model(images)
                        loss = F.cross_entropy(out, labels)
                        test_loss+=loss.item()
#################################################################################
#                                   THE END                                     #
#################################################################################
                writer.add_scalars('loss/'+label, 
                                   {'train': running_loss/100, 'test': test_loss/len(testloader)},
                                  global_step=epoch * len(trainloader) + i + 1)
                writer.flush()
                print('\titeration {:4d}: training_loss = {:5f}, test_loss = {:5f}'.format(i + 1, running_loss/100, test_loss/len(testloader)))
                running_loss = 0.
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# compute test accuracy:
# dont forget to change model mode from train to eval
# write the code in a with torch.no_grad() block to prevent computing and 
# accumulating gradients
# accumulate number of correct predictions in correct variable and total test
# samples in total variable
# accumulate number of classwise correct predictions in class_correct list 
# and total classwise test samples in class_total list
#################################################################################
        with torch.no_grad():
            correct = 0
            total = 0
            
            class_correct = [0.] * 10
            class_total = [0.] * 10
            for data in testloader:
                images, labels = data[0].to(device), data[1].to(device)
                out,_ = model(images)
                _, predicted = torch.max(out.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()                
                for i in range(len(labels)):
                    class_total[labels[i]]+=1
                    if predicted[i]==labels[i]:
                        class_correct[labels[i]]+=1
#################################################################################
#                                   THE END                                     #
#################################################################################
        writer.add_scalars('accuracy/'+label, {'test': correct / total},
                           global_step=(epoch + 1) * len(trainloader))
        print('test_accuracy = {:5f}'.format(correct / total))
        
        writer.add_scalars('classwise_accuracy/'+label, 
                           {classes[i]: class_correct[i]/class_total[i] for i in range(10)},
                           global_step=(epoch + 1) * len(trainloader))
        for i in range(10):
            print('  >> {:11s}: {:5f}'.format(classes[i], class_correct[i]/class_total[i]))
            
        writer.flush()
        torch.save(model.state_dict(), './model_{}.chkpt'.format(label))

In [12]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# initilize model and train for 10 epoches using Adam optimizer
#################################################################################
num_epoches = 10
model=Model()
optimizer=optim.Adam(model.parameters())

train(model, optimizer, trainloader, testloader, device, num_epoches, 'base')
#################################################################################
#                                   THE END                                     #
#################################################################################

EPOCH  1:
	iteration  100: training_loss = 1.141327, test_loss = 0.792359
	iteration  200: training_loss = 0.689374, test_loss = 0.645509
	iteration  300: training_loss = 0.594193, test_loss = 0.551898
	iteration  400: training_loss = 0.532206, test_loss = 0.516130
	iteration  500: training_loss = 0.469848, test_loss = 0.504271
	iteration  600: training_loss = 0.466767, test_loss = 0.467948
	iteration  700: training_loss = 0.425413, test_loss = 0.433405
	iteration  800: training_loss = 0.416112, test_loss = 0.424147
	iteration  900: training_loss = 0.415300, test_loss = 0.422329
	iteration 1000: training_loss = 0.389058, test_loss = 0.402099
	iteration 1100: training_loss = 0.381735, test_loss = 0.407317
	iteration 1200: training_loss = 0.391601, test_loss = 0.388939
	iteration 1300: training_loss = 0.378454, test_loss = 0.372811
	iteration 1400: training_loss = 0.353905, test_loss = 0.366897
	iteration 1500: training_loss = 0.353656, test_loss = 0.367796
	iteration 1600: training_loss

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# add model features corresponding to nimages as embedding to tensorboard
#################################################################################
writer = SummaryWriter('./runs/FashionMNIST_Embeddings')
loader=torch.utils.data.DataLoader(trainset, batch_size=64)
dataiter = iter(loader)
images, labels = dataiter.next()
images,labels=images.to(device),labels.to(device)
_,images=model(images)
writer.add_embedding(images, 
                     metadata=[classes[label] for label in labels],
                     tag='raw_pixels2')
writer.flush()
#################################################################################
#                                   THE END                                     #
#################################################################################

## 5. Dropout and Data Augmentation (6 pts)

Add dropout with p=0.5 to first two linear layers of the model using `F.dropout`. You can either modify the model module to take an additional parameter or write a seperate module. 

Data Augmentation is a strategy for increasing dataset size to prevent overfitting and better generalization. Dataset can be augmented by any transformation on data that do not change its label.

Pytorch provides data augmentation transforms in `torchvision.transforms` package.

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# compose a transform using transforms.Compose that horizontally flips images 
# and use transforms.RandomResizedCrop to crop a 20 * 20 patch of the image 
# and resizing back to 28 * 28
#################################################################################
transform = transforms.Compose([transforms.RandomHorizontalFlip(),transforms.RandomResizedCrop((28,28),scale=(0.8,0.8)),transforms.ToTensor()])
trainset = datasets.FashionMNIST(root='./data', train=True,download=True, transform=transform)
testset = datasets.FashionMNIST(root='./data', train=False,download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size)

#################################################################################
#                                   THE END                                     #
#################################################################################

In [0]:
dataiter = iter(trainloader)
images, labels = dataiter.next()
images,labels=images.to(device),labels.to(device)
img_grid = utils.make_grid(images[:16], nrow=4)

writer.add_image('FashionMNIST/augmented', img_grid)
writer.flush()

In [17]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# initilize model and train using augmented dataset for 10 epoches
#################################################################################
model2=Model(dropout=0.5)
optimizer=optim.Adam(model2.parameters())

train(model2, optimizer, trainloader, testloader, device, 10, 'dropout')
writer.add_graph(model2, images)
#################################################################################
#                                   THE END                                     #
#################################################################################

EPOCH  1:
	iteration  100: training_loss = 2.252083, test_loss = 1.867695
	iteration  200: training_loss = 1.661326, test_loss = 1.491312
	iteration  300: training_loss = 1.406326, test_loss = 1.315079
	iteration  400: training_loss = 1.229969, test_loss = 1.159420
	iteration  500: training_loss = 1.119151, test_loss = 1.101767
	iteration  600: training_loss = 1.077417, test_loss = 0.996913
	iteration  700: training_loss = 0.975305, test_loss = 0.967309
	iteration  800: training_loss = 0.908823, test_loss = 0.938089
	iteration  900: training_loss = 0.893228, test_loss = 0.900953
	iteration 1000: training_loss = 0.859944, test_loss = 0.849694
	iteration 1100: training_loss = 0.816191, test_loss = 0.831686
	iteration 1200: training_loss = 0.796948, test_loss = 0.810675
	iteration 1300: training_loss = 0.780262, test_loss = 0.798497
	iteration 1400: training_loss = 0.769093, test_loss = 0.799691
	iteration 1500: training_loss = 0.794788, test_loss = 0.789790
	iteration 1600: training_loss

In [0]:
#################################################################################
#                          COMPLETE THE FOLLOWING SECTION                       #
#################################################################################
# add model features corresponding to nimages as embedding to tensorboard
#################################################################################
writer = SummaryWriter('./runs/FashionMNIST_Embeddings')
loader=torch.utils.data.DataLoader(trainset, batch_size=64)
dataiter = iter(loader)
images, labels = dataiter.next()
images,labels=images.to(device),labels.to(device)
_,images=model2(images)
writer.add_embedding(images, 
                     metadata=[classes[label] for label in labels],
                     tag='raw_pixels')
writer.flush()
writer.close()
###############################passpasspasspass##################################################
#                                   THE END                                     #
#################################################################################