# Introduction


In this assignment, you will practice building and training Convolutional Neural Networks with Pytorch to solve computer vision tasks.  This assignment includes two sections, each involving different tasks:

(1) Image Classification. Predict image-level category labels on two historically notable image datasets: **CIFAR-10** and **MNIST**.

(2) Image Segmentation. Predict pixel-wise classification (semantic segmentation) on synthetic input images formed by superimposing MNIST images on top of CIFAR images.

You will design your own models in each section and build the entire training/testing pipeline with PyTorch. 
PyTorch provides optimized implementations of the building blocks and additional utilities, both of which will be necessary for experiments on real datasets. It is highly recommended to read the official [documentation](https://pytorch.org/docs/stable/index.html) and [examples](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) before starting your implementation. There are some APIs that you'll find useful:
[Layers](http://pytorch.org/docs/stable/nn.html),
[Activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity),
[Loss functions](http://pytorch.org/docs/stable/nn.html#loss-functions),
[Optimizers](http://pytorch.org/docs/stable/optim.html)

It is highly recommended to use Google Colab and run the notebook on a GPU node. Check https://colab.research.google.com/ and look for tutorials online. To use a GPU go to Runtime -> Change runtime type and select GPU. 


# (2) Image Segmentation
The task consists of performing pixel-wise classification on a synthetic dataset
of 32x32 RGB images. Each image was generated by placing a MNIST sample (a grayscale
image of a digit between 0 and 9) on top of a CIFAR-10 sample (a RGB image drawn from
one of 10 possible classes). Each image has an accompanying target tensor of size 32x32,
where in each pixel location (i,j) it contains the ground-truth label of the MNIST digit
(ranging from 0 to 9) or of the CIFAR-10 image (ranging from 10 to 19), depending on whether
the (i,j) pixel in the original image belongs to the superposed MNIST image or not. The
metric of interest here is pixel-wise accuracy, which is the fraction of pixels in each image
for which your model predicted the correct class (out of a total of 20 classes, as described 
above).

Note that there are many ways to frame the above task. For example, your CNN can directly
output a 20x32x32 tensor for each input image, representing a distribution over the possible
20 classes for each of the 32x32 pixels. However, note that the problem has a lot of additional
structure: for example, each 32x32 target tensor only has two distinct numbers in it, the label
of the MNIST digit and the label of the CIFAR-10 background image -- accounting for such
structure will make training faster and likely improve your model's final performance. Your
model should be able to achieve around 70% accuracy on the test set when trained for 100 epochs.

To finish this section step by step, you need to:

* Prepare data by building a dataset and data loader. (already provided below)

* Implement training code (6 points) & testing code (6 points), including saving and loading of models.

* Construct a model (12 points) and choose an optimizer (3 points).

* Describe what you did, any additional features you implemented, and/or any graphs you made in training and evaluating your network. Report final test accuracy @100 epochs in a writeup: hw3.pdf (3 points)

In [2]:
#this code was to connect to the googld drive files
# from google.colab import drive
# drive.mount('/content/drive')

# #import sys
# #sys.path.insert(0,'/content/drive/hw3_release’)

# %cd '/content/drive/My Drive/hw3_release'

Mounted at /content/drive
/content/drive/My Drive/hw3_release


In [3]:
import numpy as np
import os
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler
import torchvision
import torchvision.transforms as T
from utils import SegDataset


## Data Preparation:

Setup a Dataset for training and testing.

Datasets load single training examples one a time, so we practically wrap each Dataset in a DataLoader, which loads a data batch in parallel.

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
seg_train = SegDataset('./data', train=True, transform=None)
loader_train = DataLoader(seg_train, batch_size=64, shuffle=True)
seg_test = SegDataset('./data', train=False, transform=None)
loader_test = DataLoader(seg_test, batch_size=64, shuffle=False)

## Design/choose your own model structure (12 points) and optimizer (3 points).
You might want to adjust following configurations for better performance:

(1) Network architecture:
- You can borrow some ideas from existing convnets design, e.g., [ResNet](https://arxiv.org/abs/1512.03385) where
the input from the previous layer is added to the output, or [UNet](https://arxiv.org/pdf/1505.04597.pdf) where you can stack intermediate features from previous layers. 
- Note: Do not **directly copy** an existing network design.

(2) Architecture hyperparameters:
- Filter size, number of filters, and number of layers (depth). Make careful choices to tradeoff computational efficiency and accuracy.
- Pooling vs. Strided Convolution
- Batch normalization
- Choice of non-linear activation

(3) Choice of optimizer (e.g., SGD, Adam, Adagrad, RMSprop) and associated hyperparameters (e.g., learning rate, momentum).


In [5]:
#Basic model, feel free to customize the layout to fit your model design.

##########################################################################
# TODO: YOUR CODE HERE
# (1) Complete the model
class DoubleConv(nn.Module):
  def __init__(self, in_channels, out_channels):
    super(DoubleConv, self).__init__()
    self.double_conv = nn.Sequential(
        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True),
    )
  def forward(self, x):
    return self.double_conv(x)
class DownBlock(nn.Module):
  def __init__(self, in_channels, out_channels):
    super(DownBlock, self).__init__()
    self.double_conv = DoubleConv(in_channels, out_channels)
    self.down_sample = nn.MaxPool2d(2)

  def forward(self, x):
    skip_out = self.double_conv(x)
    down_out = self.down_sample(skip_out)
    return (down_out, skip_out)

class UpBlock(nn.Module):
  def __init__(self, in_channels, out_channels, up_sample_mode):
    super(UpBlock, self).__init__()
    if up_sample_mode == 'conv_transpose':
        self.up_sample = nn.ConvTranspose2d(in_channels-out_channels, in_channels-out_channels, kernel_size=2, stride=2)        
    elif up_sample_mode == 'bilinear':
        self.up_sample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
    else:
        raise ValueError("Unsupported `up_sample_mode` (can take one of `conv_transpose` or `bilinear`)")
    self.double_conv = DoubleConv(in_channels, out_channels)

  def forward(self, down_input, skip_input):
    x = self.up_sample(down_input)
    x = torch.cat([x, skip_input], dim=1)
    return self.double_conv(x)
##########################################################################

class myNet(nn.Module):
    def __init__(self,out_classes=20, up_sample_mode='conv_transpose'):
        super(myNet, self).__init__()
        # Set up your own CNN.
        self.up_sample_mode = up_sample_mode
        # Downsampling Path
        self.down_conv1 = DownBlock(3, 64)
        self.down_conv2 = DownBlock(64, 128)
        self.down_conv3 = DownBlock(128, 256)
        self.down_conv4 = DownBlock(256, 512)
        # Bottleneck
        self.double_conv = DoubleConv(512, 1024)
        # Upsampling Path
        self.up_conv4 = UpBlock(512 + 1024, 512, self.up_sample_mode)
        self.up_conv3 = UpBlock(256 + 512, 256, self.up_sample_mode)
        self.up_conv2 = UpBlock(128 + 256, 128, self.up_sample_mode)
        self.up_conv1 = UpBlock(128 + 64, 64, self.up_sample_mode)
        # Final Convolution
        self.conv_last = nn.Conv2d(64, out_classes, kernel_size=1)

    def forward(self, x):
        # forward
        x, skip1_out = self.down_conv1(x)
        x, skip2_out = self.down_conv2(x)
        x, skip3_out = self.down_conv3(x)
        x, skip4_out = self.down_conv4(x)
        x = self.double_conv(x)
        x = self.up_conv4(x, skip4_out)
        x = self.up_conv3(x, skip3_out)
        x = self.up_conv2(x, skip2_out)
        x = self.up_conv1(x, skip1_out)
        x = self.conv_last(x)
        out = x
        return out

## Training (6 points)

Train a model on the given dataset using the PyTorch Module API.

Inputs:
- loader_train: The loader from which train samples will be drawn from.
- loader_test: The loader from which test samples will be drawn from
- model: A PyTorch Module giving the model to train.
- optimizer: An Optimizer object we will use to train the model
- epochs: (Optional) A Python integer giving the number of epochs to train for

Returns: Nothing, but prints model accuracies during training.

In [6]:
def train(loader_train, loader_test, model, optimizer, epochs=100):
    model = model.cuda()
    criterion = nn.CrossEntropyLoss()# choose your loss here, if needed
    
    for e in range(epochs):
        model.train()
        for t, (x, y) in enumerate(loader_train):
            ##########################################################################
            # TODO: YOUR CODE HERE
            # (1) move data to GPU
            # (2) forward and get loss
            # (3) zero out all of the gradients for the variables which the optimizer
            # will update.
            # (4) the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            # (5) update the parameters of the model using the gradients
            # computed by the backwards pass.
            x = x.to(device)
            y = y.to(device)
              # Forward pass
            outputs = model(x)
            loss = criterion(outputs, y)

              # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        # print('Epoch [{}/{}]'.format(e+1, epochs))
            ##########################################################################
            if t % 100 == 0:
                print('Epoch %d, Iteration %d, loss = %.4f' % (e, t, loss.item()))
        test(loader_test, model)

## Testing (6 points)
Test a model using the PyTorch Module API.

Inputs:
- loader: The loader from which test samples will be drawn from.
- model: A PyTorch Module giving the model to test.

Returns: Nothing, but prints model accuracies during training.

In [7]:
def test(loader, model):
    num_correct = 0
    num_samples = 0
    model.eval() # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            ##########################################################################
            # TODO: YOUR CODE HERE
            # (1) move to GPU
            # (2) forward and calculate scores and predictions
            # (2) accumulate num_correct and num_samples
            x = x.to(device)
            y = y.to(device)
            #print(y)
            outputs = model(x)
            _, predicted = torch.max(outputs.data, 1)
            # print("predicted,y,shapes")
            # print(predicted.shape)
            # print(y.shape)
            # num_samples += 64*32*32 #num_samples
            # print(y.shape)
            num_samples += y.shape[0]*y.shape[1]*y.shape[2]
            num_correct += (predicted == y).sum().item()
            acc = float(num_correct) / num_samples
            ##########################################################################
        print('Eval %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

Describe your design details in the writeup hw3.pdf. (3 points)

Finish your model and optimizer below.

In [8]:
data, labels = next(iter(loader_train))
print(data.shape)
num_samples = len(seg_train)
print(num_samples)
batch_size = data.shape[0]
print(batch_size)

torch.Size([64, 3, 32, 32])
50000
64


In [9]:
model = myNet()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.001)
train(loader_train, loader_test, model, optimizer, epochs=100)

Epoch 0, Iteration 0, loss = 2.9627


KeyboardInterrupt: ignored

Visualize my instance segmentation model results

In [None]:
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

for batch in loader_train:
    images, masks = batch

def show_segmentation(image, model):
  device = next(model.parameters()).device
  #image = image.to(device)
  image = image.unsqueeze(0).to(device)
  model.eval()
  with torch.no_grad():
    output = model(image)
  predicted_mask = torch.argmax(output, dim=1).squeeze().cpu().numpy()
  plt.imshow(predicted_mask)
  plt.show()

for image in images:
  show_segmentation(image, model)