<a href="https://colab.research.google.com/github/ishandahal/first_repo/blob/master/Torch_Barebones_Api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Getting required packages for the data to be utilized (CIFAR10)

In [None]:
!pip install git+https://github.com/deepvision-class/starter-code

Collecting git+https://github.com/deepvision-class/starter-code
  Cloning https://github.com/deepvision-class/starter-code to /tmp/pip-req-build-b9kf2xth
  Running command git clone -q https://github.com/deepvision-class/starter-code /tmp/pip-req-build-b9kf2xth
Building wheels for collected packages: Colab-Utils
  Building wheel for Colab-Utils (setup.py) ... [?25l[?25hdone
  Created wheel for Colab-Utils: filename=Colab_Utils-0.1.dev0-cp36-none-any.whl size=10323 sha256=eb67dc139536cd1ea0ca5fba21e3a4c42511a438eea1afba3239d25ac85db52e
  Stored in directory: /tmp/pip-ephem-wheel-cache-emxw3lmc/wheels/63/d1/27/a208931527abb98d326d00209f46c80c9d745851d6a1defd10
Successfully built Colab-Utils


##Setup code 

In [None]:
import coutils
from coutils import fix_random_seed

from collections import OrderedDict
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

# for plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

In [None]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./datasets', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


## If GPU is available, it will be used if not fall back is on CPU


In [None]:
dtype = torch.float
ltype = torch.long

if torch.cuda.is_available():
  device = torch.device('cuda:0')
else:
  device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

using device: cpu


In [None]:
### Defining flatten function for convenience
def flatten(x, start_dim=1, end_dim=-1):
  return x.flatten(start_dim=start_dim, end_dim=end_dim)

## Barebones PyTorch for a two layered network.

In [None]:
def two_layer_fc(x, params):
  """
  A fully-connected neural networks; the architecture is:
  NN is fully connected -> ReLU -> fully connected layer.
  Note that this function only defines the forward pass; 
  PyTorch will take care of the backward pass for us.
  
  The input to the network will be a minibatch of data, of shape
  (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
  and the output layer will produce scores for C classes.
  
  Inputs:
  - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
    input data.
  - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
    w1 has shape (H, D) and w2 has shape (C, H).
  
  Returns:
  - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
    the input data x.
  """
  # first we flatten the image
  x = flatten(x)  # shape: [batch_size, C x H x W]
  
  w1, b1, w2, b2 = params
  
  # Forward pass: compute predicted y using operations on Tensors. Since w1 and
  # w2 have requires_grad=True, operations involving these Tensors will cause
  # PyTorch to build a computational graph, allowing automatic computation of
  # gradients. Since we are no longer implementing the backward pass by hand we
  # don't need to keep references to intermediate values.
  # Note that F.linear(x, w, b) is equivalent to x.mm(w.t()) + b
  # For ReLU, you can also use `.clamp(min=0)`, equivalent to `F.relu()`

  x = F.relu(F.linear(x, w1, b1))
  x = F.linear(x, w2, b2)
  return x
    

def two_layer_fc_test():
  hidden_layer_size = 42
  x = torch.zeros((64, 3, 16, 16), dtype=dtype)  # minibatch size 64, feature dimension 3*16*16
  w1 = torch.zeros((hidden_layer_size, 3*16*16), dtype=dtype)
  b1 = torch.zeros((hidden_layer_size,), dtype=dtype)
  w2 = torch.zeros((10, hidden_layer_size), dtype=dtype)
  b2 = torch.zeros((10,), dtype=dtype)
  scores = two_layer_fc(x, [w1, b1, w2, b2])
  print('Output size:', list(scores.size()))  # you should see [64, 10]

two_layer_fc_test()

Output size: [64, 10]


#Barebones three layered Conv-net.

In [None]:
def three_layer_convnet(x, params):
  """
  Performs the forward pass of a three-layer convolutional network with the
  architecture defined above.

  Inputs:
  - x: A PyTorch Tensor of shape (N, C, H, W) giving a minibatch of images
  - params: A list of PyTorch Tensors giving the weights and biases for the
    network; should contain the following:
    - conv_w1: PyTorch Tensor of shape (channel_1, C, KH1, KW1) giving weights
      for the first convolutional layer
    - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
      convolutional layer
    - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
      weights for the second convolutional layer
    - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
      convolutional layer
    - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
      figure out what the shape should be?
    - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
      figure out what the shape should be?
  
  Returns:
  - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
  """
  conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
  scores = None
  ##############################################################################
  # TODO: Implement the forward pass for the three-layer ConvNet.              #
  # Hint: F.linear, F.conv2d, F.relu, flatten                                  #
  ##############################################################################
  # Replace "pass" statement with your code
  x = F.relu(F.conv2d(x, conv_w1, bias=conv_b1, padding=2))
  x = F.relu(F.conv2d(x, conv_w2, bias=conv_b2, padding=1))
  x = flatten(x)
  scores = F.linear(x, fc_w, fc_b) ##shape of w = C, ch_2*32*32, b=C
  ################################################################################
  #                                 END OF YOUR CODE                             #
  ################################################################################
  return scores

In [None]:
def three_layer_convnet_test():
  x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]

  conv_w1 = torch.zeros((6, 3, 5, 5), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
  conv_b1 = torch.zeros((6,))  # out_channel
  conv_w2 = torch.zeros((9, 6, 3, 3), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
  conv_b2 = torch.zeros((9,))  # out_channel

  # you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
  fc_w = torch.zeros((10, 9 * 32 * 32))
  fc_b = torch.zeros(10)

  scores = three_layer_convnet(x, [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b])
  print('Output size:', list(scores.size()))  # you should see [64, 10]
three_layer_convnet_test()

Output size: [64, 10]


##Kaiming Initialization 
Using Kaiming initialization for the weights


In [None]:
fix_random_seed(0)

# Create a weight of shape [3 x 5]
print(nn.init.kaiming_normal_(torch.empty(3, 5, dtype=dtype, device=device)))
print(nn.init.zeros_(torch.empty(3, 5, dtype=dtype, device=device)))

tensor([[ 0.9746, -0.1856, -1.3780,  0.3595, -0.6859],
        [-0.8845,  0.2551,  0.5300, -0.4549, -0.2551],
        [-0.3773,  0.1151, -0.5418,  0.6961, -0.6775]])
tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])


###Method to check the accuracy of our models.
Note: Whenc checking for accuracy gradients do not need to be calculated. In order to prevent the computational graph from being built we specify `torch.no_grad()`


In [None]:
def check_accuracy_part2(loader, model_fn, params):
  """
  Check the accuracy of a classification model.
  
  Inputs:
  - loader: A DataLoader for the data split we want to check
  - model_fn: A function that performs the forward pass of the model,
    with the signature scores = model_fn(x, params)
  - params: List of PyTorch Tensors giving parameters of the model
  
  Returns: Nothing, but prints the accuracy of the model
  """
  split = 'val' if loader.dataset.train else 'test'
  print('Checking accuracy on the %s set' % split)
  num_correct, num_samples = 0, 0
  with torch.no_grad():
    for x, y in loader:
      x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
      y = y.to(device=device, dtype=ltype)
      scores = model_fn(x, params)
      _, preds = scores.max(1)
      num_correct += (preds == y).sum()
      num_samples += preds.size(0)
    acc = float(num_correct) / num_samples
    print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

###Barebones Pytorch: Training loop
Basic training loop using stocastic gradiant descent wothout momentum. We will use `torch.nn.functional.cross_entropy` to compute the loss. 

In [None]:
def train_part2(model_fn, params, learning_rate):
  """
  Train a model on CIFAR-10.
  
  Inputs:
  - model_fn: A Python function that performs the forward pass of the model.
    It should have the signature scores = model_fn(x, params) where x is a
    PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
    model weights, and scores is a PyTorch Tensor of shape (N, C) giving
    scores for the elements in x.
  - params: List of PyTorch Tensors giving weights for the model
  - learning_rate: Python scalar giving the learning rate to use for SGD
  
  Returns: Nothing
  """
  for t, (x, y) in enumerate(loader_train):
    # Move the data to the proper device (GPU or CPU)
    x = x.to(device=device, dtype=dtype)
    y = y.to(device=device, dtype=ltype)

    # Forward pass: compute scores and loss
    scores = model_fn(x, params)
    loss = F.cross_entropy(scores, y)

    # Backward pass: PyTorch figures out which Tensors in the computational
    # graph has requires_grad=True and uses backpropagation to compute the
    # gradient of the loss with respect to these Tensors, and stores the
    # gradients in the .grad attribute of each Tensor.
    loss.backward()

    # Update parameters. We don't want to backpropagate through the
    # parameter updates, so we scope the updates under a torch.no_grad()
    # context manager to prevent a computational graph from being built.
    with torch.no_grad():
      for w in params:
        if w.requires_grad:
          w -= learning_rate * w.grad

          # Manually zero the gradients after running the backward pass
          w.grad.zero_()

    if t % print_every == 0 or t == len(loader_train)-1:
      print('Iteration %d, loss = %.4f' % (t, loss.item()))
      check_accuracy_part2(loader_val, model_fn, params)
      print()

###Training a twolayered fully connected NN.
Each mini-batch has shape `[64, 3, 32, 32]`. 
After flattening the shape of input will be `[64, 3 * 32 * 32]`. 
The output will be `[64, 10]`. This is because there are 10 classes. Each row is the probability distribution of the input over the 10 classes. 

In [None]:
fix_random_seed(0)

C, H, W = 3, 32, 32
num_classes = 10

hidden_layer_size = 4000
learning_rate = 1e-2

w1 = nn.init.kaiming_normal_(torch.empty(hidden_layer_size, C*H*W, dtype=dtype, device=device))
w1.requires_grad = True
b1 = nn.init.zeros_(torch.empty(hidden_layer_size, dtype=dtype, device=device))
b1.requires_grad = True
w2 = nn.init.kaiming_normal_(torch.empty(num_classes, hidden_layer_size, dtype=dtype, device=device))
w2.requires_grad = True
b2 = nn.init.zeros_(torch.empty(num_classes, dtype=dtype, device=device))
b2.requires_grad = True

train_part2(two_layer_fc, [w1, b1, w2, b2], learning_rate=learning_rate)

Iteration 0, loss = 3.9070
Checking accuracy on the val set
Got 148 / 1000 correct (14.80%)

Iteration 100, loss = 1.9613
Checking accuracy on the val set
Got 356 / 1000 correct (35.60%)

Iteration 200, loss = 2.1789
Checking accuracy on the val set
Got 381 / 1000 correct (38.10%)

Iteration 300, loss = 1.9417
Checking accuracy on the val set
Got 394 / 1000 correct (39.40%)

Iteration 400, loss = 2.0686
Checking accuracy on the val set
Got 411 / 1000 correct (41.10%)

Iteration 500, loss = 1.7865
Checking accuracy on the val set
Got 412 / 1000 correct (41.20%)

Iteration 600, loss = 1.6491
Checking accuracy on the val set
Got 437 / 1000 correct (43.70%)

Iteration 700, loss = 2.1760
Checking accuracy on the val set
Got 413 / 1000 correct (41.30%)

Iteration 765, loss = 1.2554
Checking accuracy on the val set
Got 416 / 1000 correct (41.60%)



###Barebones: Training a ConvNet
In the three layered NN, we are using: 

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2 
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

Initializing weights using random_weights and biases as zeros. 


In [None]:
fix_random_seed(0)

C, H, W = 3, 32, 32
num_classes = 10

channel_1 = 32
channel_2 = 16
kernel_size_1 = 5
kernel_size_2 = 3

learning_rate = 3e-3

conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None

################################################################################
# TODO: Define and initialize the parameters of a three-layer ConvNet          #
#       using nn.init.kaiming_normal_.                                         #
################################################################################
# Replace "pass" statement with your code
conv_w1 = nn.init.kaiming_normal_(torch.empty(channel_1, C, kernel_size_1, kernel_size_1,
                                              dtype=dtype, device=device))
conv_w1.requires_grad=True
conv_b1 = nn.init.zeros_(torch.empty(channel_1))
conv_w2 = nn.init.kaiming_normal_(torch.empty(channel_2, channel_1, kernel_size_2, kernel_size_2,
                                              dtype=dtype, device=device))
conv_w2.requires_grad=True
conv_b2 = nn.init.zeros_(torch.empty(channel_2))
conv_b2.requires_grad=True
fc_w = nn.init.kaiming_normal_(torch.empty(num_classes, channel_2 * H * W,
                                           dtype=dtype, device=device))
fc_w.requires_grad=True
fc_b = nn.init.zeros_(torch.empty(num_classes, dtype=dtype, device=device))
fc_b.requires_grad=True
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)

Iteration 0, loss = 3.7824
Checking accuracy on the val set
Got 98 / 1000 correct (9.80%)

Iteration 100, loss = 1.8615
Checking accuracy on the val set
Got 350 / 1000 correct (35.00%)

Iteration 200, loss = 1.9028
Checking accuracy on the val set
Got 387 / 1000 correct (38.70%)

Iteration 300, loss = 1.7744
Checking accuracy on the val set
Got 426 / 1000 correct (42.60%)

Iteration 400, loss = 1.6251
Checking accuracy on the val set
Got 444 / 1000 correct (44.40%)

Iteration 500, loss = 1.5612
Checking accuracy on the val set
Got 468 / 1000 correct (46.80%)

Iteration 600, loss = 1.5234
Checking accuracy on the val set
Got 474 / 1000 correct (47.40%)

Iteration 700, loss = 1.5810
Checking accuracy on the val set
Got 461 / 1000 correct (46.10%)

Iteration 765, loss = 1.3091
Checking accuracy on the val set
Got 487 / 1000 correct (48.70%)

