# PhD Seminar Course on advanced Artificial Intelligence algorithms 
# by Dr. Paul Rad @ UT San Antonio.
# Assignment-1 
# CNN Rotation Invariance Project
Students: Mehrad Jaloli

# Abstract

Invariance means that you can recognize an object as an object, even when its appearance varies in some way. This is generally a good thing, because it preserves the object's identity, category, (etc) across changes in the specifics of the visual input, like relative positions of the viewer/camera and the object.
Figure 1 contains many views of the same statue. You (and well-trained neural networks) can recognize that the same object appears in every picture, even though the actual pixel values are quite different. 

# 1. CNN Model
Convolutional Neural Network (CNN) is a deep model working based on convolution concept. CNN is constructed of some convolution and max-pooling layers, called feature extraction part, and some fully connected layers which is the classification part. 
The whole idea of training in CNN model is while feeding images to the network, we have some kernels acting like filters. These kernels slide all over the images and do some filtering function based on which some features are extracted from the images. So, the output of each convolution layer would be some feature maps.

Based on the complexity of the problem and data set that we are using, the depth of the model (number of the layers) could be different. The more complex our model is, the more convolution layers we need for having a better feature extraction. 

One of the main concerns in using CNN models in real life is if those models can keep their performance well even while input images have some transformations, like rotation. So, we went through this concept and tried to solve one of the translation problems, "Rotation Invariance".

![](img/invar1-1.png)

# 2. Invariance vs Equivariance

Translation invariance means that the system produces exactly the same response, regardless of how its input is shifted. For example, a face-detector might report "FACE FOUND" for all three images in the top row.
 Equivariance means that the system works equally well across positions, but its response shifts with the position of the target. For example, a heat map of "face-iness" would have similar bumps at the left, center, and right when it processes the first row of images.

This is is sometimes an important distinction, but many people call both phenomena "invariance", especially since it is usually trivial to convert an equivariant response into an invariant one--just disregard all the position information)

# 3. Rotation Invariance

Based on section 2, we try to define rotation invariance and tied to solve it.

The procedure has been using the MNIST data set, feeding it into the model LeNet5, which is one of the well-known CNN model for image classification problem, training the model with the original data and then test the model with rotated images. Our purpose is to see whether the model can have an acceptable performance in classifying these rotated images as test data or not.

So, to go through this problem, firstly, we try to give a brief description about how each layer works in CNN model and how the layers are connected to each other.

# 3.1 CNN Opperation


Suppose that we are using MNIST data set which are images of gray handwritten digits with the size 28*28*1 (1 is the number of color channel which for gray is equal to 1). 

Once we feed the data into a convolution layer, there would be a kernel of size 5*5 (in LeNet-5 model)sliding all along the image and corresponds each block of 5 by 5 pixels of the original image to 1 pixel in the output of the first conv layer.

The output of each conv layer will be K images of size (Ns-Ks+1)*(Ns-Ks+1), while Ns*Ns is the size of original images fed to the model, Ks is the size of the kernel used for the conv layer and K is the number of kernels. Notice that each one of the images at the output of the conv layer are called feature map. 
Down Sampling} or reducing the computational cost. Here we should note that in this code we are applying kernels with stride=2, so it affects the size of the out put feature map.

By using a pooling layer of size 2*2 we are actually dividing the size of feature maps by 2. So, considering the MNIST dataset, after using \textbf{6} kernels of size 5*5 with stride=2 for the first conv layer we will have:

                                                                   6*(32-5+1)*(32-5+1) = 6*28*28

Then by applying a pooling layer of size 2*2 we will have,

                                                              size of the images after pooling layer = 6*14*14

We can have different pooling layers. Maxpooling, takes the maximum value between the pixels that pooling block is applied while Average pooling takes the average of all the pixels the pooling block is applied.
Max pooling extracts the most important features like edges whereas, average pooling extracts features so smoothly. For image data, you can see the difference. Although both are used for same reason, max pooling is better for extracting the extreme features. Average pooling sometimes can’t extract good features because it takes all into count and results an average value which may/may not be important for object detection type tasks.
It should be noted that the whole purpose of the convolution and pooling layers is feature extraction. 
After 3 convolution and 2 maxpooling layers in LeNet5 (feature extractor part of the CNN) we will get to the dimension, 120*7*7. Then we will have some \textbf{Fully Connected} layers. 
The purpose of fully connected layers are Classification. So, we use a Flatten Layer and transform the images to a vector (which in LeNet-5 is a vector of size 120).
Then we use Fully Connected layers of size 84 and 10 respectively, and based on machine learning algorithms, specifically classification algorithms, we classify the images in 10 classes (digits from 0 to 9). 
So, for the outermost layer we use a Soft Max layer which takes output of the last Fully Connected layer (which is a vector of size 1*10) and results 10 neurons as output corresponding to 10 classes (digits from 0 to 9).

The cost function used in LeNet-5 model is Cross-Entropy the outputs of which are probabilities.
Below in figure 2, you can see the architecture of LeNet-5 model.
Furthermore, the detailed architecture of Lenet-5 model based on the layers would be like figure 3.

![](img/fig2.png)

# 4. Visualization of Layers

To see if the CNN model is Rotation Invariance or not, we should be able to visualize the out put of each layer to see the changes after each conv layer. 

In figure 4 we can visualize the kernels of the first convolution layer, which are actually the weights we want to obtain after training the model.

Moreover, I have also attached the output of the 2nd and 3rd convolution layer in appendix on figure 7 and 8.

To have a better understanding of what visualization means, we choose one of the images from data set and feed it to the model. Then we look at the output of each convolution layer to see how the model actually comes up with prediction of mentioned digit. 

As an instance, we feed one of the images which is the handwritten of digit 4. As we go through the model, we can see the process in which model can identify the actual digit written in the picture.  

Figure 5 represents the images obtained from out put of the first convolution layer which we call Feature Maps.

![](img/fig4.png)

![](img/fig5.png)

As we can see, based on the architecture we have selected for our model, we have 6 feature maps at the out put of the first convolution layer so on figure 5, we see the out pt of each activation function applied to each feature map.

we can have this visualization for the other convolution layers as it is shown on figures 9 and 10 in appendix section.


# 5. Results

As we mentioned before, the goal of this project was investigating CNN model to see if it is Rotation Invariant or not. For this, we use visualization and try to go through the out put of each layer.

Thus, we define a function that gets images and rotate them by some specific degrees. In the code, this  function is called, "\textbf{rotate\_tensor}. Using this function, we rotate an image (one image of label 4) by 90, 180 and 270 degrees, as shown in figure 6. 

![](img/fig6.png)

For investigating the Rotation Invariance property of LeNet5, we use the pre-trained model with original MNIST dataset, then we test our rotated data sets to see the accuracy of the model.
Below, in table 1, you can see the results of training and testing LeNet-5 with the original MNIST data set. 

![](img/tab1.png)

Furthermore, in table 2 we can see the results obtained from testing the pre-trained LeNet5 model with images rotated by 90, 180, 270 and 360 degrees. 

![](img/tab2.png)

Note that, we have tested with 360 degree rotated images to see if it gives the same results as the original MNIST data set or not. As we can see, based on table 1 and 2, the test accuracy for these 2 data sets are the same, proving that our function (rotate\_tensor) works properly 

# 6. Concolusion

To put this problem in to perspective, we can say that the CNN models are \textbf{not Rotation Invariant} since as it is represented in the previous section, if the model is trained with a certain data set, it can not give a good accuracy on rotated images from the same data set.


# Appendix

![](img/fig7.png)

![](img/fig89.png)

# Codes

Now, let's go though the codes to see the whole process.

# Import Libraries

In [None]:
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from torchsummary import summary
import numpy as np
import matplotlib.pyplot as plt
from torchvision.utils import make_grid
import math

# Define Hyperparameters

In [None]:
args={}
kwargs={}
args['batch_size']=1000
args['test_batch_size']=1000
args['epochs']=20  # The number of Epochs is the number of times you go 
                   # through the full dataset. 
args['lr']=0.01 # Learning rate is how fast it will decend. 
args['momentum']=0.5 # SGD momentum (default: 0.5) Momentum is a moving 
                     # average of our gradients (helps to keep direction).

args['seed']=1 # random seed
args['log_interval']=40
args['cuda']=True # False if you don't have a CUDA w/ NVIDIA GPU available.
args['train_now']=False

# Define Custom Rotation Function


In [None]:
class CustomRotation(object):
    """Rotate image by a fixed angle which is ready for tranform.Compose()
    """

    def __init__(self, degrees, resample=False, expand=False, center=None):
        self.degrees = degrees
        self.resample = resample
        self.expand = expand
        self.center = center

    def __call__(self, img):
        
        return transforms.ToTensor()(
            transforms.functional.rotate(
                transforms.ToPILImage()(img), 
                self.degrees, self.resample, self.expand, self.center))

In [None]:
rotation = 0 # Specifies the rotation of images.

# Define the train and test loader
# Here we are adding our CustomRotation function to the transformations
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data/', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       CustomRotation(rotation),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['batch_size'], shuffle=True, **kwargs)

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data/', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       CustomRotation(rotation),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['test_batch_size'], shuffle=False, **kwargs)

In [None]:
class LeNet5(nn.Module):          
     
    def __init__(self):     
        super(LeNet5, self).__init__()
        # Convolution (In LeNet-5, 32x32 images are given 
        # as input. Hence padding of 2 is done below)
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, 
                                     kernel_size=5, stride=1, padding=2)
        self.max_pool_1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, 
                                     kernel_size=5, stride=1, padding=2)
        self.max_pool_2 = nn.MaxPool2d(kernel_size=2, stride=2) 
        self.conv3 = nn.Conv2d(in_channels=16, out_channels=120, 
                                     kernel_size=5, stride=1, padding=2)
        self.fc1 = nn.Linear(7*7*120, 120)
        # convert matrix with 16*5*5 (= 400) features to a matrix of 120 features (columns)
        self.fc2 = nn.Linear(120, 84)       
        # convert matrix with 120 features to a matrix of 84 features (columns)
        self.fc3 = nn.Linear(84, 10)        
        # convert matrix with 84 features to a matrix of 10 features (columns)

            
    def forward(self, x):
        # convolve, then perform ReLU non-linearity
        x = F.relu(self.conv1(x))  
        # max-pooling with 2x2 grid 
        x = self.max_pool_1(x) 
        # Conv2 + ReLU
        x = F.relu(self.conv2(x))
        # max-pooling with 2x2 grid
        x = self.max_pool_2(x)
        # Conv3 + ReLU
        x = F.relu(self.conv3(x))
        x = x.view(-1, 7*7*120)
        # FC-1, then perform ReLU non-linearity
        x = F.relu(self.fc1(x))
        # FC-2, then perform ReLU non-linearity
        x = F.relu(self.fc2(x))
        # FC-3
        x = self.fc3(x)
        
        return F.log_softmax(x, dim=1)

In [None]:
model = LeNet5()
if args['cuda']:
    model.cuda()

In [None]:
summary(model, (1, 28, 28))

In [None]:
def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        #Variables in Pytorch are differenciable. 
        data, target = Variable(data), Variable(target)
        #This will zero out the gradients for this batch. 
        optimizer.zero_grad()
        output = model(data)
        # Calculate the loss The negative log likelihood loss. 
        # It is useful to train a classification problem with C classes.
        loss = F.nll_loss(output, target)
        #dloss/dx for every Variable 
        loss.backward()
        #to do a one-step update on our parameter.
        optimizer.step()
        #Print out the loss periodically. 
        if batch_idx % args['log_interval'] == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data))

def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        with torch.no_grad(): # volatile was removed and now 
            # has no effect. Use `with torch.no_grad():` instead.
            data= Variable(data)
        target = Variable(target)
        output = model(data)
        # sum up batch loss # size_average and reduce args will 
        # be deprecated, please use reduction='sum' instead.
        test_loss += F.nll_loss(output, target, reduction='sum').data 
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1] 
        correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


##test with 90 degree rotated images
# ##=====
# rotation=90
# test_loader_90 = torch.utils.data.DataLoader(
# datasets.MNIST('data/', train=False, transform=transforms.Compose([
#     transforms.ToTensor(),
#     CustomRotation(rotation),
#     transforms.Normalize((0.1307,), (0.3081,))
#     ])),
#     batch_size=args['test_batch_size'], shuffle=False, **kwargs)

# def test_90():
#     model.eval()
#     test_loss = 0
#     correct = 0
#     for data, target in test_loader_90:
#         if args['cuda']:
#             data, target = data.cuda(), target.cuda()
#         with torch.no_grad(): # volatile was removed and now 
#             # has no effect. Use `with torch.no_grad():` instead.
#             data= Variable(data)
#         target = Variable(target)
#         output = model(data)
#         # sum up batch loss # size_average and reduce args will be 
#         # deprecated, please use reduction='sum' instead.
#         test_loss += F.nll_loss(output, target, reduction='sum').data 
#         # get the index of the max log-probability
#         pred = output.data.max(1, keepdim=True)[1] 
#         correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

#     test_loss /= len(test_loader_90.dataset)
#     print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
#         test_loss, correct, len(test_loader_90.dataset),
#         100. * correct / len(test_loader_90.dataset)))

In [None]:
optimizer = optim.SGD(model.parameters(), 
                      lr=args['lr'], momentum=args['momentum'])

# Training loop. 
# Change `args['log_interval']` if you want to change logging behavior.
# We test the network in each epoch.
# Setting the bool `args['train_now']` to not run training all the time.
# We'll save the weights and use the saved weights instead of 
# training the network everytime we load the jupyter notebook.
args['train_now'] = True
args['cuda'] = True

if args['train_now']:
    for epoch in range(1, args['epochs'] + 1):
        train(epoch)
        test()
    torch.save(model.state_dict(), './model_normal_mnist.pytrh')
else:
    model = LeNet5()
    if args['cuda']:
        device = torch.device("cuda")
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
        model.to(device)
    else:
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
    model.eval()


# Visualization

In [None]:
def custom_viz(kernels, path=None, cols=None):
    """Visualize weight and activation matrices learned 
    during the optimization process. Works for any size of kernels.
    
    Arguments
    =========
    kernels: Weight or activation matrix. Must be a high dimensional
    Numpy array. Tensors will not work.
    path: Path to save the visualizations.
    cols: TODO: Number of columns (doesn't work completely yet.)
    
    Example
    =======
    kernels = model.conv1.weight.cpu().detach().clone()
    kernels = kernels - kernels.min()
    kernels = kernels / kernels.max()
    custom_viz(kernels, 'results/conv1_weights.png', 5)
    """
    def set_size(w,h, ax=None):
        """ w, h: width, height in inches """
        if not ax: ax=plt.gca()
        l = ax.figure.subplotpars.left
        r = ax.figure.subplotpars.right
        t = ax.figure.subplotpars.top
        b = ax.figure.subplotpars.bottom
        figw = float(w)/(r-l)
        figh = float(h)/(t-b)
        ax.figure.set_size_inches(figw, figh)
    
    N = kernels.shape[0]
    C = kernels.shape[1]

    Tot = N*C

    # If single channel kernel with HxW size,
    # plot them in a row.
    # Else, plot image with C number of columns.
    if C>1:
        columns = C
    elif cols==None:
        columns = N
    elif cols:
        columns = cols
    rows = Tot // columns 
    rows += Tot % columns

    pos = range(1,Tot + 1)
    fig = plt.figure(1)
    fig.tight_layout()
    k=0
    for i in range(kernels.shape[0]):
        for j in range(kernels.shape[1]):
            img = kernels[i][j]
            ax = fig.add_subplot(rows,columns,pos[k])
            ax.imshow(img, cmap='gray')
            plt.axis('off')
            k = k+1

    set_size(30,30,ax)
    if path:
        plt.savefig(path, dpi=100)
    
    plt.show()


In [None]:
kernels = model.conv1.weight.cpu().detach().clone()
kernels = kernels - kernels.min()
kernels = kernels / kernels.max()
custom_viz(kernels, './results/conv1_weights.png', 4)

In [None]:
kernels = model.conv2.weight.cpu().detach().clone()
kernels = kernels - kernels.min()
kernels = kernels / kernels.max()
custom_viz(kernels, './results/conv2_weights.png', cols=5)
kernels.shape

In [None]:
kernels = model.conv3.weight.cpu().detach().clone()
kernels = kernels - kernels.min()
kernels = kernels / kernels.max()
custom_viz(kernels, './results/conv3_weights.png')


In [None]:
examples = enumerate(test_loader)
batch_idx, (example_data, example_targets) = next(examples)


# Rotation Function

In [None]:
def rotate_tensor(_in_tensor, plot=True):
    in_tensor = _in_tensor.clone()
    # Add one more channel to the beginning. Tensor shape = 1,1,28,28
    in_tensor.unsqueeze_(0)
    # Convert to Pytorch variable
    in_tensor = Variable(in_tensor, requires_grad=True)
    
    in_tensor_90 = in_tensor.transpose(2, 3).flip(3)
    in_tensor_180 = in_tensor.flip(2).flip(3)
    in_tensor_270 = in_tensor.transpose(2, 3).flip(2)
    
    if plot:
        plt.figure(1)
        plt.subplot(221)
        plt.gca().set_title('0 degree')
        plt.imshow(in_tensor[0][0].cpu().detach().clone(), cmap='gray')
        plt.subplot(222)
        plt.gca().set_title('+90 degree')
        plt.imshow(in_tensor_90[0][0].cpu().detach().clone(), cmap='gray')
        plt.subplot(223)
        plt.gca().set_title('+270 degree')
        plt.imshow(in_tensor_270[0][0].cpu().detach().clone(), cmap='gray')
        plt.subplot(224)
        plt.gca().set_title('+180 degree')
        plt.imshow(in_tensor_180[0][0].cpu().detach().clone(), cmap='gray')
        plt.tight_layout()
        plt.show()
    return(in_tensor, in_tensor_90, in_tensor_180, in_tensor_270)

In [None]:
number, number_90, number_180, number_270 = rotate_tensor(example_data[4])


In [None]:
print("Predicted Class: ", 
      np.argmax(model.forward(number.cuda()).cpu().detach().numpy()))


In [None]:
conv1_out = model.conv1.forward(number.cuda())

custom_viz(conv1_out.cpu().detach().clone(), './results/conv1_actv.png')
conv1_out.shape

In [None]:
conv2_out = model.conv2.forward(conv1_out.cuda())
custom_viz(conv2_out.cpu().detach().clone(), './results/conv2_actv.png')
conv2_out.shape

In [None]:
conv3_out = model.conv3.forward(conv2_out.cuda())
custom_viz(conv3_out.cpu().detach().clone(), './results/conv3_actv.png')
conv3_out.shape

In [None]:
# Specify the rotation
rotation = 90

# Load the data
train_loader_90 = torch.utils.data.DataLoader(
    datasets.MNIST('data/', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(), 
                       CustomRotation(rotation),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['batch_size'], shuffle=True, **kwargs)


test_loader_90 = torch.utils.data.DataLoader(
datasets.MNIST('data/', train=False, transform=transforms.Compose([
    transforms.ToTensor(),
    CustomRotation(rotation),
    transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=args['test_batch_size'], shuffle=False, **kwargs)

# Get some example data from test loader
examples_90 = enumerate(test_loader_90)
batch_idx, (example_data_90, example_targets_90) = next(examples_90)

In [None]:
# Specify and account for GPU usage
model_90 = LeNet5()
if args['cuda']:
    model_90.cuda()

# Training with 90 degree rotation MNIST

In [None]:
# Define train and test functions as before. 
# TODO: Consider adding model as an argument.

def train_90(epoch):
    model_90.train()
    for batch_idx, (data, target) in enumerate(train_loader_90):
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        #Variables in Pytorch are differenciable. 
        data, target = Variable(data), Variable(target)
        #This will zero out the gradients for this batch. 
        optimizer.zero_grad()
        output = model_90(data)
        # Calculate the loss The negative log likelihood loss. 
        # It is useful to train a classification problem with C classes.
        loss = F.nll_loss(output, target)
        #dloss/dx for every Variable 
        loss.backward()
        #to do a one-step update on our parameter.
        optimizer.step()
        #Print out the loss periodically. 
        if batch_idx % args['log_interval'] == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader_90.dataset),
                100. * batch_idx / len(train_loader_90), loss.data))

def test_90():
    model_90.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader_90:
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        with torch.no_grad(): # volatile was removed and now 
            # has no effect. Use `with torch.no_grad():` instead.
            data= Variable(data)
        target = Variable(target)
        output = model_90(data)
        # sum up batch loss # size_average and reduce args will be 
        # deprecated, please use reduction='sum' instead.
        test_loss += F.nll_loss(output, target, reduction='sum').data 
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1] 
        correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

    test_loss /= len(test_loader_90.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader_90.dataset),
        100. * correct / len(test_loader_90.dataset)))
 

In [None]:
# Define optimizer and train the model.
# If the model is already trained, try to load the model.
# Will give an error if trained model doesn't exist.

optimizer = optim.SGD(model_90.parameters(), 
                      lr=args['lr'], momentum=args['momentum'])

if args['train_now']:
    for epoch in range(1, args['epochs'] + 1):
        train_90(epoch)
        test_90()
    torch.save(model_90.state_dict(), './model_90_mnist.pytrh')
else:
    model_90 = LeNet5()
    if args['cuda']:
        device = torch.device("cuda")
        model_90.load_state_dict(torch.load('./model_90_mnist.pytrh'))
        model_90.to(device)
    else:
        model_90.load_state_dict(torch.load('./model_90_mnist.pytrh'))
    model_90.eval()

kernels = model_90.conv1.weight.cpu().detach().clone()
kernels = kernels - kernels.min()
kernels = kernels / kernels.max()
custom_viz(kernels, './results/conv1_weights_90.png', 4)

In [None]:
kernels = model_90.conv2.weight.cpu().detach().clone()
kernels = kernels - kernels.min()
kernels = kernels / kernels.max()
custom_viz(kernels, './results/conv2_weights_90.png')

In [None]:
kernels = model_90.conv3.weight.cpu().detach().clone()
kernels = kernels - kernels.min()
kernels = kernels / kernels.max()
custom_viz(kernels, './results/conv3_weights_90.png')


kernels = model.conv4.weight.cpu().detach().clone()
kernels = kernels - kernels.min()
kernels = kernels / kernels.max()
custom_viz(kernels, './results/conv4_weights_90.png')


In [None]:
print("Predicted Class: ", 
      np.argmax(model_90.forward(number_90.cuda()).cpu().detach().numpy()))

In [None]:
conv1_out_90 = model_90.conv1.forward(number_90.cuda())
custom_viz(conv1_out_90.cpu().detach().clone(), 'results/conv1_actv_90.png')

In [None]:

conv2_out_90 = model_90.conv2.forward(conv1_out_90.cuda())
custom_viz(conv2_out_90.cpu().detach().clone(), 'results/conv2_actv_90.png')

In [None]:
conv3_out_90 = model_90.conv3.forward(conv2_out_90.cuda())
custom_viz(conv3_out_90.cpu().detach().clone(), 'results/conv3_actv_90.png')

conv4_out_90 = model_90.conv4.forward(conv3_out_90.cuda())
custom_viz(conv4_out_90.cpu().detach().clone(), 'results/conv4_actv_90.png')

# Test Original LeNet5 model with 90 degree rotated test dataset from MNIST  

In [None]:
# def train(epoch):
#     model.train()
#     for batch_idx, (data, target) in enumerate(train_loader):
#         if args['cuda']:
#             data, target = data.cuda(), target.cuda()
#         #Variables in Pytorch are differenciable. 
#         data, target = Variable(data), Variable(target)
#         #This will zero out the gradients for this batch. 
#         optimizer.zero_grad()
#         output = model(data)
#         # Calculate the loss The negative log likelihood loss. 
#         # It is useful to train a classification problem with C classes.
#         loss = F.nll_loss(output, target)
#         #dloss/dx for every Variable 
#         loss.backward()
#         #to do a one-step update on our parameter.
#         optimizer.step()
#         #Print out the loss periodically. 
#         if batch_idx % args['log_interval'] == 0:
#             print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
#                 epoch, batch_idx * len(data), len(train_loader.dataset),
#                 100. * batch_idx / len(train_loader), loss.data))

def test_90():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader_90:
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        with torch.no_grad(): # volatile was removed and now 
            # has no effect. Use `with torch.no_grad():` instead.
            data= Variable(data)
        target = Variable(target)
        output = model(data)
        # sum up batch loss # size_average and reduce args will be 
        # deprecated, please use reduction='sum' instead.
        test_loss += F.nll_loss(output, target, reduction='sum').data 
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1] 
        correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

    test_loss /= len(test_loader_90.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader_90.dataset),
        100. * correct / len(test_loader_90.dataset)))

In [None]:
optimizer = optim.SGD(model.parameters(), 
                      lr=args['lr'], momentum=args['momentum'])

# Training loop. 
# Change `args['log_interval']` if you want to change logging behavior.
# We test the network in each epoch.
# Setting the bool `args['train_now']` to not run training all the time.
# We'll save the weights and use the saved weights instead of 
# training the network everytime we load the jupyter notebook.
args['train_now'] = False
args['cuda'] = True

if args['train_now']:
    for epoch in range(1, args['epochs'] + 1):
        train(epoch)
        test()
    torch.save(model.state_dict(), './model_normal_mnist.pytrh')
else:
    model = LeNet5()
    if args['cuda']:
        device = torch.device("cuda")
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
        model.to(device)
    else:
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
    model.eval()

In [None]:
test_90()

# Test Original LeNet5 model with 180 degree rotated test dataset from MNIST  

In [None]:
# Specify the rotation
rotation = 180

# Load the data
train_loader_180 = torch.utils.data.DataLoader(
    datasets.MNIST('data/', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(), 
                       CustomRotation(rotation),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['batch_size'], shuffle=True, **kwargs)


test_loader_180 = torch.utils.data.DataLoader(
datasets.MNIST('data/', train=False, transform=transforms.Compose([
    transforms.ToTensor(),
    CustomRotation(rotation),
    transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=args['test_batch_size'], shuffle=False, **kwargs)

# Get some example data from test loader
examples_180 = enumerate(test_loader_180)
batch_idx, (example_data_180, example_targets_180) = next(examples_180)

In [None]:
def test_180():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader_180:
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        with torch.no_grad(): # volatile was removed and now 
            # has no effect. Use `with torch.no_grad():` instead.
            data= Variable(data)
        target = Variable(target)
        output = model(data)
        # sum up batch loss # size_average and reduce args will be 
        # deprecated, please use reduction='sum' instead.
        test_loss += F.nll_loss(output, target, reduction='sum').data 
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1] 
        correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

    test_loss /= len(test_loader_180.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader_180.dataset),
        100. * correct / len(test_loader_180.dataset)))

In [None]:
optimizer = optim.SGD(model.parameters(), 
                      lr=args['lr'], momentum=args['momentum'])

# Training loop. 
# Change `args['log_interval']` if you want to change logging behavior.
# We test the network in each epoch.
# Setting the bool `args['train_now']` to not run training all the time.
# We'll save the weights and use the saved weights instead of 
# training the network everytime we load the jupyter notebook.
args['train_now'] = False
args['cuda'] = True

if args['train_now']:
    for epoch in range(1, args['epochs'] + 1):
        train(epoch)
        test()
    torch.save(model.state_dict(), './model_normal_mnist.pytrh')
else:
    model = LeNet5()
    if args['cuda']:
        device = torch.device("cuda")
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
        model.to(device)
    else:
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
    model.eval()

In [None]:
test_180()

# Test Original LeNet5 model with 270 degree rotated test dataset from MNIST  

In [None]:
# Specify the rotation
rotation = 270

# Load the data
train_loader_270 = torch.utils.data.DataLoader(
    datasets.MNIST('data/', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(), 
                       CustomRotation(rotation),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['batch_size'], shuffle=True, **kwargs)


test_loader_270 = torch.utils.data.DataLoader(
datasets.MNIST('data/', train=False, transform=transforms.Compose([
    transforms.ToTensor(),
    CustomRotation(rotation),
    transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=args['test_batch_size'], shuffle=False, **kwargs)

# Get some example data from test loader
examples_270 = enumerate(test_loader_270)
batch_idx, (example_data_270, example_targets_270) = next(examples_270)

In [None]:
def test_270():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader_270:
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        with torch.no_grad(): # volatile was removed and now 
            # has no effect. Use `with torch.no_grad():` instead.
            data= Variable(data)
        target = Variable(target)
        output = model(data)
        # sum up batch loss # size_average and reduce args will be 
        # deprecated, please use reduction='sum' instead.
        test_loss += F.nll_loss(output, target, reduction='sum').data 
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1] 
        correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

    test_loss /= len(test_loader_270.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader_270.dataset),
        100. * correct / len(test_loader_270.dataset)))

In [None]:
optimizer = optim.SGD(model.parameters(), 
                      lr=args['lr'], momentum=args['momentum'])

# Training loop. 
# Change `args['log_interval']` if you want to change logging behavior.
# We test the network in each epoch.
# Setting the bool `args['train_now']` to not run training all the time.
# We'll save the weights and use the saved weights instead of 
# training the network everytime we load the jupyter notebook.
args['train_now'] = False
args['cuda'] = True

if args['train_now']:
    for epoch in range(1, args['epochs'] + 1):
        train(epoch)
        test()
    torch.save(model.state_dict(), './model_normal_mnist.pytrh')
else:
    model = LeNet5()
    if args['cuda']:
        device = torch.device("cuda")
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
        model.to(device)
    else:
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
    model.eval()

In [None]:
test_270()

# Test Original LeNet5 model with 360 degree rotated test dataset from MNIST  

In [None]:
# Specify the rotation
rotation = 360

# Load the data
train_loader_360 = torch.utils.data.DataLoader(
    datasets.MNIST('data/', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(), 
                       CustomRotation(rotation),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args['batch_size'], shuffle=True, **kwargs)


test_loader_360 = torch.utils.data.DataLoader(
datasets.MNIST('data/', train=False, transform=transforms.Compose([
    transforms.ToTensor(),
    CustomRotation(rotation),
    transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=args['test_batch_size'], shuffle=False, **kwargs)

# Get some example data from test loader
examples_360 = enumerate(test_loader_360)
batch_idx, (example_data_360, example_targets_360) = next(examples_360)

In [None]:
def test_360():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader_360:
        if args['cuda']:
            data, target = data.cuda(), target.cuda()
        with torch.no_grad(): # volatile was removed and now 
            # has no effect. Use `with torch.no_grad():` instead.
            data= Variable(data)
        target = Variable(target)
        output = model(data)
        # sum up batch loss # size_average and reduce args will be 
        # deprecated, please use reduction='sum' instead.
        test_loss += F.nll_loss(output, target, reduction='sum').data 
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1] 
        correct += pred.eq(target.data.view_as(pred)).long().cpu().sum()

    test_loss /= len(test_loader_360.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader_360.dataset),
        100. * correct / len(test_loader_360.dataset)))

In [None]:
optimizer = optim.SGD(model.parameters(), 
                      lr=args['lr'], momentum=args['momentum'])

# Training loop. 
# Change `args['log_interval']` if you want to change logging behavior.
# We test the network in each epoch.
# Setting the bool `args['train_now']` to not run training all the time.
# We'll save the weights and use the saved weights instead of 
# training the network everytime we load the jupyter notebook.
args['train_now'] = False
args['cuda'] = True

if args['train_now']:
    for epoch in range(1, args['epochs'] + 1):
        train(epoch)
        test()
    torch.save(model.state_dict(), './model_normal_mnist.pytrh')
else:
    model = LeNet5()
    if args['cuda']:
        device = torch.device("cuda")
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
        model.to(device)
    else:
        model.load_state_dict(torch.load('./model_normal_mnist.pytrh'))
    model.eval()

In [None]:
test_360()