## Deep Learning Course

### Homework 2

### Task 1:
    1.1. Implement SimpleConvNet’s layer’s conv2d_scalar, pool2d_scalar, relu_scalar, fc_layer_scalar (forward pass) in scalar form. 
    1.2. Ensure that all layer’s in-termediate outputs are exactly the same as in Pytorch framework.  You may use 'diff_mse' function for this purpose.
    
### Task 2:  
    Implement SimpleConvNet’s layer’s (forward pass) in vector form. Please,use your derived equations from Homework I.
    (a)  for the convolutional layer conv2d_vector described in section 2.1 in matrix form.  Column’s length in reshaped input matrix isCin∗K∗K.  
    (b)  For  the  pooling  layer pool2d_vector
    (c)  For the fully-connected layer described in section 2.4 fc_layer_vector.
    
### Task 3:
    Train your network for 20 epochs, report the achieved accuracy on MNIST test data. Measure and report the time on one epoch for scalar and vector variants.
    
### Comments: In this notebook you can find task 1,2 and partly 3 (measured time on dummy data). For the rest of task 3, please, visit https://colab.research.google.com/drive/1aI3HH3Gj35HHsopXja6bP36Y1heE50Rm


 #### Anastasiia Kasprova
 
    Link to github: https://github.com/kasprova/DL_UCU/tree/master/tasks/hw2
    Link to google colab: https://colab.research.google.com/drive/1TMqoh8WRat9IsbrVJlDdMCfiuG6bxa90


In [0]:
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import numpy as np
import time

In [0]:
def diff_mse(x, y):
    x_vec = x.view(1, -1).squeeze()
    y_vec = y.view(1, -1).squeeze()
    return torch.mean(torch.pow((x_vec - y_vec), 2)).item()

## Simple Convolutional Network

In [0]:
class SimpleConvNet(nn.Module):
    def __init__(self, device):
        super(SimpleConvNet, self).__init__()
        self.device = device
        self.conv_layer = nn.Conv2d(in_channels=1,
                                    out_channels=20,
                                    kernel_size=5,
                                    stride=1,
                                    padding=0,
                                    dilation=1,
                                    groups=1,
                                    bias=True)

        self.fc_layer1 = nn.Linear(in_features=20 * 12 * 12, out_features=500)
        self.fc_layer2 = nn.Linear(in_features=500, out_features=10)
        self.to(device)

#### Reproducibility

In [4]:
torch.manual_seed(1)
np.random.seed(1)
no_cuda = False
use_cuda = not no_cuda and torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
device


device(type='cuda')

In [0]:
#initiation of the model
model = SimpleConvNet(device)

In [6]:
model.eval()

SimpleConvNet(
  (conv_layer): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (fc_layer1): Linear(in_features=2880, out_features=500, bias=True)
  (fc_layer2): Linear(in_features=500, out_features=10, bias=True)
)

#### Parameters initialization

In [0]:
#conv_layer
data = torch.rand([1,1,28,28]).requires_grad_(True) #batch_size=64 - crashed on fc1 layer

----------

##1. Convolutional Layer

#### 1.1 Scalar form of equation of convolution:
\begin{equation}
z_{n,c_{out},m,l}^{(conv)} =
  \sum_{c_{in}=1}^{C_{in}}   
  \sum_{i=1}^{K}  
  \sum_{j=1}^{K} 
  x_{n,c_{in},m+i-1,l+j-1}
  w_{c_{out},c_{in},i,j}^{(conv)}
  +
  b_{c_{out}}^{(conv)}
\end{equation}

In [0]:
def conv2d_scalar(x_in, conv_weight, conv_bias, device):
    
    #read dimentionas of input tensor and weights
    batch_size, n_channels_in, height_in, width_in = x_in.shape
    n_channels_out, n_channels_in, kernel_size, kernel_size = conv_weight.shape
    
    #calculate the dimentions of output tensor
    height_out = height_in - kernel_size + 1
    width_out = width_in - kernel_size + 1
    
    #move to device
    x_in = x_in.to(device)
    conv_weight = conv_weight.to(device)
    conv_bias = conv_bias.to(device)
    
    #intiale output tensor of the correct size
    z = torch.empty([batch_size,n_channels_out,height_out,width_out]).to(device)
    
    #fulfill z based on scalar representation
    for n in range(batch_size):
        for c_out in range(n_channels_out):
            for c_in in range(n_channels_in):
                for m in range(height_out):
                    for l in range(width_out):
                        z[n,c_out,m,l] = (x_in[n,c_in,m:m+kernel_size,l:l+kernel_size]*conv_weight[c_out,c_in]).sum() + conv_bias[c_out]
                                                                                                                                                                                                                                                                                                                                                          
    return z

In [0]:
z_conv_scalar = conv2d_scalar(data, model.conv_layer.weight, model.conv_layer.bias, device = device)

In [10]:
z_conv_scalar.shape

torch.Size([1, 20, 24, 24])

#### 1.2 Vector form of convolutional layer 
with applied 'im2col' trick on "moving window" level (NB! $\mathbf{W}$ and $\mathbf{X^{(conv)}}$ reshaped according to 'im2col' requirements):
\begin{equation}
\mathbf{Z}^{(conv)} = \mathbf{W}\mathbf{X}^{(conv)}+\mathbf{B}^{(conv)}
\end{equation}

In [0]:
def im2col(X, kernel_size, device, stride = 1):
  
    #read dimentions of input tensor - 3-dimentional
    C_in, S_in, S_in = X.shape

    #calculate size_out
    S_out = (S_in - kernel_size)//stride + 1
    
    #move to device
    X = X.to(device)
    
    #intiale output tensor of the correct size
    X_cols = torch.empty([S_out*S_out, kernel_size*kernel_size]).to(device)
    
    for i in range(S_out):
        for j in range(S_out):
            X_cols[i*S_out+j] = X[0][i: i + kernel_size, j: j + kernel_size].contiguous().view(1, -1)
    
    return X_cols.t() # [K*K x S_out*S_out]

  
def conv_weight2rows(conv_weight):
    
    ##read dimentions of input tensor
    C_out = conv_weight.shape[0]
    kernel_size = conv_weight.shape[2]
    
    #resize 
    conv_weight_rows = conv_weight.view(C_out,kernel_size*kernel_size).contiguous()
    
    return conv_weight_rows # [C_out x K*K]
  

def conv2d_vector(x_in, conv_weight, conv_bias, device):
    #read dimentionas of input tensor and weights
    batch_size, C_in, S_in, S_in = x_in.shape
    C_out, C_in, kernel_size, kernel_size = conv_weight.shape
    
    #calculate the dimentions of output tensor
    S_out = S_in - kernel_size + 1
    
    #move to device
    x_in = x_in.to(device)
    conv_weight = conv_weight.to(device)
    conv_bias = conv_bias.to(device)
    
    #intiale output tensor of the correct size
    z = torch.empty([batch_size,C_out,S_out,S_out]).to(device)
    
    #transformation of conv_weight
    conv_weight_rows = conv_weight2rows(conv_weight)
    
    for n in range(batch_size):
        #WconvX+b, dim(WconvX+b)=[C_out x S_out*S_out], reshape [C_out x S_out x S_out]
        z[n] = (conv_weight_rows.matmul(im2col(x_in[n], kernel_size, device, stride=1)) + conv_bias.view(-1,1)).view(C_out,S_out,S_out)
    
    return z

In [0]:
z_conv_vector = conv2d_vector(data, model.conv_layer.weight, model.conv_layer.bias, device = device)

In [13]:
z_conv_vector.shape

torch.Size([1, 20, 24, 24])

#### 1.3 PyTorch convolutional layer

In [0]:
z_conv_torch = model.conv_layer(data.to(device))

In [15]:
z_conv_torch.shape

torch.Size([1, 20, 24, 24])

#### 1.4 QA: convolutional layer

In [16]:
mse_conv_sv = diff_mse(z_conv_scalar,z_conv_vector)
print("MSE between conv2d_scalar and conv2d_vector: ", mse_conv_sv)

MSE between conv2d_scalar and conv2d_vector:  1.485787480800583e-15


In [17]:
mse_conv_vt = diff_mse(z_conv_vector, z_conv_torch)
print("MSE between conv2d_vector and z_conv_torch: ", mse_conv_vt)

MSE between conv2d_vector and z_conv_torch:  0.0


In [18]:
mse_conv_st = diff_mse(z_conv_scalar, z_conv_torch)
print("MSE between z_conv_scalar and z_conv_torch: ", mse_conv_st)

MSE between z_conv_scalar and z_conv_torch:  1.485787480800583e-15


#### *  to make MSE consistent, as input for the next layer for all 3 variants we will use an output from convolutional layer of native torch function.

-----------

## 2. Max-pooling layer

#### 2.1 Scalar: max-pooling layer:

\begin{equation}
z_{n,c_{in},m,l}^{(pool)} =
max(
  z_{n,c_{out},2i-1,2j-1}^{(conv)},
  z_{n,c_{out},2i-1,2j}^{(conv)},
  z_{n,c_{out},2i,2j-1}^{(conv)},
    z_{n,c_{out},2i,2j}^{(conv)})
\end{equation}

In [0]:
def pool2d_scalar(a, device, stride = 2):
    
    #read dimentionas of input tensor
    batch_size, n_channels_in, height_in, width_in = a.shape
    pooling_size = 2
    
    #calculate the dimentions of output tensor
    height_out = (height_in-pooling_size)//stride + 1
    width_out = (width_in-pooling_size)//stride + 1
    n_channels_out = n_channels_in
    
    #move to device
    a = a.to(device)
    
    #intiale an output tensor of the correct size
    z = torch.empty([batch_size,n_channels_out,height_out,width_out]).to(device)
    
    #fulfill z based on scalar representation
    for n in range(batch_size):
        for c_out in range(n_channels_out):
            for i in range(height_out):
                for j in range(width_out):
                    z[n,c_out,i,j] = a[n,c_out,2*i:2*i+2,2*j:2*j+2].max()
    
    return z

In [0]:
z_pool_scalar = pool2d_scalar(z_conv_torch, device)

In [21]:
z_pool_scalar.shape

torch.Size([1, 20, 12, 12])

#### 2.2 Vector: max-pooling layer

In [0]:
def pool2d_vector(a, device, stride = 2):
    
    #read dimentionas of input tensor
    batch_size, C_in, S_in, S_in = a.shape
    pooling_size = 2
    stride = 2
    
    #calculate the dimentions of output tensor
    S_out = (S_in - pooling_size)//stride + 1 
    C_out = C_in
    
    #move to device
    a = a.to(device)
    
    #intiale an output tensor of the correct size
    z = torch.empty([batch_size,C_out,S_out,S_out]).to(device)
    
    for n in range(batch_size):
        z[n] = im2col(a[n], pooling_size, device, stride=2).max(dim=0).values.view(-1, S_out, S_out)
    return z 

In [0]:
z_pool_vector = pool2d_vector(z_conv_torch, device)

In [24]:
z_pool_vector.shape

torch.Size([1, 20, 12, 12])

#### 2.3 PyTorch: max-pooling layer

In [0]:
z_pool_torch = F.max_pool2d(z_conv_torch, 2, 2)

In [26]:
z_pool_torch.shape

torch.Size([1, 20, 12, 12])

#### 2.4 QA: max-pooling layer

In [27]:
mse_pool_sv = diff_mse(z_pool_scalar,z_pool_vector)
print("MSE between pool2d_scalar and pool2d_vector: ", mse_pool_sv)

MSE between pool2d_scalar and pool2d_vector:  0.13785971701145172


In [28]:
mse_pool_vt = diff_mse(z_pool_vector,z_pool_torch)
print("MSE between pool2d_vector and F.max_pool2d torch: ", mse_pool_vt)

MSE between pool2d_vector and F.max_pool2d torch:  0.13785971701145172


In [29]:
mse_pool_st = diff_mse(z_pool_scalar,z_pool_torch)
print("MSE between pool2d_scalar and F.max_pool2d torch: ", mse_pool_st)

MSE between pool2d_scalar and F.max_pool2d torch:  0.0


----------

## 3.  Reshape

#### 3.1 Scalar: reshape 

Scalar equation of reshape:
\begin{equation} 
z_{n,j}^{(reshaped)} = z_{n,c_{in},m,l}^{(pool)}
\end{equation} 

\begin{equation} 
j = (c_{in}-1) \ast S_{in} \ast S_{in} + (m-1)\ast S_{in} +l
\end{equation} 

In [0]:
def reshape_scalar(a, device):
    
    #read dimentionas of input tensor
    batch_size, n_channels_in, height_in, width_in = a.shape
    
    #calculate the dimentions of output tensor
    n_outputs = n_channels_in * height_in * width_in
    
    #move to device
    a = a.to(device)
    
    #intiale an output matrix of the correct size
    z = torch.empty([batch_size, n_outputs]).to(device)
    
    for n in range(batch_size):
        for c_in in range(n_channels_in):
            for m in range(height_in):
                for l in range(width_in):
                    z[n,c_in*height_in*width_in+m*height_in+l] = a[n,c_in,m,l]
    
    return z

In [0]:
z_reshape_scalar = reshape_scalar(z_pool_torch, device)

In [32]:
z_reshape_scalar.shape

torch.Size([1, 2880])

#### 3.2 Vector: reshape 

In [0]:
def reshape_vector(a, device):
    
    batch_size = a.shape[0]
    
    #move to device
    a = a.to(device)
    
    z = a.clone().view(batch_size,-1)
    
    return z

In [0]:
z_reshape_vector = reshape_vector(z_pool_torch, device)

In [35]:
z_reshape_vector.shape

torch.Size([1, 2880])

#### 3.3 PyTorch: reshape

In [0]:
z_reshape_torch = z_pool_torch.view(-1, 20 * 12 * 12)

In [37]:
z_reshape_torch.shape

torch.Size([1, 2880])

#### 3.4 QA: reshape 
Since there were only manipulations with shape MSE for all 3 functions is same as MSE of max-pooling layer.

---------------

## 4. Fully-connected layer

#### 4.1 Scalar: fully-connected layer
\begin{equation}
z_{n,j}^{(fc1)} =
  \sum_{i=1}^{D}  
  w_{j,i}^{(fc1)}\ast
  z_{n,i}^{(reshaped)}
  +
  b_{j}^{(fc1)}
\end{equation}

In [0]:
def fc_layer_scalar(a, weight, bias, device):
    
    #read dimentionas of input matrix
    batch_size, n_inputs = a.shape
    n_outputs = bias.shape[0]
    
    #move to device
    a = a.to(device)
    weight = weight.to(device)
    bias = bias.to(device)
    
    #intiale an output matrix of the correct size
    z = torch.empty([batch_size, n_outputs]).to(device)
    
    for n in range(batch_size):
        for j in range(n_outputs):
            z[n,j] = bias[j]
            for i in range(n_inputs):
                z[n,j] += weight[j,i]*a[n,i]
                
    return z

In [0]:
z_fc1_scalar = fc_layer_scalar(z_reshape_torch,model.fc_layer1.weight, model.fc_layer1.bias, device)

In [40]:
z_fc1_scalar.shape

torch.Size([1, 500])

#### 4.2 Vector: fully-connected layer
\begin{equation}
\mathbf{Z}^{(fc1)} = \mathbf{X}^{(reshaped)}\mathbf{W}^{T}+\mathbf{B}^{(fc1)}
\end{equation}


In [0]:
def fc_layer_vector(a, weight, bias, device):
    
    #move to device
    a = a.to(device)
    weight = weight.to(device)
    bias = bias.to(device)
    
    z = (a.matmul(weight.t())+ bias).clone()
    
    return z

In [0]:
z_fc1_vector = fc_layer_scalar(z_reshape_torch, model.fc_layer1.weight, model.fc_layer1.bias, device)

In [43]:
z_fc1_vector.shape

torch.Size([1, 500])

#### 4.3 PyTorch: fully-connected layer

In [0]:
z_fc1_torch = model.fc_layer1(z_reshape_torch)

In [45]:
z_fc1_torch.shape

torch.Size([1, 500])

#### 4.4 QA: fully-connected layer

In [46]:
mse_fc1_sv = diff_mse(z_fc1_scalar, z_fc1_vector)
print("MSE between fc_layer_scalar output and fc_layer_vector: ", mse_fc1_sv)

MSE between fc_layer_scalar output and fc_layer_vector:  0.0


In [47]:
mse_fc1_st = diff_mse(z_fc1_scalar, z_fc1_torch)
print("MSE between fc_layer_scalar output and model.fc_layer1 torch: ", mse_fc1_st)

MSE between fc_layer_scalar output and model.fc_layer1 torch:  3.9590255580162007e-14


In [48]:
mse_fc1_vt = diff_mse(z_fc1_vector, z_fc1_torch)
print("MSE between fc_layer_vector output and model.fc_layer1 torch: ", mse_fc1_vt)

MSE between fc_layer_vector output and model.fc_layer1 torch:  3.9590255580162007e-14


------------

## 5. ReLU

#### 5.1 Scalar: ReLU 
\begin{equation}
z_{n,j}^{(relu)} =
  relu(
  z_{n,j}^{(fc1)})
\end{equation}

In [0]:
def relu_scalar(a, device):
  
    #read dimentionas of input matrix
    batch_size, n_inputs = a.shape
    
    #move to device
    a = a.to(device)
    
    #intiale an output matrix of the correct size
    z = torch.empty(batch_size, n_inputs).to(device)
    
    for n in range(batch_size):
        for i in range(n_inputs):
            if a[n,i]<0:
                z[n,i]=0
            else:
                z[n,i]=a[n,i]
                
    return z

In [0]:
z_relu_scalar = relu_scalar(z_fc1_torch,device)

In [51]:
z_relu_scalar.shape

torch.Size([1, 500])

#### 5.2 Vector: ReLU

In [0]:
def relu_vector(a, device):
    #move to device
    a = a.to(device)
    
    #clone input tensor
    z = a.clone().to(device)
    
    #elements < 0 replace with 0
    z[z<0] = 0
    
    return z

In [0]:
z_relu_vector = relu_vector(z_fc1_torch,device)

In [54]:
z_relu_vector.shape

torch.Size([1, 500])

#### 5.3 PyTorch: ReLU

In [0]:
z_relu_torch = F.relu(z_fc1_torch)

In [56]:
z_relu_torch.shape

torch.Size([1, 500])

#### 5.4 QA: ReLU

In [57]:
mse_relu_sv = diff_mse(z_relu_scalar, z_relu_vector)
print("MSE between z_relu_scalar output and z_relu_vector: ", mse_relu_sv)

MSE between z_relu_scalar output and z_relu_vector:  0.0


In [58]:
mse_relu_st = diff_mse(z_relu_scalar, z_relu_torch)
print("MSE between z_relu_scalar output and F.relu torch: ", mse_relu_st)

MSE between z_relu_scalar output and F.relu torch:  0.0


In [59]:
mse_relu_vt = diff_mse(z_relu_vector, z_relu_torch)
print("MSE between z_relu_vector output and F.relu torch: ", mse_relu_vt)

MSE between z_relu_vector output and F.relu torch:  0.0


---------

## Performance comparison

In [0]:
#dummy data
#convolutional layer
data = torch.rand([1,1,28,28]).requires_grad_(True) #batch_size=64 - crashed on fc1 layer
w_conv = torch.rand([20,1,5,5]).requires_grad_(True)
b_conv = torch.rand([20]).requires_grad_(True)

#fc1 layer
w_fc1 = torch.rand([500, 2880]).requires_grad_(True)
b_fc1 = torch.rand([500]).requires_grad_(True)

#fc2 layer
w_fc2 = torch.rand([10, 500]).requires_grad_(True)
b_fc2 = torch.rand([10]).requires_grad_(True)

In [0]:
def forward_scalar(x_in, w_conv, b_conv, w_fc1, b_fc1, w_fc2, b_fc2, device):
    z_conv = conv2d_scalar(data, w_conv, b_conv, device)
    z_pool = pool2d_scalar(z_conv, device)
    z_pool_reshaped = reshape_scalar(z_pool, device)
    z_fc1 = fc_layer_scalar(z_pool_reshaped, w_fc1, b_fc1, device)
    z_relu = relu_scalar(z_fc1, device)
    z_fc2 = fc_layer_scalar(z_relu, w_fc2, b_fc2, device)
    y = F.softmax(z_fc2, dim=1)
    return y
  

In [0]:
start_scalar = time.time()
forward_scalar(data, w_conv, b_conv, w_fc1, b_fc1, w_fc2, b_fc2, device)
end_scalar = time.time()

print("forward_scalar duration: ", end_scalar-start_scalar, "sec")

forward_scalar duration:  86.67015671730042 sec


In [0]:
def forward_vector(x_in, w_conv, b_conv, w_fc1, b_fc1, w_fc2, b_fc2, device):
    z_conv = conv2d_vector(data, w_conv, b_conv, device)
    z_pool = pool2d_vector(z_conv, device)
    z_pool_reshaped = reshape_vector(z_pool, device)
    z_fc1 = fc_layer_vector(z_pool_reshaped, w_fc1, b_fc1, device)
    z_relu = relu_vector(z_fc1, device)
    z_fc2 = fc_layer_vector(z_relu, w_fc2, b_fc2, device)
    y = F.softmax(z_fc2, dim=1)
    return y
  

In [0]:
start_vector = time.time()
forward_vector(data, w_conv, b_conv, w_fc1, b_fc1, w_fc2, b_fc2, device)
end_vector = time.time()

print("forward_vector duration: ", end_vector-start_vector, "sec")

forward_vector duration:  0.044803619384765625 sec


In [0]:
86.67015671730042//0.044803619384765625

1934.0

#### Conclusion: vector implementation on generated data performs in more than 100 times faster on CPU and about 2000 faster on CUDA.

In [0]:
!nvidia-smi

Sun Jul 21 19:56:30 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P0    27W /  70W |    795MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
+-------