<a href="https://colab.research.google.com/github/moajjem04/Pytorch_Practice/blob/main/Pytorch_CNN_pt_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Downloading the data

Installing `wget`

In [None]:
%%capture
!pip install wget

Downloading the data.

In [None]:
import wget
url = 'https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip'
filename = wget.download(url,'data.zip') # the file will be saved as data.zip

Unzipping the data into `PetImages` folder

In [None]:
%%capture
!unzip '/content/data.zip' -d '/content/';

# Preparing the dataset

In [None]:
import os
import cv2 as cv
import numpy as np
from tqdm import tqdm

In [None]:
REBUILD_DATA = True

class DogsVSCats():
  IMG_SIZE = 50
  CATS = 'PetImages/Cat'
  DOGS = 'PetImages/Dog'
  labels = {CATS:0, DOGS:1}
  training_data = []
  cat_count = 0
  dog_count = 0

  def make_training_data(self):
    for label in self.labels:
      print('\n',label)
      for f in tqdm(os.listdir(label)):
        try:
          path = os.path.join(label,f)
          img = cv.imread(path, cv.IMREAD_GRAYSCALE)
          img = cv.resize(img,(self.IMG_SIZE,self.IMG_SIZE))
          self.training_data.append([np.array(img),np.eye(2)[self.labels[label]]])

          if label == self.CATS:
            self.cat_count += 1
          elif label == self.DOGS:
            self.dog_count += 1
        except Exception as e:
          #print(label, f, str(e))
          pass
    print('\n')  
    print('Cats:',self.cat_count)
    print('Dogs:',self.dog_count) 

In [None]:
if REBUILD_DATA:
  dogvcat = DogsVSCats()
  dogvcat.make_training_data()
  training_data = dogvcat.training_data

  1%|          | 88/12501 [00:00<00:14, 879.20it/s]


 PetImages/Cat


100%|██████████| 12501/12501 [00:14<00:00, 846.33it/s]
  0%|          | 56/12501 [00:00<00:22, 559.41it/s]


 PetImages/Dog


100%|██████████| 12501/12501 [00:17<00:00, 700.77it/s]



Cats: 12476
Dogs: 12470





In [None]:
len(training_data)

24946

In [None]:
training_data[0]

[array([[141, 152, 160, ..., 181, 164, 149],
        [148, 161, 163, ..., 183, 169, 112],
        [149, 154, 160, ..., 117, 113, 104],
        ...,
        [170, 179, 181, ..., 162, 158, 134],
        [167, 177, 177, ..., 154, 142, 105],
        [159, 164, 179, ..., 145, 126, 107]], dtype=uint8), array([1., 0.])]

# Neural Network

Importing Libraries

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

### Checking the NN shapes on a dummy network

Printing the shape of the output of each layer before defining the architecture

In [None]:
inp_size = 50; 

m1 = nn.Conv2d(1,32,5)
m2 = nn.Conv2d(32,64,5)
m3 = nn.Conv2d(64,128,5)
m0 = nn.Flatten()

input = torch.randn(1, 1, inp_size, inp_size)
print('Input Shape:',input.shape)

output = F.relu(m1(input))
print('Output after 1st Conv:',output.shape)
output = F.max_pool2d(output,(2, 2))
print('Output after 1st MaxPool:',output.shape)

output = F.relu(m2(output))
print('Output after 2nd Conv:',output.shape)
output = F.max_pool2d(output,(2, 2))
print('Output after 2nd MaxPool:',output.shape)

output = F.relu(m3(output))
print('Output after 3rd Conv:',output.shape)
output = F.max_pool2d(output,(2, 2))
print('Output after 3rd MaxPool:',output.shape)

output = m0(output)
print('Flat:',output.shape)
#print(128 * calc_shape(inp_size,kernel,stride_size,padding))

Input Shape: torch.Size([1, 1, 50, 50])
Output after 1st Conv: torch.Size([1, 32, 46, 46])
Output after 1st MaxPool: torch.Size([1, 32, 23, 23])
Output after 2nd Conv: torch.Size([1, 64, 19, 19])
Output after 2nd MaxPool: torch.Size([1, 64, 9, 9])
Output after 3rd Conv: torch.Size([1, 128, 5, 5])
Output after 3rd MaxPool: torch.Size([1, 128, 2, 2])
Flat: torch.Size([1, 512])


Using a function to check the final size of the flattened layer

In [19]:
import math
def calc_shape(inp,kernel,stride,dilation,padding):
  out = (inp + 2*padding - dilation*(kernel-1) -1)/stride + 1
  #out = round(out)
  out = math.floor(out)
  return out

input = torch.randn(1, 1, inp_size, inp_size)
print('Input Shape:',input.shape[-1])

temp = calc_shape(50,5,1,1,0)
print('After Conv1:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool1:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv2:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool2:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv3:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool3:',temp)

# The final convolution layer has 128 channels so the output is multiplied by 128
print('Flattened Size:',temp*temp*128 )

Input Shape: 50
After Conv1: 46
After MaxPool1: 23
After Conv2: 19
After MaxPool2: 9
After Conv3: 5
After MaxPool3: 2
Flattened Size: 512


The initial size of the data is of `(1,1,50,50)`. The numbers are explained below:
* the 1st `1` denotes the no of samples
* the 2nd `1` denotes the no of channels
* the `50` represents the dimension of the image, i.e. a `50x50` image.

With each convolution, the no of channels increases while the size of the data decreases. This is by design. # Include appropriate discussion \
\
The size of the data also decreases when pooling is used (in our case max pooling). The output after a convolution and pooling follow this general formula assuming that the input data is square:

\begin{equation}
Size_{Out} = \frac{Size_{In} + 2 * Padding - Dilation * (Kernel - 1) -1}{Stride} + 1 \tag{1}
\end{equation}

Here:
* $Size_{Out}$ is the size of the output matrix
* $Size_{In}$ is the size of the input matrix 
* $Padding$ is the padding size. It defaults to `0`
* $Dilation$ is a parameter that controls the stride of elements in the window. It defaults to `1`
* $Kernel$ is the kernel size.
* $Stride$ is the length of stride of the kernel. It defaults to the kernel size.

We ran our dummy model. Then we printed the outputs after each convolution and pooling. 
```
Input Shape: torch.Size([1, 1, 50, 50])
Output after 1st Conv: torch.Size([1, 32, 46, 46])
Output after 1st MaxPool: torch.Size([1, 32, 23, 23])
Output after 2nd Conv: torch.Size([1, 64, 19, 19])
Output after 2nd MaxPool: torch.Size([1, 64, 9, 9])
Output after 3rd Conv: torch.Size([1, 128, 5, 5])
Output after 3rd MaxPool: torch.Size([1, 128, 2, 2])
Flattened Layer: torch.Size([1, 512])
```
The size of the flattened layer is something that is not easily retrieved from code. Most tutorials enter the size to a fully connected layer(a usual next step after conv layers) without justifying the numbers. I will try to use the formula according to Equation (1) to get the same number that we recieved from simulating a dummy network. \
We will use the following function to calculate the size:
```python
def calc_shape(inp,kernel,stride,dilation,padding):
  out = (inp + 2*padding - dilation*(kernel-1) -1)/stride + 1
  out = math.floor(out)
  return out
```
The code is just the equation (1) in code form. The output is rounded down.
The function is used as below:

```python
input = torch.randn(1, 1, inp_size, inp_size)
print('Input Shape:',input.shape[-1])

temp = calc_shape(50,5,1,1,0)
print('After Conv1:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool1:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv2:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool2:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv3:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool3:',temp)

print('Flattened Size:',temp*temp*128 )
```
The size from the final layer of Max Pooling is squared and multiplied by `128`. That is the number of channels according to our dummy model. So when the data is flattened we get `temp*temp*128`.
And we get the output:
```
Input Shape: 50
After Conv1: 46
After MaxPool1: 23
After Conv2: 19
After MaxPool2: 9
After Conv3: 5
After MaxPool3: 2
Flattened Size: 512
```
We can see that the values calculated by us match that of the dummy simulation. This method can be used before hand if we want to know the size of the flattened layer.

Defining the architecture

In [None]:
class Net(nn.Module):
  def __init__(self):
    super().__init__()

    # Layer definition
    self.conv1 = nn.Conv2d(1,32,5)
    self.conv2 = nn.Conv2d(32,64,5)
    self.conv3 = nn.Conv2d(64,128,5)
    # This code is used calculate the size of the flattened layer by having one
    #   pass forward
    x = torch.randn(50,50).view(-1,1,50,50)
    self._to_linear = None
    self.convs(x)

    self.flat = nn.Flatten()

    self.fc1 = nn.Linear(self._to_linear, 512) #flattening.
    self.fc2 = nn.Linear(512, 2) # 512 in, 2 out bc we're doing 2 classes (dog vs cat).

  def convs(self, x):
    # max pooling over 2x2
    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))
    #print(x[0].shape)

    if self._to_linear is None:
      self._to_linear = x[0].shape[0]*x[0].shape[1]*x[0].shape[2]
    return x

  def forward(self, x):
    x = self.convs(x)
    x = self.flat(x)
    x = F.relu(self.fc1(x))
    x = self.fc2(x) # bc this is our output layer. No activation here.
    return F.softmax(x, dim=1)

Initializing the model and viewing the layers

In [None]:
net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  (conv3): Conv2d(64, 128, kernel_size=(5, 5), stride=(1, 1))
  (flat): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=512, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=2, bias=True)
)
