<a href="https://colab.research.google.com/github/moajjem04/Pytorch_Practice/blob/main/Pytorch_CNN_pt_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Downloading the data

## Installing `wget`

In [2]:
%%capture
!pip install wget

## Downloading the data.

In [3]:
import wget
url = 'https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip'
filename = wget.download(url,'data.zip') # the file will be saved as data.zip

## Unzipping the data into `PetImages` folder

In [4]:
%%capture
!unzip '/content/data.zip' -d '/content/';

# Preparing the dataset

## Importing Necessary Modules

In [5]:
import os
import cv2 as cv
import numpy as np
from tqdm import tqdm

## Preprocessing the Data

`REBUILD_DATA` is used as a flag in this instance. This is because, ideally, we would want to preprocess(resize, augment, etc.) once. Then we will set `REBUILD_DATA`to `False`. This is true in most cases as there is usually a lot of data and data cleaning and preprocessing takes a bunch of time. So to run it every time we do an experiment would be a waste of resources on an already resource intensive task. But in this case, I have decided to not save the data.

In [6]:
REBUILD_DATA = True

class DogsVSCats():
  IMG_SIZE = 50
  CATS = 'PetImages/Cat'
  DOGS = 'PetImages/Dog'
  labels = {CATS:0, DOGS:1}
  training_data = []
  cat_count = 0
  dog_count = 0

  def make_training_data(self):
    for label in self.labels:
      print('\n',label)
      for f in tqdm(os.listdir(label)):
        try:
          path = os.path.join(label,f)
          img = cv.imread(path, cv.IMREAD_GRAYSCALE)
          img = cv.resize(img,(self.IMG_SIZE,self.IMG_SIZE))
          self.training_data.append([np.array(img),np.eye(2)[self.labels[label]]])

          if label == self.CATS:
            self.cat_count += 1
          elif label == self.DOGS:
            self.dog_count += 1
        except Exception as e:
          #print(label, f, str(e))
          pass
    print('\n')  
    print('Cats:',self.cat_count)
    print('Dogs:',self.dog_count) 

Since `REBUILD_DATA` is `True`, the preprocessing will take place.

In [7]:
if REBUILD_DATA:
  dogvcat = DogsVSCats()
  dogvcat.make_training_data()
  training_data = dogvcat.training_data

  1%|          | 66/12501 [00:00<00:18, 659.24it/s]


 PetImages/Cat


100%|██████████| 12501/12501 [00:12<00:00, 1003.65it/s]
  1%|          | 102/12501 [00:00<00:12, 1016.79it/s]


 PetImages/Dog


100%|██████████| 12501/12501 [00:13<00:00, 942.84it/s]



Cats: 12476
Dogs: 12470





Inspecting the `training_data` by looking at the length of the list and a sample data.

In [8]:
len(training_data)

24946

In [9]:
training_data[0]

[array([[34, 35, 34, ..., 56, 57, 61],
        [35, 36, 36, ..., 69, 68, 66],
        [35, 37, 39, ..., 63, 76, 75],
        ...,
        [38, 34, 33, ..., 44, 52, 53],
        [33, 72, 48, ..., 47, 50, 48],
        [31, 34, 28, ..., 59, 53, 52]], dtype=uint8), array([1., 0.])]

### Shuffle the data so that bias inserted into the network is minimized.

In [10]:
np.random.shuffle(training_data)

# Neural Network with CPU

## Importing Libraries for NN

In [11]:
import torch
import torch.nn as nn
import torch.nn.functional as F

## Checking the NN shapes on a dummy network

### Printing the shape of the output of each layer before defining the architecture

In [12]:
inp_size = 50; 

m1 = nn.Conv2d(1,32,5)
m2 = nn.Conv2d(32,64,5)
m3 = nn.Conv2d(64,128,5)
m0 = nn.Flatten()

input = torch.randn(1, 1, inp_size, inp_size)
print('Input Shape:',input.shape)

output = F.relu(m1(input))
print('Output after 1st Conv:',output.shape)
output = F.max_pool2d(output,(2, 2))
print('Output after 1st MaxPool:',output.shape)

output = F.relu(m2(output))
print('Output after 2nd Conv:',output.shape)
output = F.max_pool2d(output,(2, 2))
print('Output after 2nd MaxPool:',output.shape)

output = F.relu(m3(output))
print('Output after 3rd Conv:',output.shape)
output = F.max_pool2d(output,(2, 2))
print('Output after 3rd MaxPool:',output.shape)

output = m0(output)
print('Flat:',output.shape)
#print(128 * calc_shape(inp_size,kernel,stride_size,padding))

Input Shape: torch.Size([1, 1, 50, 50])
Output after 1st Conv: torch.Size([1, 32, 46, 46])
Output after 1st MaxPool: torch.Size([1, 32, 23, 23])
Output after 2nd Conv: torch.Size([1, 64, 19, 19])
Output after 2nd MaxPool: torch.Size([1, 64, 9, 9])
Output after 3rd Conv: torch.Size([1, 128, 5, 5])
Output after 3rd MaxPool: torch.Size([1, 128, 2, 2])
Flat: torch.Size([1, 512])


### Using a function to check the final size of the flattened layer

In [13]:
import math
def calc_shape(inp,kernel,stride,dilation,padding):
  out = (inp + 2*padding - dilation*(kernel-1) -1)/stride + 1
  #out = round(out)
  out = math.floor(out)
  return out

input = torch.randn(1, 1, inp_size, inp_size)
print('Input Shape:',input.shape[-1])

temp = calc_shape(50,5,1,1,0)
print('After Conv1:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool1:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv2:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool2:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv3:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool3:',temp)

# The final convolution layer has 128 channels so the output is multiplied by 128
print('Flattened Size:',temp*temp*128 )

Input Shape: 50
After Conv1: 46
After MaxPool1: 23
After Conv2: 19
After MaxPool2: 9
After Conv3: 5
After MaxPool3: 2
Flattened Size: 512


### Explaining the shape of the flattened layer

The initial size of the data is of `(1,1,50,50)`. The numbers are explained below:
* the 1st `1` denotes the no of samples
* the 2nd `1` denotes the no of channels
* the `50` represents the dimension of the image, i.e. a `50x50` image.

With each convolution, the no of channels increases while the size of the data decreases. This is by design. # Include appropriate discussion \
\
The size of the data also decreases when pooling is used (in our case max pooling). The output after a convolution and pooling follow this general formula assuming that the input data is square:

\begin{equation}
Size_{Out} = \frac{Size_{In} + 2 * Padding - Dilation * (Kernel - 1) -1}{Stride} + 1 \tag{1}
\end{equation}

Here:
* $Size_{Out}$ is the size of the output matrix
* $Size_{In}$ is the size of the input matrix 
* $Padding$ is the padding size. It defaults to `0`
* $Dilation$ is a parameter that controls the stride of elements in the window. It defaults to `1`
* $Kernel$ is the kernel size.
* $Stride$ is the length of stride of the kernel. It defaults to the kernel size.

We ran our dummy model. Then we printed the outputs after each convolution and pooling. 
```
Input Shape: torch.Size([1, 1, 50, 50])
Output after 1st Conv: torch.Size([1, 32, 46, 46])
Output after 1st MaxPool: torch.Size([1, 32, 23, 23])
Output after 2nd Conv: torch.Size([1, 64, 19, 19])
Output after 2nd MaxPool: torch.Size([1, 64, 9, 9])
Output after 3rd Conv: torch.Size([1, 128, 5, 5])
Output after 3rd MaxPool: torch.Size([1, 128, 2, 2])
Flattened Layer: torch.Size([1, 512])
```
The size of the flattened layer is something that is not easily retrieved from code. Most tutorials enter the size to a fully connected layer(a usual next step after conv layers) without justifying the numbers. I will try to use the formula according to Equation (1) to get the same number that we recieved from simulating a dummy network. \
We will use the following function to calculate the size:
```python
def calc_shape(inp,kernel,stride,dilation,padding):
  out = (inp + 2*padding - dilation*(kernel-1) -1)/stride + 1
  out = math.floor(out)
  return out
```
The code is just the equation (1) in code form. The output is rounded down.
The function is used as below:

```python
input = torch.randn(1, 1, inp_size, inp_size)
print('Input Shape:',input.shape[-1])

temp = calc_shape(50,5,1,1,0)
print('After Conv1:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool1:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv2:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool2:',temp)

temp = calc_shape(temp,5,1,1,0)
print('After Conv3:',temp)
temp = calc_shape(temp,2,2,1,0)
print('After MaxPool3:',temp)

print('Flattened Size:',temp*temp*128 )
```
The size from the final layer of Max Pooling is squared and multiplied by `128`. That is the number of channels according to our dummy model. So when the data is flattened we get `temp*temp*128`.
And we get the output:
```
Input Shape: 50
After Conv1: 46
After MaxPool1: 23
After Conv2: 19
After MaxPool2: 9
After Conv3: 5
After MaxPool3: 2
Flattened Size: 512
```
We can see that the values calculated by us match that of the dummy simulation. This method can be used before hand if we want to know the size of the flattened layer.

## Defining the architecture

In [14]:
class Net(nn.Module):
  def __init__(self):
    super().__init__()

    # Layer definition
    self.conv1 = nn.Conv2d(1,32,5)
    self.conv2 = nn.Conv2d(32,64,5)
    self.conv3 = nn.Conv2d(64,128,5)
    # This code is used calculate the size of the flattened layer by having one
    #   pass forward
    x = torch.randn(50,50).view(-1,1,50,50)
    self._to_linear = None
    self.convs(x)

    self.flat = nn.Flatten()

    self.fc1 = nn.Linear(self._to_linear, 512) #flattening.
    self.fc2 = nn.Linear(512, 2) # 512 in, 2 out bc we're doing 2 classes (dog vs cat).

  def convs(self, x):
    # max pooling over 2x2
    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))
    #print(x[0].shape)

    if self._to_linear is None:
      self._to_linear = x[0].shape[0]*x[0].shape[1]*x[0].shape[2]
    return x

  def forward(self, x):
    x = self.convs(x)
    x = self.flat(x)
    x = F.relu(self.fc1(x))
    x = self.fc2(x) # bc this is our output layer. No activation here.
    return F.softmax(x, dim=1)

## Initializing the model and viewing the layers

In [15]:
net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  (conv3): Conv2d(64, 128, kernel_size=(5, 5), stride=(1, 1))
  (flat): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=512, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=2, bias=True)
)


## Optimizers and Loss Function

In [16]:
import torch.optim as optim

optimizer = optim.Adam(net.parameters(),lr = 1e-3)
loss_function = nn.MSELoss()

### Splitting the data into feature set (`X`) and label(`y`)

In [17]:
X = torch.Tensor([i[0] for i in training_data]).view(-1,50,50)
X = X/255.0

y = torch.Tensor([i[1] for i in training_data])

### Splitting the data into validation and train set using `val_pct` as the ratio

In [18]:
val_pct = 0.1
val_size = int(len(X) * val_pct)
print(val_size)

2494


Just a simple way of valid/train split. This is okay as the images are already shuffled.

In [19]:
train_X = X[:-val_size]
train_y = y[:-val_size]

test_X = X[-val_size:]
test_y = y[-val_size:]

In [20]:
print(len(train_y))
print(len(test_y))

22452
2494


## Training

Training for 1 epoch. Also training in batches as training all of the data at a time would be tough.

In [None]:
batch_size = 100
epochs = 3
for epoch in range(epochs):
  for i in tqdm(range(0,len(train_X),batch_size)):
    #print('\n',i,i+batch_size)
    batch_X = train_X[i:i+batch_size].view(-1,1,50,50)
    batch_y = train_y[i:i+batch_size]

    net.zero_grad() # or optimizer.zero_grad()
    outputs = net(batch_X)
    loss = loss_function(outputs,batch_y)
    loss.backward()
    optimizer.step()
  print(loss)


## Predict

In [22]:
correct = 0
total = 0
with torch.no_grad():
    for i in tqdm(range(len(test_X))):
        real_class = torch.argmax(test_y[i])
        net_out = net(test_X[i].view(-1, 1, 50, 50))[0]  # returns a list, 
        predicted_class = torch.argmax(net_out)

        if predicted_class == real_class:
            correct += 1
        total += 1
print("\nAccuracy: ", round(correct/total, 3))

100%|██████████| 2494/2494 [00:05<00:00, 497.70it/s]


Accuracy:  0.658





# Neural Network with GPU

In [23]:
torch.cuda.is_available()

True

In [24]:
print(torch.cuda.get_device_name(0))

Tesla T4


Lets define a new network creatively named `net_gpu`. We will initialize it like normal and then assign it to CUDA.

In [61]:
net_gpu = Net()
optimizer = optim.Adam(net.parameters(),lr = 1e-3)
loss_function = nn.MSELoss()
if torch.cuda.is_available():
  net_gpu = net_gpu.cuda()
  loss_function = loss_function.cuda()


In [28]:
net_gpu

Net(
  (conv1): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  (conv3): Conv2d(64, 128, kernel_size=(5, 5), stride=(1, 1))
  (flat): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=512, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=2, bias=True)
)

The train and test set have already been splitted

In [29]:
print('Shape of training feature set:',train_X.shape)
print('Shape of training label      :',train_y.shape)
print('Shape of testing feature set :',test_X.shape)
print('Shape of testing label       :',test_y.shape)

Shape of training feature set: torch.Size([22452, 50, 50])
Shape of training label      : torch.Size([22452, 2])
Shape of testing feature set : torch.Size([2494, 50, 50])
Shape of testing label       : torch.Size([2494, 2])


Defining a training function

In [66]:
def train_model(train_X,train_y,batch_size):
  for i in range(0,len(train_y),batch_size):
    net_gpu.train()
    #print('\n',i,i+batch_size)
    batch_X = train_X[i:i+batch_size].view(-1,1,50,50)
    batch_y = train_y[i:i+batch_size]

    if torch.cuda.is_available():
      batch_X = batch_X.cuda()
      batch_y = batch_y.cuda()

    net_gpu.zero_grad() # or optimizer.zero_grad()
    outputs = net_gpu(batch_X)
    loss = loss_function(outputs,batch_y)
    loss.backward()
    optimizer.step()
  print('\nLoss :',loss)
  #return loss


Training the model

In [67]:
batch_size = 100;
loss =0;
epochs = 5
for epoch in tqdm(range(epochs)):
  
  loss = train_model(train_X,train_y,batch_size)
  print('\nEpoch:',epoch)




  0%|          | 0/5 [00:00<?, ?it/s][A[A[A


 20%|██        | 1/5 [00:01<00:07,  1.98s/it][A[A[A


Loss : tensor(0.2505, device='cuda:0', grad_fn=<MseLossBackward>)

Epoch: 0





 40%|████      | 2/5 [00:03<00:05,  1.95s/it][A[A[A


Loss : tensor(0.2505, device='cuda:0', grad_fn=<MseLossBackward>)

Epoch: 1





 60%|██████    | 3/5 [00:05<00:03,  1.94s/it][A[A[A


Loss : tensor(0.2505, device='cuda:0', grad_fn=<MseLossBackward>)

Epoch: 2





 80%|████████  | 4/5 [00:07<00:01,  1.92s/it][A[A[A


Loss : tensor(0.2505, device='cuda:0', grad_fn=<MseLossBackward>)

Epoch: 3





100%|██████████| 5/5 [00:09<00:00,  1.91s/it]


Loss : tensor(0.2505, device='cuda:0', grad_fn=<MseLossBackward>)

Epoch: 4





In [43]:
def test(model):
  correct = 0
  total = 0
  with torch.no_grad():
      for i in tqdm(range(len(test_X))):
          real_class = torch.argmax(test_y[i]).cuda()
          new_test = test_X[i].view(-1, 1, 50, 50)
          net_out = model(new_test.cuda())[0]  # returns a list, 
          predicted_class = torch.argmax(net_out)

          if predicted_class == real_class:
              correct += 1
          total += 1
  print("\nAccuracy: ", round(correct/total, 3))

In [44]:
test(net_gpu)

100%|██████████| 2494/2494 [00:01<00:00, 1516.69it/s]


Accuracy:  0.513



