Pytorch reference, for training vision functions, on a simpler dataset MINST.

In [1]:
import torch
import torchvision

from torch import nn
from torch.utils.data import DataLoader
from tqdm.notebook import tqdm

Custom packages that need to be installed separately.
 - torchinfo, shows the structure of neural network
 - torchmetrics, implements certain metrics out of the box


In [None]:
!pip install torchinfo
!pip install torchmetrics

from torchinfo import summary
from torchmetrics import Accuracy

finish this tutorial:

https://medium.com/@deepeshdeepakdd2/lenet-5-implementation-on-mnist-in-pytorch-c6f2ee306e37


**Comparison between Torch and FastAi concepts**

PyTorch's Dataset and DataLoader are indeed similar  in concept to fastai's DataBlock and DataLoader,
but there are some key differences in how they are used and what they provide.

In essence, fastai's DataBlock and DataLoader can be seen as a more opinionated and user-friendly wrapper around PyTorch's Dataset and DataLoader, providing a streamlined way to prepare data for deep learning models.


**FastAI DataBlock vs PyTorch Dataset:**

**TODO write**

**FastAI DataLoader vs PyTorch DataLoader:**

In both libraries this class wraps a Dataset/Datablock and provides an iterable over the it. It handles batching, shuffling, and parallel loading of data.

FastAI DataLoader is built on top of PyTorch's DataLoader and adds more features, such as transformations applied on the GPU and progress bars.

PyTorch   .view():

In PyTorch, when you use .view() to reshape a tensor, the parameter -1 is a placeholder that tells PyTorch to automatically infer the size of that dimension.
.view(-1) means that tensor will be reshaped to single dimension tensor only one dimension can be infered, be -1:

```
flattened_image = image.view(-1)
```


.view() is generally memory-efficient because it avoids data copies,
but be mindful of memory contiguity for optimal performance.




In [3]:
def calculate_mean_and_std_formula(dataset):
  total_pixels_count = 0
  sum_pixels = 0

  # _ is the label in data
  for image, _ in dataset:
      flattened_image = image.view(-1)
      sum_pixels += flattened_image.sum()
      total_pixels_count += flattened_image.numel()
  mean = sum_pixels / total_pixels_count

  sum_sq_diff = 0
  for image, _ in dataset:
      sq_diff = (image.view(-1) - mean) ** 2
      sum_sq_diff += sq_diff.sum()

  variance = sum_sq_diff / total_pixels_count
  std_dev = torch.sqrt(variance)
  return {'mean': mean.item(), 'std_dev': std_dev.item()}

In PyTorch you can get a single Python number from a PyTorch tensor containing a single value by using the .item() method.

torch.stack:  concatenates a sequence of tensors along a new dimension.




In [4]:
def calculate_mean_and_std(dataset):
  imgs = torch.stack([img for img, _ in dataset], dim=0)
  return {'mean': imgs.mean().item(), 'std_dev': imgs.std().item()}

In [6]:
calculate_mean_and_std(torchvision.datasets.MNIST('/files/', train=True, download=True,
                             transform=torchvision.transforms.ToTensor()))

{'mean': 0.13066047430038452, 'std_dev': 0.30810782313346863}

In [7]:
calculate_mean_and_std_formula(torchvision.datasets.MNIST('/files/', train=True, download=True,
                             transform=torchvision.transforms.ToTensor()))

{'mean': 0.13066048920154572, 'std_dev': 0.308107852935791}

Training of the dataset:

In [8]:
train_dataset =  torchvision.datasets.MNIST('/files/', train=True, download=True,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.1307,), (0.3081,)),
                             ])),

In [9]:
train_loader = torch.utils.data.DataLoader(train_dataset)

**Description of PyTorch nn classes:**

**nn.Module**

is the base class for all neural network modules. It provides a framework for building and managing neural network layers and models.

You need to implement all the layers in costructor:

`def __init__(self):`

the forward method, that implements the forward pass in neural network.

`def forward(self, x):`



**nn.Sequential**

is a container module that stacks other modules (like layers such as convolutional layers, activation functions, pooling layers, etc.) in a specific sequential order.
It ensures that the input is passed through each module in the sequence, and the output of one module becomes the input for the next




LeNet-5 neural network:


TODO, add reference for lenet5 images, explain padding for 28 input, copy code from reference title


Max pooling is used instead of average pooling, because
MNIST digits dataset using CNN, max pooling is used because the background in these images is made black with white foreground.

[reference article](https://medium.com/@bdhuma/which-pooling-method-is-better-maxpooling-vs-minpooling-vs-average-pooling-95fb03f45a9
)

architecture:

https://medium.com/@siddheshb008/lenet-5-architecture-explained-3b559cb2d52b

https://www.geeksforgeeks.org/computer-vision/lenet-5-architecture/



https://medium.com/@benjybo7/7-pytorch-pool-methods-you-should-be-using-495eb00325d6


In [10]:
class LeNet5Variant(nn.Module):
    def __init__(self):
        super().__init__()
        self.feature = nn.Sequential(
            #1
            nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2),   # 28*28->32*32-->28*28
            nn.Tanh(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 14*14

            #2
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1),  # 10*10
            nn.Tanh(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 5*5

        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=16*5*5, out_features=120),
            nn.Tanh(),
            nn.Linear(in_features=120, out_features=84),
            nn.Tanh(),
            nn.Linear(in_features=84, out_features=10),
        )

    def forward(self, x):
        return self.classifier(self.feature(x))


Model summary output, with this you can also test if structure of network is correct. If operations don't match error will be thrown


In [11]:
model_test = LeNet5Variant()
summary(model=model_test, input_size=(1, 1, 28, 28), col_width=20,
                  col_names=['input_size', 'output_size', 'num_params', 'trainable'], row_settings=['var_names'], verbose=0)

Layer (type (var_name))                  Input Shape          Output Shape         Param #              Trainable
LeNet5Variant (LeNet5Variant)            [1, 1, 28, 28]       [1, 10]              --                   True
├─Sequential (feature)                   [1, 1, 28, 28]       [1, 16, 5, 5]        --                   True
│    └─Conv2d (0)                        [1, 1, 28, 28]       [1, 6, 28, 28]       156                  True
│    └─Tanh (1)                          [1, 6, 28, 28]       [1, 6, 28, 28]       --                   --
│    └─MaxPool2d (2)                     [1, 6, 28, 28]       [1, 6, 14, 14]       --                   --
│    └─Conv2d (3)                        [1, 6, 14, 14]       [1, 16, 10, 10]      2,416                True
│    └─Tanh (4)                          [1, 16, 10, 10]      [1, 16, 10, 10]      --                   --
│    └─MaxPool2d (5)                     [1, 16, 10, 10]      [1, 16, 5, 5]        --                   --
├─Sequential (classifi

Dataset MNIST doesn't come with validation part out of the box.
Here a split is performed where 10% is assigned to validation

In [12]:
TRAIN_RATIO =  0.9
BATCH_SIZE = 32
generator = torch.Generator().manual_seed(42)

In [13]:
train_val_dataset = torchvision.datasets.MNIST(root="/files/", train=True, download=True)
test_dataset = torchvision.datasets.MNIST(root="/files/", train=False, download=True)

train_size = int(TRAIN_RATIO * len(train_val_dataset))
val_size = len(train_val_dataset) - train_size

train_dataset, val_dataset = torch.utils.data.random_split(dataset=train_val_dataset, lengths=[train_size, val_size], generator=generator)
len(train_dataset), len(val_dataset), len(test_dataset)

(54000, 6000, 10000)

In [14]:
train_dataloader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_dataloader = DataLoader(dataset=val_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_dataloader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=True)

# This are determined by the batch size parameter
len(train_dataloader), len(val_dataloader), len(test_dataloader)

(1688, 188, 313)