<a href="https://colab.research.google.com/github/helenjoy/CV_Units/blob/main/feedforward.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Identifying hand-written digits(MNIST) using PyTorch**

We will use the famous MNIST Handwritten Digits Databases as our training dataset.It consists of 28px by 28px grayscale images of handwritten disgits(0 - 9), along with labels for each image indicating which digit it represents. MNIST stands for Modified National Institute of Standards and Technology.

In [None]:
import kagglehub
path = kagglehub.dataset_download("oddrationale/mnist-in-csv")

Using Colab cache for faster access to the 'mnist-in-csv' dataset.


In [None]:
## Imports
import torch
import torchvision ## Contains some utilities for working with the image data
from torchvision.datasets import MNIST
import matplotlib.pyplot as plt
#%matplotlib inline
import torchvision.transforms as transforms
from torch.utils.data import random_split
from torch.utils.data import DataLoader
import torch.nn.functional as F

In [None]:
dataset = MNIST(root = 'data/', download = True)
print(len(dataset))

100%|██████████| 9.91M/9.91M [00:00<00:00, 46.7MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.18MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 10.7MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 5.89MB/s]

60000





In [None]:
image, label = dataset[10]
plt.imshow(image, cmap = 'gray')
print('Label:', label)

In [None]:
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Transform images to tensors and normalize
transform = transforms.ToTensor()

# Load training dataset
train_dataset = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=transform
)

# Create DataLoader
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)


100%|██████████| 26.4M/26.4M [00:01<00:00, 14.2MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 277kB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 5.02MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 7.64MB/s]


These images are small in size, and recognizing the digits can sometimes be hard. PyTorch doesn't know how to work with images. We need to convert the images into tensors. We can do this by specifying a transform while creating our dataset.

PyTorch datasets allow us to specify one or more transformation function which are applied to the images as they are loaded.

 torchvision.transforms contains many such predefined functions and we will use ToTensor transform to convert images into Pytorch tensors.

In [None]:
## MNIST dataset(images and labels)
mnist_dataset = MNIST(root = 'data/', train = True, transform = transforms.ToTensor())
print(mnist_dataset)

Dataset MNIST
    Number of datapoints: 60000
    Root location: data/
    Split: Train
    StandardTransform
Transform: ToTensor()


In [None]:
image_tensor, label = mnist_dataset[0]
print(image_tensor.shape, label)

torch.Size([1, 28, 28]) 5


**The image is now convert to a 28 X 28 tensor.The first dimension is used to keep track of the color channels. Since images in the MNIST dataset are grayscale, there's just one channel. Other datasets have images with color, in that case the color channels would be 3(Red, Green, Blue).**

In [None]:
print(image_tensor[:,10:15,10:15])
print(torch.max(image_tensor), torch.min(image_tensor))

tensor([[[0.0039, 0.6039, 0.9922, 0.3529, 0.0000],
         [0.0000, 0.5451, 0.9922, 0.7451, 0.0078],
         [0.0000, 0.0431, 0.7451, 0.9922, 0.2745],
         [0.0000, 0.0000, 0.1373, 0.9451, 0.8824],
         [0.0000, 0.0000, 0.0000, 0.3176, 0.9412]]])
tensor(1.) tensor(0.)


**The values range from 0 to 1, with 0 representing black, 1 white and the values between different shades of grey. We can also plot the tensor as an image using lt.imshow**

In [None]:
train_data, validation_data = random_split(mnist_dataset, [50000, 10000])
## Print the length of train and validation datasets
print("length of Train Datasets: ", len(train_data))
print("length of Validation Datasets: ", len(validation_data))

length of Train Datasets:  50000
length of Validation Datasets:  10000


Here we will use DataLoaders to help us load the data in batches. We will use a batch size of 128. We will set shuffle = True for the training dataloader, so that the batches generated in each epoch are different, and this randomization helps in generalizing and speed up the process.

Since Validation dataloader is used only for evaluating the model, there is no need to shuffle the images.

In [None]:
import torch.nn as nn

input_size = 28 * 28
num_classes = 10

## Logistic regression model
model = nn.Linear(input_size, num_classes)
print(model.weight.shape)
print(model.weight)
print(model.bias.shape)
print(model.bias)

torch.Size([10, 784])
Parameter containing:
tensor([[-0.0015, -0.0204,  0.0230,  ..., -0.0296,  0.0121, -0.0225],
        [ 0.0020,  0.0211, -0.0060,  ...,  0.0161, -0.0141, -0.0162],
        [-0.0277, -0.0293, -0.0310,  ...,  0.0051, -0.0126,  0.0013],
        ...,
        [ 0.0201,  0.0112, -0.0169,  ...,  0.0306, -0.0232, -0.0105],
        [-0.0070, -0.0277, -0.0059,  ..., -0.0266,  0.0245, -0.0270],
        [ 0.0299,  0.0215,  0.0022,  ...,  0.0043,  0.0311, -0.0294]],
       requires_grad=True)
torch.Size([10])
Parameter containing:
tensor([ 0.0075, -0.0331,  0.0140,  0.0309, -0.0241, -0.0245, -0.0299,  0.0185,
         0.0278, -0.0005], requires_grad=True)


Logistic Regression model is identical to a linear regression model i.e, there are weights and bias matrices, and the output is obtained using simple matrix operations(pred = x@ w.t() + b).

We can use nn.Linear to create the model instead of defining and initializing the matrices manually.

Since nn.Linear expects the each training example to a vector, each 1 X 28 X 28 image tensor needs to be flattened out into a vector of size 784(28 X 28), before being passed into the model.

The output for each image is vector of size 10, with each element of the vector signifying the probability a particular target label(i.e 0 to 9). The predicted label for an image is simply the one with the highest probability.

In [None]:
class MnistModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(input_size, num_classes)

    def forward(self, xb):
        xb = xb.reshape(-1, 784)
        print(xb)
        out = self.linear(xb)
        print(out)
        return(out)

model = MnistModel()
print(model.linear.weight.shape, model.linear.bias.shape)
list(model.parameters())

torch.Size([10, 784]) torch.Size([10])


[Parameter containing:
 tensor([[-0.0258,  0.0344,  0.0134,  ..., -0.0244,  0.0040, -0.0191],
         [ 0.0231, -0.0308, -0.0225,  ...,  0.0157,  0.0063, -0.0032],
         [-0.0261,  0.0217,  0.0163,  ...,  0.0068, -0.0170, -0.0334],
         ...,
         [-0.0020,  0.0176, -0.0256,  ...,  0.0315,  0.0312, -0.0046],
         [-0.0207,  0.0212,  0.0039,  ...,  0.0238,  0.0223,  0.0074],
         [ 0.0140,  0.0151,  0.0255,  ..., -0.0247,  0.0210,  0.0050]],
        requires_grad=True),
 Parameter containing:
 tensor([ 0.0230,  0.0198,  0.0294, -0.0220, -0.0079, -0.0308,  0.0140,  0.0172,
          0.0244, -0.0102], requires_grad=True)]

Inside the init constructor method, we instantiate the weights and biases using nn.Linear. Inside the forward method, which is invoked when we pass a batch of inputs to the model, we flatten out the input tensor, and then pass it into self.linear.

xb.reshape(-1, 28 * 28) indicates to PyTorch that we want a view of the xb tensor with two dimensions, where the length along the 2nd dimension is 28 * 28(i.e 784). One argument to .reshape can be set to -1(in this case the first dimension), to let PyTorch figure it out automatically based on the shape of the original tensor.

Note that the model no longer has .weight and .bias attributes(as they are now inside the .linear attribute),but it does have a .parameters method which returns a list containg the weights and bias, and can be used by a PyTorch optimizer.


In [None]:
for images, labels in train_loader:
    outputs = model(images)
    break

print('outputs shape: ', outputs.shape)
print('Sample outputs: \n', outputs[:2].data)


tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0824, 0.0000, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]])
tensor([[-1.5296e-01,  3.8717e-02, -1.3359e-02, -1.7384e-01, -8.1495e-02,
          3.2846e-01, -1.8180e-01,  1.2243e-02,  1.0970e-01,  8.6983e-02],
        [-4.7125e-01,  6.0984e-01, -8.0295e-01, -2.2474e-01,  3.9914e-01,
         -5.0947e-03, -9.5121e-01, -2.9335e-01,  2.6338e-01, -1.7028e-01],
        [-2.7910e-01,  3.4566e-01, -5.7364e-01, -2.6045e-01,  3.8741e-01,
          8.9853e-02, -6.4902e-01, -1.1373e-01,  1.2724e-01,  2.1690e-03],
        [-5.8175e-01,  5.6425e-01, -1.0862e+00, -2.1219e-01,  3.3368e-01,
          2.4260e-01, -1.0067e+00, -4.0404e-01,  1.8727e-01, -2.8301e-01],
      

In [None]:
probs = F.softmax(outputs, dim = 1)

## chaecking at sample probabilities
print("Sample probabilities:\n", probs[:2].data)

print("\n")
## Add up the probabilities of an output row
print("Sum: ", torch.sum(probs[0]).item())
max_probs, preds = torch.max(probs, dim = 1)
print("\n")
print(preds)
print("\n")
print(max_probs)

Sample probabilities:
 tensor([[0.0851, 0.1030, 0.0978, 0.0833, 0.0914, 0.1377, 0.0826, 0.1003, 0.1106,
         0.1081],
        [0.0659, 0.1942, 0.0473, 0.0843, 0.1573, 0.1050, 0.0408, 0.0787, 0.1374,
         0.0890]])


Sum:  1.0


tensor([5, 1, 4, 1, 5, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 5, 1,
        1, 4, 7, 4, 1, 8, 1, 1, 5, 1, 1, 1, 5, 4, 8, 1, 1, 5, 1, 1, 4, 1, 5, 1,
        5, 1, 8, 1, 8, 4, 1, 1, 1, 4, 1, 1, 5, 4, 1, 5])


tensor([0.1377, 0.1942, 0.1531, 0.1926, 0.1360, 0.1625, 0.1146, 0.1303, 0.1580,
        0.1894, 0.1889, 0.1652, 0.1306, 0.1769, 0.1900, 0.1510, 0.1599, 0.1512,
        0.1389, 0.2430, 0.1269, 0.1304, 0.1504, 0.1725, 0.1349, 0.1705, 0.1281,
        0.1848, 0.1611, 0.1382, 0.1738, 0.1755, 0.1478, 0.1666, 0.1392, 0.2149,
        0.1569, 0.1331, 0.1574, 0.1834, 0.1964, 0.1113, 0.1829, 0.2230, 0.1554,
        0.1541, 0.1287, 0.1606, 0.1376, 0.1235, 0.1134, 0.1303, 0.1341, 0.1293,
        0.1363, 0.1912, 0.1838, 0.1591, 0.1359, 0.1982, 0.1312, 0

Evaluation Metric and Loss Function¶
Here we evaluate our model by finding the percentage of labels that were predicted correctly i.e. the accuracy of the predictions.

The == performas an element-wise comparision of two tensors with the same shape, and returns a tensor of the same shape,containing 0s for unequal elements, and 1s for equal elements. Passing the result to torch.sum returns the number of labels that were predicted correctly. Finally we divide by the total total number of images to get the accuracy.

In [None]:
def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim = 1)
    return(torch.tensor(torch.sum(preds == labels).item()/ len(preds)))

print("Accuracy: ",accuracy(outputs, labels))
print("\n")
loss_fn = F.cross_entropy
print("Loss Function: ",loss_fn)
print("\n")
## Loss for the current batch
loss = loss_fn(outputs, labels)
print(loss)

Accuracy:  tensor(0.0938)


Loss Function:  <function cross_entropy at 0x7f8e943293a0>


tensor(2.4179, grad_fn=<NllLossBackward0>)
