# Image classifier

Building a single layer NN in PyTorch.

Using my notes from the machine learning module last year: https://github.com/hannahjayneknight/machine-learning 

Also see: https://machinelearningmastery.com/building-an-image-classifier-with-a-single-layer-neural-network-in-pytorch/ 

To do/ thoughts:
- Try making my own dataset loader class?
- Reduce size of images to speed up
- Use CNN
- Further data augmentation: https://towardsdatascience.com/custom-dataset-in-pytorch-part-1-images-2df3152895 

In [1]:
import torch
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

NB: Images are 1920 x 1080, 2.9MB, 72 dpi, 32 bit

Good explanation of why we need to transform: https://www.kaggle.com/code/leifuer/intro-to-pytorch-loading-image-data

In [2]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

train = datasets.ImageFolder(root='C:/Users/hanna/Desktop/git/interiorcardamage/Data/train', transform=transform)
test  = datasets.ImageFolder(root='C:/Users/hanna/Desktop/git/interiorcardamage/Data/test', transform=transform)

trainset = torch.utils.data.DataLoader(train, batch_size=1, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=1, shuffle=True)

In [3]:
for X, y in testset:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([1, 3, 1080, 1920])
Shape of y: torch.Size([1]) torch.int64


ReLu functions and Adam's gradient descent

In [20]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(3*1080*1920, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10) # softmax is an activation function that we need to apply wx+b to

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.softmax(x, dim=1)

net = Net()

optimizer = optim.Adamax(net.parameters(), lr=0.001)

for epoch in range(1):
    for data in trainset:
        X, y = data
        net.zero_grad()
        output = net.forward(X.view(-1, 3*1080*1920))
        loss = F.nll_loss(output, y)
        loss.backward()
        optimizer.step()

correct =0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net.forward(X.view(-1, 3*1080*1920))
        for idx, i in enumerate(output):
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

Accuracy:  0.5


Image size: 1920 x 1080
Filter: [H, W] = [3, 3]
Padding: [1, 1]

Kernel size:
- https://towardsdatascience.com/deciding-optimal-filter-size-for-cnns-d6f7b56f9363
- https://medium.com/analytics-vidhya/how-to-choose-the-size-of-the-convolution-filter-or-kernel-size-for-cnn-86a55a1e2d15
- https://blog.paperspace.com/padding-in-convolutional-neural-networks/

In [10]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # applies 2d convolution
        # first 2 params are the dimensions of layer,
        # 3rd is dimension of convolutional kernel (5x5 - based on what works well)
        # last param = padding = 2 because...?

        # padding ensure that the shape of the output feature map is the same as the input image
        # we set it to 2 becuase filter (5) + padding (2) = image size (7) 
        self.conv1 = nn.Conv2d(3, 32, 5, padding=2)
        self.conv2 = nn.Conv2d(32, 64, 5, padding=2)

        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)

    def convs(self, x):
        # in each convolutional layer, ReLu is the activation function
        # max_pool2d() takes the max value of a patch of the image (patch = 2x2)
        # --> this reduces the dimension of the matrix and the number of features to learn
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))

        return x

    def forward(self, x):
        x = self.convs(x) # applies two convolutional layers
        #x = x.view(-1, 64*7*7) # unpack 64x7x7 tensor to be linear
        #print(x)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        return F.softmax(x, dim=1)


In [11]:
net = Net()

optimizer = optim.Adam(net.parameters(), lr=0.001)

for epoch in range(3):
    for data in trainset:
        X, y = data
        net.zero_grad()
        output = net.forward(X) # no need for X.view as this is in the function
        loss = F.nll_loss(output, y)
        loss.backward()
        optimizer.step()

    #print("loss: ", loss)

correct = 0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net.forward(X)
        for idx, i in enumerate(output):
            # argmax is finding the highest probability out of y (from SoftMax)
            # y is a label between 0 and 9
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (17280x480 and 3136x128)

To try:
- 3X3 kernel size since this is commonly used.
- Non-square kernel, stride or padding
- Understand what dilation is
