## Convolutional neural networks
Let’s consider the case where the input data is color images. Convolutional neural networks were designed to process such data, so we’ll use one of the classic models for this.

We will be processing the standard CIFAR10 dataset, which includes photographs of ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. Each class contains 6000 images (5000 training and 1000 testing), with each image being a 32x32 pixel color image (with three RGB color channels).

Due to the nature of color images, we cannot use them in their raw form immediately: first, we need to normalize the images, converting them into images with color intensity in the range of 0 to 1. This is done using the standard Normalize() function from torchvision.transforms, which typically takes standard values (mean and standard deviation) for such normalization as parameters (0.5, 0.5, 0.5).

In other words, we need to perform a composition of transformations: first, convert the images to tensors, and then normalize them.

The composition is performed using the standard torchvision.transforms.Compose() function. We then pass the result as the transform parameter to the CIFAR10 constructor.


For minor technical reasons (the need for data alignment when transitioning from convolutional layers to linear layers), it is more convenient to represent it in the format of a classic model rather than just a composition of layers.

The first layer, Conv2d(3, 6, 5) with the ReLU activation function, creates a set of convolutional filters. The first parameter, 3, is the number of input channels for the images (three colors). The second parameter, 6, is the number of output channels, and the third parameter is the filter size (5x5). The output is 6 filters of size 3x5x5, and the model in total has (3 * 5 * 5 + 1) * 6 = 456 parameters. The output size of the layer will be 6 * 28 * 28, where 28 = ((32 - 5) + 1).

The MaxPool2d(2,2) method implements max pooling (for details on its arguments, see the link above). The kernel_size is the pooling window size, and stride is the pooling step. Thus, we reduce the output size of the layer by half: from 6 * 28 * 28 to 6 * 14 * 14.

Next, the Conv2d(6, 16, 5) function is applied again, where the six output channels of the previous function are used as inputs. Now we apply 16 filters (each size 6 * 5 * 5), and the output size of the layer will be 16 * 10 * 10, where 10 = (14 - 5) + 1. The total number of parameters at this level is (5 * 5 * 6 + 1) * 16 = 2416 parameters.

The next max pooling reduces this output by half, from 16 * 10 * 10 to 16 * 5 * 5.

Finally, three fully connected layers (Linear) are added. Note that before these layers, we need to modify the structure of the data being passed, as convolutional layers work with two-dimensional images, while linear layers work with vector sets. This transformation is done using x.view(-1, 16 * 5 * 5).

The first linear layer with 120 nodes receives 16 * 5 * 5 inputs, requiring (16 * 5 * 5 + 1) * 120 = 48120 parameters, and then the number of inputs and outputs is reduced through the following layers to our final 10 classes (the last level requires (84+1) * 10 = 850 parameters).

In [3]:
import torch
import torch.nn as nn
import torchvision
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.nn import Linear, Sigmoid

input_size = 3*32*32   # Image size in pixels * number of colors
num_classes = 10       # Number of recognized classes (10 types of images)
n_epochs = 2           # Number of epochs
batch_size = 4         # Mini-batch size of input data
lr = 0.001             # Learning rate


transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Download Training and testing CIFAR10 datasets
cifar_trainset = dsets.CIFAR10(root='./data', train=True, download=True, transform=transform)
cifar_testset = dsets.CIFAR10(root='./data', train=False, download=True, transform=transform)
print(len(cifar_trainset))
print(len(cifar_testset))

train_loader = torch.utils.data.DataLoader(dataset=cifar_trainset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=cifar_testset,
                                          batch_size=batch_size,
                                          shuffle=False)



device = 'cuda' if torch.cuda.is_available() else 'cpu'


# our beloved train_step func :) 
def make_train_step(model, loss_fn, optimizer):
    def train_step(x, y):
        model.train()
        yhat = model(x)
        loss = loss_fn(yhat, y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        return loss.item()
    return train_step

import torch.nn as nn
import torch.nn.functional as F


class CifarModel(nn.Module):
    def __init__(self):
        super(CifarModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

from torch import optim, nn

model = CifarModel()
model.to(device)

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

train_step = make_train_step(model, loss_fn, optimizer)

for epoch in range(n_epochs):
    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        loss = train_step(images, labels)

# print(model.state_dict())
print(loss)

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy: {} %'.format(100 * correct / total))

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(4):
    print('Accuracy for %5s : %2d %%' % (
        labels[i], 100 * class_correct[i] / class_total[i]))

Files already downloaded and verified
Files already downloaded and verified
50000
10000
0.9823124408721924
Accuracy: 56.02 %
Accuracy for tensor(3) : 63 %
Accuracy for tensor(5) : 69 %
Accuracy for tensor(1) : 53 %
Accuracy for tensor(7) : 26 %


---

Great! We should get something around 55%!

And  this is only for 2 epochs! The good news that simply increasing the number of epochs we can improve accuracy.
After 10 epochs our model will be 80% accurate.

But it is not cool, because it is too simple, right? Lets challenge ourself and try to find *another ways* to increase model accuraccy (No increasing enochs).

---

We are going to add *two additional convolutional layers*, and also increase the number of filters to 32, 64, 128, and 256, which should allow the model to capture more complex patterns. Besides that, we will add a Dropout layer after the first fully connected one to reduce the possibility of overfitting

In [5]:
import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import torchvision
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.nn import Linear, Sigmoid

input_size = 3*32*32   # Image size in pixels * number of colors
num_classes = 10       # Number of recognized classes (10 types of images)
n_epochs = 2           # Number of epochs
batch_size = 64        # Increased batch size
lr = 0.001             # Learning rate

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Download Training and testing CIFAR10 datasets
cifar_trainset = dsets.CIFAR10(root='./data', train=True, download=True, transform=transform)
cifar_testset = dsets.CIFAR10(root='./data', train=False, download=True, transform=transform)
print(len(cifar_trainset))
print(len(cifar_testset))

train_loader = torch.utils.data.DataLoader(dataset=cifar_trainset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=cifar_testset,
                                          batch_size=batch_size,
                                          shuffle=False)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# our beloved train_step func :) 
def make_train_step(model, loss_fn, optimizer):
    def train_step(x, y):
        model.train()
        yhat = model(x)
        loss = loss_fn(yhat, y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        return loss.item()
    return train_step

class ImprovedCifarModel(nn.Module):
    def __init__(self):
        super(ImprovedCifarModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv4 = nn.Conv2d(128, 256, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout = nn.Dropout(0.5)
        self.fc1 = nn.Linear(256 * 2 * 2, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = self.pool(F.relu(self.conv4(x)))
        x = x.view(-1, 256 * 2 * 2)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = ImprovedCifarModel()
model.to(device)

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

train_step = make_train_step(model, loss_fn, optimizer)

for epoch in range(n_epochs):
    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        loss = train_step(images, labels)
    print(f"Epoch [{epoch+1}/{n_epochs}], Loss: {loss:.4f}")

with torch.no_grad(): # checking on testing dataset
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy: {} %'.format(100 * correct / total))

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(len(labels)):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(4):
    print('Accuracy for %5s : %2d %%' % (
        labels[i], 100 * class_correct[i] / class_total[i]))

Files already downloaded and verified
Files already downloaded and verified
50000
10000
Epoch [1/2], Loss: 1.1228
Epoch [2/2], Loss: 1.0673
Accuracy: 63.77 %
Accuracy for tensor(7) : 87 %
Accuracy for tensor(5) : 76 %
Accuracy for tensor(8) : 48 %
Accuracy for tensor(0) : 31 %


We should achieve a total accuracy of around 61-66% now! Wow! I'm not sure what the spread is due to, but we can still increase the number of epochs.