# 4. Multi-Input & Multi-Output Architectures

Build multi-input and multi-output models, demonstrating how they can handle tasks requiring more than one input or generating multiple outputs. You will explore how to design and train these models using PyTorch and delve into the crucial topic of loss weighting in multi-output models. This involves understanding how to balance the importance of different tasks when training a model to perform multiple tasks simultaneously.

In [39]:
from PIL import Image
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
import torch.nn as nn
import torch
import torch.optim as optim
from torchmetrics import Accuracy

In [37]:
import os

## Multi-input models

### Two-input dataset

Building a multi-input model starts with crafting a custom dataset that can supply all the inputs to the model. In this exercise, you will build the Omniglot dataset that serves triplets consisting of:
- The image of a character to be classified,
- The one-hot encoded alphabet vector of length 30, with zeros everywhere but for a single one denoting the ID of the alphabet the character comes from,
- The target label, an integer between 0 and 963.

You are provided with `train_samples`, a list of 3-tuples comprising an image's file path, its alphabet vector, and the target label. Also, the following imports have already been done for you, so let's get to it!

Instructions:
- Assign transform and samples to class attributes with the same names.
- Implement the `.__len()__` method such that it return the number of samples stored in the class' samples attribute.
- Unpack the sample at index `idx` assigning its contents to `img_path`, `alphabet`, and `label`.
- Transform the loaded image with `self.transform()` and assign it to `img_transformed`.

In [4]:
class OmniglotDataset(Dataset):
    def __init__(self, transform, samples):
        # Assign transform and samples to class attributes
        self.transform = transform
        self.samples = samples

    def __len__(self):
        # Return number of samples
        return len(self.samples)

    def __getitem__(self, idx):
        # Unpack the sample at index idx
        img_path, alphabet, label = self.samples[idx]
        img = Image.open(img_path).convert("L")
        # Transform the image
        img_transformed = self.transform(img)
        return img_transformed, alphabet, label

In [None]:
dataset_train = OmniglotDataset(
    transform=transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Resize((64, 64)),
        ]
    ),
    samples=samples,
)

dataloader_train = DataLoader(
    dataset_train,
    shuffle=True,
    batch_size=3,
)

### Two-input model

With the data ready, it's time to build the two-input model architecture! To do so, you will set up a model class with the following methods:

`.__init__()`, in which you will define sub-networks by grouping layers; this is where you define the two layers for processing the two inputs, and the classifier that returns a classification score for each class.

`forward()`, in which you will pass both inputs through corresponding pre-defined sub-networks, concatenate the outputs, and pass them to the classifier.

Instructions:

- Define image, alphabet and classifier sub-networks as sequential models, assigning them to `self.image_layer`, `self.alphabet_layer` and `self.classifier`, respectively.
- Pass the image and alphabet through the appropriate model layers.
- Concatenate the outputs from image and alphabet layers and assign the result to `x`.

In [6]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Define sub-networks as sequential models
        self.image_layer = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3, padding=1),
            nn.MaxPool2d(kernel_size=2),
            nn.ELU(),
            nn.Flatten(),
            nn.Linear(16 * 32 * 32, 128),
        )
        self.alphabet_layer = nn.Sequential(
            nn.Linear(30, 8),
            nn.ELU(),
        )
        self.classifier = nn.Sequential(
            nn.Linear(128 + 8, 964),
        )

    def forward(self, x_image, x_alphabet):
        # Pass the x_image and x_alphabet through appropriate layers
        x_image = self.image_layer(x_image)
        x_alphabet = self.alphabet_layer(x_alphabet)
        # Concatenate x_image and x_alphabet
        x = torch.cat((x_image, x_alphabet), dim=1)
        return self.classifier(x)

## Multi-output models

### Two-output Dataset and DataLoader

In this and the following exercises, you will build a two-output model to predict both the character and the alphabet it comes from based on the character's image. As always, you will start with getting the data ready.

The `OmniglotDataset` class you have created before is available for you to use along with updated `samples`. Let's use it to build the Dataset and the DataLoader.

In [34]:
root_folder = "datasets\\omniglot_train"

alphabets_folders = os.listdir(root_folder)
included_extensions = ["jpg", "jpeg", "bmp", "png"]

samples = []

alphabets_folders = [
    os.path.join(root_folder, name)
    for name in os.listdir(root_folder)
    if os.path.isdir(os.path.join(root_folder, name))
]

for alphabet_folder in alphabets_folders:
    characters_folders = [
        os.path.join(alphabet_folder, name)
        for name in os.listdir(alphabet_folder)
        if os.path.isdir(os.path.join(alphabet_folder, name))
    ]

    for character_folder in characters_folders:
        images_list = [
            os.path.join(character_folder, name)
            for name in os.listdir(character_folder)
            if os.path.isfile(os.path.join(character_folder, name))
        ]

        images_list = filter(lambda i:any([i.endswith(ext) for ext in included_extensions]), images_list)
        samples.extend(images_list)

# ('omniglot_train/Gujarati/character45/0462_20.png', 0, 1)
print(samples[0])
print(len(set(samples)))

datasets\omniglot_train\Alphabet_of_the_Magi\character01\0709_01.png
14400


Instructions:

- Print the element of `samples` at index `100` and examine its structure.
- Use your `OmniglotDataset` to create `dataset_train`, passing the two image transforms you have used before: parse the image to a tensor and resize it to size `(64, 64)`.
- Create `dataloader_train` from `dataset_train`; shuffle the training images and set batch size to `32`.

In [None]:
# Print the sample at index 100
print(samples[100])

# Create dataset_train
dataset_train = OmniglotDataset(
    transform=transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Resize((64, 64)),
        ]
    ),
    samples=samples,
)

# Create dataloader_train
dataloader_train = DataLoader(
    dataset_train,
    shuffle=True,
    batch_size=32,
)

### Two-output model architecture

In this exercise, you will construct a multi-output neural network architecture capable of predicting the character and the alphabet.

Recall the general structure: in the `.__init__()` method, you define layers to be used in the forward pass later. In the `forward()` method, you will first pass the input image through a couple of layers to obtain its embedding, which in turn is fed into two separate classifier layers, one for each output.

Instructions:

- Define `self.classifier_alpha` and `self.classifier_char` as linear layers with input shapes matching the output of `image_layer`, and output shapes corresponding to the number of alphabets (`30`) and the number of characters (`964`), respectively.
- Pass the image embedding `x_image` separately through each of the classifiers, assigning the results to `output_alpha` and `output_char`, respectively, and return them in this order.

In [35]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.image_layer = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3, padding=1),
            nn.MaxPool2d(kernel_size=2),
            nn.ELU(),
            nn.Flatten(),
            nn.Linear(16 * 32 * 32, 128),
        )
        # Define the two classifier layers
        self.classifier_alpha = nn.Linear(128, 30)
        self.classifier_char = nn.Linear(128, 964)

    def forward(self, x):
        x_image = self.image_layer(x)
        # Pass x_image through the classifiers and return both results
        output_alpha = self.classifier_alpha(x_image)
        output_char = self.classifier_char(x_image)
        return output_alpha, output_char

### Training multi-output models

When training models with multiple outputs, it is crucial to ensure that the loss function is defined correctly.

In this case, the model produces two outputs: predictions for the alphabet and the character. For each of these, there are corresponding ground truth labels, which will allow you to calculate two separate losses: one incurred from incorrect alphabet classifications, and the other from incorrect character classification. Since in both cases you are dealing with a multi-label classification task, the Cross-Entropy loss can be applied each time.

Gradient descent can optimize only one loss function, however. You will thus define the total loss as the sum of alphabet and character losses.

Instructions:

- Calculate the alphabet classification loss and assign it to `loss_alpha`.
- Calculate the character classification loss and assign it to `loss_char`.
- Compute the total loss as the sum of the two partial losses and assign it to `loss`.

In [None]:
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.05)

for epoch in range(1):
    for images, labels_alpha, labels_char in dataloader_train:
        optimizer.zero_grad()
        outputs_alpha, outputs_char = net(images)
        # Compute alphabet classification loss
        loss_alpha = criterion(outputs_alpha, labels_alpha)
        # Compute character classification loss
        loss_char = criterion(outputs_char, labels_char)
        # Compute total loss
        loss = loss_alpha + loss_char
        # loss = ((1 - char_weight) * loss_alpha) + (char_weight * loss_char)
        loss.backward()
        optimizer.step()

## Evaluation of multi-output models and loss weighting

### Multi-output model evaluation

In this exercise, you will practice model evaluation for multi-output models. Your task is to write a function called `evaluate_model()` that takes an alphabet-and-character-predicting model as input, runs the evaluation loop, and prints the model's accuracy in the two tasks.

Instructions:

- Define `acc_alpha` and `acc_char` as multi-class `Accuracy()` metrics for the two outputs, alphabets and characters, with the appropriate number of classes each (there are `30` alphabets and `964` characters in the dataset).
- Define the evaluation loop by iterating over test `images`, `labels_alpha`, and `labels_char`.
- Inside the for-loop, obtain model results for the test data batch and assign them to `outputs_alpha`, `outputs_char`.
- Update the two accuracy metrics with the current batch's data.

In [None]:
dataloader_test

In [None]:
def evaluate_model(model):
    # Define accuracy metrics
    acc_alpha = Accuracy(task="multiclass", num_classes=30)
    acc_char = Accuracy(task="multiclass", num_classes=964)

    model.eval()
    with torch.no_grad():
        for images, labels_alpha, labels_char in dataloader_test:
            # Obtain model outputs
            outputs_alpha, outputs_char = model(images)
            _, pred_alpha = torch.max(outputs_alpha, 1)
            _, pred_char = torch.max(outputs_char, 1)
            # Update both accuracy metrics
            acc_alpha(pred_alpha, labels_alpha)
            acc_char(pred_char, labels_char)

    print(f"Alphabet: {acc_alpha.compute()}")
    print(f"Character: {acc_char.compute()}")