<a href="https://colab.research.google.com/github/jacobrdavis/CSE546_image_classification_on_cifar_10/blob/main/CSE546_image_classification_on_cifar_10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Image Classification on CIFAR-10
In this problem we will explore different deep learning architectures for image classification on the CIFAR-10 dataset. Make sure that you are familiar with torch `Tensor`s, two-dimensional convolutions (`nn.Conv2d`) and fully-connected layers (`nn.Linear`), ReLU non-linearities (`F.relu`), pooling (`nn.MaxPool2d`), and tensor reshaping (`view`).

We will use Colab because it has free GPU runtimes available; GPUs can accelerate training times for this problem by 10-100x. **You will need to enable the GPU runtime to use it**. To do so, click "Runtime" above and then "Change runtime type". There under hardware accelerator choose "GPU".

This notebook provides some starter code for the CIFAR-10 problem on HW4, including a completed training loop to assist with some of the Pytorch setup. You'll need to modify this code to implement the layers required for the assignment, but this provides a working training loop to start from.

*Note: GPU runtimes are limited on Colab. Limit your training to short-running jobs (around 20mins or less) and spread training out over time, if possible. Colab WILL limit your usage of GPU time, so plan ahead and be prepared to take breaks during training.* We also suggest performing your early coding/sweeps on a small fraction of the dataset (~10%) to minimize training time and GPU usage.

In [None]:
import torch
from torch import nn
from torch.distributions import uniform
import numpy as np

from typing import Tuple, Union, List, Callable
from torch.optim import SGD
import torchvision
from torch.utils.data import DataLoader, TensorDataset, random_split
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm

Let's verify that we are using a gpu:

In [None]:
# assert torch.cuda.is_available(), "GPU is not available, check the directions above (or disable this assertion to use CPU)"

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(DEVICE)  # this should print out CUDA

To use the GPU you will need to send both the model and data to a device; this transfers the model from its default location on CPU to the GPU.

Note that torch operations on Tensors will fail if they are not located on the same device.

```python
model = model.to(DEVICE)  # Sending a model to GPU

for x, y in tqdm(data_loader):
  x, y = x.to(DEVICE), y.to(DEVICE)
```
When reading tensors you may need to send them back to cpu, you can do so with `x = x.cpu()`.

Let's load CIFAR-10 data. This is how we load datasets using PyTorch in the real world!

In [None]:
train_dataset = torchvision.datasets.CIFAR10("./data", train=True, download=True, transform=torchvision.transforms.ToTensor())
test_dataset = torchvision.datasets.CIFAR10("./data", train=False, download=True, transform=torchvision.transforms.ToTensor())

Here, we'll use the torch `DataLoader` to wrap our datasets. `DataLoader`s handle batching, shuffling, and iterating over data; they can also be useful for building more complex input pipelines that perform transfoermations such as data augmentation.

## For Reference: Logistic Regression

This problem is about deep learning architectures, not pytorch. We are providing an implementation of logistic regression using SGD in torch, which can serve as a template for the rest of your implementation in this problem.

Before we get started, let's take a look at our data to get an understanding of what we are doing. CIFAR-10 is a dataset containing images split into 10 classes.

In [None]:
# imgs, labels = next(iter(train_loader))
# print(f"A single batch of images has shape: {imgs.size()}")
# example_image, example_label = imgs[0], labels[0]
# c, w, h = example_image.size()
# print(f"A single RGB image has {c} channels, width {w}, and height {h}.")

# # This is one way to flatten our images
# batch_flat_view = imgs.view(-1, c * w * h)
# print(f"Size of a batch of images flattened with view: {batch_flat_view.size()}")

# # This is another equivalent way
# batch_flat_flatten = imgs.flatten(1)
# print(f"Size of a batch of images flattened with flatten: {batch_flat_flatten.size()}")

# # The new dimension is just the product of the ones we flattened
# d = example_image.flatten().size()[0]
# print(c * w * h == d)

# # View the image
# t =  torchvision.transforms.ToPILImage()
# plt.imshow(t(example_image))

# # These are what the class labels in CIFAR-10 represent. For more information,
# # visit https://www.cs.toronto.edu/~kriz/cifar.html
# classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog",
#            "horse", "ship", "truck"]
# print(f"This image is labeled as class {classes[example_label]}")


In this problem, we will attempt to predict what class an image is labeled as.

First, let's create our model. For a linear model we could flatten the data before passing it into the model, but that is not be the case for the convolutional neural network.

Let's define a method to train this model using SGD as our optimizer.

In [None]:
# def train(
#     model: nn.Module, optimizer: SGD,
#     train_loader: DataLoader, val_loader: DataLoader,
#     epochs: int = 20
# )-> Tuple[List[float], List[float], List[float], List[float]]:
#     """
#     Trains a model for the specified number of epochs using the loaders.

#     Returns: 
#     Lists of training loss, training accuracy, validation loss, validation accuracy for each epoch.
#     """

#     loss = nn.CrossEntropyLoss()
#     train_losses = []
#     train_accuracies = []
#     val_losses = []
#     val_accuracies = []
#     for e in tqdm(range(epochs)):
#         model.train()
#         train_loss = 0.0
#         train_acc = 0.0

#         # Main training loop; iterate over train_loader. The loop
#         # terminates when the train loader finishes iterating, which is one epoch.
#         for (x_batch, labels) in train_loader:
#             x_batch, labels = x_batch.to(DEVICE), labels.to(DEVICE)
#             optimizer.zero_grad()
#             labels_pred = model(x_batch)
#             batch_loss = loss(labels_pred, labels)
#             train_loss = train_loss + batch_loss.item()

#             labels_pred_max = torch.argmax(labels_pred, 1)
#             batch_acc = torch.sum(labels_pred_max == labels)
#             train_acc = train_acc + batch_acc.item()

#             batch_loss.backward()
#             optimizer.step()
#         train_losses.append(train_loss / len(train_loader))
#         train_accuracies.append(train_acc / (batch_size * len(train_loader)))

#         # Validation loop; use .no_grad() context manager to save memory.
#         model.eval()
#         val_loss = 0.0
#         val_acc = 0.0

#         with torch.no_grad():
#             for (v_batch, labels) in val_loader:
#                 v_batch, labels = v_batch.to(DEVICE), labels.to(DEVICE)
#                 labels_pred = model(v_batch)
#                 v_batch_loss = loss(labels_pred, labels)
#                 val_loss = val_loss + v_batch_loss.item()

#                 v_pred_max = torch.argmax(labels_pred, 1)
#                 batch_acc = torch.sum(v_pred_max == labels)
#                 val_acc = val_acc + batch_acc.item()
#             val_losses.append(val_loss / len(val_loader))
#             val_accuracies.append(val_acc / (batch_size * len(val_loader)))

#     return train_losses, train_accuracies, val_losses, val_accuracies


For this problem, we will be using SGD. The two hyperparameters for our linear model trained with SGD are the learning rate and momentum. Only learning rate will be searched for in this example.

Note: We ask you to plot the accuracies for the best 5 models for each structure, so you will need to return multiple sets of hyperparameters for the homework, or, if you do random search, run your hyperparameter search multiple times.

In [None]:
# def parameter_search(
#     train_loader: DataLoader, 
#     val_loader: DataLoader, 
#     model_fn:Callable[[], nn.Module]
# ) -> float:
#     """
#     Parameter search for our linear model using SGD.

#     Args:
#     train_loader: the train dataloader.
#     val_loader: the validation dataloader.
#     model_fn: a function that, when called, returns a torch.nn.Module.

#     Returns:
#     The learning rate with the least validation loss.
#     NOTE: you may need to modify this function to search over and return
#      other parameters beyond learning rate.
#     """
#     num_iter = 10  # This will likely not be enough for the rest of the problem.
#     best_loss = torch.tensor(np.inf)
#     best_lr = 0.0

#     lrs = torch.linspace(10 ** (-6), 10 ** (-1), num_iter)

#     for lr in lrs:
#         print(f"trying learning rate {lr}")
#         model = model_fn()
#         optim = SGD(model.parameters(), lr)
        
#         train_loss, train_acc, val_loss, val_acc = train(
#             model,
#             optim,
#             train_loader,
#             val_loader,
#             epochs=20
#             )

#         if min(val_loss) < best_loss:
#             best_loss = min(val_loss)
#             best_lr = lr
        
#     return best_lr

Now that we have everything, we can train and evaluate our model.

In [None]:
# model = linear_model()
# optimizer = SGD(model.parameters(), best_lr)

In [None]:
# # We are only using 20 epochs for this example. You may have to use more.
# train_loss, train_accuracy, val_loss, val_accuracy = train(
#     model, optimizer, train_loader, val_loader, 20
# )

Plot the training and validation accuracy for each epoch.

In [None]:
# epochs = range(1, 21)
# plt.plot(epochs, train_accuracy, label="Train Accuracy")
# plt.plot(epochs, val_accuracy, label="Validation Accuracy")
# plt.xlabel("Epoch")
# plt.ylabel("Accuracy")
# plt.legend()
# plt.title("Logistic Regression Accuracy for CIFAR-10 vs Epoch")
# plt.show()

The last thing we have to do is evaluate our model on the testing data.

In [None]:
# def evaluate(
#     model: nn.Module, loader: DataLoader
# ) -> Tuple[float, float]:
#     """Computes test loss and accuracy of model on loader."""
#     loss = nn.CrossEntropyLoss()
#     model.eval()
#     test_loss = 0.0
#     test_acc = 0.0
#     with torch.no_grad():
#         for (batch, labels) in loader:
#             batch, labels = batch.to(DEVICE), labels.to(DEVICE)
#             y_batch_pred = model(batch)
#             batch_loss = loss(y_batch_pred, labels)
#             test_loss = test_loss + batch_loss.item()

#             pred_max = torch.argmax(y_batch_pred, 1)
#             batch_acc = torch.sum(pred_max == labels)
#             test_acc = test_acc + batch_acc.item()
#         test_loss = test_loss / len(loader)
#         test_acc = test_acc / (batch_size * len(loader))
#         return test_loss, test_acc

In [None]:
# test_loss, test_acc = evaluate(model, test_loader)
# print(f"Test Accuracy: {test_acc}")

The rest is yours to code. You can structure the code any way you would like.

We do advise making using code cells and functions (train, search, predict etc.) for each subproblem, since they will make your code easier to debug. 

Also note that several of the functions above can be reused for the various different models you will implement for this problem; i.e., you won't need to write a new `evaluate()`.

## CIFAR-10 neural networks

In [None]:
batch_size = 128

subset = list(range(0, 10000))
train_subset = torch.utils.data.Subset(train_dataset, indices=subset)

train_data, val_data = random_split(train_subset, [int(0.9 * len(train_subset)), int( 0.1 * len(train_subset))])

# Create separate dataloaders for the train, test, and validation set
train_loader = DataLoader(
    train_data,
    batch_size=batch_size,
    shuffle=True
)

val_loader = DataLoader(
    val_data,
    batch_size=batch_size,
    shuffle=True
)

test_loader = DataLoader(
    test_dataset,
    batch_size=batch_size,
    shuffle=True
)

### Fully-connected output, 1 fully-connected hidden layer:

$x^{out} = W_2 \mathrm{relu} (W_1 (x^{in}) + b_1 ) + b_2$

In [None]:
def fully_connected_neural_network(dim_in, m, dim_out) -> nn.Module:
    """Fully-connected output, 1 fully-connected hidden layer."""
    model =  nn.Sequential(
            nn.Flatten(),
            nn.Linear(dim_in, m),  # [in, out]
            nn.ReLU(),
            nn.Linear(m, dim_out),
            # nn.Linear(m, dim_in),  # [in, out]
            # nn.ReLU(),
            # nn.Linear(dim_out, m),
         )
    return model.to(DEVICE)

#### Parameter search over the fully-connected output, 1 fully-connected hidden layer:

In [None]:
def train(
    model: nn.Module, optimizer: SGD,
    train_loader: DataLoader, val_loader: DataLoader,
    epochs: int = 20
)-> Tuple[List[float], List[float], List[float], List[float]]:
    """
    Trains a model for the specified number of epochs using the loaders.

    Returns: 
    Lists of training loss, training accuracy, validation loss, validation accuracy for each epoch.
    """

    loss = nn.CrossEntropyLoss()
    train_losses = []
    train_accuracies = []
    val_losses = []
    val_accuracies = []
    for e in tqdm(range(epochs)):
        model.train()
        train_loss = 0.0
        train_acc = 0.0

        # Main training loop; iterate over train_loader. The loop
        # terminates when the train loader finishes iterating, which is one epoch.
        for (x_batch, labels) in train_loader:
            # print(x_batch.shape)
            # for p in model.parameters():
            #     print(p)

            x_batch, labels = x_batch.to(DEVICE), labels.to(DEVICE)
            optimizer.zero_grad()
            labels_pred = model(x_batch)
            batch_loss = loss(labels_pred, labels)
            train_loss = train_loss + batch_loss.item()

            labels_pred_max = torch.argmax(labels_pred, 1)
            batch_acc = torch.sum(labels_pred_max == labels)
            train_acc = train_acc + batch_acc.item()

            batch_loss.backward()
            optimizer.step()
        train_losses.append(train_loss / len(train_loader))
        train_accuracies.append(train_acc / (batch_size * len(train_loader)))

        # Validation loop; use .no_grad() context manager to save memory.
        model.eval()
        val_loss = 0.0
        val_acc = 0.0

        with torch.no_grad():
            for (v_batch, labels) in val_loader:
                v_batch, labels = v_batch.to(DEVICE), labels.to(DEVICE)
                labels_pred = model(v_batch)
                v_batch_loss = loss(labels_pred, labels)
                val_loss = val_loss + v_batch_loss.item()

                v_pred_max = torch.argmax(labels_pred, 1)
                batch_acc = torch.sum(v_pred_max == labels)
                val_acc = val_acc + batch_acc.item()
            val_losses.append(val_loss / len(val_loader))
            val_accuracies.append(val_acc / (batch_size * len(val_loader)))

    return train_losses, train_accuracies, val_losses, val_accuracies


In [None]:
momentum_factors = torch.linspace(0.05, 1.25, 13)
momentum_factors

In [None]:
def parameter_search_fully_connected_nn(
    train_loader: DataLoader,
    val_loader: DataLoader,
    model_fn:Callable[[], nn.Module]
) -> float:
    """
    #TODO: Parameter search for our linear model using SGD.

    Args:
    train_loader: the train dataloader.
    val_loader: the validation dataloader.
    model_fn: a function that, when called, returns a torch.nn.Module.

    Returns:
    TODO: The learning rate with the least validation loss.
    """
    dim_in = 3072
    dim_out = 10
    learning_rates = torch.logspace(-6, 0, 15)
    # hidden_layer_sizes = torch.logspace(2, 9, steps=30, base=2)
    # hidden_layer_sizes = torch.logspace(2, 8, steps=7, base=2)
    hidden_layer_sizes = torch.linspace(100, 600, 11) #  M
    momentum_factors = torch.linspace(0.05, 1.25, 13)
    n_epochs = 10 #TODO: increase

    num_searches = 20  #TODO: This will likely not be enough for the rest of the problem.
    best_loss = torch.tensor(np.inf)


    # best_results = {
    #     'learning_rate': 0.0,
    #     'momentum_factor': 0.0,
    #     'hidden_layer_size': 0.0,
    #     'train_loss': np.inf,
    #     'train_acc': np.inf,
    #     'val_loss': np.inf,
    #     'val_acc': np.inf,
    # }

    results = {
        'learning_rate': [],
        'momentum_factor': [],
        'hidden_layer_size': [],
        'train_loss': [],
        'train_acc': [],
        'val_loss': [],
        'val_acc': [],
    }

    for i in range(num_searches):
        learning_rate_sampler = torch.randint(low=0, high=len(learning_rates), size=(1,))
        hidden_layer_sampler = torch.randint(low=0, high=len(hidden_layer_sizes), size=(1,))
        momentum_factor_sampler = torch.randint(low=0, high=len(momentum_factors), size=(1,))

        lr = learning_rates[learning_rate_sampler].item()
        momentum = momentum_factors[momentum_factor_sampler].item()
        m = int(hidden_layer_sizes[hidden_layer_sampler].item())

        print(f"lr: {lr}; momentum: {momentum}; m: {m}")

        model = model_fn(dim_in, m, dim_out)
        optim = SGD(model.parameters(), lr=lr, momentum=momentum)

        train_loss, train_acc, val_loss, val_acc = train(
            model,
            optim,
            train_loader,
            val_loader,
            epochs=n_epochs
        )

        #TODO: append all to a list?
        results['learning_rate'].append(lr)
        results['momentum_factor'].append(momentum)
        results['hidden_layer_size'].append(m)
        results['train_loss'].append(train_loss)
        results['train_acc'].append(train_acc)
        results['val_loss'].append(val_loss)
        results['val_acc'].append(val_acc)

        # if min(val_loss) < best_loss:
        #     best_results['learning_rate'] = lr
        #     best_results['momentum_factor'] = momentum
        #     best_results['hidden_layer_size'] = m
        #     best_results['train_loss'] = min(train_loss)
        #     best_results['train_acc'] = min(train_acc)
        #     best_results['val_loss'] = min(val_loss)
        #     best_results['val_acc'] = min(val_acc)
        #     print(best_results)

    return results #best_results

In [None]:
parameters = parameter_search_fully_connected_nn(train_loader,
                                                 val_loader,
                                                 fully_connected_neural_network)

for key, item in parameters.items():
    parameters[key] = np.array(item)

In [None]:
# val_acc_sorted = np.take_along_axis(parameters['val_acc'], sort_by_val_acc, axis=1)
# train_acc_sorted = np.take_along_axis(parameters['train_acc'], sort_by_val_acc, axis=1)
# train_acc_sorted

In [None]:
best_search_index = np.argmax(np.max(parameters['val_acc'], axis=1))
# best_parameters = {p: parameters[p][best_search_index] for p in parameters.keys()}
# print(best_search_index)
best_3_search_indices = np.argpartition(np.max(parameters['val_acc'][:,-5:], axis=1), -3)[-3:]
best_3_search_indices = best_3_search_indices[np.argsort(np.max(parameters['val_acc'], axis=1)[best_3_search_indices])]
best_3_search_indices

In [None]:
epochs = range(0, len(parameters['val_acc'][0]))
num_searches = len(parameters['learning_rate'])

# sort_by_val_acc = np.argsort(parameters['val_acc'], axis=1)
# val_acc_sorted = np.take_along_axis(parameters['val_acc'], sort_by_val_acc, axis=1)
# train_acc_sorted = np.take_along_axis(parameters['train_acc'], sort_by_val_acc, axis=1)

# val_acc_sorted = parameters['val_acc']
# train_acc_sorted = parameters['train_acc']
# val_acc_sorted = parameters['val_acc'][range(0,10),sort_by_val_acc]
# train_acc_sorted = np.sort(parameters['train_acc'], axis=1)

fig, ax = plt.subplots(figsize=(8,8))
# for val_acc, train_acc in zip(val_acc_sorted, train_acc_sorted):
# for i in range(num_searches):
for i in best_3_search_indices:
    ax.plot(epochs, parameters['val_acc'][i], label='validation')
    # ax.plot(epochs, train_acc, label='train')
    ax.set_ylabel('validation accuracy')
    ax.set_xlabel('epoch')
    # ax.legend(frameon=False)
    ax.set_ylim([0, 0.6])
    line_label = (f"lr={np.round(parameters['learning_rate'][i], 8)}; "
                  f"mom={np.round(parameters['momentum_factor'][i], 5)}; "
                  f"m={parameters['hidden_layer_size'][i]}")
    ax.annotate(line_label, (epochs[-1], parameters['val_acc'][i][-1]))

# fig, ax = plt.subplots(figsize=(6,3))
# ax.plot(epochs, val_loss, label='validation')
# ax.plot(epochs, train_loss, label='train')
# ax.set_ylabel('loss')
# ax.set_xlabel('epoch')
# ax.legend(frameon=False)
# ax.set_ylim([1, 3])

In [None]:
search_index = best_3_search_indices[0]

print(search_index)

lr = parameters['learning_rate'][search_index]
momentum = parameters['momentum_factor'][search_index]
m = parameters['hidden_layer_size'][search_index]

print(f"lr={np.round(lr, 5)}; "
      f"mom={np.round(momentum, 5)}; "
      f"m={m}")

print(np.max(parameters['val_acc'], axis=1)[search_index])

#### Retrain the best-performing model

In [None]:
dim_in = 3072
dim_out = 10

lr = parameters['learning_rate'][search_index]
momentum = parameters['momentum_factor'][search_index]
m = parameters['hidden_layer_size'][search_index]

# learning_rate = 0.001
# momentum = 0.6
# m = 16


print(f"lr={np.round(lr, 5)}; "
      f"mom={np.round(momentum, 5)}; "
      f"m={m}")

n_epochs = 100 #TODO: increase

model = fully_connected_neural_network(dim_in, m, dim_out)
optim = SGD(model.parameters(), lr=lr, momentum=momentum)

train_loss, train_acc, val_loss, val_acc = train(
    model,
    optim,
    train_loader,
    val_loader,
    epochs=n_epochs
)

print(f'train_loss: {train_loss}')
print(f'train_acc: {train_acc}')
print(f'val_loss: {val_loss}')
print(f'val_acc: {val_acc}')

In [None]:
epochs = range(0, n_epochs)
fig, ax = plt.subplots(figsize=(6,3))
ax.plot(epochs, val_acc, label='validation')
ax.plot(epochs, train_acc, label='train')
ax.set_ylabel('accuracy')
ax.set_xlabel('epoch')
ax.legend(frameon=False)
ax.set_ylim([0, 0.6])

fig, ax = plt.subplots(figsize=(6,3))
ax.plot(epochs, val_loss, label='validation')
ax.plot(epochs, train_loss, label='train')
ax.set_ylabel('loss')
ax.set_xlabel('epoch')
ax.legend(frameon=False)
ax.set_ylim([1, 3])

### Convolutional layer with max-pool and fully-connected output:

$x^{out} = W_2 (\mathrm{MaxPool} (\mathrm{relu} ( \mathrm{Conv2d} (x^{in}, W_1) + b_1 ))) + b_2$

Where,

$\mathrm{Conv2d} (x^{in}, W_1) \in \R^{(33-k) \times (33-k) \times M}$

$\mathrm{MaxPool} (\mathrm{relu} ( \mathrm{Conv2d} (x^{in}, W_1) + b_1 )) \in \R^{(\frac{33-k}{N}) \times (\frac{33-k}{N}) \times M}$

$W_2 \in \R^{10 \times M(\frac{33-k}{N})^2};\; b_2 \in \R^{10}$

such that $M$, $k$, $N$ are model-specific hyperparameters.

In [None]:
dim_in = 3
m = 150
k = 5
n = 14

int(m * ((33 - k)/n)**2)

In [None]:
def convolutional_neural_network(dim_in, m, k, n, dim_out) -> nn.Module:
    """Fully-connected output, 1 fully-connected hidden layer."""
    fc_input_size = int(m * ((33 - k)/n)**2)

    print(fc_input_size)
    model =  nn.Sequential(
            nn.Conv2d(dim_in, m, k),  # (in, # filters, kernel size)
            # nn.Conv2d(dim_in, m, kernel_size=(k, k, 3)),  # (in, # filters, kernel size)
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(n, n)),
            nn.Flatten(),
            # nn.Linear(dim_in, dim_out),
            nn.Linear(fc_input_size, dim_out),
         )
    return model.to(DEVICE)

In [None]:
p = np.array([2, 3, 4, 5])
k = np.array([4, 5, 6, 8])
n = np.divide.outer(33-k, p)
n
k_p_combs = [(5, 2), (5, 4), (6, 3), (8, 5)]
k_n_combs = [(5, 14), (5, 7), (6, 9), (8, 5)]
for k, p in k_p_combs:
    print(f"{k},{p}")
    n = np.divide.outer(33-k, p)
    print(f"n={n}")

for k, n in k_n_combs:
    print(f"{k},{n}")
    p = np.divide(33-k, n)
    print(f"p={p}")

#### Parameter search over the convolutional layer with max-pool and fully-connected output:

In [None]:
def parameter_search_convolutional_nn(
    train_loader: DataLoader,
    val_loader: DataLoader,
    model_fn:Callable[[], nn.Module]
) -> float:
    """
    Parameter search over the the convolutional layer with max-pool and fully-
    connected output.

    Args:
    train_loader: the train dataloader.
    val_loader: the validation dataloader.
    model_fn: a function that, when called, returns a torch.nn.Module.

    Returns:
    Hyperparameters and train/validation losses and accuracy over the epochs of
    a random search.
    """
    dim_in = 3
    dim_out = 10
    learning_rates = torch.logspace(-6, 0, 15)
    momentum_factors = torch.linspace(0.05, 1.5, 20)

    conv2d_filters = torch.tensor([10, 20, 50, 100, 120, 150, 200])  # m_typ = 100; = torch.logspace(2, 9, steps=30, base=2)
    k_n_combs = [(5, 14), (5, 7), (6, 9), (8, 5)]  # k_typ = 5, n_typ = 14; pool_size = np.divide(33-k, n)
    n_epochs = 8
    num_searches = 3

    results = {
        'learning_rate': [],
        'momentum_factor': [],
        'conv2d_size': [],  # k
        'conv2d_filters': [],  # M
        'maxpool_size': [],  # N
        'train_loss': [],
        'train_acc': [],
        'val_loss': [],
        'val_acc': [],
    }

    for i in range(num_searches):
        learning_rate_sampler = torch.randint(low=0, high=len(learning_rates), size=(1,))
        momentum_factor_sampler = torch.randint(low=0, high=len(momentum_factors), size=(1,))
        conv2d_filters_sampler = torch.randint(low=0, high=len(conv2d_filters), size=(1,))
        k_n_combs_sampler = torch.randint(low=0, high=len(k_n_combs), size=(1,))


        lr = learning_rates[learning_rate_sampler].item()
        momentum = momentum_factors[momentum_factor_sampler].item()
        m = int(conv2d_filters[conv2d_filters_sampler].item())
        k, n = k_n_combs[k_n_combs_sampler]

        print(f"lr: {lr}; "
              f"momentum: {momentum}; "
              f"m: {m}; "
              f"k: {k}; "
              f"n: {n}; ")

        model = model_fn(dim_in, m, k, n, dim_out)

        optim = SGD(model.parameters(), lr=lr, momentum=momentum)

        train_loss, train_acc, val_loss, val_acc = train(
            model,
            optim,
            train_loader,
            val_loader,
            epochs=n_epochs
        )

        results['learning_rate'].append(lr)
        results['momentum_factor'].append(momentum)
        results['conv2d_size'].append(k)
        results['conv2d_filters'].append(m)
        results['maxpool_size'].append(n)
        results['train_loss'].append(train_loss)
        results['train_acc'].append(train_acc)
        results['val_loss'].append(val_loss)
        results['val_acc'].append(val_acc)

    return results

In [None]:
parameters = parameter_search_convolutional_nn(train_loader,
                                               val_loader,
                                               convolutional_neural_network)

for key, item in parameters.items():
    parameters[key] = np.array(item)

#### Evaluate

In [None]:
def evaluate(
    model: nn.Module, loader: DataLoader
) -> Tuple[float, float]:
    """Computes test loss and accuracy of model on loader."""
    loss = nn.CrossEntropyLoss()
    model.eval()
    test_loss = 0.0
    test_acc = 0.0
    with torch.no_grad():
        for (batch, labels) in loader:
            batch, labels = batch.to(DEVICE), labels.to(DEVICE)
            y_batch_pred = model(batch)
            batch_loss = loss(y_batch_pred, labels)
            test_loss = test_loss + batch_loss.item()

            pred_max = torch.argmax(y_batch_pred, 1)
            batch_acc = torch.sum(pred_max == labels)
            test_acc = test_acc + batch_acc.item()
        test_loss = test_loss / len(loader)
        test_acc = test_acc / (batch_size * len(loader))
        return test_loss, test_acc

In [None]:
best_search_index = np.argmax(np.max(parameters['val_acc'], axis=1))
best_3_search_indices = np.argpartition(np.max(parameters['val_acc'][:,-5:], axis=1), -3)[-3:]
best_3_search_indices = best_3_search_indices[np.argsort(np.max(parameters['val_acc'], axis=1)[best_3_search_indices])]
best_3_search_indices

In [None]:
epochs = range(0, len(parameters['val_acc'][0]))
num_searches = len(parameters['learning_rate'])

fig, ax = plt.subplots(figsize=(8,6))
for i in range(num_searches):
# for i in best_3_search_indices:
    ax.plot(epochs, parameters['val_acc'][i], label='validation')
    # ax.plot(epochs, train_acc, label='train')
    ax.set_ylabel('validation accuracy')
    ax.set_xlabel('epoch')
    # ax.legend(frameon=False)
    ax.set_ylim([0, 0.6])
    line_label = (f"lr={np.round(parameters['learning_rate'][i], 5)}; "
                  f"mom={np.round(parameters['momentum_factor'][i], 2)}; "
                  f"k={np.round(parameters['conv2d_size'][i], 5)}; "
                  f"M={np.round(parameters['conv2d_filters'][i], 5)}; "
                  f"N={parameters['maxpool_size'][i]}")

    ax.annotate(line_label, (epochs[-1], parameters['val_acc'][i][-1]))

fig, ax = plt.subplots(figsize=(8,6))
for i in range(num_searches):
    ax.plot(epochs, parameters['val_loss'][i], label='validation')
    # ax.plot(epochs, parameters['train_loss'][i], label='train')
    ax.set_ylabel('loss')
    ax.set_xlabel('epoch')
    ax.legend(frameon=False)
    ax.set_ylim([1, 3])