# Prototyping Models

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fostiropoulos/ablator/blob/v0.0.1-mp/docs/source/notebooks/Prototyping-models.ipynb)

Let's say you have a novel idea for a model architecture and you want to run ablation study on it with `ablator`. Ablator simplifies the process of prototyping your model, allowing you to swiftly construct and evaluate your innovative concept. Once a prototype runs smoothly and behave as expected, you can scale it to a parallel ablation study of multiple trials with minimal code change.

This chapter covers prototyping a model using Ablator. We will train a simple neural network model on the popular **Fashion-mnist** dataset.

There are three main steps to run a prototype experiment in ablator:

- Configure the prototype experiment.

- Create model wrapper that defines boiler-plate code for training and evaluating models.

- Create the prototype trainer and launch the experiment.

Let us first import all necessary dependencies:

In [None]:
try:
    import ablator
except:
    !pip install ablator
    print("Stopping RUNTIME! Please run again") # This script automatically restart runtime (if ablator is not found and installing is needed) so changes are applied
    import os

    os.kill(os.getpid(), 9)

In [None]:
from ablator import (ModelConfig, OptimizerConfig, TrainConfig, RunConfig,
                     ModelWrapper, ProtoTrainer, configclass)

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

from sklearn.metrics import f1_score, accuracy_score

import os

## Launch the prototype experiment

### Configure the experiment

We will follow the same steps as in the previous tutorial on [Configuration Basics](./Configuration-Basics.ipynb) to configure the experiment:

Here's a summary of how we will configure it:

- **Model Configuration**: dimensions for the layers of the model.

- **Optimizer Configuration**: adam (lr = 0.001).

- **Train Configuration**: `batch_size` = 32, `epochs` = 20.

- **Running Configuration**: CPU as hardware and a random seed for the experiment.

#### Configure the model

##### Model configuration

For the model configuration, we define hyperparameters `input_size`, `hidden_size`, and `num_classes` as [stateful](./Configuration-Basics.ipynb#Ablator-custom-data-types-for-stateful-experiment-design) integer config attributes.

In [None]:
@configclass
class CustomModelConfig(ModelConfig):
    input_size :int
    hidden_size :int 
    num_classes :int

model_config = CustomModelConfig(
    input_size = 28*28,  # 28x28 image flattened
    hidden_size = 256, 
    num_classes = 10
    )
model_config

CustomModelConfig(input_size=784, hidden_size=256, num_classes=10)

Since the hyperparameters are defined as stateful, we must provide concrete values when initializing the `model_config` object.

<div class="alert alert-info">

Note

In this tutorial, the model config is just used to construct the main model. In fact, if you don't plan to scale your prototype to an ablation study, you can skip defining a model config and directly construct the model. However, it is a good practice to define a model config, because ablator is mostly used for scaling up a prototype to an ablation study. Once shifted to scaling up, you will see that model config plays a critical role, specifically, it lets you create search spaces for hyperparameters and in turn run ablation study on them.

</div>

##### Creating Pytorch Model 

Model Architecture (Simple Neural Network with Linear Layers):

Linear_1_(28*28, 256) -> ReLU -> Linear_2_(256, 256) -> ReLU -> Linear_3_(256, 10) (where ReLU is an Activation function) 

Note that here, we depart from the Configuration Basics tutorial, we construct our model as a 2-level module:

- `FashionMNISTModel` is the model architecture (your novel idea), this is where we use the model config attributes to construct the model.

- `MyModel` includes the main model architecture as a sub-module, adds a loss function, performs forward computation, and returns the predicted labels and loss during model training and evaluation. 

In [None]:
class FashionMNISTModel(nn.Module):
    def __init__(self, config: CustomModelConfig):
        super(FashionMNISTModel, self).__init__()

        input_size = config.input_size 
        hidden_size = config.hidden_size
        num_classes = config.num_classes

        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)  
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

class MyModel(nn.Module):
    def __init__(self, config: CustomModelConfig) -> None:
        super().__init__()
        
        self.model = FashionMNISTModel(config)
        self.loss = nn.CrossEntropyLoss()

    def forward(self, x, labels=None):
        out = self.model(x)
        loss = None

        if labels is not None:
            loss = self.loss(out, labels)
            labels = labels.reshape(-1, 1)  # reshape as (batch, labels)

        out = out.argmax(dim=-1)
        out = out.reshape(-1, 1)    # reshape as (batch, output)

        return {"y_pred": out, "y_true": labels}, loss

<div class="alert alert-info">

Note

- Ablator requires the model's forward function to return two objects: one dictionary of model's batched output (e.g. labels, predictions, logits, probabilities, etc.), and the other is the loss value. Notice that these values must be tensors. You also have the choice to return `None` for either of the values, depending on the use case.

- Depending on the evaluation metrics that you want to use, you can include in the model's dictionary output logits, probabilities, predicted labels, ground truth labels, etc. In this example, we return the predicted labels and the ground truth labels in the model's dictionary output, and these will be used later on to compute the accuracy and F1 score.

</div>

#### Configure the training process

In [None]:
optimizer_config = OptimizerConfig(
    name="adam", 
    arguments={"lr": 0.001}
)

train_config = TrainConfig(
    dataset="Fashion-mnist",
    batch_size=32,
    epochs=20,
    optimizer_config=optimizer_config,
    scheduler_config=None
)

#### Configure the running configuration

In [None]:
@configclass
class CustomRunConfig(RunConfig):
    model_config: CustomModelConfig

run_config = CustomRunConfig(
    train_config=train_config,
    model_config=model_config,
    metrics_n_batches = 800,
    experiment_dir = "/tmp/experiments",
    device="cpu",
    amp=False,
    random_seed = 42
)

<div class="alert alert-info">

Note

- We recommend that the experiment directory `RunConfig.experiment_dir` should be an empty directory, or at least does not contain any prior experiment results.
- Make sure to redefine the running configuration class to update its `model_config` attribute from `ModelConfig` (by default) to `CustomModelConfig` before creating the config object.

</div>

### Create the model wrapper

The model wrapper class `ModelWrapper` serves as a comprehensive wrapper for PyTorch models, providing a high-level interface for handling various tasks involved in model training. It defines boiler-plate code for training and evaluating models, which significantly reduces development efforts and minimizes the need for writing complex code, ultimately improving efficiency and productivity:

- It takes care of creating and utilizing data loaders, evaluating models, importing parameters from configuration files into the model, setting up optimizers and schedulers, and checkpoints, logging metrics, handling interruptions, and much more.

- Its functions are over-writable to support for custom use-cases (read more about these functions in [this documentation of Model Wrapper](../training.interface.rst)).

An important function of the `ModelWrapper` is `make_dataloader_train`, which is used to create a data loader for training the model. In fact, you **MUST** provide a train dataloader to `make_dataloader_train` before launching the experiment.

Therefore, we will prepare the datasets first. Then, we write some evaluation functions that will be used to evaluate our model. Finally, we will create the model wrapper, pass it and the configuration to the trainer and launch the experiment.

#### Prepare the dataset

**Fashion MNIST** is a dataset consisting of 60,000 grayscale images of fashion items. The images are categorized into ten classes, which include clothing items. 

- Image dimensions: 28 pixels x 28 pixels (grayscale)

- Shape of the training data tensor: [60000, 1, 28, 28]

Here we will create two datasets: one for training and one for validation.

In [None]:
transform = transforms.ToTensor()

train_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

test_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

#### Defining Custom Evaluation Metrics

Evaluation metrics are a way for you to evaluate the model using different methods. . Defining evaluation functions for classification problems. Using average as "weighted" for multiclass evaluation.

In [None]:
def my_accuracy(y_true, y_pred):
    return accuracy_score(y_true.flatten(), y_pred.flatten())

def my_f1_score(y_true, y_pred):
    return f1_score(y_true.flatten(), y_pred.flatten(), average='weighted')

<div class="alert alert-info">

Note

Make sure that parameters to the evaluation function match the model's forward dictionary output. Since `MyModel`'s returned dictionary has keys `"y_true"` and `"y_pred"`, the evaluation function must have parameters `"y_true"` and `"y_pred"`.

</div>

#### Create the Model Wrapper

We will now create a model wrapper class and overwrite the following functions:

- `make_dataloader_train` and `make_dataloader_val`: to provide the training dataset and validation dataset as dataloaders (In PyTorch, a **DataLoader** is a utility class that provides an iterable over a dataset. It is commonly used for handling data loading and batching in machine learning and deep learning tasks).

- `evaluation_functions`: to provide the evaluation functions that will evaluate the model on the datasets. In this function, you must return a dictionary of callables, where the keys are the names of the evaluation metrics and the values are the functions that compute the metrics. These metrics are later used for logging and plotting (i.e. tensorboard tracking and analysis artifacts).

In [None]:
class MyModelWrapper(ModelWrapper):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def make_dataloader_train(self, run_config: CustomRunConfig):
        return torch.utils.data.DataLoader(
            train_dataset,
            batch_size=32,
            shuffle=True
        )

    def make_dataloader_val(self, run_config: CustomRunConfig):
        return torch.utils.data.DataLoader(
            test_dataset,
            batch_size=32,
            shuffle=False
        )

    def evaluation_functions(self):
        return {
            "accuracy": my_accuracy,
            "f1": my_f1_score
        }

Now create the model wrapper object, passing the model class as its argument:

In [None]:
wrapper = MyModelWrapper(
    model_class=MyModel,
)

### Create the trainer and launch the experiment

For a prototype experiment, we will use the prototype trainer `ProtoTrainer` to launch the experiment.

Initialize the trainer, providing it with the model wrapper and the running configuration. After that, call the `launch()` method, passing to `working_directory` the path to the main directory that you're working at (which stores codes, modules that will be pushed to ray). It's recommended that this directory be tracked by git for keeping track of any code changes.

In [None]:
ablator = ProtoTrainer(
    wrapper=wrapper,
    run_config=run_config,
)

metrics = ablator.launch(working_directory=os.getcwd()) # assuming the current directory is tracked by git

## Experiment results

The `ProtoTrainer.launch()` method returns a dictionary which stores metrics of the experiment

A more detailed exploration of interpreting results will be undertaken in a later chapter.

In [None]:
max_key_length = max(len(str(k)) for k in metrics)

for k, v in metrics.items():
    print(f"{k:{max_key_length}} : {v}")

```shell
val_loss          : 0.5586626408626636
val_accuracy      : 0.8687149999999999
val_f1            : 0.8684085851245271
train_loss        : 0.2816645764191945
train_accuracy    : 0.8915705128205127
train_f1          : 0.891141942313593
best_iteration    : 3750
best_loss         : 0.4098668480262208
current_epoch     : 20
current_iteration : 37500
epochs            : 20
learning_rate     : 0.001
total_steps       : 37500
```

Ablator automatically records metrics so that you can visualize them in TensorBoard and observe how they change every epoch:

- Just install `pip install tensorboard` and load using `%load_ext tensorboard` if using a notebook.

- Run the command `%tensorboard --logdir <experiment_dir>/dashboard/tensorboard --port [port]`, where `<experiment_dir>` is the experiment directory that we passed to the parallel config (`run_config.experiment_dir = "/tmp/experiments/"`)

## Conclusion

That's it! We have successfully built and tested a prototype model using ablator. In the later chapters, we will learn how to scale a prototype to a cluster of parallel processes to explore hyperparameter optimization with more complex models.

### Additional Info

Why train with ProtoTrainer?

- It provides a robust way to handle errors during training.
- Ideal for prototyping experiments in a local environment.
- Easily adaptable for ablation studies with larger configurations and horizontal scaling.
- Quick transition to `ParallelConfig` and `ParallelTrainer` for parallel execution of trials using Ray.