# Configuration basics

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fostiropoulos/ablator/blob/v0.0.1-mp/docs/source/notebooks/Configuration-Basics.ipynb)

Ablator embraces a versatile configuration system that configures every facet of the training process of machine learning models, covering not only the model architecture but also the training environment.

Think of this configuration system as a blueprint for crafting experiments. By leveraging this system, Ablator orchestrates the creation and setup of experiments, seamlessly integrating the necessary configurations. Furthermore, Ablator offers the flexibility to dynamically construct a hierarchical configuration through composition.

You have the choice to override settings either by using Python named arguments, by `yaml` configuration files, or by using dictionaries. In this tutorial, we will explain all configuration-related concepts in Ablator. We will also demonstrate all necessary steps to configure an experiment in Ablator (named arguments method). Delve into [this section](#alternatives-to-constructing-configuration-objects) of this tutorial to gain insights into implementing the latter two approaches.

## Configuration categories

In ablator, configurations are divided into different categories, these include:

- [Model configuration](#model-configuration) (or model config).

- [Training configuration](#training-configuration) (or training config/ train config).

- [Optimizer and Scheduler configuration](#optimizer-and-scheduler-configurations) (or optimizer config and scheduler config).

- [Running configurations](#running-configurations) (or run configuration/ running config/ run config), either for training a single prototype model or training multiple models in parallel.

These configuration classes will be used together to configure an experiment.

### Model Configuration

Model configuration is required when creating the run configuration (`RunConfig.model_config` and `ParallelConfig.model_config`). This configuration class is used to define [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) specific to the model of interest. By default, it does not have any attribute, instead, we typically inherit from this and add custom attributes for each of the hyperparameters specific to our models. Later, you will use this configuration to construct the model.

There are 2 steps that are required after defining a model config class for your model:

- Pass the model config to the main model's constructor so you can construct the model using the attributes that's defined in the config.

- Create a custom [running configuration class](#running-configurations) and update `model_config` class type to the newly created model config class.

<div class="alert alert-info">

Note

Ablator requires the model’s forward function to return two objects: one dictionary of model’s batched output (e.g. labels, predictions, logits, probabilities, etc.), and the other is the loss value. Notice that these values must be tensors. You also have the choice to return None for either of the values, depending on the use case.

When writing forward method of the main model, ablator requires that you return 2 outputs in the following order:

- dictionary of model’s batched output (e.g. labels, predictions, logits, probabilities, etc.), e.g. `{"y_pred": <model prediction>, "y_true": <true labels>}`.

- Loss value. Here the loss value will be considered an auxiliary metric that will be recorded for later analysis (e.g. tracking loss with `Tensorboard`).

</div>

In addition, you can create a search space over these parameters for your ablation experiment. A sample use case for this is when you want to test different values for model size, number of layers, activation functions, etc. You can do this by creating a search space via `SearchSpace` class for the hyperparameters that you have defined in the model config. Refer to the [Ablation experiment](./HPO-tutorial.ipynb) tutorial for more details.

### Training Configuration

This configuration class defines the training setting (e.g., batch size, number of epochs, the optimizer to use, etc.). Two important attributes to metion are `optimizer_config` and `scheduler_config`. As the names suggest, they configure the optimizer and scheduler to be used in the training process.

| Parameter         | Usage                                                                 |
|-------------------|-----------------------------------------------------------------------|
| dataset _(`str`)_           | dataset name. maybe used in custom dataset loader functions.          |
| batch_size _(`int`)_        | batch size.                                                           |
| epochs _(`int`)_            | number of epochs to train.                                            |
| optimizer_config _(`OptimizerConfig`)_  | optimizer configuration.|
| scheduler_config _(`Optional[SchedulerConfig]`)_  | scheduler configuration.|

Training configuration is required when creating the run configuration (`RunConfig.train_config` or `ParallelConfig.train_config`)

### Optimizer and Scheduler Configurations

By default, Abaltor takes care of creating the optimizer (and optionally the scheduler) for training models. Thus, you also need to configure them so ablator knows which optimizer to pick.

`OptimizerConfig` is used to configure the optimizer for the training process. Currently, we support `SGD` optimizer, `Adam` optimizer, and `AdamW` optimizer.

`SchedulerConfig`, on the other hand, can be used to configure the learning rate scheduler for the training process. Currently, we support `StepLR` scheduler, `OneCycleLR` scheduler, and `ReduceLROnPlateau` scheduler.

Both of these config classes have similar arguments:

| Parameter | Usage                                                                                                                                                                       |
|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name _(`str`)_      | The type of the optimizer or scheduler, this can be any in ``['sgd', 'adam', 'adamw']`` for optimizers and <br> in ``['none', 'step', 'cycle', 'plateau']`` for schedulers.  |
| arguments _(`OptimizerArgs`)_ | The arguments for the scheduler or optimizer, specific to a certain type of scheduler or scheduler. For opimizer, **MUST** include an item for learning rate, e.g. `{"lr": 0.1}`                                                                         |

The table below shows possible arguments can be defined for each type of optimzer:

| Optimizer type | Arguments                                                                                                                                                                                                                                                                           |
|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| sgd            | `weight_decay` _(defaults to 0.0)_: Weight decay rate<br>`momentum` _(defaults to 0.0)_: Momentum factor<br>                                                                                                                                                                                                           |
| adam           | `betas` _(defaults to (0.9, 0.999))_: Coefficients for computing running averages of gradient and its square.<br>`weight_decay` _(defaults to 0.0)_: Weight decay rate.<br>                                                                                                    |
| adamw          | `betas` _(defaults to (0.9, 0.999))_: Coefficients for computing running averages of gradient and its square.<br>`eps` _(defaults to 1e-8)_: Term added to the denominator to improve numerical stability.<br>`weight_decay` _(defaults to 0.01)_: Weight decay rate.<br>|

The table below shows possible arguments can be defined for each type of scheduler:


| Scheduler type | Arguments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cycle          | `max_lr` : Upper learning rate boundaries in the cycle.<br> `total_steps` : The total number of steps to run the scheduler in a cycle.<br> `step_when` _(defaults to `"train"`)_: The step type at which the `scheduler.step()` should be invoked: ``'train'``, ``'val'``, or ``'epoch'``.             |
| plataeu        | `patience` _(defaults to 10)_: Number of epochs with no improvement after which learning rate will be reduced.<br> `min_lr` _(defaults to 1e-5)_: A lower bound on the learning rate.<br>  `mode` _(defaults to "min")_: One of ``'min'``, ``'max'``, or ``'auto'``, which defines the direction of optimization, so as to adjust the learning rate <br> accordingly, i.e when a certain metric ceases improving.<br> `factor` _(defaults to 0.0)_: Factor by which the learning rate will be reduced: ``new_lr = lr * factor``.<br> `threshold` _(defaults to 1e-4)_: Threshold for measuring the new optimum, to only focus on significant changes.<br> `verbose` _(defaults to False)_: If ``True``, prints a message to ``stdout`` for each update.<br> `step_when` _(defaults to "val")_: The step type at which the scheduler should be invoked: ``'train'``, ``'val'``, or ``'epoch'``.<br> |
| step           | `step_size` _(defaults to 1)_: Period of learning rate decay.<br> `gamma` _(defaults to 0.99)_: Multiplicative factor of learning rate decay.99.<br> `step_when` _(defaults to "epoch")_: The step type at which the scheduler should be invoked: ``'train'``, ``'val'``, or ``'epoch'``.<br>                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |

### Running Configurations

Running configurations define the environment of an experiment (experiment main directory, number of checkpoints to maintain, hardware device to use, etc.). There are 2 types of running configurations:

- `RunConfig` for prototype experiments
- `ParallelConfig` for ablation experiments

#### `RunConfig` for prototype experiments

The table below summarizes the parameters:

| Parameter           | Usage                                                                                                                                                                                                                                                                                                             |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| experiment_dir _(`Stateless[Optional[str]]`, defaults to None)_      | location to store experiment artifacts.                                                                                                                                                                                                                                                                           |
| random_seed _(`Optional[int]`, defaults to None)_         | random seed.                                                                                                                                                                                                                                                                                                      |
| train_config _(`TrainConfig`)_        | training configuration. (check ``TrainConfig`` for more details)                                                                                                                                                                                                                                                  |
| model_config _(`ModelConfig`)_        | model configuration. (check ``ModelConfig`` for more details)                                                                                                                                                                                                                                                     |
| keep_n_checkpoints _(`Stateless[int]`, defaults to 3)_  | number of latest checkpoints to keep.                                                                                                                                                                                                                                                                             |
| tensorboard _(`Stateless[bool]`, defaults to True)_         | whether to use tensorboardLogger.                                                                                                                                                                                                                                                                                 |
| amp _(`Stateless[bool]`, defaults to True)_                 | whether to use automatic mixed precision when running on gpu.                                                                                                                                                                                                                                                     |
| device _(`Stateless[str]`, defaults to "cuda")_              | device to run on.                                                                                                                                                                                                                                                                                                 |
| verbose _(`Stateless[Literal["console", "progress", "silent"]]`, defaults to "console")_             | verbosity level.                                                                                                                                                                                                                                                                                                  |
| eval_subsample _(`Stateless[float]`, defaults to 1)_      | fraction of the dataset to use for evaluation.                                                                                                                                                                                                                                                                    |
| metrics_n_batches _(`Stateless[int]`, defaults to 32)_   | max number of batches stored in every tag(train, eval, test) for evaluation.                                                                                                                                                                                                                                      |
| metrics_mb_limit _(`Stateless[int]`, defaults to 10_000)_    | max number of megabytes stored in every tag(train, eval, test) for evaluation.                                                                                                                                                                                                                                    |
| early_stopping_iter _(`Stateless[Optional[int]]`, defaults to None)_ | The maximum allowed difference between the current iteration and the last <br />iteration with the best metric before applying early stopping. Early stopping <br />will be triggered if the difference ``(current_itr-best_itr)`` exceeds ``early_stopping_iter``.<br />If set to ``None``, early stopping will not be applied. |
| eval_epoch _(`Stateless[float]`, defaults to 1)_          | The epoch interval between two evaluations.                                                                                                                                                                                                                                                                       |
| log_epoch _(`Stateless[float]`, defaults to 1)_           | The epoch interval between two logging.                                                                                                                                                                                                                                                                           |
| init_chkpt _(`Stateless[Optional[str]]`, defaults to None)_          | path to a checkpoint to initialize the model with.                                                                                                                                                                                                                                                                |
| warm_up_epochs _(`Stateless[float]`, defaults to 1)_      | number of epochs marked as warm up epochs.                                                                                                                                                                                                                                                                        |
| divergence_factor _(`Stateless[Optional[float]]`, defaults to 10)_   | if ``cur_loss > best_loss > divergence_factor``, the model is considered <br />to have diverged.                                                                                                                                                                                                                        |
| optim_metrics: _(`Stateless[Optional[Dict[Optim]]]`)_         | The optimization metric to use for meta-training procedures, such as for model saving and lr scheduling e.g. ``{"val_loss": "min"}``                                                                                                    |
| optim_metric_name: _(`Stateless[Optional[str]]`)_         | The name of the metric to be optimized.                                                                                                    |

#### `ParallelConfig` for ablation experiments

`ParallelConfig` is a subclass of `RunConfig`. Therefore, it has all attributes `RunConfig` has. Additionally, it introduces other attributes to configure the parallel experiment:

| Parameters                                   | Usage                                                                                                                                               |
|----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| total_trials: _(`Optional[int]`)_                  | total number of trials.                                                                                                                             |
| concurrent_trials: _(`Stateless[Optional[int]]`)_                       | number of trials to run concurrently.                                                                                                               |
| search_space: _(`Dict[SearchSpace]`)_              | search space for hyperparameter search, eg. ``{"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}`` |
| gpu_mb_per_experiment: _(`Stateless[Optional[int]]`)_                   | CUDA memory requirement per experimental trial in MB. e.g. a value of 100 is equivalent to 100MB                                                    |
| search_algo: _(`Stateless[SearchAlgo]`, defaults to SearchAlgo.random)_     | type of search algorithm, SearchAlgo.random for ablation studies, SearchAlgo.tpe for HPO.                                                                                                                           |
| ignore_invalid_params: _(`Stateless[bool]`, defaults to False)_          | whether to ignore invalid parameters when sampling or raise an error.                                                                               |
| remote_config: _(`Stateless[Optional[RemoteConfig]]`, defaults to None)_ | remote storage configuration.                                                                                                                       |

`search_space` is used to define a set of continuous or categorical/ discrete values for a certain hyperparameter. Refer to [Search Space basics](./Search-space-tutorial.ipynb) to learn more about how to use it.

## Configure your experiments

Now let's combine everything to configure your experiment!

<div class="alert alert-info">

Note

For predefined config classes in ablator, the tables above summarize the list of attributes for each config class that you can include when creating config objects, you can also inspect these in the modules documentation [Configuration module](../config.rst), specifically in their attribute sections.

</div>

In [None]:
try:
    import ablator
except:
    !pip install ablator
    print("Stopping RUNTIME! Please run again") # This script automatically restart runtime (if ablator is not found and installing is needed) so changes are applied
    import os

    os.kill(os.getpid(), 9)

In most cases, also as a good practice, we first configure our model (or not configure it at all if you're not running ablation study on the model architecture).

In this example, we create a configuration class `MyModelConfig` for a simple 1-layer neural network model with the following hyperparameters: input size (to be inferred); hidden layer dimension, activation function, and dropout rate (all of which are stateful - discussed below). This configuration then will be used to construct the neural network model `MyCustomModel`:

In [None]:
from ablator import RunConfig, ModelConfig, Derived, configclass

import torch.nn as nn
import torch

@configclass
class MyModelConfig(ModelConfig):
    inp_size: Derived[int]
    hidden_dim: int
    activation: str
    dropout: float

# Construct the model using the configuration
class MyCustomModel(nn.Module):
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__()
        self.linear = nn.Linear(config.inp_size, config.hidden_dim)
        self.dropout = nn.Dropout(config.dropout)
        if config.activation == "relu":
            self.activate = nn.ReLU()
        elif config.activation == "elu":
            self.activate = nn.ELU()
        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x: torch.Tensor, labels: None):
        out = self.linear(x)
        out = self.dropout(out)
        out = self.activate(out)
        
        loss = self.criterion(out, labels)

        return {"preds": out, "labels": labels}, loss

my_model_config = MyModelConfig(hidden_dim=100, activation="relu", dropout=0.3)

Notice how we're returning a dictionary for model's predictions and labels, and loss value in the `forward` method.

We next create a training configuration, which requires an optimizer config and an optional scheduler config. Here we create `optimizer_config` that for an `SGD` optimizer, and `scheduler_config` that configs a `OneCycleLR` scheduler, and then use them in `train_config`:

In [2]:
from ablator import OptimizerConfig, SchedulerConfig
from ablator import TrainConfig

optimizer_config = OptimizerConfig(name="sgd", arguments={"lr": 0.1})
scheduler_config = SchedulerConfig(name="cycle", arguments={"max_lr": 0.5, "total_steps": 50})

train_config = TrainConfig(
    dataset="test",
    batch_size=128,
    epochs=2,
    optimizer_config=optimizer_config,
    scheduler_config=scheduler_config,
)

The last step is to create a `run_config` object. This object combines `train_config` and `my_model_config`, along with runtime settings like verbosity and device. However, we first need to redefine the run config class to update its `model_config` attribute from `ModelConfig` (by default) to `MyModelConfig`:

In [4]:
@configclass
class CustomRunConfig(RunConfig):
    model_config: MyModelConfig

run_config = CustomRunConfig(
    train_config=train_config,
    model_config=my_model_config,
    verbose="silent",
    device="cpu",
)
run_config

CustomRunConfig(model_config={'inp_size': None, 'hidden_dim': 100, 'activation': 'relu', 'dropout': 0.3}, experiment_dir=None, random_seed=None, train_config={'dataset': 'test', 'batch_size': 128, 'epochs': 2, 'optimizer_config': {'name': 'sgd', 'arguments': {'weight_decay': 0.0, 'momentum': 0.0, 'lr': 0.1}}, 'scheduler_config': {'name': 'cycle', 'arguments': {'max_lr': 0.5, 'total_steps': 50, 'step_when': 'train'}}}, keep_n_checkpoints=3, tensorboard=True, amp=True, device='cpu', verbose='silent', eval_subsample=1.0, metrics_n_batches=32, metrics_mb_limit=10000, early_stopping_iter=None, eval_epoch=1.0, log_epoch=1.0, init_chkpt=None, warm_up_epochs=1.0, divergence_factor=10.0, optim_metrics=None, optim_metric_name=None)

That's it, we have finished configuring our experiment! With this, we are half-way to launching an ablation experiment. Refer to [Prototyping models](./Prototyping-models.ipynb) and [Ablation experiment](./HPO-tutorial.ipynb) tutorials for the next steps after configuration to launch the experiment.

<div class="alert alert-info">

Note

All configuration classes (including custom ones that you may create) must inherit from `ConfigBase` and decorated with `@configclass` decorator, you can see this in `MyModelConfig` and `CustomRunConfig` classes in the example above.

</div>

# Ablator custom data types for stateful experiment design

One key feature of Ablator is the ability to run stateful experiments. To do this, we created three special types: **Stateless**, **Stateful**, and **Derived**. These are custom annotations to define configuration attributes to which the experiment state is agnostic, aka does not have any impact on the experiment state (which can be Complete, Running, Pending, Pruned, etc. Read more about experiment state from our [paper](https://iordanis.me/data/ablator.pdf)).

- **Stateless** attributes can take different values between trials or experiments. For example, learning rate should be stateless, as we can train models with different learning rates. Note that if you're declaring a variable to be Stateless, it must be assigned an initial value before launching the experiment.

- **Stateful** attributes, opposite to **Stateless**, must have the same value between different experiments. For example, a binary classification model should always have output size of 2. Stateful variables, defined as a primitive datatype (no annotating needed), must be assigned with values before launching the experiment.

- **Derived** attributes are **Stateful** and are un-decided at the start of the experiment. Their values are determined by internal experiment processes that can depend on other experimental attributes, e.g model input size that depends on the dataset.

Later when annotating attributes, you can wrap these keywords around their data type. For example, `inp_size: Derived[int]` means that `inp_size` is a derived attribute of type `int`, and similarly for Stateless. For Stateful however, any data types that are not annotated with Derived nor Stateless are considered Stateful.

We also defined other structural data types such as List, Dict, Enum, etc. You can find more information about all custom data types (including the three above types) in the [Data types configuration](../ablator.config.types.rst) module documentation.

<div class="alert alert-info">

Note

- The reason for creating these annotations is that for stateful experiment design, the configuration should be unambiguous at the initialization state. And the use of these annotations assures the unambiguity of the configuration.
- If you are interested to learn more about stateful experiment design, see our paper: [ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments](https://iordanis.me/data/ablator.pdf)

</div>

## Alternatives to constructing configuration objects

There are three methods to configure an experiment: named arguments, file-based, and dictionary-based. All previous code snippets are examples of the named-arguments method. Now let's look at how file based method and dictionary based method work.

### File-based

File based configuration is a way for you to create simple configuration files. You can use `<ConfigClass>.load(path/to/yaml/file)` method to create configuration with values provided in the config file.

To write these config files, simply follow `yaml` syntax. Make sure that **the attributes and their hierarchy match with those in the config classes** (for both default config classes from ablator, or custom ones like `MyModelConfig`). The following example shows what a config yaml file looks like. We will name it `config.yaml`:

```yaml
experiment_dir: "/tmp/dir"
train_config:
  dataset: test
  batch_size: 128
  epochs: 2
  optimizer_config:
    name: sgd
    arguments:
      lr: 0.1
  scheduler_config:
    name: cycle
    arguments:
      max_lr: 0.5
      total_steps: 50
model_config:
  inp_size: 50
  hidden_dim: 100
  activation: "relu"
  dropout: 0.15
verbose: "silent"
device: "cpu"
```

Now in your code, load these values to create the config object:

```python
config = CustomRunConfig.load("path/to/yaml/file")
```

Note that since we created a custom running configuration class `CustomRunConfig` that is tied to the custom model config in the previous sections, we used `CustomRunConfig.load("path/to/yaml/file")` to load configuration from file. Otherwise, if you're not creating any subclasses, simply run `RunConfig.load("path")` or `ParallelConfig.load("path")`.

### Dictionary based

Another alternative is similar to the file-based method, but it's defining configurations in a dictionary instead of a yaml file, and then the dictionary will be passed (as keyword arguments) to the configuration at initialization

```python
configuration = {
    "experiment_dir": "/tmp/dir",
    "train_config": {
        "dataset": "test",
        "batch_size": 128,
        "epochs": 2,
        "optimizer_config":{
            "name": "sgd",
            "arguments": {
                "lr": 0.1
            }
        },
        "scheduler_config":{
            "name": "cycle",
            "arguments":{
                "max_lr": 0.5,
                "total_steps": 50
            }
        }
    },
    "model_config": {
        "inp_size": 50,
        "hidden_dim": 100,
        "activation": "relu",
        "dropout": 0.15
    },
    "verbose": "silent",
    "device": "cpu"
}

config = CustomRunConfig(
    **configuration
)
```

## Conclusion

Now that you've learned how to configure experiments, you can start creating your own prototype. In the next chapter, we will learn how to write a prototype model, define necessary configurations and model interfaces and launch the experiment. 