# Configuration basics

Ablator has the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

Important configuration classes are presented below. Note that you can either store all these parameters in a yaml config file, or you can play around with classes. You can refer to [these examples]() to see how these 2 cases are done in practice.

## Configuration parameters

For our framework, configuration is divided into different categories: running configs (either ordinary running configs or parallel training configs), model configs, training configs, optimizer configs, scheduler configs. Details of these configurations is summarized below

### RunConfig

`RunConfig` is used for setting up configuration for the experiment to run, for example, where artifacts of the experiment are stored, the device to be used (gpu, cpu), when to do validation step, when to do logging of training progress. 

`RunConfig` is passed as an argument when initializing the trainer object:
```
config = RunConfig(
    train_config=train_config,
    model_config=CustomModelConfig(),
    verbose="silent",
    device="cpu",
    amp=False,
    ...
)

ablator = ParallelTrainer(wrapper=wrapper, run_config=config)
```
The table below summarizes parameters that one can use. Note that `RunConfig` requires TrainConfig and ModelConfig to be included when initializing.

| Parameter           | Usage                                                                                                                                                                                                                                                                                                             |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| experiment_dir      | location to store experiment artifacts.                                                                                                                                                                                                                                                                           |
| random_seed         | random seed.                                                                                                                                                                                                                                                                                                      |
| train_config        | training configuration. (check ``TrainConfig`` for more details)                                                                                                                                                                                                                                                  |
| model_config        | model configuration. (check ``ModelConfig`` for more details)                                                                                                                                                                                                                                                     |
| keep_n_checkpoints  | number of latest checkpoints to keep.                                                                                                                                                                                                                                                                             |
| tensorboard         | whether to use tensorboardLogger.                                                                                                                                                                                                                                                                                 |
| amp                 | whether to use automatic mixed precision when running on gpu.                                                                                                                                                                                                                                                     |
| device              | device to run on.                                                                                                                                                                                                                                                                                                 |
| verbose             | verbosity level.                                                                                                                                                                                                                                                                                                  |
| eval_subsample      | fraction of the dataset to use for evaluation.                                                                                                                                                                                                                                                                    |
| metrics_n_batches   | max number of batches stored in every tag(train, eval, test) for evaluation.                                                                                                                                                                                                                                      |
| metrics_mb_limit    | max number of megabytes stored in every tag(train, eval, test) for evaluation.                                                                                                                                                                                                                                    |
| early_stopping_iter | The maximum allowed difference between the current iteration and the last <br />iteration with the best metric before applying early stopping. Early stopping <br />will be triggered if the difference ``(current_itr-best_itr)`` exceeds ``early_stopping_iter``.<br />If set to ``None``, early stopping will not be applied. |
| eval_epoch          | The epoch interval between two evaluations.                                                                                                                                                                                                                                                                       |
| log_epoch           | The epoch interval between two logging.                                                                                                                                                                                                                                                                           |
| init_chkpt          | path to a checkpoint to initialize the model with.                                                                                                                                                                                                                                                                |
| warm_up_epochs      | number of epochs marked as warm up epochs.                                                                                                                                                                                                                                                                        |
| divergence_factor   | if ``cur_loss > best_loss > divergence_factor``, the model is considered <br />to have diverged.                                                                                                                                                                                                                        |

### ParallelConfig

This configuration is unique to `ablator`. It's a subclass of RunConfig, but adding further arguments that configure parallel training (horizontal scaling of a single experiment). It also helps define the settings of distributed training in the experiment, e.g number of trials, number of trials to run concurrently, the target metrics to optimize, etc.

| Parameter             | Usage                                                                                                                                              |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| total_trials          | total number of trials.                                                                                                                            |
| concurrent_trials     | number of trials to run concurrently.                                                                                                              |
| search_space          | search space for hyperparameter search,eg. ``{"train_config.optimizer_config.arguments.lr": SearchSpace(value_range=[0, 10], value_type="int"),}`` |
| optim_metrics         | metrics to optimize, eg. ``{"val_loss": "min"}``                                                                                                   |
| search_algo           | type of search algorithm.                                                                                                                          |
| ignore_invalid_params | whether to ignore invalid parameters when sampling.                                                                                                |
| remote_config         | remote storage configuration.                                                                                                                      |
| gcp_config            | gcp configuration.                                                                                                                                 |
| gpu_mb_per_experiment | gpu resource to assign to an experiment.                                                                                                           |
| cpus_per_experiment   | cpu resource to assign to an experiment.                                                                                                           |

It's worth to mention `search_space`, which is used to define a set (categorical or discrete/continuous) of values for a certain hyperparameter that you want to ablate. Refer to [this]() to learn more about how to use it for ablation.

### ModelConfig

This configuration can be used to add parameters specific to certain types of models that you're using. One sample use case for this is when we want to try different model size, e.g different number of layers, different activation function, or different dropout ratio, etc. By creating `ModelConfig` class for your model, `ablator` will be able to create search space over them, hence be able to run Hyperparameter optimization.

Another example is when we have different models sizes, each comes with different set of pretrained weights. A config class encapsulating these models and their corresponding weight set.

There are 2 extra steps you need to do if you create a custom `ModelConfig`. Firstly, since you have defined model parameters to be customized, when creating the model module, you have to pass to its constructor this config, then you can construct the model using this config's parameters. Secondly, as the running config (either RunConfig or ParallelConfig) has an `ModelConfig` attribute, you will need to create a custom running config class as well (decorated with `configclass` decorator), updating `model_config` argument to type `ModelConfig`. Only after this can the running config object be used in ablator launcher.

Note that in the model config class, all arguments should be defined as Stateless or Derived data type. These are custom Python annotations to define attributes to which the experiment state is agnostic.
- Stateless configuration attributes can be used as a proxy for variables that can take different value assignments between trials or experiments. For example, the learning rate can be set as an independent variable and must be annotated as stateless. Additionally, there are variables that take different values between experiments and trials to which the state is agnostic, for example, a random seed or a directory path between execution environments can be annotated as stateless.
- Derived attributes are un-decided at the start of the experiment and do not require a value assignment. Instead, the value is determined by internal experiment processes that can depend on other experimental attributes, such as the dataset. However, given the same initial state, the attribute is expected to result in the same value and is therefore deterministic. For example, the input size used in a modelâ€™s architecture that depends on the dataset will be annotated as Derived during the experiment design phase.

The code snippet below shows a concrete example of a simple 1-layer neural network model, with injected configuration for input size, hidden layer dimension, activation function, and dropout rate.

In [6]:
from ablator import RunConfig, ModelConfig, Stateless, Derived, configclass

import torch.nn as nn
import torch

class MyModelConfig(ModelConfig):
    inp_size: Derived[int]
    hidden_dim: Stateless[int]
    activation: Stateless[str]
    dropout: Stateless[float]

@configclass
class CustomRunConfig(RunConfig):
    model_config: MyModelConfig

class MyCustomModel(nn.Module):
    def __init__(self, config: MyModelConfig) -> None:
        super().__init__()
        self.linear = nn.Linear(config.inp_size, config.hidden_dim)
        self.dropout = nn.Dropout(config.dropout)
        if config.activation == "relu":
            self.activate = nn.ReLU()
        elif config.activation == "elu":
            self.activate = nn.ELU()

    def forward(self, x: torch.Tensor):
        out = self.linear(x)
        out = self.dropout(out)
        out = self.activate(out)

        return {"preds": out, "labels": out}, x.sum().abs()

### TrainConfig

This configuration class defines everything that is related to the main training process of your model, which includes dataset name, batch size, number of epochs, optimizer, scheduler.

| Parameter         | Usage                                                                 |
|-------------------|-----------------------------------------------------------------------|
| dataset           | dataset name. maybe used in custom dataset loader functions.          |
| batch_size        | batch size.                                                           |
| epochs            | number of epochs to train.                                            |
| optimizer_config  | optimizer configuration. (check ``OptimizerConfig`` for more details) |
| scheduler_config  | scheduler configuration. (check ``SchedulerConfig`` for more details) |
| rand_weights_init | whether to initialize model weights randomly.                         |

### OptimizerConfig

`OptimizerConfig` is a config class that allows user choose the optimizer they wanted, either `Adam`, `SGD`, or `AdamW` optimizer.

### SchedulerConfig


## Using Configuration

Ablator trainer requires a model wrapper and a running config when initializing, after that, experiment can be launched via `trainer.launch()`.
```
trainer = ProtoTrainer(wrapper=model_wrapper, run_config=run_config)
trainer.launch()
```
This tutorial focuses on helping you define a running configuration `run_config`.

Apart from the default parameter values and primitive type parameters (which you can refer to the summary table above to know what value to give them), Running configuration, as shown in table above, requires training configuration and model configuration. So, you must provide it with these configuration objects. Moreover, training configuration also requires optimizer config. So these are the configuration objects that you should create. These configuration objects are just a way to isolate different parameters by functional type in machine learning.

Take the code snippet below as an example, `optimizer_config` specifies the optimizer name and arguments, while the `train_config` sets up the dataset, batch size, epochs, and references the optimizer configuration. Next, `config` object combines the `training_config` and `model_config`, along with runtime settings like verbosity and device. Finally, the ProtoTrainer object `ablator` is created with a wrapper object and `config`, and `launch()` method is called to start the experiment. The returned metrics or results are stored in the metrics variable.

In [None]:
from ablator import OptimizerConfig, TrainConfig, RunConfig, ModelConfig

optimizer_config = OptimizerConfig(name="sgd", arguments={"lr": 0.1})

train_config = TrainConfig(
    dataset="test",
    batch_size=128,
    epochs=2,
    optimizer_config=optimizer_config,
    scheduler_config=None,
)

config = RunConfig(
    train_config=train_config,
    model_config=ModelConfig(),
    verbose="silent",
    device="cpu",
)

There are 3 ways to provide values to the configurations: Named arguments, file-based, or dictionary-based.

### Named arguments

What you saw in the above example is actually named argument method. So you directly create configuration objects and provide config values as you initialize them.

Parallel config:

Here ModelConfig can be customized to your use case. Note that if you're customizing model configuration as a new config class, you will need to redefine the running configuration's model config to be of the newly created model config class



### File-based

Define configs in yaml files

Use RunConfig.load(`path`)

### Dictionary based

In the next chapter, you will learn how to train a model using these configurations in ablator