# ConfigState example with training a model using Tensorflow

This notebook presents an example of how the config-state library can be used to design a machine learning experiment that consists in training an image classification model. We show how the different components, the dataset, the model and the optimizer can be configured and modified through a config file without requiring to write code. We also show how the experiment can be saved at regular intervals and be resumed in case of interruption.

### Requirements

The packages `tensorflow` and `tensorflow-datasets` are required for this example:
```
pip install tensorflow
pip install tensorflow-datasets
```

### The `MLExperiment` class

The `MLExperiment` class is a `ConfigState` subclass that defines the experiment consisting of training a machine learning model for image classification. It is composed of nested `ConfigState` objects that represent the different components such as `Dataset`, `Model` and `Optimizer`.

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # reduce tensorflow's verbosity

from examples.tensorflow.experiment import MLExperiment

### Configuring a `MLExperiment` experiment
The directory `examples/tensorflow/configs` contains examples of configuration files that can be used to configure an experiment. For example:
```yaml
dataset:
  name: mnist
  batch_size: 32
model:
  class: MultiLayerPerceptron
  structure: [128]
  dropout_rate: 0.2
optimizer:
  class: RMSprop
  learning_rate: 0.001
```
We can load it and instanciate an experiment with it:

In [2]:
import yaml

config = yaml.load(open("tensorflow/configs/mlp.yml", 'r'), Loader=yaml.FullLoader)

experiment = MLExperiment(config)

print(experiment.config_summary())

dataset:
  batch_size: 32
  name: mnist
model:
  class: MultiLayerPerceptron
  dropout_rate: 0.2
  input_shape: (28, 28, 1)
  output_units: 10
  structure: [128]
optimizer:
  class: RMSprop
  epsilon: 1e-07
  learning_rate: 0.001
  momentum: 0.0
  rho: 0.9



We can now start training the model for a given number of epochs:

In [3]:
experiment.run(epochs=2)

Training for 2 epochs...
Epoch 1/2
Epoch 2/2
Training finished


### Saving and restoring an experiment

The current experiment's state can been saved into file:

In [4]:
from config_state import Serializer
import tempfile
from pathlib import Path

# create a temporary directory
temp_dir = tempfile.TemporaryDirectory()

# file that will store the experiment
file_path = Path(temp_dir.name) / 'exp.save'

# save the experiment using the Pickle serializer
Serializer({'class': 'Pickle'}).save(experiment, file_path)

The experiment can be restored and resumed:

In [5]:
experiment = Serializer({'class': 'Pickle'}).load(file_path)

experiment.run(epochs=2)

temp_dir.cleanup()

Training for 2 epochs...
Epoch 3/4
Epoch 4/4
Training finished


### Configuration grammar

We can customize the config and change the different components of the experiment. The set of valid configurations depends on an underlying grammar that is defined uppon the definition of the `ConfigState` objects. In our example, the model is defined using the `Model` class:

```python
@builder
class Model(ConfigState):
  input_shape: Tuple[int] = ConfigField(..., "Input shape of the model", type=tuple)
  output_units: Optional[int] = ConfigField(..., "Model's output units count", type=int)
    
  @abstractmethod
  def _build_keras_model(self) -> tf.keras.Model:
    """Build the keras model"""
```

It is an abstract class decorated with the `@builder` decorator meaning that it used as a factory to build concrete subclasses representing the different model architectures that we would like to be able to instanciate. The subclasses decorated with `@register` can be built using the factory, for example we can define a class to build multi layer perceptron models:

```python
@register
class MultiLayerPerceptron(Model):
  structure: List[int] = ConfigField([128], "hidden structure of the MLP")
  dropout_rate: float = ConfigField(0.0, "Dropout rate applied on the last "
                                         "hidden layer.")

  def _build_keras_model(self) -> tf.keras.Model:
    ...
```
`MultiLayerPerceptron` have `ConfigField` attributes that define its configuration interface. For instance, `structure` is a config field representing a list of integer that specifies the size and the number of the hidden layers. In the same way, we can define a class to build CNNs:

```python
@register
class CNN(Model):
  structure: List[Union[int, str]] = ConfigField([32, 'max', 64, 'max', 64],
                                                 "Convolutional structure. "
                                                 "Conv2D layers units "
                                                 "are integers, pooling "
                                                 "layers type are str among "
                                                 "'max' or 'average'.")
  def _build_keras_model(self) -> tf.keras.Model:
    ...
```
Now, the grammar for the `structure` configuration field is different: it is a sequence of integers and strings representing `Conv2D` layers units for intergers, while strings represents the pooling layers specified with either 'max' or 'average' for respectively `MaxPooling2D` or `AveragePooling2D` layers. This is a convenient way to represent an alternating sequence of convolutional and pooling layers. It is uppon the developper to design a grammar rich enough to allow a large set of valid configurations.

Here are some valid model configurations that can be defined using our grammar:
```yaml
model:
  class: CNN
  structure: [32, 32, 'average', 64, 64, 'max', 64, 128]
```
```yaml
model:
  class: CNN
  structure: [16, 'average', 32, 'max', 64, 128, 'max']
```
```yaml
model:
  class: MultiLayerPerceptron
  structure: [128, 64]
  dropout_rate: 0.5
```

The other componants of the experiment follow the same princible. As long as the user is able to setup a valid configuration, he can instantiate and run a new experiment with little to no coding skill requirement:

In [6]:
config = {}

config['dataset'] = {
    'name': 'cifar10' # https://www.tensorflow.org/datasets/catalog/overview#image_classification
}
config['model'] = {
    'class': 'CNN',
    'structure': [32, 'max', 64, 'max', 64]
}
config['optimizer'] = {
    'class': 'Adam',
    'learning_rate': 0.001
}

experiment = MLExperiment(config)

print(experiment.config_summary())

experiment.run(epochs=20)

dataset:
  batch_size: 32
  name: cifar10
model:
  class: CNN
  input_shape: (32, 32, 3)
  output_units: 10
  structure: [32, max, 64, max, 64]
optimizer:
  beta_1: 0.9
  beta_2: 0.9999
  class: Adam
  epsilon: 1e-07
  learning_rate: 0.001

Training for 20 epochs...
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Training finished


### Configuration grammar and composability

One powerfull feature of `ConfigState` is the ability to compose objects. For instance, we can define a model representing an ensemble of another base model:

```python
@register
class Ensembler(Model):
  model: Model = ConfigField(type=Model, doc="The model to be ensembled")
  ensemble_size: int = ConfigField(2, "Size of the ensemble", force_type=True)
  input_shape = ConfigField(model.input_shape)
  output_units = ConfigField(model.output_units)

  def _build_keras_model(self) -> tf.keras.Model:
    ...
```

In `Ensembler`, the `model` configuration field represents another `Model` to be used as base component for the ensembling. This way, different models can be composed together to represent more complex ones. Notice the `input_shape` and `output_units` fields, they are *references* and enable the configuration settings to be injected into nested `ConfigState` objects. By tying the fields in such a way, this constrains the configuration grammar and reduces its verbosity as well as help preventing inconsistent configurations. The following are valid ensemble models configs:

```yaml
model:
  class: Ensembler
  ensemble_size: 4
  model:
    class: MultiLayerPerceptron
    structure: [128]
```
```yaml
model:
  class: Ensembler
  ensemble_size: 4
  model:
    class: CNN
    structure: [32, 'max', 64, 'max', 64]
```

Furthermore, since `Ensembler` is itself a `Model`, we can compose it into another `Ensembler` such that we can define models that are ensemble of ensemble of ensemble, etc ...:

```yaml
model:
  class: Ensembler
  ensemble_size: 4
  model:
    class: Ensembler
    ensemble_size: 4
    model:
      class: CNN
      structure: [32, 'max', 64, 'max', 64]
```

Following are examples of configuring and running experiments with ensemble models.

***Configuring an ensemble of CNN:

In [7]:
config = {
    'optimizer': {
        'class': 'Adam',
        'learning_rate': 0.001
    },
    'dataset': {
        'name': 'cifar10',
        'batch_size': 128 # We augment the batch_size so that each ensembled models train on batches of 32 elements
    }
}

cnn_model = {
    'class': 'CNN',
    'structure': [32, 'max', 64, 'max', 64]
}

config['model'] = {
    'class': 'Ensembler',
    'model': cnn_model,
    'ensemble_size': 4
}

experiment = MLExperiment(config)

print(experiment.config_summary())

print(experiment.model.keras_model.summary())


dataset:
  batch_size: 128
  name: cifar10
model:
  class: Ensembler
  ensemble_size: 4
  input_shape: (32, 32, 3)
  model:
    class: CNN
    input_shape: (32, 32, 3)
    output_units: 10
    structure: [32, max, 64, max, 64]
  output_units: 10
optimizer:
  beta_1: 0.9
  beta_2: 0.9999
  class: Adam
  epsilon: 1e-07
  learning_rate: 0.001

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_6 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
lambda (Lambda)                 [(None, 32, 32, 3),  0           input_6[0][0]                    
__________________________________________________________________________________________________
sequential_3 (Sequential)       (None, 10)      

In [8]:
experiment.run(epochs=20)

Training for 20 epochs...
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Training finished


***Configuring an ensemble of ensemble of CNN:

In [9]:
config = {
    'optimizer': {
        'class': 'Adam',
        'learning_rate': 0.001
    },
    'dataset': {
        'name': 'cifar10',
        'batch_size': 512
    }
}

cnn_model = {
    'class': 'CNN',
    'structure': [32, 'max', 64, 'max', 64]
}

ensemble = {
    'class': 'Ensembler',
    'model': cnn_model,
    'ensemble_size': 4
}

config['model'] = {
    'class': 'Ensembler',
    'model': ensemble,
    'ensemble_size': 4
}

experiment = MLExperiment(config)

# ensemble_ensemble_exp.model.model.output_units
print(experiment.config_summary())

print(experiment.model.keras_model.summary())


dataset:
  batch_size: 512
  name: cifar10
model:
  class: Ensembler
  ensemble_size: 4
  input_shape: (32, 32, 3)
  model:
    class: Ensembler
    ensemble_size: 4
    input_shape: (32, 32, 3)
    model:
      class: CNN
      input_shape: (32, 32, 3)
      output_units: 10
      structure: [32, max, 64, max, 64]
    output_units: 10
  output_units: 10
optimizer:
  beta_1: 0.9
  beta_2: 0.9999
  class: Adam
  epsilon: 1e-07
  learning_rate: 0.001

Model: "model_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_27 (InputLayer)           [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
lambda_10 (Lambda)              [(None, 32, 32, 3),  0           input_27[0][0]                   
__________________________________

In [10]:
experiment.run(epochs=1)

Training for 1 epochs...
Training finished
