# ConfigState example with training a model using Tensorflow

This notebook presents an example of how the config-state library can be used to design a machine learning experiment that consists in training an image classification model. We show how the different components, the dataset, the model and the optimizer can be configured and modified through a config file without requiring to write code. We also show how the experiment can be saved at regular intervals and be resumed in case of interruption.

### Requirements

The packages `tensorflow` and `tensorflow-datasets` are required for this example:
```
pip install tensorflow
pip install tensorflow-datasets
```

### The `MLExperiment` class

The `MLExperiment` class is a `ConfigState` subclass that defines the experiment consisting of training a machine learning model for image classification. It is composed of nested `ConfigState` objects that represent the different components such as `Dataset`, `Model` and `Optimizer`.

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # reduce tensorflow's verbosity

from examples.tensorflow.experiment import MLExperiment

2021-08-25 03:31:21.140619: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


### Configuring a `MLExperiment` experiment
The directory `examples/tensorflow/configs` contains examples of configuration files that can be used to configure an experiment. Let's load one:

In [2]:
import yaml

config = yaml.load(open("tensorflow/configs/mlp.yml", 'r'), Loader=yaml.FullLoader)

An experiment can be instantiated using this configuration:

In [3]:
experiment = MLExperiment(config)

print(experiment.config_summary())

dataset:
  batch_size: 32
  name: mnist
model:
  class: MultiLayerPerceptron
  dropout_rate: 0.2
  input_shape: (28, 28, 1)
  output_units: 10
  structure: [128]
optimizer:
  class: RMSprop
  epsilon: 1e-07
  learning_rate: 0.001
  momentum: 0.0
  rho: 0.9



2021-08-25 03:31:25.391946: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-25 03:31:25.484259: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-25 03:31:25.484695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:0b:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.75GiB deviceMemoryBandwidth: 573.69GiB/s
2021-08-25 03:31:25.484706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-25 03:31:25.486441: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-08-25 03:31:25.486469: I tensorflow/stream_executor

We can start training the model for a given number of epochs:

In [4]:
experiment.run(epochs=2)

Training for 2 epochs...
Epoch 1/2


2021-08-25 03:31:29.683515: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-25 03:31:29.703694: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3792935000 Hz
2021-08-25 03:31:29.933653: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11


 112/1875 [>.............................] - ETA: 2s - loss: 0.8235 - accuracy: 0.7617

2021-08-25 03:31:30.261106: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11


Epoch 2/2
Training finished


### Saving and restoring an experiment

The current experiment's state can been saved into file:

In [None]:
from config_state import Serializer
import tempfile
from pathlib import Path

# create a temporary directory
temp_dir = tempfile.TemporaryDirectory()

# file that will store the experiment
file_path = Path(temp_dir.name) / 'exp.save'

# save the experiment using the Pickle serializer
Serializer({'class': 'Pickle'}).save(experiment, file_path)

The experiment can be restored and resumed:

In [None]:
experiment = Serializer({'class': 'Pickle'}).load(file_path)

experiment.run(epochs=2)

temp_dir.cleanup()

### Configuring a new experiment

We can customize the config dictionary to design a new experiment with a different datatet, model or optimizer:

In [None]:
config['dataset'] = {
    'name': 'cifar10' # https://www.tensorflow.org/datasets/catalog/overview#image_classification
}
config['model'] = {
    'class': 'CNN',
    'structure': [32, 'max', 64, 'max', 64]
}
config['optimizer'] = {
    'class': 'Adam',
    'learning_rate': 0.001
}

experiment = MLExperiment(config)

print(experiment.config_summary())

experiment.run(epochs=20)

### ConfigState objects composability

`ConfigState` is convenient for compositing objects. For instance we can nest a `Model` into another `Ensembler` model:

In [None]:
cnn_model = {
    'class': 'CNN',
    'structure': [32, 'max', 64, 'max', 64]
}

config['model'] = {
    'class': 'Ensembler',
    'model': cnn_model,
    'ensemble_size': 4
}

config['dataset'] = {
    'name': 'cifar10',
    'batch_size': 128 # We augment the batch_size so that each ensembled models train on batches of 32 elements
}

experiment = MLExperiment(config)

print(experiment.config_summary())

print(experiment.model.keras_model.summary())


In [None]:
experiment.run(epochs=20)

Since `Ensembler` is itself a `Model`, we can compose it into another `Ensemble` such that we can define models that are ensemble of ensemble:

In [5]:
cnn_model = {
    'class': 'CNN',
    'structure': [32, 'max', 64, 'max', 64]
}

ensemble = {
    'class': 'Ensembler',
    'model': cnn_model,
    'ensemble_size': 4
}

config['model'] = {
    'class': 'Ensembler',
    'model': ensemble,
    'ensemble_size': 4
}

config['dataset'] = {
    'name': 'cifar10',
    'batch_size': 512
}

experiment = MLExperiment(config)

# ensemble_ensemble_exp.model.model.output_units
print(experiment.config_summary())

print(experiment.model.keras_model.summary())


Tracing 4
Tracing 4
dataset:
  batch_size: 512
  name: cifar10
model:
  class: Ensembler
  ensemble_size: 4
  input_shape: (32, 32, 3)
  model:
    class: Ensembler
    ensemble_size: 4
    input_shape: (32, 32, 3)
    model:
      class: CNN
      input_shape: (32, 32, 3)
      output_units: 10
      structure: [32, max, 64, max, 64]
    output_units: 10
  output_units: 10
optimizer:
  class: RMSprop
  epsilon: 1e-07
  learning_rate: 0.001
  momentum: 0.0
  rho: 0.9

Model: "model_4"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_21 (InputLayer)           [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
lambda_8 (Lambda)               [(None, 32, 32, 3),  0           input_21[0][0]                   
_______________

In [7]:
experiment.run(epochs=1)

Training for 3 epochs...
Epoch 2/4
Epoch 3/4
Epoch 4/4
Training finished
