This framework is a ML experimentation library that offers a modular implementation workflow where a single python class holds the dataset, callacks and data processing graphs. By implementing a derived task-specific class, users can alter any attribute of the experiment instance.
This library is still in a beta phase and should be installed by cloning the repository:
git clone git@github.com:gabriel-vanzandycke/experimentator.git
cd experimentator ; pip install -e .
It is deep-learning framework agnostic but, to this day, only the API for TensorFlow is implemented.
Instantiation of an experiment requires a configuration that can be any dictionary, but it's recommended to work with Pyconfyg
which handles python language configuration files.
from experimentator import BaseExperiment
class SpecificExperiment(BaseExperiment):
pass
exp = SpecificExperiment({})
With PyConfyg
, the configuration files are plain python files, bringing all features from modern IDEs like syntaxical coloring, automatic completion, pop-up documentation on hover or while typing and go-to functionalities.
The configuration must define the following attributes:
Generic attributes:
experiment_type
: a list of classes that the experiment will instantiate. This enables decoupling features into independant class definitions likeAsyncExperiment
featuring asynchronous data loading orCallbackedExperiment
featuring callbacks called before and after each batch, subset and epoch. It should containexperimentator.TensorflowExperiment
when working with Tensorflow, andexperimentator.TorchExperiment
when working with PyTorch.subsets
: a list ofexperimentator.Subset
s, typically a training, a validation and a testing set. The dataset provided toSubset
must inherit frommlworkflow.Dataset
and its items must be dictionnaries of named tensors, including the machine learning model input and targets, but also any additional tensor required for training or evaluation.batch_size
: an integer defining the batch size.chunk_processors
: a list ofexperimentator.ChunkProcessor
applied onchunk
, a dictionnary initially filled from the batched dataset items. After being successively processed by each chunk processors, thechunk
dictionnary should contain aloss
attribute used for the gradient descent algorithm.
Framework specific attributes:
optimizer
: atf.keras.optimizers.Optimizer
for TensorFlow experiments and atorch.optim.Optimizer
for PyTorch experiments.
The specific experiment must define the following attributes:
batch_inputs_names
: The initial chunk attribute names, extracted from the batched dataset items.batch_metrics_names
: The chunk attribute names used to build the evaluation graph.batch_outputs_names
: The chunk attribute names used to build the inference graph (training graph is made from the chunk attribute"loss"
).
By default, all chunk processors are executed for training, evaluation and inference. However, by setting the mode
attribute with the ExperimentMode
TRAIN
, EVAL
and INFER
flags, it's possible to have the chunk processors excuted only for specific phases. Typically, a data-augmentation chunk processor should have its mode
set to ExperimentMode.TRAIN
and a chunk processor computing evaluation metrics, to ExperimentMode.EVAL
.
A simple example can be found in the examples
folder. The experiment specific implementations are located in the tasks/mnist.py
file. The configuration file, configs/mnist.py
, defines every attribute required for the experiment and is loaded by PyConfyg
with build_experiment
:
exp = build_experiment("configs/mnist.py")
exp.train(10) # trains for 10 epochs
Using a PyConfig
also makes grid searching very convenient:
python -m experimentator configs/mnist.py --epochs 10 --grid "learning_rate=[0.1,0.01,0.001]"
The DeepSport repository makes extensive use of Experimentator with 3 practical cases related to ball detection and size estimation in basketball scenes.
This library was developed in many iterations since 2017, before the modern deep learning libraries became standards. It was greatly inspired by what became ModulOM from Maxime Istasse.
Overall, it was developped to meet the following requirements:
- Handling multiple inputs models, relevant for siameese training.
- Handling multiple outputs models, typically for multi-task learning.
- Modular graph construction with easy block substitutions instead of having a fixed graph.
- Clean implementation structure where task specific objects (callback, chunk processors, specific experiment class) can be implemented in a single file.
Desipite the numberous hours spent in this library developement and maintenance, I wouldn't recommend using it, as modern library offer much better support with similar features.