Skip to content

Training State Centered Framework vs. Engine Centered Architecture #810

@DrStoop

Description

@DrStoop

Hi!

I was thinking for some time about the current architecture of Ignite and how to improve the workflow both, during application development (writing training scripts) and during feature development. When writing training scripts I could not code the individual training I wanted and when trying to code the feature for my training script I ended up writing infrastructure/architecture code instead of implementing the feature. There were some kind of restrictions I first couldn't identify...

Nevertheless, after a while I found 2 twists, actually nothing big... so here I want to come up with a under-the-hood-framework to integrate into Ignite that solved all my problems and many open issues! Nice right?

With this framework integrated into Ignite you achieve an extreme nice overview during debugging, enhance Ignite to a rapid feature dev tool, can handle far more complex (individual) use cases while achieving a higher degree of automation at the same time and have quite some new features and many more possibilities for more syntactic sugar.

But now comes the... BUT as I tried to fix it, unfortunately I had to realize it won't work without major revisions. For that I went a long way to really provide proof and facts - something you can play with to make up your opinion - before daring to suggest a major revision...

So, if you're interested in a up&running "what-if-when"-Ignite version with the 2 twists below untwisted, please have a look at the repository and the documentation and leave me a feedback - I'd really like to know your opinion.

In case enough of you like it and could imagine integrating the framework into Ignite, I could pull/request the code on an experimental branch and we see how it goes from there. (Note: I just pushed it to another repository because as far as i know you cannot pull/request a new branch - which this definitely needs.)

Everything else you need to know you will find in the Ignite Framework repo and the docu. For bugs & questions, let me know, thx!

So, set up your first coffee & enjoy playing!

Teaser from the documentation

Two issues

I am a fan of Ignite and that's why I'm trying to contribute, but I discovered 2 shortcomings in the architecture and the implementation, that caused me quite some restrictions and coding infrastructure instead of programming new features (what I actually wanted to do). The issues are:

  • Engine centered architecture: In current Ignite the Engine is the architectural center with the training state as attribute. The training state atttribute is a transient object that is only instantiated when the Engine is in run-mode and vanishes afterwards. Also the state holds only a selective fraction of all variables and parameters that make up the real training state. So Engine is a kind of static object and state is transient. This does not represent the reality of the training process. In reality the training starts with an initial state holding all variables, parameters including e.g. model variables, hyperparameters etc. which then are modified while the state goes through different transitions. The main transitions of the state are Engines (normally more than one). So the state should be the architectual center holding ALL variables, parameters, values, transitions etc. and the Engine is (just) the main trainsition of the state. This small twist causes quite some complications for features and APIs which are listed below.
  • Event is broken in many pieces: Currently an Event is an Enum that has to be explicitly fire_evented, and implicitly _fire_evented so the event_handlers handle further callbacks. The Event is always fired after some other training value has changed, e.g. the model output was updated ITERATION_COMPLETED is fired. Also if you want to fire a non-standard event, you first have to create it, register it at each Engine that is supposed to use it and then the firing has to be implemented... But in reality an Event is nothing more than a value change of a training state variable that triggers callbacks. So all these pieces above can be put together by implementing a state variable as a descriptor.

Improvements from an underlying framework

You will experience the improvements given by the framework when working on all 3 levels: application implementation, feature development and framework development. The separation of these working areas is already the first improvement. Try out the benifits in detail & hands-on for the first to levels in the Quickunderstanding Application and Quickunderstanding Feature Dev.

Before you go through the theoretically described enhancements these few no-comment-teasers of the training state in the debugger will give you nice insights what's ahead. It shows the Ignite example mnist_with_tensorboard.py transferred to the framework architecture just before the engines are started:

teaser_state_in_debugger

And here the engine state.engines.trainer unfolded:

teaser_engine_in_debugger

Or setting up all the below Tensorboard charts with these two simple comands:

# Automatically identify and generate metric chats comparing the different engines
EnginesMetricsComparisonCharts(x_axis_ref=state.trainer.n_samples_ref, n_identical_metric_name_suffixes=1)
# Automatically generate for each engine a summary of all metric charts
EnginesMetricsCharts(x_axes_refs=state.trainer.n_samples_ref, n_identical_metric_name_suffixes=1)

By the way, if you had set up 10x more metrics and some more engines, these two command would not change to provide all comparative and single metric charts of all engines.

teaser_tensorboard

Soooo, if you're intrested, then grab a coffee and press >>>PLAY<<<!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions