Major refactoring of ModelTrainer #3182

alanakbik · 2023-04-03T10:48:52Z

This PR is mostly based on #3084 by @plonerma and executes a major refactoring of the ModelTrainer.

Why? Since Flair 0.1 a very large amount of features were added to the ModelTrainer, causing the class to become very bloated and difficult to extend/debug.

This PR refactors the trainer in several ways:

it adds a Plugin logic for all non-essential features. Users can write new plugins that attach to the trainer through callbacks in different parts of the training. This gives users hopefully more flexibility in adding special features to the Flair trainer (like specific logic frameworks) .
after years of "feature bloat", some little-used features were removed. This also slims down the number of parameters the trainer can take.
similar to before, the train and fine_tune methods are "best practice" configurations for model training. A new train_custom method is added to this that gives users maximal flexibility when doing model training.

We hope that a more structured ModelTrainer will allow us to better integrate new features in the near future, such as multi-GPU and more logging support.

…ment

…r metric-recording)

…g a second time, this is more flexibel)

…nst dev loss

…ption) in the core class of the plugin system

plonerma

The introduction of train_custom in combination with the other functions which offer a set of presets makes the Trainer more flexible. Overall, the train functions are much more compact.

I am a bit unsure about the events which are not currently used in any of the plugins. I can see that one might only want to keep those which are referenced by plugins in the repository. However, I already started using some of them in my experiments and think they offer a much greater potential for extending the train function (without the need to derive a trainer class with an overwritten train function).

flair/datasets/base.py

flair/trainers/plugins/functional/anneal_on_plateau.py

flair/trainers/plugins/loggers/log_file.py

flair/trainers/plugins/loggers/loss_file.py

flair/trainers/trainer.py

plonerma added 30 commits January 13, 2023 15:15

first draft of plugin system for ModelTrainer's train function

cbb5d8f

Merge branch 'master' into pluggable_trainer

5f968bd

Removed unused variable

f5db922

Minor improvements

0b3eaca

Fixed handling of eval batch size

091c6bb

Fixed issues with evaluation

484e61a

Changed how the amp plugin works

e1b1bd6

Merge branch 'master' into pluggable_trainer

79ed0e9

Fixed test for missing attribute

38cd9e5

Merge branch 'master' into pluggable_trainer

94bcafb

Split up BasicPerformancePlugin

e895aef

Compatibility with python3.7

3dd41d7

Removed unnecessary print statement

83edc70

Fixed compliance with black

afb8f0a

Fixed various implementation mistakes in the trainer plugins

cf92208

Fixed issues with logging and resuming training

cdadacb

Formatted with isort & black

d2788b2

Fixed tensorboard plugin

a2810b3

Implemented batch loss logging

2933e9c

Corrected global step calculation (first epoch is epoch==1)

40b1813

Added MetricRecord to plugins.__init__

813dd6e

Merge branch 'master' into pluggable_trainer

33e0af5

Fixed missing **kw in training-behavior plugin

ad617cf

Improved TypeError message if hook callback does not accept a kw argu…

6752ef4

…ment

Minor improvements in training behavior plugin

b12c111

Improved error message on wrong hook callback return-type.

d119973

Fixed mistake in error message

566ba1d

Improved error message

c52224b

Pass epoch to collecting_train_return_values hook (may be relevant fo…

d9d59e0

…r metric-recording)

Count total batches in plugin instead of calculating it (when trainin…

667b9cd

…g a second time, this is more flexibel)

helpmefindaname marked this pull request as draft April 3, 2023 11:10

alanakbik and others added 20 commits April 3, 2023 15:41

Remove ununsed imports

57e5b63

Add docstrings and other refactorings

6336af6

More refactorings

d6f7772

More refactorings

ecd63db

More refactorings

88c4baa

More informative outputs when initializing model training

075cf79

Correct loss file logging of F1 and accuracy

a930088

Correctly close logfile after training

312a52b

Correctly close logfile after training | remove option to anneal agai…

918e7d0

…nst dev loss

Remove option to anneal against dev loss

907a791

Try more robust way of deleting logger

c25adce

Log loading best epoch model

c77c7cc

Do not include state dictionaries in model card

d1e39fe

Revert logger detach

33de205

Fix flake

fedc156

Fix flake

c5d8295

Fixed resetting of the processing events flag (in the case of an exce…

50928db

…ption) in the core class of the plugin system

Removed newline (for compliance with black)

cbb218d

Improve log output

c5fb639

Improve log output

2356cb6

alanakbik marked this pull request as ready for review April 11, 2023 09:24

plonerma reviewed Apr 12, 2023

View reviewed changes

alanakbik and others added 5 commits April 13, 2023 16:57

Small changes

d95dc1d

Make backward method private

791a47e

Added two more events to train_custom

2916ce0

Removed deprecated event from available events in trainer

0a3def6

Remove comment

e09663c

alanakbik merged commit a94682e into master Apr 14, 2023
1 check passed

alanakbik deleted the pluggable_trainer_detached branch April 14, 2023 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major refactoring of ModelTrainer #3182

Major refactoring of ModelTrainer #3182

alanakbik commented Apr 3, 2023 •

edited

plonerma left a comment

Major refactoring of ModelTrainer #3182

Major refactoring of ModelTrainer #3182

Conversation

alanakbik commented Apr 3, 2023 • edited

plonerma left a comment

Choose a reason for hiding this comment

alanakbik commented Apr 3, 2023 •

edited