feat: lr scheduler #248

tadejsv · 2021-11-30T15:09:04Z

Implements a learning rate scheduler

TODO

codecov · 2021-11-30T15:18:17Z

Codecov Report

Merging #248 (7348642) into main (3768b01) will increase coverage by 0.66%.
The diff coverage is 98.41%.

@@            Coverage Diff             @@
##             main     #248      +/-   ##
==========================================
+ Coverage   87.88%   88.54%   +0.66%     
==========================================
  Files          37       37              
  Lines        1799     1834      +35     
==========================================
+ Hits         1581     1624      +43     
+ Misses        218      210       -8

Flag	Coverage Δ
finetuner	`88.54% <98.41%> (+0.66%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
finetuner/tuner/base.py	`88.23% <94.11%> (+0.73%)`	⬆️
finetuner/__init__.py	`71.87% <100.00%> (ø)`
finetuner/helper.py	`74.28% <100.00%> (+0.75%)`	⬆️
finetuner/tuner/__init__.py	`79.16% <100.00%> (ø)`
finetuner/tuner/callback/wandb_logger.py	`92.00% <100.00%> (+1.09%)`	⬆️
finetuner/tuner/keras/__init__.py	`97.64% <100.00%> (+0.11%)`	⬆️
finetuner/tuner/paddle/__init__.py	`92.38% <100.00%> (+0.46%)`	⬆️
finetuner/tuner/pytorch/__init__.py	`99.03% <100.00%> (+0.08%)`	⬆️
finetuner/tuner/state.py	`100.00% <100.00%> (ø)`
finetuner/tuner/evaluation.py	`98.36% <0.00%> (+0.17%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3768b01...7348642. Read the comment docs.

JoanFM · 2021-11-30T15:18:17Z

finetuner/__init__.py

@@ -28,8 +29,13 @@ def fit(
    epochs: int = 10,
    batch_size: int = 256,
    loss: Union[str, 'AnyDNN'] = 'SiameseLoss',
-    optimizer: Optional['AnyOptimizer'] = None,


why u can't pass an optimizer object?

In pytorch we move the model to a GPU if needed. However, if this is done after optimizer is initialized, this will break the optimizer. Here is a quote from pytorch documentation

If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.

The problem is that u are moving fit arguments to attributes of the objects. What a Tuner is should be determined by the model and the loss perhaps. Optimizers and learning_rates should be different call arguments possible with the same object.

Well no, the problem persists, and the factory idea is good, but they should remain call arguments and not attributes

I have thought about this, and I have decided to go against it. What is an "attribute" of Tuner and what is not is just semantics really - I can easily argue that because this is Tuner and not Model all things crucial to tuning need to go in __init__.

I have instead a very concrete argument - all the things whose state can change during training should go into __init__. This is due to checkpointing - if my training is interrupted in epoch 5, and I want to later start from that position, I need a way of restoring that state. There are two ways of doing this:

Having the fit method already start with initialized states. In this case, I can, right after initializing the trainer, call TrainingCheckpoint.load(tuner, path_to_checkpoint). This is my preferred option, as it decouples checkpoint from the trainer.

Having the checkpoint and training tightly coupled, so that you do tuner.fit(load_checkpoint=path_to_checkpoint). I don't really like this option.

And the way this is done in different frameworks:

in keras you call compile on a model first (equivalent of our trainer init), passing optimizer with learning rate, metrics, etc. In fit you only pass data and related args

in pytorch lightning you pass all the configuration (including things like number of epochs, device) in init, and pass model with data (optimizer is specified as part of the model definition) and checkpoint path in fit - here they have checkpointing tightly coupled with the training method

in pytorch ignite you have to create all objects before calling Engine.run - which takes only data and number of epochs.

Based on the prefered way of implementing checkpoints, and the observation that this is indeed how it is also done in other frameworks, I would go ahead with my implementation.

JoanFM · 2021-11-30T15:19:35Z

finetuner/tuner/__init__.py

@@ -69,26 +75,34 @@ def fit(
        - ``TripletLoss`` for Triplet network
    :param num_items_per_class: Number of items from a single class to include in
        the batch. Only relevant for class datasets
+    :param configure_optimizer: A function that allows you to provide a custom


I am not sure this usage makes much sense. Why would I make the optimizer returned depend on the model? Why can't I pass the optimizer directly?

See above - basically, in pytorch to construct the optimizer you need to pass the parameters of the model (and same in paddle paddle)

JoanFM · 2021-11-30T15:21:16Z

finetuner/tuner/base.py

+        configure_optimizer: Optional[
+            Callable[[AnyDNN], Union[AnyOptimizer, Tuple[AnyOptimizer, AnyScheduler]]]
+        ] = None,
+        learning_rate: float = 1e-3,


If we are talking about making learning_rate scheduling, it seems weird to link the identity of a Tuner to a learning_rate value, it seems more adequate to be a fit argument than an attribute

So this is only the "default" learning rate, if the user does not provide an optimizer (in which case the learning rate stays constant). I think renaming this will suffice

The point is, learning_rate is an argument of fit not something that the tuner should be adapted to. Same tuner identity should be able to work with different learning_rates.

JoanFM · 2021-11-30T15:21:26Z

finetuner/tuner/base.py

+            Callable[[AnyDNN], Union[AnyOptimizer, Tuple[AnyOptimizer, AnyScheduler]]]
+        ] = None,
+        learning_rate: float = 1e-3,
+        scheduler_step: str = 'batch',


same for this

finetuner/tuner/pytorch/__init__.py

Co-authored-by: Joan Fontanals <joan.martinez@jina.ai>

bwanglzu

i'm not sure about configure_optimizer the usage looks a bit wired to me. At least, the naming sounds wired.

One question I have is the learning_rate and default_learning_rate, does it means that if user provide a learning_rate, the default.. will be overwritten? why change the name here?

tests/integration/keras/test_lr.py

finetuner/tuner/pytorch/__init__.py

tadejsv · 2021-12-07T11:56:31Z

@bwanglzu The naming is taken from Pytorch lightning where they do the same thing.

I see I forgot to rename learning_rate to default_learning_rate in some places, I will fix that

bwanglzu

LGTM!

Tadej Svetina added 3 commits November 30, 2021 15:42

feat(tuner): add learning rate schedulers

3832c86

Merge branch 'main' into feat-lr-scheduler

34a92c7

feat: log learning rate with wandb

d7cf4d1

github-actions bot added size/m area/core area/entrypoint component/misc component/tuner labels Nov 30, 2021

JoanFM reviewed Nov 30, 2021

View reviewed changes

Tadej Svetina and others added 2 commits November 30, 2021 16:48

fix: suggestion from review

7de6b15

Co-authored-by: Joan Fontanals <joan.martinez@jina.ai>

fix: remains

4726db6

tadejsv linked an issue Dec 3, 2021 that may be closed by this pull request

Learning rate schedules #241

Closed

Tadej Svetina added 3 commits December 3, 2021 16:22

feat: add paddle learning rate schedulers

7a0bcce

feat: add Keras callback

d52b4bd

test: add tests for the learning rate scheduler

13a9252

github-actions bot added size/l area/testing This issue/PR affects testing and removed size/m labels Dec 6, 2021

Tadej Svetina and others added 6 commits December 6, 2021 11:13

fix: remove old custom optimizer tests

a3d7bef

fix: remove debugging

ab59aa7

fix: record callback in tests

e8b460e

fix: wandb test

08346f7

fix: wandb log test

e7d2dab

feat: set device as init

6d478f9

tadejsv marked this pull request as ready for review December 7, 2021 08:15

Merge branch 'main' into feat-lr-scheduler

091da56

tadejsv changed the title ~~Feat lr scheduler~~ feat: lr scheduler Dec 7, 2021

fix: paddle seed

77bc44d

bwanglzu requested changes Dec 7, 2021

View reviewed changes

tests/integration/keras/test_lr.py Show resolved Hide resolved

finetuner/tuner/pytorch/__init__.py Show resolved Hide resolved

tadejsv added 2 commits December 7, 2021 16:22

fix: rename learning rate

c8b62b8

fix: rename default_learning_rate back to default

108c13a

bwanglzu approved these changes Dec 7, 2021

View reviewed changes

fix: black

7348642

tadejsv merged commit b870678 into main Dec 7, 2021

tadejsv deleted the feat-lr-scheduler branch December 7, 2021 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: lr scheduler #248

feat: lr scheduler #248

tadejsv commented Nov 30, 2021 •

edited

codecov bot commented Nov 30, 2021 •

edited

JoanFM Nov 30, 2021

tadejsv Nov 30, 2021 •

edited

JoanFM Dec 5, 2021

JoanFM Dec 5, 2021

tadejsv Dec 7, 2021

JoanFM Nov 30, 2021

tadejsv Nov 30, 2021 •

edited

JoanFM Nov 30, 2021

tadejsv Nov 30, 2021

JoanFM Dec 5, 2021

JoanFM Nov 30, 2021

bwanglzu left a comment

tadejsv commented Dec 7, 2021

bwanglzu left a comment

feat: lr scheduler #248

feat: lr scheduler #248

Conversation

tadejsv commented Nov 30, 2021 • edited

codecov bot commented Nov 30, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

tadejsv Nov 30, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tadejsv Nov 30, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwanglzu left a comment

Choose a reason for hiding this comment

tadejsv commented Dec 7, 2021

bwanglzu left a comment

Choose a reason for hiding this comment

tadejsv commented Nov 30, 2021 •

edited

codecov bot commented Nov 30, 2021 •

edited

tadejsv Nov 30, 2021 •

edited

tadejsv Nov 30, 2021 •

edited