Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lr scheduler #248

Merged
merged 19 commits into from Dec 7, 2021
Merged

feat: lr scheduler #248

merged 19 commits into from Dec 7, 2021

Conversation

tadejsv
Copy link
Contributor

@tadejsv tadejsv commented Nov 30, 2021

Implements a learning rate scheduler

image

TODO

  • pytorch implementation
  • add to logging
  • keras implementation
  • paddle implementation
  • add tests

@codecov
Copy link

codecov bot commented Nov 30, 2021

Codecov Report

Merging #248 (7348642) into main (3768b01) will increase coverage by 0.66%.
The diff coverage is 98.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #248      +/-   ##
==========================================
+ Coverage   87.88%   88.54%   +0.66%     
==========================================
  Files          37       37              
  Lines        1799     1834      +35     
==========================================
+ Hits         1581     1624      +43     
+ Misses        218      210       -8     
Flag Coverage Δ
finetuner 88.54% <98.41%> (+0.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
finetuner/tuner/base.py 88.23% <94.11%> (+0.73%) ⬆️
finetuner/__init__.py 71.87% <100.00%> (ø)
finetuner/helper.py 74.28% <100.00%> (+0.75%) ⬆️
finetuner/tuner/__init__.py 79.16% <100.00%> (ø)
finetuner/tuner/callback/wandb_logger.py 92.00% <100.00%> (+1.09%) ⬆️
finetuner/tuner/keras/__init__.py 97.64% <100.00%> (+0.11%) ⬆️
finetuner/tuner/paddle/__init__.py 92.38% <100.00%> (+0.46%) ⬆️
finetuner/tuner/pytorch/__init__.py 99.03% <100.00%> (+0.08%) ⬆️
finetuner/tuner/state.py 100.00% <100.00%> (ø)
finetuner/tuner/evaluation.py 98.36% <0.00%> (+0.17%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3768b01...7348642. Read the comment docs.

@@ -28,8 +29,13 @@ def fit(
epochs: int = 10,
batch_size: int = 256,
loss: Union[str, 'AnyDNN'] = 'SiameseLoss',
optimizer: Optional['AnyOptimizer'] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why u can't pass an optimizer object?

Copy link
Contributor Author

@tadejsv tadejsv Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pytorch we move the model to a GPU if needed. However, if this is done after optimizer is initialized, this will break the optimizer. Here is a quote from pytorch documentation

If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that u are moving fit arguments to attributes of the objects. What a Tuner is should be determined by the model and the loss perhaps. Optimizers and learning_rates should be different call arguments possible with the same object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well no, the problem persists, and the factory idea is good, but they should remain call arguments and not attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about this, and I have decided to go against it. What is an "attribute" of Tuner and what is not is just semantics really - I can easily argue that because this is Tuner and not Model all things crucial to tuning need to go in __init__.

I have instead a very concrete argument - all the things whose state can change during training should go into __init__. This is due to checkpointing - if my training is interrupted in epoch 5, and I want to later start from that position, I need a way of restoring that state. There are two ways of doing this:

  1. Having the fit method already start with initialized states. In this case, I can, right after initializing the trainer, call TrainingCheckpoint.load(tuner, path_to_checkpoint). This is my preferred option, as it decouples checkpoint from the trainer.
  2. Having the checkpoint and training tightly coupled, so that you do tuner.fit(load_checkpoint=path_to_checkpoint). I don't really like this option.

And the way this is done in different frameworks:

  • in keras you call compile on a model first (equivalent of our trainer init), passing optimizer with learning rate, metrics, etc. In fit you only pass data and related args
  • in pytorch lightning you pass all the configuration (including things like number of epochs, device) in init, and pass model with data (optimizer is specified as part of the model definition) and checkpoint path in fit - here they have checkpointing tightly coupled with the training method
  • in pytorch ignite you have to create all objects before calling Engine.run - which takes only data and number of epochs.

Based on the prefered way of implementing checkpoints, and the observation that this is indeed how it is also done in other frameworks, I would go ahead with my implementation.

@@ -69,26 +75,34 @@ def fit(
- ``TripletLoss`` for Triplet network
:param num_items_per_class: Number of items from a single class to include in
the batch. Only relevant for class datasets
:param configure_optimizer: A function that allows you to provide a custom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this usage makes much sense. Why would I make the optimizer returned depend on the model? Why can't I pass the optimizer directly?

Copy link
Contributor Author

@tadejsv tadejsv Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - basically, in pytorch to construct the optimizer you need to pass the parameters of the model (and same in paddle paddle)

configure_optimizer: Optional[
Callable[[AnyDNN], Union[AnyOptimizer, Tuple[AnyOptimizer, AnyScheduler]]]
] = None,
learning_rate: float = 1e-3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are talking about making learning_rate scheduling, it seems weird to link the identity of a Tuner to a learning_rate value, it seems more adequate to be a fit argument than an attribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is only the "default" learning rate, if the user does not provide an optimizer (in which case the learning rate stays constant). I think renaming this will suffice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is, learning_rate is an argument of fit not something that the tuner should be adapted to. Same tuner identity should be able to work with different learning_rates.

Callable[[AnyDNN], Union[AnyOptimizer, Tuple[AnyOptimizer, AnyScheduler]]]
] = None,
learning_rate: float = 1e-3,
scheduler_step: str = 'batch',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this

finetuner/tuner/pytorch/__init__.py Outdated Show resolved Hide resolved
Tadej Svetina and others added 2 commits November 30, 2021 16:48
Co-authored-by: Joan Fontanals <joan.martinez@jina.ai>
@tadejsv tadejsv linked an issue Dec 3, 2021 that may be closed by this pull request
@github-actions github-actions bot added size/l area/testing This issue/PR affects testing and removed size/m labels Dec 6, 2021
@tadejsv tadejsv marked this pull request as ready for review December 7, 2021 08:15
@tadejsv tadejsv changed the title Feat lr scheduler feat: lr scheduler Dec 7, 2021
Copy link
Member

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure about configure_optimizer the usage looks a bit wired to me. At least, the naming sounds wired.

One question I have is the learning_rate and default_learning_rate, does it means that if user provide a learning_rate, the default.. will be overwritten? why change the name here?

tests/integration/keras/test_lr.py Show resolved Hide resolved
finetuner/tuner/pytorch/__init__.py Show resolved Hide resolved
@tadejsv
Copy link
Contributor Author

tadejsv commented Dec 7, 2021

@bwanglzu The naming is taken from Pytorch lightning where they do the same thing.

I see I forgot to rename learning_rate to default_learning_rate in some places, I will fix that

Copy link
Member

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tadejsv tadejsv merged commit b870678 into main Dec 7, 2021
@tadejsv tadejsv deleted the feat-lr-scheduler branch December 7, 2021 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Learning rate schedules
3 participants