Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add learning rate logging #135

Merged
merged 17 commits into from Dec 25, 2020
Merged

Add learning rate logging #135

merged 17 commits into from Dec 25, 2020

Conversation

cgarciae
Copy link
Collaborator

@cgarciae cgarciae commented Dec 23, 2020

Implements the same functionality from #131 using only minor modifications to elegy.Optimizer.

  • Add lr_schedule and steps_per_epoch to Optimizer.
  • Implement Optimizer.get_effective_learning_rate
  • Copy logging code from learning rate logging #131
  • Add documentation

@alexander-g Here is a proposal that is a bit simpler, closer to what I mentioned in #124. What do you think?
@charlielito should we log the learning rate automatically if available or should we create a Callback?

@cgarciae
Copy link
Collaborator Author

cgarciae commented Dec 23, 2020

The API would be like this:

model = Model(
    module=MyModule(),
    optimizer=elegy.Optimizer(
        optax.adam(1e-3),
        lr_schedule=lambda step, epoch: my_schedule(step, epoch),
        steps_per_epoch=100,
    )
)

Just to clarify, the Model.__init__.optimizer parameter now is of type

tp.Union[elegy.Optimizer, optax.GradientTransformation, None]

so you can still pass optax object directly, but to enable this additional functionality the idea is to pass the elegy.Optimizer.

An alternative strategy could be to have Model accept the lr_schedule and steps_per_epoch parameters and forward them to Optimizer, I am good with both.

@alexander-g
Copy link
Contributor

alexander-g commented Dec 23, 2020

@cgarciae My main issue with this approach is that only the learning rate multiplier is logged not the actual learning rate (here 1e-3). This is not unimportant for debugging and especially documentation purposes, when doing hyperparameter search or comparing multiple training runs.
That was the main reason for wrapping optax in #131

should we log the learning rate automatically if available or should we create a Callback?

I vote for automatically logging if a schedule is given, not logging if it stays constant. That's how Keras does it.

@cgarciae
Copy link
Collaborator Author

In my previous example I actually meant:

model = Model(
    module=MyModule(),
    optimizer=elegy.Optimizer(
        optax.adam(1.0), # set this to 1 so lr_schedule has more meaning.
        lr_schedule=lambda step, epoch: my_schedule(step, epoch),
        steps_per_epoch=100,
    )
)

I think we can get away with documenting this behaviour in the Optimizer class that you should set all other learning_rate args to 1.0. I think this solution will be easier to maintain even if we cannot enforce certain things.

Its a shame that the learning_rate parameter is not passed to ScaleState but instead its just captured by a closure, if it were we could've just inspect all the ScaleState objects. Maybe we can send a PR to optax? The change is minimal but would give us more power.

@cgarciae
Copy link
Collaborator Author

Should we move lr_schedule and steps_per_epoch to the Model's constructor or add more documentation to Model pointing to the direct use of Optimizer?

@alexander-g
Copy link
Contributor

Both is fine.
But what would be nice if elegy.Optimizer could accept a variable amount of optax.GradientTransformations which it chains. Simply for the user to write a line less. Something like:

elegy.Optimizer(
        optax.scale(1/LOSS_SCALE),
        optax.additive_weight_decay(1e-4),
        optax.sgd(0.1, momentum=1e-4),
        lr_schedule=lambda step, epoch: my_schedule(step, epoch),
        steps_per_epoch=100,
    )

@cgarciae
Copy link
Collaborator Author

We could definitely accept *args and do the chaining.

Just to clarify, are we interpreting the use of scale as change in the learning rate? Learning rate updates are implemented as

optax.scale(-learning_rate)

by most Optax optimizers, this makes the definition of a learning rate a bit fuzzy since

optax.chain(
    optax.sdg(0.5),
    optax.scale(0.5),
)

is the same as

optax.sdg(0.25)

@alexander-g
Copy link
Contributor

I would simply use the learning_rate parameter of the functions defined in optax/_src/alias.py .
If the user wants to apply more scaling that's their own issue.

@codecov-io
Copy link

codecov-io commented Dec 23, 2020

Codecov Report

Merging #135 (27618b6) into master (78777ff) will increase coverage by 0.18%.
The diff coverage is 98.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #135      +/-   ##
==========================================
+ Coverage   79.25%   79.44%   +0.18%     
==========================================
  Files         110      110              
  Lines        5143     5195      +52     
==========================================
+ Hits         4076     4127      +51     
- Misses       1067     1068       +1     
Impacted Files Coverage Δ
elegy/model/model_base.py 75.64% <95.00%> (+1.33%) ⬆️
elegy/model/__init__.py 100.00% <100.00%> (ø)
elegy/model/model_test.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78777ff...27618b6. Read the comment docs.

@cgarciae cgarciae changed the title [WIP] learning rate logging V2 learning rate logging V2 Dec 24, 2020
@cgarciae cgarciae merged commit 14fba6c into master Dec 25, 2020
@cgarciae cgarciae changed the title learning rate logging V2 Add learning rate logging Dec 25, 2020
@cgarciae cgarciae deleted the learning_rate_monitor branch December 25, 2020 02:29
@charlielito
Copy link
Collaborator

@cgarciae just to answer you I think it would be nice if there is a lr schedule, the learning rate should be logged by default, since you want to monitor that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants