Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error in keras Learning Rate Scheduler #3135

Merged
merged 1 commit into from
Aug 28, 2021

Conversation

iitmdinesh
Copy link
Contributor

  • When the LearningRateScheduleCallback is initialized with multiplier = constant, it does not work correctly (there is no learning rate decay): more specifically learning rates = initial_lr * multiplier, initial_lr * multiplier, initial_lr * multiplier,... whereas the expectation it decays as initial_lr, initial_lr * multiplier, initial_lr * multiplier^2 and such
  • Another nit is when multiplier = constant, there is no need to enforce that staircase = True. The change made to fix the above mistake can handle staircase = False as well

Checklist before submitting

  • [Y] Did you read the contributor guide?
  • [Y] Did you update the docs?
  • [N] Did you write any tests to validate this change?
  • [N] Did you update the CHANGELOG, if this change affects users?

Description

Fixes issue identified in Uber, which uses this repository. If multiplier is set to a constant, we realized by looking at the learning rate logs that it remained a constant equal to initial_lr * multiplier.

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

 - When the LearningRateScheduleCallback is initialized with multiplier = constant, it does not work correctly (there is no learning rate decay): more specifically learning rates = initial_lr * multiplier, initial_lr * multiplier, initial_lr * multiplier,... whereas the expectation it decays as  initial_lr,  initial_lr * multiplier,  initial_lr * multiplier^2 and such
 - Another nit is when multiplier = constant, there is no need to enforce that staircase = True. The change made to fix the above mistake can handle staircase = False as well

Signed-off-by: Dinesh Ramasamy <89654805+iitmdinesh@users.noreply.github.com>
if not callable(multiplier):
self.staircase = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove it? Aren't we supposed to apply multiplier only once per epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is we should not override what the user has configured. I can set staircase = True or staircase = False and both would work fine with this new formulation

self.staircase = True
self.multiplier = lambda epoch: multiplier
# If multiplier is a constant, it corresponds to exponential decay
self.multiplier = lambda epoch: multiplier ** epoch
Copy link
Collaborator

@irasit irasit Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good catch.

In future, instead of apply multiplier each epoch, I think we should also do add a "step" option so that we can change it every N steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. That is however currently supported by means of passing in a callable instead of a number

@github-actions
Copy link

github-actions bot commented Aug 27, 2021

Unit Test Results

     750 files  ±0       750 suites  ±0   6h 12m 43s ⏱️ ±0s
     700 tests ±0       658 ✔️ ±0       42 💤 ±0  0 ❌ ±0 
16 078 runs  ±0  11 339 ✔️ ±0  4 739 💤 ±0  0 ❌ ±0 

Results for commit 719c495. ± Comparison against base commit 719c495.

♻️ This comment has been updated with latest results.

@chongxiaoc chongxiaoc merged commit 719c495 into horovod:master Aug 28, 2021
@github-actions
Copy link

github-actions bot commented Aug 28, 2021

Unit Test Results (with flaky tests)

     843 files  ±0       843 suites  ±0   6h 29m 2s ⏱️ ±0s
     700 tests ±0       658 ✔️ ±0       41 💤 ±0  1 ❌ ±0 
18 171 runs  ±0  12 659 ✔️ ±0  5 511 💤 ±0  1 ❌ ±0 

For more details on these failures, see this check.

Results for commit 719c495. ± Comparison against base commit 719c495.

♻️ This comment has been updated with latest results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants