Fix error in keras Learning Rate Scheduler #3135

iitmdinesh · 2021-08-27T16:43:58Z

When the LearningRateScheduleCallback is initialized with multiplier = constant, it does not work correctly (there is no learning rate decay): more specifically learning rates = initial_lr * multiplier, initial_lr * multiplier, initial_lr * multiplier,... whereas the expectation it decays as initial_lr, initial_lr * multiplier, initial_lr * multiplier^2 and such
Another nit is when multiplier = constant, there is no need to enforce that staircase = True. The change made to fix the above mistake can handle staircase = False as well

Checklist before submitting

[Y] Did you read the contributor guide?
[Y] Did you update the docs?
[N] Did you write any tests to validate this change?
[N] Did you update the CHANGELOG, if this change affects users?

Description

Fixes issue identified in Uber, which uses this repository. If multiplier is set to a constant, we realized by looking at the learning rate logs that it remained a constant equal to initial_lr * multiplier.

Review process to land

All tests and other checks must succeed.
At least one member of the technical steering committee must review and approve.
If any member of the technical steering committee requests changes, they must be addressed.

- When the LearningRateScheduleCallback is initialized with multiplier = constant, it does not work correctly (there is no learning rate decay): more specifically learning rates = initial_lr * multiplier, initial_lr * multiplier, initial_lr * multiplier,... whereas the expectation it decays as initial_lr, initial_lr * multiplier, initial_lr * multiplier^2 and such - Another nit is when multiplier = constant, there is no need to enforce that staircase = True. The change made to fix the above mistake can handle staircase = False as well Signed-off-by: Dinesh Ramasamy <89654805+iitmdinesh@users.noreply.github.com>

irasit · 2021-08-27T18:51:02Z

horovod/_keras/callbacks.py

        if not callable(multiplier):
-            self.staircase = True


Why remove it? Aren't we supposed to apply multiplier only once per epoch?

The idea is we should not override what the user has configured. I can set staircase = True or staircase = False and both would work fine with this new formulation

irasit · 2021-08-27T19:00:23Z

horovod/_keras/callbacks.py

-            self.staircase = True
-            self.multiplier = lambda epoch: multiplier
+            # If multiplier is a constant, it corresponds to exponential decay
+            self.multiplier = lambda epoch: multiplier ** epoch


This is good catch.

In future, instead of apply multiplier each epoch, I think we should also do add a "step" option so that we can change it every N steps.

Good point. That is however currently supported by means of passing in a callable instead of a number

github-actions · 2021-08-27T23:11:49Z

Unit Test Results

    750 files ±0     750 suites ±0 6h 12m 43s ⏱️ ±0s
    700 tests ±0     658 ✔️ ±0     42 💤 ±0 0 ❌ ±0
16 078 runs ±0 11 339 ✔️ ±0 4 739 💤 ±0 0 ❌ ±0

Results for commit 719c495. ± Comparison against base commit 719c495.

♻️ This comment has been updated with latest results.

github-actions · 2021-08-28T08:32:32Z

Unit Test Results (with flaky tests)

    843 files ±0     843 suites ±0 6h 29m 2s ⏱️ ±0s
    700 tests ±0     658 ✔️ ±0     41 💤 ±0 1 ❌ ±0
18 171 runs ±0 12 659 ✔️ ±0 5 511 💤 ±0 1 ❌ ±0

For more details on these failures, see this check.

Results for commit 719c495. ± Comparison against base commit 719c495.

♻️ This comment has been updated with latest results.

chongxiaoc requested review from chongxiaoc, tgaddair and irasit August 27, 2021 17:09

iitmdinesh force-pushed the patch-1 branch from 9ac2468 to fbce091 Compare August 27, 2021 18:35

tgaddair approved these changes Aug 27, 2021

View reviewed changes

Tixxx approved these changes Aug 27, 2021

View reviewed changes

irasit reviewed Aug 27, 2021

View reviewed changes

irasit approved these changes Aug 28, 2021

View reviewed changes

chongxiaoc merged commit 719c495 into horovod:master Aug 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error in keras Learning Rate Scheduler #3135

Fix error in keras Learning Rate Scheduler #3135

iitmdinesh commented Aug 27, 2021

irasit Aug 27, 2021

iitmdinesh Aug 27, 2021

irasit Aug 27, 2021 •

edited

Loading

iitmdinesh Aug 27, 2021

github-actions bot commented Aug 27, 2021 •

edited

Loading

github-actions bot commented Aug 28, 2021 •

edited

Loading

Fix error in keras Learning Rate Scheduler #3135

Fix error in keras Learning Rate Scheduler #3135

Conversation

iitmdinesh commented Aug 27, 2021

Checklist before submitting

Description

Review process to land

irasit Aug 27, 2021

Choose a reason for hiding this comment

iitmdinesh Aug 27, 2021

Choose a reason for hiding this comment

irasit Aug 27, 2021 • edited Loading

Choose a reason for hiding this comment

iitmdinesh Aug 27, 2021

Choose a reason for hiding this comment

github-actions bot commented Aug 27, 2021 • edited Loading

Unit Test Results

github-actions bot commented Aug 28, 2021 • edited Loading

Unit Test Results (with flaky tests)

irasit Aug 27, 2021 •

edited

Loading

github-actions bot commented Aug 27, 2021 •

edited

Loading

github-actions bot commented Aug 28, 2021 •

edited

Loading