Fix LR scheduler cooldown #3719

stephenroller · 2021-06-15T11:18:42Z

Patch description
Context:

Originally in ParlAI, fixed LR schedulers like cosine/linear would consume (warmup_updates + max_lr_steps) updates, cooling down to 0 eventually
[train] New training options for logging/validation based on number of steps #3379 changed this behavior such that only max_lr_steps would be consumed, but did not change the cooldown to be faster
This PR changes it so the full cooldown is completed by the end of max_lr_steps.

Testing steps
Adjusted CI, new assertions.

emilydinan

thanks for the fix! this lgtm

klshuster · 2021-06-15T18:28:05Z

parlai/core/torch_agent.py

        if optim_states and saved_optim_type != opt['optimizer']:
            # we changed from adam to adamax, or sgd to adam, or similar
            logging.warning('Not loading optim state since optim class changed.')
-            return False
+            return True
        elif optim_states:
            # check for any fp16/fp32 conversions we need to do
            optimstate_fp16 = 'loss_scaler' in optim_states


i can't leave a comment for it, but are the semantics correct for line 1099? the elif not optimstate_fp16 and self.fp16 block? are we returning True always because of the lower precision conversion?

klshuster · 2021-06-15T18:29:06Z

parlai/nn/lr_scheduler.py

        self.scheduler = optim.lr_scheduler.LambdaLR(optimizer, self._linear_lr)

    def _linear_lr(self, step):
        # this multiplicative factor ensures linear decay rate
-        # lr_mult = float(self.max_lr_steps - step - 1) / float(self.max_lr_steps - step)
-        lr_mult = max(0.0, 1e-6 + (1.0 - step / self.max_lr_steps) * (1 - 1e-6))
+        lr_mult = max(0.0, 1.0 - step / self.max_lr_steps)


we dont need 1e-6 anymore?

I made an executive call to let it actually go to 0 :P

Fix LR scheduler cooldown

6596144

stephenroller requested a review from klshuster June 15, 2021 11:18

facebook-github-bot added the CLA Signed label Jun 15, 2021

emilydinan approved these changes Jun 15, 2021

View reviewed changes

Also reset the optimizer on fine-tune.

230d48c

klshuster approved these changes Jun 15, 2021

View reviewed changes

Fix from Kurt

c9caa76

stephenroller merged commit d3713fe into master Jun 15, 2021

stephenroller deleted the lrschedulemax branch June 15, 2021 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LR scheduler cooldown #3719

Fix LR scheduler cooldown #3719

stephenroller commented Jun 15, 2021

emilydinan left a comment

klshuster Jun 15, 2021

stephenroller Jun 15, 2021

klshuster Jun 15, 2021

stephenroller Jun 15, 2021

Fix LR scheduler cooldown #3719

Fix LR scheduler cooldown #3719

Conversation

stephenroller commented Jun 15, 2021

emilydinan left a comment

Choose a reason for hiding this comment

klshuster Jun 15, 2021

Choose a reason for hiding this comment

stephenroller Jun 15, 2021

Choose a reason for hiding this comment

klshuster Jun 15, 2021

Choose a reason for hiding this comment

stephenroller Jun 15, 2021

Choose a reason for hiding this comment