Fix memory-efficient-fp16 when using update_freq other than 1 (#1219) · facebookresearch/fairseq@a4c6250

Commit

Fix memory-efficient-fp16 when using update_freq other than 1 (#1219)

Summary:
Tested the following model and verified that gnorms and losses match
the following commit:

commit 3b7cf75
Author: m_fomicheva <mari.fomicheva@gmail.com>
Date:   Wed Jul 8 13:04:55 2020 -0700

The loss and gnorm are identical to the number of digits reported in the logs
and the ppl is very close to many signficant digits.

Thanks again to Jun Ru for reporting.

Pull Request resolved: fairinternal/fairseq-py#1219

Test Plan:
CUDA_VISIBLE_DEVICES=0 fairseq-train --task language_modeling   data-bin/wikitext-103   --save-dir checkpoints/transformer_wikitext-103   --arch transformer_lm --share-decoder-input-output-embed   --dropout 0.1   --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0   --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07   --tokens-per-sample 512 --sample-break-mode none   --max-tokens 2048 --update-freq 16   --max-update 50000  --memory-efficient-fp16 --no-progress-bar --log-interval 1 --seed 4

Before (commit 3b7cf75):

2020-07-15 12:17:28 | INFO | train_inner | epoch 001:     45 / 3151 loss=19.083, ppl=555252, wps=7165.8, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=208
2020-07-15 12:17:33 | INFO | train_inner | epoch 001:     46 / 3151 loss=19.042, ppl=539620, wps=7176.6, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=213
2020-07-15 12:17:37 | INFO | train_inner | epoch 001:     47 / 3151 loss=18.908, ppl=492042, wps=7188.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=217
2020-07-15 12:17:42 | INFO | train_inner | epoch 001:     48 / 3151 loss=18.894, ppl=487224, wps=7192, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=222
2020-07-15 12:17:47 | INFO | train_inner | epoch 001:     49 / 3151 loss=18.829, ppl=465781, wps=7182.5, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=226
2020-07-15 12:17:51 | INFO | train_inner | epoch 001:     50 / 3151 loss=18.752, ppl=441564, wps=7185.4, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=231

After:

2020-07-15 15:13:10 | INFO | train_inner | epoch 001:     45 / 3151 loss=19.083, ppl=555249, wps=7220.5, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=207
2020-07-15 15:13:14 | INFO | train_inner | epoch 001:     46 / 3151 loss=19.042, ppl=539617, wps=7216.3, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=212
2020-07-15 15:13:19 | INFO | train_inner | epoch 001:     47 / 3151 loss=18.908, ppl=492041, wps=7220.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=216
2020-07-15 15:13:24 | INFO | train_inner | epoch 001:     48 / 3151 loss=18.894, ppl=487228, wps=7229.4, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=221
2020-07-15 15:13:28 | INFO | train_inner | epoch 001:     49 / 3151 loss=18.829, ppl=465783, wps=7231.2, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=225
2020-07-15 15:13:33 | INFO | train_inner | epoch 001:     50 / 3151 loss=18.752, ppl=441559, wps=7224.5, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=230

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Reviewed By: myleott

Differential Revision: D22560914

Pulled By: msbaines

fbshipit-source-id: f2fdc3daa46de0b75f26cb4d5712e92d1a820d60

Loading branch information

Mandeep Baines authored and facebook-github-bot committed Jul 16, 2020

1 parent 9828bf5 commit a4c6250

fairseq/optim/fp16_optimizer.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -341,7 +341,6 @@ def backward(self, loss): @@
             """
             if self.scaler is not None:
                 loss = loss * self.scaler.loss_scale
-                self._multiply_factor /= float(self.scaler.loss_scale)
             loss.backward()
         def _unscale_grads(self):
@@ Expand Down Expand Up / @@ -396,7 +395,8 @@ def step(self, closure=None): @@
         def zero_grad(self):
             """Clears the gradients of all optimized parameters."""
             self.wrapped_optimizer.zero_grad()
-            self._multiply_factor = 1.
+            if self.scaler is not None:
+                self._multiply_factor = 1. / float(self.scaler.loss_scale)
     class MemoryEfficientFP16Optimizer(_MemoryEfficientFP16OptimizerMixin, optim.FairseqOptimizer):
@@ Expand Down @@

0 comments on commit `a4c6250`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `a4c6250`

Commit

There are no files selected for viewing

0 comments on commit a4c6250

0 comments on commit `a4c6250`