Skip to content

Commit

Permalink
Fix memory-efficient-fp16 when using update_freq other than 1 (#1219)
Browse files Browse the repository at this point in the history
Summary:
Tested the following model and verified that gnorms and losses match
the following commit:

commit 3b7cf75
Author: m_fomicheva <mari.fomicheva@gmail.com>
Date:   Wed Jul 8 13:04:55 2020 -0700

The loss and gnorm are identical to the number of digits reported in the logs
and the ppl is very close to many signficant digits.

Thanks again to Jun Ru for reporting.

Pull Request resolved: fairinternal/fairseq-py#1219

Test Plan:
CUDA_VISIBLE_DEVICES=0 fairseq-train --task language_modeling   data-bin/wikitext-103   --save-dir checkpoints/transformer_wikitext-103   --arch transformer_lm --share-decoder-input-output-embed   --dropout 0.1   --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0   --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07   --tokens-per-sample 512 --sample-break-mode none   --max-tokens 2048 --update-freq 16   --max-update 50000  --memory-efficient-fp16 --no-progress-bar --log-interval 1 --seed 4

Before (commit 3b7cf75):

2020-07-15 12:17:28 | INFO | train_inner | epoch 001:     45 / 3151 loss=19.083, ppl=555252, wps=7165.8, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=208
2020-07-15 12:17:33 | INFO | train_inner | epoch 001:     46 / 3151 loss=19.042, ppl=539620, wps=7176.6, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=213
2020-07-15 12:17:37 | INFO | train_inner | epoch 001:     47 / 3151 loss=18.908, ppl=492042, wps=7188.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=217
2020-07-15 12:17:42 | INFO | train_inner | epoch 001:     48 / 3151 loss=18.894, ppl=487224, wps=7192, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=222
2020-07-15 12:17:47 | INFO | train_inner | epoch 001:     49 / 3151 loss=18.829, ppl=465781, wps=7182.5, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=226
2020-07-15 12:17:51 | INFO | train_inner | epoch 001:     50 / 3151 loss=18.752, ppl=441564, wps=7185.4, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=231

After:

2020-07-15 15:13:10 | INFO | train_inner | epoch 001:     45 / 3151 loss=19.083, ppl=555249, wps=7220.5, ups=0.22, wpb=32768, bsz=64, num_updates=41, lr=5.22398e-06, gnorm=6.895, loss_scale=8, train_wall=5, wall=207
2020-07-15 15:13:14 | INFO | train_inner | epoch 001:     46 / 3151 loss=19.042, ppl=539617, wps=7216.3, ups=0.22, wpb=32768, bsz=64, num_updates=42, lr=5.34895e-06, gnorm=6.662, loss_scale=8, train_wall=5, wall=212
2020-07-15 15:13:19 | INFO | train_inner | epoch 001:     47 / 3151 loss=18.908, ppl=492041, wps=7220.8, ups=0.22, wpb=32768, bsz=64, num_updates=43, lr=5.47393e-06, gnorm=6.231, loss_scale=8, train_wall=5, wall=216
2020-07-15 15:13:24 | INFO | train_inner | epoch 001:     48 / 3151 loss=18.894, ppl=487228, wps=7229.4, ups=0.22, wpb=32768, bsz=64, num_updates=44, lr=5.5989e-06, gnorm=6.078, loss_scale=8, train_wall=5, wall=221
2020-07-15 15:13:28 | INFO | train_inner | epoch 001:     49 / 3151 loss=18.829, ppl=465783, wps=7231.2, ups=0.22, wpb=32768, bsz=64, num_updates=45, lr=5.72388e-06, gnorm=5.819, loss_scale=8, train_wall=5, wall=225
2020-07-15 15:13:33 | INFO | train_inner | epoch 001:     50 / 3151 loss=18.752, ppl=441559, wps=7224.5, ups=0.22, wpb=32768, bsz=64, num_updates=46, lr=5.84885e-06, gnorm=5.521, loss_scale=8, train_wall=5, wall=230

# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Reviewed By: myleott

Differential Revision: D22560914

Pulled By: msbaines

fbshipit-source-id: f2fdc3daa46de0b75f26cb4d5712e92d1a820d60
  • Loading branch information
Mandeep Baines authored and facebook-github-bot committed Jul 16, 2020
1 parent 9828bf5 commit a4c6250
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions fairseq/optim/fp16_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,6 @@ def backward(self, loss):
"""
if self.scaler is not None:
loss = loss * self.scaler.loss_scale
self._multiply_factor /= float(self.scaler.loss_scale)
loss.backward()

def _unscale_grads(self):
Expand Down Expand Up @@ -396,7 +395,8 @@ def step(self, closure=None):
def zero_grad(self):
"""Clears the gradients of all optimized parameters."""
self.wrapped_optimizer.zero_grad()
self._multiply_factor = 1.
if self.scaler is not None:
self._multiply_factor = 1. / float(self.scaler.loss_scale)


class MemoryEfficientFP16Optimizer(_MemoryEfficientFP16OptimizerMixin, optim.FairseqOptimizer):
Expand Down

0 comments on commit a4c6250

Please sign in to comment.