Add gradient cumulative optimizer #784

ZhiyuanChen · 2021-01-12T08:25:10Z

fixes #190

fixes open-mmlab#190

ZhiyuanChen · 2021-01-12T08:25:37Z

Unit tests to be added

mmcv/runner/hooks/optimizer.py

codecov · 2021-01-12T08:42:16Z

Codecov Report

Merging #784 (72ce7e3) into master (f169fb5) will decrease coverage by 0.07%.
The diff coverage is 81.21%.

@@            Coverage Diff             @@
##           master     #784      +/-   ##
==========================================
- Coverage   62.89%   62.81%   -0.08%     
==========================================
  Files         144      145       +1     
  Lines        8467     8702     +235     
  Branches     1520     1574      +54     
==========================================
+ Hits         5325     5466     +141     
- Misses       2874     2971      +97     
+ Partials      268      265       -3

Flag	Coverage Δ
unittests	`62.81% <81.21%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcv/cnn/alexnet.py	`26.08% <0.00%> (-4.35%)`	⬇️
mmcv/cnn/resnet.py	`12.19% <0.00%> (-0.61%)`	⬇️
mmcv/cnn/vgg.py	`11.11% <0.00%> (-1.02%)`	⬇️
mmcv/onnx/onnx_utils/symbolic_helper.py	`0.00% <0.00%> (-19.88%)`	⬇️
mmcv/ops/roi_align.py	`58.75% <0.00%> (-1.25%)`	⬇️
mmcv/visualization/image.py	`10.76% <0.00%> (-0.17%)`	⬇️
mmcv/ops/nms.py	`34.43% <8.33%> (ø)`
mmcv/runner/hooks/optimizer.py	`22.85% <20.00%> (-0.90%)`	⬇️
mmcv/runner/iter_based_runner.py	`53.95% <50.00%> (ø)`
mmcv/utils/parrots_jit.py	`78.94% <66.66%> (+2.47%)`	⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f169fb5...5087bec. Read the comment docs.

nbei · 2021-02-04T15:09:09Z

Please fix the linting error

nbei

Unit tests are missing.
The detailed implementation should be clarified. Please see my comments.

nbei · 2021-02-04T15:13:19Z

mmcv/runner/hooks/optimizer.py

@@ -34,6 +34,27 @@ def after_train_iter(self, runner):
        runner.optimizer.step()


+@HOOKS.register_module()
+class GradientCumulativeOptimizerHook(OptimizerHook):
+


Docs are missing.

nbei · 2021-02-04T15:30:12Z

mmcv/runner/hooks/optimizer.py

+        self.cumulative_iters = cumulative_iters
+
+    def after_train_iter(self, runner):
+        runner.outputs['loss'] = runner.outputs['loss'] / self.cumulative_iters


I'm confusing about the detailed implementation of this function. Could you please offer a reference for this implementation to show that it is a general case?

In my opinion, the accumulative gradients are adopted to avoid large batch sizes. Is it right? If it's right, why should we divide the self.cumulative_iters?

Gradient accumulate is to achieve an equivalent larger batch size with small batch size. Therefore, the loss should be normalised. See more at https://discuss.pytorch.org/t/pytorch-gradient-accumulation/55955/2

Ok, got it. You may specify this in the docs.

In addition, please also fix the corner case where total_iters % cumulative_iters != 0

Hello, isn't BatchNormalization an issue as mentioned in here https://gist.github.com/thomwolf/ac7a7da6b1888c2eeac8ac8b9b05d3d3#gistcomment-3381285

@ggalan87 Yes the behavior of batch normal is different, however, not all networks contain batchnorm

ZhiyuanChen · 2021-02-18T14:48:57Z

As mentioned by @ggalan87
We probably need to raise a warning when runner.model contains nn.BatchNorm.

nbei · 2021-02-21T06:42:32Z

Hi @ZhiyuanChen , many thanks for your contribution. Please fix the linting error first. It seems that you have not adopted the pre-commit hook as requested in CONTRIBUTING.

nbei

See comments.

nbei · 2021-02-21T06:44:40Z

mmcv/runner/hooks/optimizer.py

@@ -40,11 +40,22 @@ class GradientCumulativeOptimizerHook(OptimizerHook):
    def __init__(self, grad_clip=None, cumulative_iters=1):
        super(GradientCumulativeOptimizerHook, self).__init__(grad_clip)
        self.cumulative_iters = cumulative_iters
+        self.divisible_ietrs = 0


self.divisible_iters

nbei · 2021-02-21T06:53:30Z

mmcv/runner/hooks/optimizer.py

+        self.initialized = False
+
+    def _init(self, runner):
+        self.divisible_ietrs = runner.max_iters // self.cumulative_iters * self.cumulative_iters


There is another corner case where users resume from iter=2 but the cumulative_iters=4. It seems this implementation will bring wrong gradients. If no good solutions, please put a warning here.

nbei · 2021-02-21T06:54:32Z

mmcv/runner/hooks/optimizer.py

+        self.remainder_iters = 0
+        self.initialized = False
+
+    def _init(self, runner):


Please put a warning for the usage of BN.

* Add gradient cumulative optimizer fixes #190 * Update optimizer.py * Update optimizer.py * fix loss scale improperly in last equivalent_iter * Add `GradientCumulativeOptimizerHook` in `__init__.py`. * Add docstring of `GradientCumulativeOptimizerHook`. * Add type check, BN warning and resume warning. And fix typo, lint the code. * Add unit test * Update docstring example. * Change GradientCumulativeOptimizerHook `__init__` arguments. * Add GradientCumulativeOptimzierHook unit tests with IterBasedRunner. * Add GradientCumulativeFp16OptimizerHook. * Add unit tests of GradientCumulativeFp16OptimizerHook * Use '!=' instead of '>' to determine resume Co-authored-by: Zhiyuan Chen <this@zyc.ai>

zhouzaida · 2021-08-24T13:39:17Z

closed by #1221

Add gradient cumulative optimizer

7ea95c3

fixes open-mmlab#190

ZhiyuanChen added 2 commits January 12, 2021 16:32

Update optimizer.py

bfa748f

Update optimizer.py

04a0fef

ZhiyuanChen commented Jan 12, 2021

View reviewed changes

mmcv/runner/hooks/optimizer.py Outdated Show resolved Hide resolved

hellock requested a review from nbei February 4, 2021 09:44

nbei reviewed Feb 4, 2021

View reviewed changes

fix loss scale improperly in last equivalent_iter

5087bec

hellock requested a review from nbei February 21, 2021 03:29

nbei reviewed Feb 21, 2021

View reviewed changes

mzr1996 mentioned this pull request Jul 28, 2021

[WIP] continue PR #784 #1221

Merged

4 tasks

zhouzaida closed this Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gradient cumulative optimizer #784

Add gradient cumulative optimizer #784

ZhiyuanChen commented Jan 12, 2021

ZhiyuanChen commented Jan 12, 2021

codecov bot commented Jan 12, 2021 •

edited

nbei commented Feb 4, 2021

nbei left a comment

nbei Feb 4, 2021

nbei Feb 4, 2021

ZhiyuanChen Feb 5, 2021

nbei Feb 8, 2021

ggalan87 Feb 18, 2021

ZhiyuanChen Feb 18, 2021 •

edited

ZhiyuanChen commented Feb 18, 2021

nbei commented Feb 21, 2021

nbei left a comment

nbei Feb 21, 2021

nbei Feb 21, 2021

nbei Feb 21, 2021

zhouzaida commented Aug 24, 2021

Add gradient cumulative optimizer #784

Add gradient cumulative optimizer #784

Conversation

ZhiyuanChen commented Jan 12, 2021

ZhiyuanChen commented Jan 12, 2021

codecov bot commented Jan 12, 2021 • edited

Codecov Report

nbei commented Feb 4, 2021

nbei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhiyuanChen Feb 18, 2021 • edited

Choose a reason for hiding this comment

ZhiyuanChen commented Feb 18, 2021

nbei commented Feb 21, 2021

nbei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhouzaida commented Aug 24, 2021

codecov bot commented Jan 12, 2021 •

edited

ZhiyuanChen Feb 18, 2021 •

edited