Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] continue PR #784 #1221

Merged
merged 15 commits into from Aug 23, 2021
Merged

[WIP] continue PR #784 #1221

merged 15 commits into from Aug 23, 2021

Conversation

mzr1996
Copy link
Member

@mzr1996 mzr1996 commented Jul 28, 2021

Motivation

This PR is the update of PR #784, to fix #190

Modification

Based on #784, add docstring and unit tests, fix a typo, apply changes request.

Use cases

>>> # Use cumulative_iters to simulate a large batch size
>>> # It is helpful when the hardware cannot handle a large batch size.
>>> loader = DataLoader(data, batch_size=64)
>>> optimizer_hook = GradientCumulativeOptimizerHook(cumulative_iters=4)
>>> # almost equals to
>>> loader = DataLoader(data, batch_size=256)
>>> optimizer_hook = OptimizerHook()

To use GradientCumulativeOptimizerHook, modify the config to

_base_ = [...]

data = dict(samples_per_gpu=16)  # Assume original samples_per_gpu=32
optimizer_config = dict(type='GradientCumulativeOptimizerHook', cumulative_iters=2)

Performance Validation

Faster RCNN (Without BatchNorm, thanks @BIGWangYuDong)

samples_per_gpu cumulative_iters bbox_mAP
2 1 0.3770
1 2 0.3770
1 1 0.3660

ResNet-34 (With BatchNorm)

samples_per_gpu cumulative_iters accuracy
32 1 73.85
16 2 73.25
16 1 73.19

ResNet-50 (Fp16, With BatchNorm)

samples_per_gpu cumulative_iters accuracy
32 1 76.32
16 2 75.92
16 1 75.81

Checklist

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  • The documentation has been modified accordingly, like docstring or example tutorials.

@zhouzaida
Copy link
Member

please provide the performance comparison before and after using the hook

@zhouzaida zhouzaida requested a review from xvjiarui July 28, 2021 03:35
@mzr1996 mzr1996 changed the title [Feature] continue PR #784 [WIP] continue PR #784 Jul 28, 2021
@codecov
Copy link

codecov bot commented Jul 28, 2021

Codecov Report

Merging #1221 (04e9c7d) into master (88d8c9e) will decrease coverage by 0.20%.
The diff coverage is 39.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1221      +/-   ##
==========================================
- Coverage   68.27%   68.06%   -0.21%     
==========================================
  Files         160      160              
  Lines       10599    10701     +102     
  Branches     1937     1965      +28     
==========================================
+ Hits         7236     7284      +48     
- Misses       2979     3026      +47     
- Partials      384      391       +7     
Flag Coverage Δ
unittests 68.06% <39.42%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmcv/runner/__init__.py 100.00% <ø> (ø)
mmcv/runner/hooks/optimizer.py 29.48% <38.83%> (+12.82%) ⬆️
mmcv/runner/hooks/__init__.py 100.00% <100.00%> (ø)
mmcv/runner/hooks/hook.py 98.14% <0.00%> (+1.85%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88d8c9e...04e9c7d. Read the comment docs.

@mzr1996 mzr1996 requested a review from zhouzaida August 2, 2021 07:51
@BIGWangYuDong
Copy link

Run Faster R-CNN with:

  • batch size = 2, do not use GradientCumulativeOptimizerHook and cumulative_iters
    got 37.7% mAP
    bbox_mAP: 0.3770, bbox_mAP_50: 0.5860, bbox_mAP_75: 0.4120, bbox_mAP_s: 0.2210, bbox_mAP_m: 0.4130, bbox_mAP_l: 0.4850, bbox_mAP_copypaste: 0.377 0.586 0.412 0.221 0.413 0.485

  • batch size = 1, use GradientCumulativeOptimizerHook, with cumulative_iters=2
    got 37.7% mAP
    bbox_mAP: 0.3770, bbox_mAP_50: 0.5860, bbox_mAP_75: 0.4050, bbox_mAP_s: 0.2180, bbox_mAP_m: 0.4120, bbox_mAP_l: 0.4850, bbox_mAP_copypaste: 0.377 0.586 0.405 0.218 0.412 0.485

  • batch size = 1, use GradientCumulativeOptimizerHook, do not use cumulative_iters
    got 36.6% mAP
    bbox_mAP: 0.3660, bbox_mAP_50: 0.5700, bbox_mAP_75: 0.3990, bbox_mAP_s: 0.2080, bbox_mAP_m: 0.4010, bbox_mAP_l: 0.4690, bbox_mAP_copypaste: 0.366 0.570 0.399 0.208 0.401 0.469

@mzr1996
Copy link
Member Author

mzr1996 commented Aug 6, 2021

please provide the performance comparison before and after using the hook

Done, updated in comments.

@zhouzaida
Copy link
Member

zhouzaida commented Aug 17, 2021

hi, have you tested the GradientCumulativeFp16OptimizerHook in mmcls or mmdet? BTW, maybe we need to add more unittest to cover the code.

@mzr1996
Copy link
Member Author

mzr1996 commented Aug 17, 2021

hi, have you tested the GradientCumulativeFp16OptimizerHook in mmcls or mmdet? BTW, maybe we need to add more unittest to cover the code.

I have updated the test results in the comment.

And I have added unit tests that covered the fp16 related hook, but codecov doesn't recognize them. I don't know why.

@zhouzaida
Copy link
Member

hi, have you tested the GradientCumulativeFp16OptimizerHook in mmcls or mmdet? BTW, maybe we need to add more unittest to cover the code.

I have updated the test results in the comment.

And I have added unit tests that covered the fp16 related hook, but codecov doesn't recognize them. I don't know why.
It perhaps skips the test. So do you test it in your local environment?

@pytest.mark.skipif(
    not torch.cuda.is_available(), reason='requires CUDA support')

@mzr1996
Copy link
Member Author

mzr1996 commented Aug 17, 2021

hi, have you tested the GradientCumulativeFp16OptimizerHook in mmcls or mmdet? BTW, maybe we need to add more unittest to cover the code.

I have updated the test results in the comment.
And I have added unit tests that covered the fp16 related hook, but codecov doesn't recognize them. I don't know why.
It perhaps skips the test. So do you test it in your local environment?

@pytest.mark.skipif(
    not torch.cuda.is_available(), reason='requires CUDA support')

Local test results:
深度截图_选择区域_20210817154659

Copy link
Member

@zhouzaida zhouzaida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhouzaida zhouzaida mentioned this pull request Aug 19, 2021
13 tasks
@ZiyiZhang27
Copy link

Did each experiment use the same learning rate?

@mzr1996
Copy link
Member Author

mzr1996 commented Nov 22, 2021

Did each experiment use the same learning rate?

Yes, since we use GradientCumulativeOptimizerHook to simulate the same batch size, they share the same learning rate.

@ZiyiZhang27
Copy link

Did each experiment use the same learning rate?

Yes, since we use GradientCumulativeOptimizerHook to simulate the same batch size, they share the same learning rate.

Including experiments that set cumulative_iters=1? Shouldn't the learning rate be reduced to half?

@ZiyiZhang27
Copy link

Did each experiment use the same learning rate?

Yes, since we use GradientCumulativeOptimizerHook to simulate the same batch size, they share the same learning rate.

Hi, can you answer my earlier question?

@mzr1996
Copy link
Member Author

mzr1996 commented Nov 30, 2021

Did each experiment use the same learning rate?

Yes, since we use GradientCumulativeOptimizerHook to simulate the same batch size, they share the same learning rate.

Hi, can you answer my earlier question?

Right, usually we need to reduce the learning rate when reducing the batch size, but these experiments are just used to check if GradientCumulativeOptimizerHook works. Maybe more experiments are needed to test the effectiveness of gradient accumulating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cumulative gradient?Using small BatchSize to simulate big BatchSize
6 participants