[Feature] Add revert_sync_batchnorm #1253

gaotongxiao · 2021-08-10T12:12:28Z

Motivation

A model that uses SyncBN cannot support CPU inference. SyncBN can also cause some other issues during inference. We introduce revert_sync_batchnorm from @kapily's work (pytorch/pytorch#41081 (comment)), which can convert SyncBN in any model to BN.

Modification

Added revert_sync_batchnorm to mmcv/cnn/utils/sync_bn.py and its unittest.

BC-breaking (Optional)

No, but there could be a potential minor risk -

PyTorch provides an implementation of convert_sync_batchnorm which converts BatchNorm1D, BatchNorm2D and BatchNorm3D to SyncBatchNorm. However, it doesn't provide an inverse function for that. The reason is SyncBatchNorm neither has a strict input dimension checking nor stores the expected input dimension, whereas BatchNormxD strictly validates the input dimension (and this is the only difference between BatchNorm1D, 2D, and 3D). Therefore, if one converts BNxD to SyncBN using PyTorch's implementation and then converts it back to BN using this implementation, the input dimension check is no longer retained.

codecov · 2021-08-10T13:17:13Z

Codecov Report

Merging #1253 (35df788) into master (9341856) will increase coverage by 0.69%.
The diff coverage is 63.05%.

❗ Current head 35df788 differs from pull request most recent head 7f85e59. Consider uploading reports for the commit 7f85e59 to get more accurate results

@@            Coverage Diff             @@
##           master    #1253      +/-   ##
==========================================
+ Coverage   68.20%   68.90%   +0.69%     
==========================================
  Files         160      162       +2     
  Lines       10607    10775     +168     
  Branches     1938     1978      +40     
==========================================
+ Hits         7234     7424     +190     
+ Misses       2992     2962      -30     
- Partials      381      389       +8

Flag	Coverage Δ
unittests	`68.90% <63.05%> (+0.69%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcv/image/photometric.py	`99.27% <ø> (ø)`
mmcv/ops/carafe.py	`23.30% <ø> (ø)`
mmcv/ops/cc_attention.py	`38.70% <0.00%> (ø)`
mmcv/ops/deform_conv.py	`72.26% <0.00%> (+10.21%)`	⬆️
mmcv/ops/deform_roi_pool.py	`28.20% <ø> (ø)`
mmcv/ops/focal_loss.py	`23.89% <ø> (ø)`
mmcv/ops/fused_bias_leakyrelu.py	`30.90% <ø> (ø)`
mmcv/ops/masked_conv.py	`40.00% <ø> (ø)`
mmcv/ops/psa_mask.py	`31.81% <ø> (ø)`
mmcv/ops/sync_bn.py	`19.64% <ø> (ø)`
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9341856...7f85e59. Read the comment docs.

ZwwWayne · 2021-08-11T14:42:35Z

Please also consider/check/test MMSyncBN

gaotongxiao · 2021-08-12T01:54:22Z

@ZwwWayne It seems MMSyncBN even cannot be initialized in a non-distributed environment.

ZwwWayne · 2021-08-12T05:34:31Z

@ZwwWayne It seems MMSyncBN even cannot be initialized in a non-distributed environment.

So it is also necessary to convert it if it is used in the config for training?

gaotongxiao · 2021-08-12T06:58:48Z

@ZwwWayne A model with MMSyncBN layer cannot be built in a non-distributed environment. I think the only solution is to re-generate a Config replacing MMSyncBN with BN and use it to build the model, but the idea can be very different from the current implementation. I'm also not sure if the checkpoint is still compatible after such a conversion. BTW, what was the goal of MMSyncBN?

ZwwWayne · 2021-08-12T08:58:20Z

@ZwwWayne A model with MMSyncBN layer cannot be built in a non-distributed environment. I think the only solution is to re-generate a Config replacing MMSyncBN with BN and use it to build the model, but the idea can be very different from the current implementation. I'm also not sure if the checkpoint is still compatible after such a conversion. BTW, what was the goal of MMSyncBN?

It was written a long time ago when PyTorch does not provide SyncBN. It also supports some corner case that currently is not supported in PyTorch.

gaotongxiao · 2021-08-17T03:53:06Z

@ZwwWayne Do we have any examples using MMSyncBN?

ZwwWayne · 2021-08-17T13:17:05Z

@ZwwWayne Do we have any examples using MMSyncBN?

Simply change the SyncBN in the norm_cfg to MMSyncBN should work

tests/test_cnn/test_sync_bn.py

ZwwWayne · 2021-08-17T13:23:25Z

I suggest implementing a function to simply modify the config to change SyncBN/MMSyncBN back to BN. It works for both SyncBN and MMSyncBN and meets the demands that this PR wants to achieve. It is a little bit late to convert the model back after building SyncBN layer.

gaotongxiao · 2021-08-19T14:08:48Z

@ZwwWayne Done

mmcv/cnn/utils/sync_bn.py

tests/test_cnn/test_revert_syncbn.py

[Feature] Add revert_sync_batchnorm

628fc4a

ZwwWayne reviewed Aug 17, 2021

View reviewed changes

tests/test_cnn/test_sync_bn.py Outdated Show resolved Hide resolved

gaotongxiao and others added 6 commits August 18, 2021 15:13

support mmsyncbn (to be tested)

5be81d8

Test passed

fdcb48f

Update docstring, rename the test file

a9ca42a

remove test_sync_bn

2875ec6

add comment

6c00b06

add mmcv.ops check

9a82b5a

xvjiarui mentioned this pull request Aug 31, 2021

[Fix] Convert SyncBN to BN when training on DP open-mmlab/mmsegmentation#772

Merged

ZwwWayne mentioned this pull request Sep 7, 2021

[Enhancement] Add build_func for UPSAMPLE_LAYERS #1272

Closed

ZwwWayne reviewed Sep 7, 2021

View reviewed changes

mmcv/cnn/utils/sync_bn.py Show resolved Hide resolved

Add a comment

a340640

ZwwWayne approved these changes Sep 7, 2021

View reviewed changes

ZwwWayne reviewed Sep 7, 2021

View reviewed changes

mmcv/cnn/utils/sync_bn.py Show resolved Hide resolved

ZwwWayne reviewed Sep 7, 2021

View reviewed changes

mmcv/cnn/utils/sync_bn.py Show resolved Hide resolved

ZwwWayne reviewed Sep 7, 2021

View reviewed changes

tests/test_cnn/test_revert_syncbn.py Outdated Show resolved Hide resolved

Add notes and relax test req

7f85e59

ZwwWayne approved these changes Sep 8, 2021

View reviewed changes

ZwwWayne merged commit 642d281 into open-mmlab:master Sep 8, 2021

zhouzaida mentioned this pull request Sep 8, 2021

Iteration Plan v1.3.13 - Sep 2021 #1302

Closed

16 tasks

Junjun2016 mentioned this pull request Sep 8, 2021

[Fix] fix_torchserver1.1 open-mmlab/mmsegmentation#844

Merged

RunningLeon mentioned this pull request Jan 20, 2022

Replace convert_syncbatchnorm in mmseg open-mmlab/mmdeploy#93

Merged

3 tasks

abaqusstudent mentioned this pull request Aug 2, 2022

SyncBatchNorm does not work on CPU pytorch/pytorch#82464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add revert_sync_batchnorm #1253

[Feature] Add revert_sync_batchnorm #1253

gaotongxiao commented Aug 10, 2021

codecov bot commented Aug 10, 2021 •

edited

Loading

ZwwWayne commented Aug 11, 2021

gaotongxiao commented Aug 12, 2021

ZwwWayne commented Aug 12, 2021

gaotongxiao commented Aug 12, 2021

ZwwWayne commented Aug 12, 2021

gaotongxiao commented Aug 17, 2021 •

edited

Loading

ZwwWayne commented Aug 17, 2021

ZwwWayne commented Aug 17, 2021 •

edited

Loading

gaotongxiao commented Aug 19, 2021

[Feature] Add revert_sync_batchnorm #1253

[Feature] Add revert_sync_batchnorm #1253

Conversation

gaotongxiao commented Aug 10, 2021

Motivation

Modification

BC-breaking (Optional)

codecov bot commented Aug 10, 2021 • edited Loading

Codecov Report

ZwwWayne commented Aug 11, 2021

gaotongxiao commented Aug 12, 2021

ZwwWayne commented Aug 12, 2021

gaotongxiao commented Aug 12, 2021

ZwwWayne commented Aug 12, 2021

gaotongxiao commented Aug 17, 2021 • edited Loading

ZwwWayne commented Aug 17, 2021

ZwwWayne commented Aug 17, 2021 • edited Loading

gaotongxiao commented Aug 19, 2021

codecov bot commented Aug 10, 2021 •

edited

Loading

gaotongxiao commented Aug 17, 2021 •

edited

Loading

ZwwWayne commented Aug 17, 2021 •

edited

Loading