Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Turning On fast_conv_bn_eval #1202

Merged
merged 26 commits into from Jul 14, 2023
Merged

Conversation

youkaichao
Copy link
Contributor

@youkaichao youkaichao commented Jun 15, 2023

Motivation

The mmcv package will introduce a fast_conv_bn_eval feature in 2.0.1, as discussed in open-mmlab/mmcv#2807 .

This PR enables the mmengine to use that feature.

Modification

Runner tries to read from cfg to know if the fast_conv_bn_eval option is turned on. If it is, and the mmcv supports fast_conv_bn_eval, then:

  • fast_conv_bn_eval feature will be turned on for existing ConvModule.
  • Consecutive conv + bn blocks will be merged into a new ConvModule (with existing parameters).

BC-breaking (Optional)

This feature does not break the backward-compatibility. It relies on a new cfg option fast_conv_bn_eval.

Use cases (Optional)

Downstream projects can leverage this feature by adding command-line options fast_conv_bn_eval=True. They also need to specify the parts of module to use with this features. For example, fast_conv_bn_eval_modules=['backbone']. This limitation comes from the fact that whole models might be too complex to trace (e.g. detection models) to find consecutive conv-bn blocks, while parts of the model are simple enough to trace. The main usecases are to replace consecutive conv-bn blocks in the pre-trained backbone.

Checklist

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  • The documentation has been modified accordingly, like docstring or example tutorials.

@CLAassistant
Copy link

CLAassistant commented Jun 15, 2023

CLA assistant check
All committers have signed the CLA.

@youkaichao
Copy link
Contributor Author

Meeting Notes with @zhouzaida and @HAOCHENYE :

The logic happens in Runner.train, and the order of build_optim_wrapper / _init_model_weights / fast_conv_bn_eval matters.

Ideally, we would like to have _init_model_weights --> fast_conv_bn_eval --> build_optim_wrapper. However, this is not available since we have build_optim_wrapper --> _init_model_weights now.

Some users would set param groups when building optimizers. The param groups might depend on the key values of parameters. To keep these users unaffected, we should place fast_conv_bn_eval after build_optim_wrapper.

Therfore, we finally agreed on the order of build_optim_wrapper --> _init_model_weights --> fast_conv_bn_eval.

@youkaichao
Copy link
Contributor Author

Tested on MMDetection, with a small fraction of coco data, in my laptop. The optimizer works fine.

Copy link
Collaborator

@HAOCHENYE HAOCHENYE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi~ We should provide corresponding unittest for turn_on_fast_conv_bn_eval to check output will not be influenced when model is modified by it. It could be more reliable to validate that turn_on_fast_conv_bn_eval works fine with different network architecture.

mmengine/model/turn_on_fast_conv_bn_eval.py Outdated Show resolved Hide resolved
mmengine/model/turn_on_fast_conv_bn_eval.py Outdated Show resolved Hide resolved
mmengine/model/turn_on_fast_conv_bn_eval.py Outdated Show resolved Hide resolved
mmengine/runner/runner.py Outdated Show resolved Hide resolved
docs/zh_cn/common_usage/save_gpu_memory.md Outdated Show resolved Hide resolved
@HAOCHENYE
Copy link
Collaborator

BTW, we should also update this function in FlexibleRunner

@youkaichao
Copy link
Contributor Author

Hi~ We should provide corresponding unittest for turn_on_fast_conv_bn_eval to check output will not be influenced when model is modified by it. It could be more reliable to validate that turn_on_fast_conv_bn_eval works fine with different network architecture.

Unit test is intended to be used for simple functions. However, the validity of this feature usually requires days of GPUs to run detection experiments on the whole MSCOCO dataset. Do we have this much resource for unit testing?

@HAOCHENYE
Copy link
Collaborator

HAOCHENYE commented Jul 5, 2023

Hi~ We should provide corresponding unittest for turn_on_fast_conv_bn_eval to check output will not be influenced when model is modified by it. It could be more reliable to validate that turn_on_fast_conv_bn_eval works fine with different network architecture.

Unit test is intended to be used for simple functions. However, the validity of this feature usually requires days of GPUs to run detection experiments on the whole MSCOCO dataset. Do we have this much resource for unit testing?

We don't need to validate the training accuracy for the specific model. We can construct some ToyModel with different Conv + BN structures, and then validate whether the output of the modified ToyModel instance matches with the prior one. The structure should include the cases that all BN and Conv can be merged, only part of pairs can be merged, and none of the submodules can be merged.

@youkaichao
Copy link
Contributor Author

Hi~ We should provide corresponding unittest for turn_on_fast_conv_bn_eval to check output will not be influenced when model is modified by it. It could be more reliable to validate that turn_on_fast_conv_bn_eval works fine with different network architecture.

Unit test is intended to be used for simple functions. However, the validity of this feature usually requires days of GPUs to run detection experiments on the whole MSCOCO dataset. Do we have this much resource for unit testing?

We don't need to validate the training accuracy for the specific model. We can construct some ToyModel with different Conv + BN structures, and then validate whether the output of the modified ToyModel instance matches with the prior one. The structure should include the cases that all BN and Conv can be merged, only part of pairs can be merged, and none of the submodules can be merged.

Fixed in 23bc688.

@youkaichao youkaichao requested a review from HAOCHENYE July 6, 2023 05:34
@youkaichao
Copy link
Contributor Author

This PR enables fast_conv_bn_eval feature for pytorch native conv and bn. It works for all the modules whose computation graphs does not change during training.

This PR does not support dynamic computation graph (which changes across different forward calls). A possible example of dynamic computation graph is ConvModule, whose forward function can take a norm argument to control whether to use normalization layers for each forward pass. However, the ConvModule itself in mmcv does support fast_conv_bn_eval feature even in the case of dynamic computation graph. If users want to use fast_conv_bn_eval with ConvModule and a dynamic norm argument (a rather rare usecase), then they can depend on ConvModule. turn_on_fast_conv_bn_eval rather than from mmengine.model.fast_conv_bn_eval import turn_on_fast_conv_bn_eval.

To sum up:

  • If you don't use ConvModule, this PR is enough to use fast_conv_bn_eval feature.
  • If you use ConvModule and the norm argument keeps the same across forward calls, this PR is enough to use fast_conv_bn_eval feature.
  • If you use ConvModule and the norm argument changes across forward calls, this might fail torch.fx.symbolic_trace. Please do not use this PR. Instead, use ConvModule. turn_on_fast_conv_bn_eval function manually for each ConvModule.

@zhouzaida zhouzaida merged commit 40e49ff into open-mmlab:main Jul 14, 2023
14 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants