[Feature] Add DCFF (#295)

* add ChannelGroup (#250) * rebase new dev-1.x * modification for adding config_template * add docstring to channel_group.py * add docstring to mutable_channel_group.py * rm channel_group_cfg from Graph2ChannelGroups * change choice type of SequentialChannelGroup from float to int * add a warning about group-wise conv * restore __init__ of dynamic op * in_channel_mutable -> mutable_in_channel * rm abstractproperty * add a comment about VT * rm registry for ChannelGroup * MUTABLECHANNELGROUP -> ChannelGroupType * refine docstring of IndexDict * update docstring * update docstring * is_prunable -> is_mutable * update docstring * fix error in pre-commit * update unittest * add return type * unify init_xxx apit * add unitest about init of MutableChannelGroup * update according to reviews * sequential_channel_group -> sequential_mutable_channel_group Co-authored-by: liukai <liukai@pjlab.org.cn> * Add BaseChannelMutator and refactor Autoslim (#289) * add BaseChannelMutator * add autoslim * tmp * make SequentialMutableChannelGroup accpeted both of num and ratio as choice. and supports divisior * update OneShotMutableChannelGroup * pass supernet training of autoslim * refine autoslim * fix bug in OneShotMutableChannelGroup * refactor make_divisible * fix spell error: channl -> channel * init_using_backward_tracer -> init_from_backward_tracer init_from_fx_tracer -> init_from_fx_tracer * refine SequentialMutableChannelGroup * let mutator support models with dynamicop * support define search space in model * tracer_cfg -> parse_cfg * refine * using -> from * update docstring * update docstring Co-authored-by: liukai <liukai@pjlab.org.cn> * tmpsave * migrate ut * tmpsave2 * add loss collector * refactor slimmable and add l1-norm (#291) * refactor slimmable and add l1-norm * make l1-norm support convnd * update get_channel_groups * add l1-norm_resnet34_8xb32_in1k.py * add pretrained to resnet34-l1 * remove old channel mutator * BaseChannelMutator -> ChannelMutator * update according to reviews * add readme to l1-norm * MBV2_slimmable -> MBV2_slimmable_config Co-authored-by: liukai <liukai@pjlab.org.cn> * update config * fix md & pytorch support <1.9.0 in batchnorm init * Clean old codes. (#296) * remove old dynamic ops * move dynamic ops * clean old mutable_channels * rm OneShotMutableChannel * rm MutableChannel * refine * refine * use SquentialMutableChannel to replace OneshotMutableChannel * refactor dynamicops folder * let SquentialMutableChannel support float Co-authored-by: liukai <liukai@pjlab.org.cn> * fix ci * ci fix py3.6.x & add mmpose * ci fix py3.6.9 in utils/index_dict.py * fix mmpose * minimum_version_cpu=3.7 * fix ci 3.7.13 * fix pruning &meta ci * support python3.6.9 * fix py3.6 import caused by circular import patch in py3.7 * fix py3.6.9 * Add channel-flow (#301) * base_channel_mutator -> channel_mutator * init * update docstring * allow omitting redundant configs for channel * add register_mutable_channel_to_a_module to MutableChannelContainer * update according to reviews 1 * update according to reviews 2 * update according to reviews 3 * remove old docstring * fix error * using->from * update according to reviews * support self-define input channel number * update docstring * chanenl -> channel_elem Co-authored-by: liukai <liukai@pjlab.org.cn> Co-authored-by: jacky <jacky@xx.com> * support >=3.7 * support py3.6.9 * Rename: ChannelGroup -> ChannelUnit (#302) * refine repr of MutableChannelGroup * rename folder name * ChannelGroup -> ChannelUnit * filename in units folder * channel_group -> channel_unit * groups -> units * group -> unit * update * get_mutable_channel_groups -> get_mutable_channel_units * fix bug * refine docstring * fix ci * fix bug in tracer Co-authored-by: liukai <liukai@pjlab.org.cn> * update new channel config format * update pruning refactor * update merged pruning * update commit * fix dynamic_conv_mixin * update comments: readme&dynamic_conv_mixins.py * update readme * move kl softmax channel pooling to op by comments * fix comments: fix redundant & split README.md * dcff in ItePruneAlgorithm * partial dynamic params for fuseconv * add step_freq & prune_time check * update comments * update comments * update comments * fix ut * fix gpu ut & revise step_freq in ItePruneAlgorithm * update readme * revise ItePruneAlgorithm * fix docs * fix dynamic_conv attr * fix ci Co-authored-by: LKJacky <108643365+LKJacky@users.noreply.github.com> Co-authored-by: liukai <liukai@pjlab.org.cn> Co-authored-by: zengyi.vendor <zengyi.vendor@sensetime.com> Co-authored-by: jacky <jacky@xx.com>
open-mmlab · Nov 23, 2022 · 76c3773 · 76c3773
1 parent 18fc50f
commit 76c3773
Show file tree

Hide file tree

Showing 50 changed files with 4,746 additions and 114 deletions.
diff --git a/.circleci/test.yml b/.circleci/test.yml
@@ -70,6 +70,7 @@ jobs:
             pip install git+https://github.com/open-mmlab/mmclassification.git@dev-1.x
             pip install git+https://github.com/open-mmlab/mmdetection.git@dev-3.x
             pip install git+https://github.com/open-mmlab/mmsegmentation.git@dev-1.x
+            python -m pip install git+ssh://git@github.com/open-mmlab/mmpose.git@dev-1.x
             pip install -r requirements.txt
       - run:
           name: Build and install
@@ -163,7 +164,7 @@ workflows:
           torchvision: 0.13.1
           python: 3.9.0
           requires:
-            - minimum_version_cpu
+            - lint
       - hold:
           type: approval
           requires:

diff --git a/configs/pruning/mmcls/dcff/README.md b/configs/pruning/mmcls/dcff/README.md
@@ -0,0 +1,82 @@
+# Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion
+
+## Abstract
+
+The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select “important” filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computationeconomical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an intersimilarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top1 accuracy on ILSVRC-2012.
+
+![pipeline](https://user-images.githubusercontent.com/31244134/189286581-722853ba-c6d7-4a39-b902-37995b444c71.jpg)
+
+## Results and models
+
+### 1. Classification
+
+| Dataset  |   Backbone   | Params(M) | FLOPs(M) | lr_type | Top-1 (%) | Top-5 (%) |                     CPrate                      |                        Config                        |           Download           |
+| :------: | :----------: | :-------: | :------: | :-----: | :-------: | :-------: | :---------------------------------------------: | :--------------------------------------------------: | :--------------------------: |
+| ImageNet | DCFFResNet50 |   15.16   |   2260   |  step   |   73.96   |   91.66   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmcls/dcff/dcff_resnet_8xb32_in1k.py) | [model](<>) \| \[log\] (\<>) |
+
+### 2. Detection
+
+| Dataset |   Method    |   Backbone   |  Style  | Lr schd | Params(M) | FLOPs(M) | bbox AP |                     CPrate                      |                              Config                               |           Download           |
+| :-----: | :---------: | :----------: | :-----: | :-----: | :-------: | :------: | :-----: | :---------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | Faster_RCNN | DCFFResNet50 | pytorch |  step   |   33.31   |  168320  |  35.8   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+### 3. Segmentation
+
+|  Dataset   |  Method   |    Backbone     | crop size | Lr schd | Params(M) | FLOPs(M) | mIoU  |                               CPrate                                |                                Config                                 |           Download           |
+| :--------: | :-------: | :-------------: | :-------: | :-----: | :-------: | :------: | :---: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: | :--------------------------: |
+| Cityscapes | PointRend | DCFFResNetV1c50 | 512x1024  |  160k   |   18.43   |  74410   | 76.75 | \[0.0, 0.0, 0.0\] + \[0.35, 0.4, 0.1\] * 10 + \[0.3, 0.3, 0.1\] * 6 | [config](../../mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py) | [model](<>) \| \[log\] (\<>) |
+
+### 4. Pose
+
+| Dataset |     Method      |   Backbone   | crop size | total epochs | Params(M) | FLOPs(M) |  AP  |                           CPrate                           |                              Config                               |           Download           |
+| :-----: | :-------------: | :----------: | :-------: | :----------: | :-------: | :------: | :--: | :--------------------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | TopDown HeatMap | DCFFResNet50 |  256x192  |     300      |   26.95   |   4290   | 68.3 | \[0.0\] + \[0.2, 0.2, 0.1\] * 10 + \[0.15, 0.15, 0.1\] * 6 | [config](../../mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+## Citation
+
+```latex
+@article{lin2021training,
+  title={Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion},
+  author={Lin, Mingbao and Ji, Rongrong and Chen, Bohong and Chao, Fei and Liu, Jianzhuang and Zeng, Wei and Tian, Yonghong and Tian, Qi},
+  journal={arXiv preprint arXiv:2107.06916},
+  year={2021}
+}
+```
+
+## Getting Started
+
+### Generate channel_config file
+
+Generate `resnet_cls.json` with `tools/get_channel_units.py`.
+
+```bash
+python tools/get_channel_units.py
+  configs/pruning/mmcls/dcff/dcff_resnet50_8xb32_in1k.py \
+  -c -i --output-path=configs/pruning/mmcls/dcff/resnet_cls.json
+```
+
+Then set layers' pruning rates `target_pruning_ratio` by `resnet_cls.json`.
+
+### Train DCFF
+
+#### Classification
+
+##### ImageNet
+
+```bash
+sh tools/slurm_train.sh $PARTITION $JOB_NAME \
+  configs/pruning/mmcls/dcff/dcff_resnet50_8xb32_in1k.py \
+  $WORK_DIR
+```
+
+### Test DCFF
+
+#### Classification
+
+##### ImageNet
+
+```bash
+sh tools/slurm_test.sh $PARTITION $JOB_NAME \
+  configs/pruning/mmcls/dcff/dcff_compact_resnet50_8xb32_in1k.py \
+  $WORK_DIR
+```
diff --git a/configs/pruning/mmcls/dcff/dcff_compact_resnet_8xb32_in1k.py b/configs/pruning/mmcls/dcff/dcff_compact_resnet_8xb32_in1k.py
@@ -0,0 +1,5 @@
+_base_ = ['dcff_resnet_8xb32_in1k.py']
+
+# model settings
+model = _base_.model
+model['is_deployed'] = True
diff --git a/configs/pruning/mmcls/dcff/dcff_resnet_8xb32_in1k.py b/configs/pruning/mmcls/dcff/dcff_resnet_8xb32_in1k.py
@@ -0,0 +1,81 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+stage_ratio_1 = 0.65
+stage_ratio_2 = 0.6
+stage_ratio_3 = 0.9
+stage_ratio_4 = 0.7
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.0.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.0.conv3_(0, 256)_256': stage_ratio_3,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.2.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv2_(0, 64)_64': stage_ratio_2,
+    # block 1 [0.65, 0.6] downsample=[0.9]
+    'backbone.layer2.0.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.0.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.0.conv3_(0, 512)_512': stage_ratio_3,
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.1.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.2.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.3.conv2_(0, 128)_128': stage_ratio_2,
+    # block 2 [0.65, 0.6] downsample=[0.9]
+    'backbone.layer3.0.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.0.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': stage_ratio_3,
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.1.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.2.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.3.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv2_(0, 256)_256': stage_ratio_4,
+    # block 3 [0.65, 0.6]*2+[0.7, 0.7]*2 downsample=[0.9]
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv3_(0, 2048)_2048': stage_ratio_3,
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.1.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv2_(0, 512)_512': stage_ratio_4
+    # block 4 [0.7, 0.7] downsample=[0.9]
+}
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001))
+param_scheduler = dict(
+    type='MultiStepLR', by_epoch=True, milestones=[30, 60, 90], gamma=0.1)
+train_cfg = dict(by_epoch=True, max_epochs=120, val_interval=1)
+
+data_preprocessor = {'type': 'mmcls.ClsDataPreprocessor'}
+
+# model settings
+model = dict(
+    _scope_='mmrazor',
+    type='DCFF',
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False),
+    mutator_cfg=dict(
+        type='DCFFChannelMutator',
+        channel_unit_cfg=dict(
+            type='DCFFChannelUnit', default_args=dict(choice_mode='ratio')),
+        parse_cfg=dict(
+            type='BackwardTracer',
+            loss_calculator=dict(type='ImageClassifierPseudoLoss'))),
+    target_pruning_ratio=target_pruning_ratio,
+    step_freq=1,
+    linear_schedule=False,
+    is_deployed=False)