the input size of Flops is 256x256? #50

Sunting78 · 2021-11-15T07:51:09Z

https://github.com/facebookresearch/detectron2/blob/main/tools/analyze_model.py

Hi Bowen. I calculate the flop and params with the scirpt, but the result is not the same with your paper.
The maskformer_swin_small_bs16_160k.yaml is 63M Params and 111G Flops. In your paper is 63M Params and 79G Flops. Is there any problems with my calculation? When the input shape resize to 256x256 it is the similar as your paper.

python3 analyze_model.py --config-file ./configs/ade20k-150/swin/maskformer_swin_small_bs16_160k.yaml --tasks flop

Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(512, 512), max_size=2048, sample_style='choice')]
[11/15 13:41:29 detectron2]: Flops table computed from only one input sample:

module	#parameters or shape	#flops
model	63.075M	80.909G
backbone	48.839M	49.38G
backbone.patch_embed	4.896K	83.362M
backbone.patch_embed.proj	4.704K	75.497M
backbone.patch_embed.norm	0.192K	7.864M
backbone.layers	48.831M	49.282G
backbone.layers.0	0.299M	4.394G
backbone.layers.1	1.188M	4.367G
backbone.layers.2	33.16M	35.953G
backbone.layers.3.blocks	14.184M	4.567G
backbone.norm0	0.192K	7.864M
backbone.norm0.weight	(96,)
backbone.norm0.bias	(96,)
backbone.norm1	0.384K	3.932M
backbone.norm1.weight	(192,)
backbone.norm1.bias	(192,)
backbone.norm2	0.768K	1.966M
backbone.norm2.weight	(384,)
backbone.norm2.bias	(384,)
backbone.norm3	1.536K	0.983M
backbone.norm3.weight	(768,)
backbone.norm3.bias	(768,)
sem_seg_head	14.236M	27.453G
sem_seg_head.pixel_decoder	4.305M	23.56G
sem_seg_head.pixel_decoder.adapter_1	25.088K	0.424G
sem_seg_head.pixel_decoder.layer_1	0.59M	9.685G
sem_seg_head.pixel_decoder.adapter_2	49.664K	0.207G
sem_seg_head.pixel_decoder.layer_2	0.59M	2.421G
sem_seg_head.pixel_decoder.adapter_3	98.816K	0.102G
sem_seg_head.pixel_decoder.layer_3	0.59M	0.605G
sem_seg_head.pixel_decoder.layer_4	1.77M	0.453G
sem_seg_head.pixel_decoder.mask_features	0.59M	9.664G
sem_seg_head.predictor	9.932M	3.887G
sem_seg_head.predictor.transformer.decoder	9.473M	1.179G
sem_seg_head.predictor.query_embed	25.6K
sem_seg_head.predictor.input_proj	0.197M	50.332M
sem_seg_head.predictor.class_embed	38.807K	23.194M
sem_seg_head.predictor.mask_embed.layers	0.197M	0.118G
[11/15 13:41:29 detectron2]: Average GFlops for each type of operators:
[('conv', 32.83191595008), ('layer_norm', 0.22296760319999998), ('linear', 67.07614236672), ('matmul', 1.92566500224), ('group_norm', 0.0769406976), ('upsample_nearest2d', 0.00764854272), ('bmm', 0.139984896), ('einsum', 8.959275), ('upsample_bilinear2d', 0.29302461)]
[11/15 13:41:29 detectron2]: Total GFlops: 111.5±12.8

bowenc0221 · 2021-11-16T21:34:34Z

We calculate FLOPs with the corresponding training crop size: if it is 512x512 in the Table it means we calculate FLOPs with an image of size 512x512.

The augmentation in the config uses [ResizeShortestEdge(short_edge_length=(512, 512), max_size=2048, sample_style='choice')] which only resize the shorter side to 512 but the longer side could be larger than 512.

You can measure the FLOPs by feeding some dummy image of size 512x512 instead of using ADE20K images.

Sunting78 · 2021-11-18T04:11:02Z

Yes, I tried. When I calculate FLOPs with an image of size 256x256 , it can match the FLOPs reported in your paper. 512x512 can't. May you check it again, please?

bowenc0221 · 2021-11-30T19:25:22Z

I'm sure the input size is 512x512, here is my output:

[11/30 11:23:08 detectron2]: Flops table computed from only one input sample:
| module                                        | #parameters or shape   | #flops     |
|:----------------------------------------------|:-----------------------|:-----------|
| model                                         | 63.075M                | 81.079G    |
|  backbone                                     |  48.839M               |  49.38G    |
|   backbone.patch_embed                        |   4.896K               |   83.362M  |
|    backbone.patch_embed.proj                  |    4.704K              |    75.497M |
|    backbone.patch_embed.norm                  |    0.192K              |    7.864M  |
|   backbone.layers                             |   48.831M              |   49.282G  |
|    backbone.layers.0                          |    0.299M              |    4.394G  |
|    backbone.layers.1                          |    1.188M              |    4.367G  |
|    backbone.layers.2                          |    33.16M              |    35.953G |
|    backbone.layers.3.blocks                   |    14.184M             |    4.567G  |
|   backbone.norm0                              |   0.192K               |   7.864M   |
|    backbone.norm0.weight                      |    (96,)               |            |
|    backbone.norm0.bias                        |    (96,)               |            |
|   backbone.norm1                              |   0.384K               |   3.932M   |
|    backbone.norm1.weight                      |    (192,)              |            |
|    backbone.norm1.bias                        |    (192,)              |            |
|   backbone.norm2                              |   0.768K               |   1.966M   |
|    backbone.norm2.weight                      |    (384,)              |            |
|    backbone.norm2.bias                        |    (384,)              |            |
|   backbone.norm3                              |   1.536K               |   0.983M   |
|    backbone.norm3.weight                      |    (768,)              |            |
|    backbone.norm3.bias                        |    (768,)              |            |
|  sem_seg_head                                 |  14.236M               |  27.453G   |
|   sem_seg_head.pixel_decoder                  |   4.305M               |   23.56G   |
|    sem_seg_head.pixel_decoder.adapter_1       |    25.088K             |    0.424G  |
|    sem_seg_head.pixel_decoder.layer_1         |    0.59M               |    9.685G  |
|    sem_seg_head.pixel_decoder.adapter_2       |    49.664K             |    0.207G  |
|    sem_seg_head.pixel_decoder.layer_2         |    0.59M               |    2.421G  |
|    sem_seg_head.pixel_decoder.adapter_3       |    98.816K             |    0.102G  |
|    sem_seg_head.pixel_decoder.layer_3         |    0.59M               |    0.605G  |
|    sem_seg_head.pixel_decoder.layer_4         |    1.77M               |    0.453G  |
|    sem_seg_head.pixel_decoder.mask_features   |    0.59M               |    9.664G  |
|   sem_seg_head.predictor                      |   9.932M               |   3.887G   |
|    sem_seg_head.predictor.transformer.decoder |    9.473M              |    1.179G  |
|    sem_seg_head.predictor.query_embed         |    25.6K               |            |
|    sem_seg_head.predictor.input_proj          |    0.197M              |    50.332M |
|    sem_seg_head.predictor.class_embed         |    38.807K             |    23.194M |
|    sem_seg_head.predictor.mask_embed.layers   |    0.197M              |    0.118G  |
[11/30 11:23:08 detectron2]: Average GFlops for each type of operators:
[('conv', 23.630708736), ('layer_norm', 0.16134144), ('linear', 48.940320768), ('matmul', 1.413401472), ('group_norm', 0.05537792), ('upsample_nearest2d', 0.005505024), ('bmm', 0.1093632), ('einsum', 6.4485), ('upsample_bilinear2d', 0.3146752)]
[11/30 11:23:08 detectron2]: Total GFlops: 81.1±0.0

The FLOPs is 81.1G (compared to 79G in the paper), the slight increase in FLOPs is probably due to update in the fvcore package.

bowenc0221 · 2021-11-30T19:32:52Z

I have committed the script for calculating FLOPs.

Please use the following command: python tools/analyze_model.py --num-inputs 1 --tasks flop --use-fixed-input-size --config-file configs/ade20k-150/swin/maskformer_swin_small_bs16_160k.yaml MODEL.WEIGHTS ""

bowenc0221 closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the input size of Flops is 256x256? #50

the input size of Flops is 256x256? #50

Sunting78 commented Nov 15, 2021

bowenc0221 commented Nov 16, 2021

Sunting78 commented Nov 18, 2021

bowenc0221 commented Nov 30, 2021

bowenc0221 commented Nov 30, 2021

the input size of Flops is 256x256? #50

the input size of Flops is 256x256? #50

Comments

Sunting78 commented Nov 15, 2021

bowenc0221 commented Nov 16, 2021

Sunting78 commented Nov 18, 2021

bowenc0221 commented Nov 30, 2021

bowenc0221 commented Nov 30, 2021