Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fx] provide an accurate estimation of memory. #1587

Merged
merged 18 commits into from Sep 14, 2022

Conversation

super-dainiu
Copy link
Contributor

@super-dainiu super-dainiu commented Sep 12, 2022

Improvements

After hacking the autograd graph and the save_tensors_hook, I managed to compute the accurate memory estimation of torchvision.models. We can compare the result with #1547.

model estimated_fwd_mem estimated_param_mem real_fwd_mem real_param_mem forward_flops backward_flops
<function resnet18 at 0x7f6345a2bca0> 128.068 MB 44.592 MB 141.258 MB 44.690 MB fwd_flop=7.312GFLOPs bwd_flop=14.566GFLOPs
<function resnet34 at 0x7f6345a2be50> 204.909 MB 83.152 MB 219.900 MB 85.538 MB fwd_flop=14.739GFLOPs bwd_flop=29.388GFLOPs
<function resnet50 at 0x7f63459bd040> 442.070 MB 97.492 MB 440.048 MB 99.353 MB fwd_flop=16.605GFLOPs bwd_flop=32.939GFLOPs
<function resnet101 at 0x7f63459bd1f0> 670.707 MB 169.942 MB 668.108 MB 173.151 MB fwd_flop=31.570GFLOPs bwd_flop=62.739GFLOPs
<function resnet152 at 0x7f63459bd3a0> 923.320 MB 229.617 MB 923.410 MB 234.590 MB fwd_flop=46.562GFLOPs bwd_flop=92.564GFLOPs
<function convnext_tiny at 0x7f102536e040> 582.937 MB 109.059 MB 539.976 MB 109.942 MB fwd_flop=17.975GFLOPs bwd_flop=137.358GFLOPs
<function convnext_small at 0x7f102536e1f0> 934.254 MB 191.588 MB 885.770 MB 192.682 MB fwd_flop=34.974GFLOPs bwd_flop=272.979GFLOPs
<function convnext_base at 0x7f102536e3a0> 1327.399 MB 337.950 MB 1242.999 MB 339.684 MB fwd_flop=61.738GFLOPs bwd_flop=484.819GFLOPs
<function convnext_large at 0x7f102536e550> 2237.441 MB 754.423 MB 2097.922 MB 757.913 MB fwd_flop=137.925GFLOPs bwd_flop=1089.767GFLOPs
<function densenet121 at 0x7f8597a48280> 520.851 MB 30.437 MB 537.939 MB 30.859 MB fwd_flop=11.656GFLOPs bwd_flop=22.993GFLOPs
<function densenet161 at 0x7f8597a48430> 1024.691 MB 109.409 MB 1053.578 MB 111.325 MB fwd_flop=31.510GFLOPs bwd_flop=62.421GFLOPs
<function densenet169 at 0x7f8597a485e0> 646.706 MB 53.976 MB 665.817 MB 54.724 MB fwd_flop=13.826GFLOPs bwd_flop=27.266GFLOPs
<function densenet201 at 0x7f8597a48790> 843.069 MB 76.347 MB 868.206 MB 77.392 MB fwd_flop=17.666GFLOPs bwd_flop=34.832GFLOPs
<function wide_resnet50_2 at 0x7fc4f37ae940> 707.643 MB 262.769 MB 735.279 MB 265.056 MB fwd_flop=45.906GFLOPs bwd_flop=91.476GFLOPs
<function wide_resnet101_2 at 0x7fc4f37aeaf0> 1137.159 MB 484.034 MB 1184.539 MB 501.862 MB fwd_flop=91.476GFLOPs bwd_flop=182.453GFLOPs
<function regnet_x_16gf at 0x7f79df25b1f0> 991.103 MB 207.056 MB 994.317 MB 209.767 MB fwd_flop=64.299GFLOPs bwd_flop=238.787GFLOPs
<function mnasnet0_5 at 0x7f79df29e5e0> 103.387 MB 8.463 MB 109.946 MB 8.631 MB fwd_flop=0.480GFLOPs bwd_flop=12.054GFLOPs
<function efficientnet_b0 at 0x7f79df2788b0> 297.973 MB 20.174 MB 364.534 MB 20.676 MB fwd_flop=1.689GFLOPs bwd_flop=63.045GFLOPs
<function mobilenet_v2 at 0x7f52645f9280> 218.564 MB 13.370 MB 320.066 MB 13.578 MB fwd_flop=1.338GFLOPs bwd_flop=23.533GFLOPs
<function mobilenet_v3_small at 0x7f52645f9dc0> 63.579 MB 9.700 MB 72.194 MB 9.805 MB fwd_flop=0.256GFLOPs bwd_flop=7.634GFLOPs
<function mobilenet_v3_large at 0x7f52645f9c10> 175.311 MB 20.916 MB 193.134 MB 21.086 MB fwd_flop=0.959GFLOPs bwd_flop=22.808GFLOPs
<function shufflenet_v2_x0_5 at 0x7f4b762823a0> 41.181 MB 5.214 MB 47.624 MB 5.341 MB fwd_flop=0.184GFLOPs bwd_flop=0.647GFLOPs
<function shufflenet_v2_x1_0 at 0x7f4b76282550> 74.148 MB 8.692 MB 89.263 MB 8.828 MB fwd_flop=0.620GFLOPs bwd_flop=2.908GFLOPs
<function shufflenet_v2_x1_5 at 0x7f4b76282700> 104.840 MB 13.365 MB 127.643 MB 13.540 MB fwd_flop=1.240GFLOPs bwd_flop=6.348GFLOPs
<function shufflenet_v2_x2_0 at 0x7f4b762828b0> 149.951 MB 28.206 MB 184.915 MB 28.395 MB fwd_flop=2.409GFLOPs bwd_flop=12.286GFLOPs
<function resnext50_32x4d at 0x7fbccd9fd550> 515.852 MB 95.478 MB 539.308 MB 96.390 MB fwd_flop=17.236GFLOPs bwd_flop=62.806GFLOPs
<function resnext101_32x8d at 0x7fbccd9fd700> 1272.055 MB 338.712 MB 1318.516 MB 351.757 MB fwd_flop=66.320GFLOPs bwd_flop=368.469GFLOPs
<function resnext101_64x4d at 0x7fbccda6daf0> 1251.700 MB 318.357 MB 1280.926 MB 319.183 MB fwd_flop=62.505GFLOPs bwd_flop=364.654GFLOPs
<function vit_b_16 at 0x7ff4df5c2310> 918.792 MB 330.229 MB 869.416 MB 330.229 MB fwd_flop=70.436GFLOPs bwd_flop=90.311GFLOPs
<function vit_b_32 at 0x7ff4df5c24c0> 471.514 MB 336.549 MB 460.042 MB 337.311 MB fwd_flop=17.678GFLOPs bwd_flop=23.616GFLOPs
<function vit_h_14 at 0x7ff4df5c2160> 5825.672 MB 2411.063 MB 5597.718 MB 2480.049 MB fwd_flop=670.212GFLOPs bwd_flop=864.708GFLOPs
<function vit_l_16 at 0x7ff4df5c2670> 2723.466 MB 1160.914 MB 2619.091 MB 1165.363 MB fwd_flop=246.692GFLOPs bwd_flop=318.905GFLOPs
<function vit_l_32 at 0x7ff4df5c2820> 1524.597 MB 1169.340 MB 1503.321 MB 1174.183 MB fwd_flop=61.619GFLOPs bwd_flop=81.867GFLOPs
<function gpt2_medium at 0x7f385e6b1c10> 56396.074 MB 1353.543 MB 55721.340 MB 1377.555 MB fwd_flop=3321.385GFLOPs bwd_flop=6634.189GFLOPs

Test

image

@FrankLeeeee
Copy link
Contributor

Good work. There are some overall comments below:

  1. fix the bug reported in the CI
  2. use better data structure to represent your meta data, never return a dictionary whose keys are fixed, this can be converted to a dataclass instead.
  3. do not return an empty list/dict, use None instead.
  4. refactor all wrong type hinting.

@super-dainiu super-dainiu changed the title [fx] provide an accurate estimation of memory except for GPT-2. [fx] provide an accurate estimation of memory. Sep 13, 2022
@FrankLeeeee FrankLeeeee merged commit 5c494d4 into hpcaitech:main Sep 14, 2022
@super-dainiu super-dainiu deleted the feature/better_flop_tensor branch September 23, 2022 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants