[fx] add profiler for fx nodes. #1480

super-dainiu · 2022-08-23T09:12:49Z

What's new?

After patching all possible ops, we can now profile the memory cost and FLOPs with lines of code. We only support the original torch.nn.functional and torch.nn, but it is not too challenging to profile your own model using MetaInfoProp.

import torch
from colossalai.fx.profiler import profile_function, profile_module


input = torch.rand(100, 100, 100, 100, device='meta')
func = torch.nn.functional.relu
output, profile = profile_function(func)(input, inplace=False)
print(f"Profiling function {func},")
print(f"Param size: {profile.param / 1024**2:.3f} MB, Activation size: {profile.activation / 1024**2:.3f} MB, {profile.flops} FLOPs, {profile.macs} MACs")

output, profile = profile_function(func)(input, inplace=True)
print(f"Profiling function {func},")
print(f"Param size: {profile.param / 1024**2:.3f} MB, Activation size: {profile.activation / 1024**2:.3f} MB, {profile.flops} FLOPs, {profile.macs} MACs")

input = torch.rand(4, 3, 224, 224, device='meta')
mod = torch.nn.Conv2d(3, 128, 3)
output, profile = profile_module(mod)(input)
print(f"Profiling function {mod},")
print(f"Param size: {profile.param / 1024**2:.3f} MB, Activation size: {profile.activation / 1024**2:.3f} MB, {profile.flops} FLOPs, {profile.macs} MACs")

===============================================================================
Result:
Profiling function <function relu at 0x7f3b6f8ead30>,
Param size: 0.000 MB, Activation size: 381.470 MB, 100000000 FLOPs, 0 MACs
Profiling function <function relu at 0x7f3b6f8ead30>,
Param size: 0.000 MB, Activation size: 0.000 MB, 100000000 FLOPs, 0 MACs
Profiling function Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1)),
Param size: 0.014 MB, Activation size: 96.258 MB, 1387837440 FLOPs, 681302016 MACs
===============================================================================

Also using MetaInfoProp, we can trace the model using option device='meta' solely and get all the required results.

from typing import Tuple
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as tm
from torch.fx import symbolic_trace
import torch.fx
from torch.optim import Adam
from torch.nn import CrossEntropyLoss
from colossalai.fx.passes.meta_info_prop import MetaInfoProp


def _forward_mem(gm: torch.fx.GraphModule):
    node_size = 0
    param_size = 0
    for node in gm.graph.nodes:
        node_size += getattr(node, '__param__', 0) + getattr(node, '__activation__', 0)
        param_size += getattr(node, '__param__', 0)
    return node_size / 1024**2, param_size / 1024**2


def _forward_flops(gm: torch.fx.GraphModule):
    flops = 0
    macs = 0
    for node in gm.graph.nodes:
        flops += getattr(node, '__flops__', 0)
        macs += getattr(node, '__macs__', 0)
    return flops / 1e9, macs / 1e9


def data_gen(batch_size: int, shape: Tuple[int, int, int], device='cuda'):
    data = torch.rand(batch_size, *shape, device=device)
    label = torch.empty(batch_size, dtype=torch.long, device=device).random_(1000)
    return data, label


def test_forward(gm: torch.fx.GraphModule, num_steps: int=5):
    def get_gpu_mem():
        result = torch.cuda.max_memory_allocated() / 1024**2
        torch.cuda.reset_peak_memory_stats()
        return result

    get_gpu_mem()   # reset
    forward_mem = -get_gpu_mem()
    param_mem = -get_gpu_mem()
    gm.train()
    gm.cuda()
    param_mem += get_gpu_mem()
    criterion = CrossEntropyLoss()
    optimizer = Adam(gm.parameters(), lr=1e-3)
    for n in range(num_steps):
        data, label = data_gen(1, (3, 224, 224))
        output = gm(data)
        optimizer.zero_grad()
        loss = criterion(output, label)
        forward_mem += get_gpu_mem() / num_steps
        loss.backward()
        optimizer.step()
    return forward_mem, param_mem

        
def test_meta_info_prop():
    for M in [tm.densenet121, tm.densenet161, tm.densenet169, tm.densenet201]:
        model = M()
        data = torch.rand(1, 3, 224, 224, device='meta')
        gm = symbolic_trace(model)
        MetaInfoProp(gm).run(data)
        meta_forward_mem, meta_param_mem = _forward_mem(gm)
        flops, macs = _forward_flops(gm)
        concrete_forward_mem, concrete_param_mem = test_forward(gm, num_steps=1)

        print(f'|{M}|{meta_forward_mem:.3f} MB|{meta_param_mem:.3f} MB|{concrete_forward_mem:.3f} MB|{concrete_param_mem:.3f} MB|{flops:.3f}GFLOPs|{macs:.3f}GMACs|')
    
        
if __name__ == '__main__':
    test_meta_info_prop()

===============================================================================
Result:
|<function densenet121 at 0x7f99d58f7b80>|158.786 MB|30.437 MB|156.183 MB|30.859 MB|5.717GFLOPs|2.834GMACs|
|<function densenet161 at 0x7f99d58f7d30>|347.533 MB|109.409 MB|349.309 MB|112.571 MB|15.546GFLOPs|7.728GMACs|
|<function densenet169 at 0x7f99d58f7ee0>|208.338 MB|53.976 MB|209.491 MB|54.724 MB|6.778GFLOPs|3.360GMACs|
|<function densenet201 at 0x7f99d58ff0d0>|274.686 MB|76.347 MB|277.507 MB|77.392 MB|8.659GFLOPs|4.291GMACs|
===============================================================================

…on checkpointing usages

* [fx] activation checkpointing using Chen strategies. * [fx] add test for ckpt_solver_chen * [fx] add vanilla activation checkpoint search with test on resnet and densenet * [fx] add a namespace code for solver_chen. * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. * [fx] fix lowercase naming conventions. * [fx] simplify test for ckpt.

* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages * [fx] merge development into main (#1) * [fx] activation checkpointing using Chen strategies. * [fx] add test for ckpt_solver_chen * [fx] add vanilla activation checkpoint search with test on resnet and densenet * [fx] add a namespace code for solver_chen. * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. * [fx] fix lowercase naming conventions. * [fx] simplify test for ckpt. * [fx] fix test and algorithm bugs in activation checkpointing. * [fx] polish ckpt_test. * [fx] add rules to linearize computation graphs for searching.

…ColossalAI into feature/linear_ckpt

Cypher30

🙌GREAT🙌

FrankLeeeee · 2022-08-24T02:59:47Z

Great work!

super-dainiu · 2022-08-24T08:21:42Z

I passed all tests/test_fx tests locally on A100.

super-dainiu and others added 30 commits August 9, 2022 23:23

[fx] modify the calculation of node_size in MetaInfoProp for activati…

06f8991

…on checkpointing usages

[fx] modify the calculation of node_size in MetaInfoProp for activati…

3cd7d22

…on checkpointing usages

[fx] modify the calculation of node_size in MetaInfoProp for activati…

0849b3b

…on checkpointing usages

Merge branch 'hpcaitech:main' into main

701786c

Merge branch 'hpcaitech:main' into main

a75e5a2

Merge branch 'hpcaitech:main' into main

c20beb2

Merge branch 'hpcaitech:main' into main

7e87286

Merge branch 'hpcaitech:main' into main

f027931

[fx] merge

86c005d

[fx] remove chen_sqrt for sake of simplicity

da259cc

[fx] remove chen_sqrt for sake of simplicity

296b405

[fx] remove chen_sqrt for sake of simplicity

bf7feea

[fx] remove chen_sqrt for sake of simplicity

e6c5f70

Merge branch 'feature/linear_ckpt' of http://github.com/super-dainiu/…

0cbafd8

…ColossalAI into feature/linear_ckpt

[fx] remove chen_sqrt for sake of simplicity

8e14703

[fx] remove chen_sqrt for sake of simplicity

92e8223

[fx] remove chen_sqrt for sake of simplicity

3e9531c

[fx] remove chen_sqrt for sake of simplicity

02c5cae

Merge branch 'hpcaitech:main' into feature/linear_ckpt

a8616ef

[fx] fix inconsistencies.

083cf7f

[fx] fix MetaInfoProp.

9c7441e

Merge branch 'hpcaitech:main' into feature/linear_ckpt

76f55b7

[fx] fix MetaInfoProp.

2c8a827

Merge branch 'feature/linear_ckpt' of http://github.com/super-dainiu/…

b1afd09

…ColossalAI into feature/linear_ckpt

[fx] consider MetaInfoProp for inplace operands.

ff71edc

[fx] consider MetaInfoProp for inplace operands.

c90d14a

[fx] consider MetaInfoProp for inplace operands.

ea7250b

[fx] consider MetaInfoProp for inplace operands.

77406fe

super-dainiu added 7 commits August 18, 2022 10:27

[fx] consider MetaInfoProp for inplace operands.

0da5d29

[fx] consider MetaInfoProp for inplace operands.

98ddce6

[fx] add profiler for fx nodes.

c4dbb99

[fx] add profiler for fx nodes.

7719e4f

[fx] add profiler for fx nodes.

3a82865

[fx] add profiler for fx nodes.

5075e45

[fx] add profiler for fx nodes.

08e6d73

super-dainiu requested review from YuliangLiu0306, FrankLeeeee and Cypher30 August 23, 2022 09:12

super-dainiu added 3 commits August 23, 2022 17:28

[fx] add profiler for fx nodes.

9091c84

[fx] add profiler for fx nodes.

236c52e

[fx] add profiler for fx nodes.

7a03047

Cypher30 approved these changes Aug 24, 2022

View reviewed changes

Cypher30 added the Run Build and Test label Aug 24, 2022

YuliangLiu0306 approved these changes Aug 24, 2022

View reviewed changes

FrankLeeeee approved these changes Aug 24, 2022

View reviewed changes

super-dainiu added 3 commits August 24, 2022 11:19

[fx] fix error in tests.

408b1d6

[fx] unfix bug.

fe3e098

[fx] unfix bug.

37fad8c

FrankLeeeee merged commit 32efe8e into hpcaitech:main Aug 24, 2022

Cypher30 mentioned this pull request Aug 25, 2022

[fx] Add activation checkpoint solver rotor #1496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fx] add profiler for fx nodes. #1480

[fx] add profiler for fx nodes. #1480

super-dainiu commented Aug 23, 2022 •

edited

Loading

Cypher30 left a comment

FrankLeeeee commented Aug 24, 2022

super-dainiu commented Aug 24, 2022

[fx] add profiler for fx nodes. #1480

[fx] add profiler for fx nodes. #1480

Conversation

super-dainiu commented Aug 23, 2022 • edited Loading

What's new?

Cypher30 left a comment

Choose a reason for hiding this comment

FrankLeeeee commented Aug 24, 2022

super-dainiu commented Aug 24, 2022

super-dainiu commented Aug 23, 2022 •

edited

Loading