[fx] add rules to linearize computation graphs for searching. #1461

super-dainiu · 2022-08-16T08:44:29Z

What's new?

Since in most existing frameworks of activation checkpoints, the forward graph is assumed to be linearized, I developed a small tool function to detect all potential checkpoint nodes. In this way, chen_greedy() will operate in a linearized manner, without checkpointing something within the skip connection blocks. I also removed chen_sqrtn() because it is a bit outdated.
And hopefully, with the new checkpoint function in Colossal-AI (#1460), we can now do linearized searches on arbitrary computation graphs, such as resnet18()

def forward(self, x : torch.Tensor) -> torch.Tensor:
    import colossalai
    conv1 = self.conv1(x);  x = None
    def checkpoint_0(conv1):
        bn1 = self.bn1(conv1);  conv1 = None
        relu = self.relu(bn1);  bn1 = None
        maxpool = self.maxpool(relu);  relu = None
        layer1_0_conv1 = getattr(self.layer1, "0").conv1(maxpool)
        layer1_0_bn1 = getattr(self.layer1, "0").bn1(layer1_0_conv1);  layer1_0_conv1 = None
        layer1_0_relu = getattr(self.layer1, "0").relu(layer1_0_bn1);  layer1_0_bn1 = None
        layer1_0_conv2 = getattr(self.layer1, "0").conv2(layer1_0_relu);  layer1_0_relu = None
        layer1_0_bn2 = getattr(self.layer1, "0").bn2(layer1_0_conv2);  layer1_0_conv2 = None
        add = layer1_0_bn2 + maxpool;  layer1_0_bn2 = maxpool = None
        return add
    add = colossalai.utils.activation_checkpoint.checkpoint(checkpoint_0, False, conv1, use_reentrant=False)
    def checkpoint_1(add):
        layer1_0_relu_1 = getattr(self.layer1, "0").relu(add);  add = None
        layer1_1_conv1 = getattr(self.layer1, "1").conv1(layer1_0_relu_1)
        layer1_1_bn1 = getattr(self.layer1, "1").bn1(layer1_1_conv1);  layer1_1_conv1 = None
        layer1_1_relu = getattr(self.layer1, "1").relu(layer1_1_bn1);  layer1_1_bn1 = None
        layer1_1_conv2 = getattr(self.layer1, "1").conv2(layer1_1_relu);  layer1_1_relu = None
        layer1_1_bn2 = getattr(self.layer1, "1").bn2(layer1_1_conv2);  layer1_1_conv2 = None
        add_1 = layer1_1_bn2 + layer1_0_relu_1;  layer1_1_bn2 = layer1_0_relu_1 = None
        layer1_1_relu_1 = getattr(self.layer1, "1").relu(add_1);  add_1 = None
        layer2_0_conv1 = getattr(self.layer2, "0").conv1(layer1_1_relu_1)
        layer2_0_bn1 = getattr(self.layer2, "0").bn1(layer2_0_conv1);  layer2_0_conv1 = None
        layer2_0_relu = getattr(self.layer2, "0").relu(layer2_0_bn1);  layer2_0_bn1 = None
        layer2_0_conv2 = getattr(self.layer2, "0").conv2(layer2_0_relu);  layer2_0_relu = None
        layer2_0_bn2 = getattr(self.layer2, "0").bn2(layer2_0_conv2);  layer2_0_conv2 = None
        layer2_0_downsample_0 = getattr(getattr(self.layer2, "0").downsample, "0")(layer1_1_relu_1);  layer1_1_relu_1 = None
        layer2_0_downsample_1 = getattr(getattr(self.layer2, "0").downsample, "1")(layer2_0_downsample_0);  layer2_0_downsample_0 = None
        add_2 = layer2_0_bn2 + layer2_0_downsample_1;  layer2_0_bn2 = layer2_0_downsample_1 = None
        layer2_0_relu_1 = getattr(self.layer2, "0").relu(add_2);  add_2 = None
        layer2_1_conv1 = getattr(self.layer2, "1").conv1(layer2_0_relu_1)
        layer2_1_bn1 = getattr(self.layer2, "1").bn1(layer2_1_conv1);  layer2_1_conv1 = None
        layer2_1_relu = getattr(self.layer2, "1").relu(layer2_1_bn1);  layer2_1_bn1 = None
        layer2_1_conv2 = getattr(self.layer2, "1").conv2(layer2_1_relu);  layer2_1_relu = None
        layer2_1_bn2 = getattr(self.layer2, "1").bn2(layer2_1_conv2);  layer2_1_conv2 = None
        add_3 = layer2_1_bn2 + layer2_0_relu_1;  layer2_1_bn2 = layer2_0_relu_1 = None
        return add_3
    add_3 = colossalai.utils.activation_checkpoint.checkpoint(checkpoint_1, False, add, use_reentrant=False)
    layer2_1_relu_1 = getattr(self.layer2, "1").relu(add_3);  add_3 = None
    layer3_0_conv1 = getattr(self.layer3, "0").conv1(layer2_1_relu_1)
    layer3_0_bn1 = getattr(self.layer3, "0").bn1(layer3_0_conv1);  layer3_0_conv1 = None
    layer3_0_relu = getattr(self.layer3, "0").relu(layer3_0_bn1);  layer3_0_bn1 = None
    layer3_0_conv2 = getattr(self.layer3, "0").conv2(layer3_0_relu);  layer3_0_relu = None
    layer3_0_bn2 = getattr(self.layer3, "0").bn2(layer3_0_conv2);  layer3_0_conv2 = None
    layer3_0_downsample_0 = getattr(getattr(self.layer3, "0").downsample, "0")(layer2_1_relu_1);  layer2_1_relu_1 = None
    layer3_0_downsample_1 = getattr(getattr(self.layer3, "0").downsample, "1")(layer3_0_downsample_0);  layer3_0_downsample_0 = None
    add_4 = layer3_0_bn2 + layer3_0_downsample_1;  layer3_0_bn2 = layer3_0_downsample_1 = None
    layer3_0_relu_1 = getattr(self.layer3, "0").relu(add_4);  add_4 = None
    layer3_1_conv1 = getattr(self.layer3, "1").conv1(layer3_0_relu_1)
    layer3_1_bn1 = getattr(self.layer3, "1").bn1(layer3_1_conv1);  layer3_1_conv1 = None
    layer3_1_relu = getattr(self.layer3, "1").relu(layer3_1_bn1);  layer3_1_bn1 = None
    layer3_1_conv2 = getattr(self.layer3, "1").conv2(layer3_1_relu);  layer3_1_relu = None
    layer3_1_bn2 = getattr(self.layer3, "1").bn2(layer3_1_conv2);  layer3_1_conv2 = None
    add_5 = layer3_1_bn2 + layer3_0_relu_1;  layer3_1_bn2 = layer3_0_relu_1 = None
    layer3_1_relu_1 = getattr(self.layer3, "1").relu(add_5);  add_5 = None
    layer4_0_conv1 = getattr(self.layer4, "0").conv1(layer3_1_relu_1)
    layer4_0_bn1 = getattr(self.layer4, "0").bn1(layer4_0_conv1);  layer4_0_conv1 = None
    layer4_0_relu = getattr(self.layer4, "0").relu(layer4_0_bn1);  layer4_0_bn1 = None
    layer4_0_conv2 = getattr(self.layer4, "0").conv2(layer4_0_relu);  layer4_0_relu = None
    layer4_0_bn2 = getattr(self.layer4, "0").bn2(layer4_0_conv2);  layer4_0_conv2 = None
    layer4_0_downsample_0 = getattr(getattr(self.layer4, "0").downsample, "0")(layer3_1_relu_1);  layer3_1_relu_1 = None
    layer4_0_downsample_1 = getattr(getattr(self.layer4, "0").downsample, "1")(layer4_0_downsample_0);  layer4_0_downsample_0 = None
    add_6 = layer4_0_bn2 + layer4_0_downsample_1;  layer4_0_bn2 = layer4_0_downsample_1 = None
    layer4_0_relu_1 = getattr(self.layer4, "0").relu(add_6);  add_6 = None
    layer4_1_conv1 = getattr(self.layer4, "1").conv1(layer4_0_relu_1)
    layer4_1_bn1 = getattr(self.layer4, "1").bn1(layer4_1_conv1);  layer4_1_conv1 = None
    layer4_1_relu = getattr(self.layer4, "1").relu(layer4_1_bn1);  layer4_1_bn1 = None
    layer4_1_conv2 = getattr(self.layer4, "1").conv2(layer4_1_relu);  layer4_1_relu = None
    layer4_1_bn2 = getattr(self.layer4, "1").bn2(layer4_1_conv2);  layer4_1_conv2 = None
    add_7 = layer4_1_bn2 + layer4_0_relu_1;  layer4_1_bn2 = layer4_0_relu_1 = None
    layer4_1_relu_1 = getattr(self.layer4, "1").relu(add_7);  add_7 = None
    avgpool = self.avgpool(layer4_1_relu_1);  layer4_1_relu_1 = None
    flatten = torch.flatten(avgpool, 1);  avgpool = None
    fc = self.fc(flatten);  flatten = None
    return fc

…on checkpointing usages

* [fx] activation checkpointing using Chen strategies. * [fx] add test for ckpt_solver_chen * [fx] add vanilla activation checkpoint search with test on resnet and densenet * [fx] add a namespace code for solver_chen. * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. * [fx] fix lowercase naming conventions. * [fx] simplify test for ckpt.

* [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages * [fx] merge development into main (#1) * [fx] activation checkpointing using Chen strategies. * [fx] add test for ckpt_solver_chen * [fx] add vanilla activation checkpoint search with test on resnet and densenet * [fx] add a namespace code for solver_chen. * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. * [fx] fix lowercase naming conventions. * [fx] simplify test for ckpt. * [fx] fix test and algorithm bugs in activation checkpointing. * [fx] polish ckpt_test. * [fx] add rules to linearize computation graphs for searching.

…ColossalAI into feature/linear_ckpt

Cypher30

Okay, I see all the changes, currently I don't spot any mistakes, I will modify the codegen to check if we need to use use_reentrant=False when call checkpoint functions.

Cypher30

Just hold the PR~

super-dainiu · 2022-08-17T05:02:49Z

Hey, anyone to review this??

Cypher30

Okay, let's check the if CI could pass

super-dainiu and others added 16 commits August 9, 2022 23:23

[fx] modify the calculation of node_size in MetaInfoProp for activati…

06f8991

…on checkpointing usages

[fx] modify the calculation of node_size in MetaInfoProp for activati…

3cd7d22

…on checkpointing usages

[fx] modify the calculation of node_size in MetaInfoProp for activati…

0849b3b

…on checkpointing usages

Merge branch 'hpcaitech:main' into main

701786c

Merge branch 'hpcaitech:main' into main

a75e5a2

Merge branch 'hpcaitech:main' into main

c20beb2

Merge branch 'hpcaitech:main' into main

7e87286

Merge branch 'hpcaitech:main' into main

f027931

[fx] merge

86c005d

[fx] remove chen_sqrt for sake of simplicity

da259cc

[fx] remove chen_sqrt for sake of simplicity

296b405

[fx] remove chen_sqrt for sake of simplicity

bf7feea

[fx] remove chen_sqrt for sake of simplicity

e6c5f70

Merge branch 'feature/linear_ckpt' of http://github.com/super-dainiu/…

0cbafd8

…ColossalAI into feature/linear_ckpt

super-dainiu requested a review from Cypher30 August 16, 2022 08:44

super-dainiu added 2 commits August 16, 2022 16:47

[fx] remove chen_sqrt for sake of simplicity

8e14703

[fx] remove chen_sqrt for sake of simplicity

92e8223

super-dainiu requested a review from FrankLeeeee August 16, 2022 08:51

super-dainiu added 2 commits August 16, 2022 16:55

[fx] remove chen_sqrt for sake of simplicity

3e9531c

[fx] remove chen_sqrt for sake of simplicity

02c5cae

Cypher30 reviewed Aug 16, 2022

View reviewed changes

super-dainiu requested a review from Cypher30 August 17, 2022 05:01

super-dainiu and others added 2 commits August 17, 2022 13:19

Merge branch 'hpcaitech:main' into feature/linear_ckpt

a8616ef

[fx] fix inconsistencies.

083cf7f

Cypher30 approved these changes Aug 17, 2022

View reviewed changes

Cypher30 added the Run Build and Test label Aug 17, 2022

super-dainiu merged commit e7383f5 into hpcaitech:main Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fx] add rules to linearize computation graphs for searching. #1461

[fx] add rules to linearize computation graphs for searching. #1461

super-dainiu commented Aug 16, 2022 •

edited

Loading

Cypher30 left a comment

Cypher30 left a comment

super-dainiu commented Aug 17, 2022

Cypher30 left a comment

[fx] add rules to linearize computation graphs for searching. #1461

[fx] add rules to linearize computation graphs for searching. #1461

Conversation

super-dainiu commented Aug 16, 2022 • edited Loading

What's new?

Cypher30 left a comment

Choose a reason for hiding this comment

Cypher30 left a comment

Choose a reason for hiding this comment

super-dainiu commented Aug 17, 2022

Cypher30 left a comment

Choose a reason for hiding this comment

super-dainiu commented Aug 16, 2022 •

edited

Loading