[WIP/NO_MERGE] Prototype RegularizedShortcut #4549

datumbox · 2021-10-06T17:14:32Z

This is an early prototype utility based on FX.

The target is to detect Residual Connections in arbitrary Model architectures and modify the network to add regularlization blocks (such as StochasticDepth).

Example usage:

# Before
model = resnet18()
fx.symbolic_trace(model).graph.print_tabular()

# After addition
regularizer_layer = partial(StochasticDepth, p=0.0, mode="row")
model = add_regularized_shortcut(model, BasicBlock, regularizer_layer)
fx.symbolic_trace(model).graph.print_tabular()

# After deletion
model = del_regularized_shortcut(model)
fx.symbolic_trace(model).graph.print_tabular()

Output:

Before

opcode         name                   target                                                   args                                   kwargs
-------------  ---------------------  -------------------------------------------------------  -------------------------------------  --------
placeholder    x                      x                                                        ()                                     {}
call_module    conv1                  conv1                                                    (x,)                                   {}
call_module    bn1                    bn1                                                      (conv1,)                               {}
call_module    relu                   relu                                                     (bn1,)                                 {}
call_module    maxpool                maxpool                                                  (relu,)                                {}
call_module    layer1_0_conv1         layer1.0.conv1                                           (maxpool,)                             {}
call_module    layer1_0_bn1           layer1.0.bn1                                             (layer1_0_conv1,)                      {}
call_module    layer1_0_relu          layer1.0.relu                                            (layer1_0_bn1,)                        {}
call_module    layer1_0_conv2         layer1.0.conv2                                           (layer1_0_relu,)                       {}
call_module    layer1_0_bn2           layer1.0.bn2                                             (layer1_0_conv2,)                      {}
call_function  add                    <built-in function add>                                  (layer1_0_bn2, maxpool)                {}
call_module    layer1_0_relu_1        layer1.0.relu                                            (add,)                                 {}
call_module    layer1_1_conv1         layer1.1.conv1                                           (layer1_0_relu_1,)                     {}
call_module    layer1_1_bn1           layer1.1.bn1                                             (layer1_1_conv1,)                      {}
call_module    layer1_1_relu          layer1.1.relu                                            (layer1_1_bn1,)                        {}
call_module    layer1_1_conv2         layer1.1.conv2                                           (layer1_1_relu,)                       {}
call_module    layer1_1_bn2           layer1.1.bn2                                             (layer1_1_conv2,)                      {}
call_function  add_1                  <built-in function add>                                  (layer1_1_bn2, layer1_0_relu_1)        {}
call_module    layer1_1_relu_1        layer1.1.relu                                            (add_1,)                               {}
call_module    layer2_0_conv1         layer2.0.conv1                                           (layer1_1_relu_1,)                     {}
call_module    layer2_0_bn1           layer2.0.bn1                                             (layer2_0_conv1,)                      {}
call_module    layer2_0_relu          layer2.0.relu                                            (layer2_0_bn1,)                        {}
call_module    layer2_0_conv2         layer2.0.conv2                                           (layer2_0_relu,)                       {}
call_module    layer2_0_bn2           layer2.0.bn2                                             (layer2_0_conv2,)                      {}
call_module    layer2_0_downsample_0  layer2.0.downsample.0                                    (layer1_1_relu_1,)                     {}
call_module    layer2_0_downsample_1  layer2.0.downsample.1                                    (layer2_0_downsample_0,)               {}
call_function  add_2                  <built-in function add>                                  (layer2_0_bn2, layer2_0_downsample_1)  {}
call_module    layer2_0_relu_1        layer2.0.relu                                            (add_2,)                               {}
call_module    layer2_1_conv1         layer2.1.conv1                                           (layer2_0_relu_1,)                     {}
call_module    layer2_1_bn1           layer2.1.bn1                                             (layer2_1_conv1,)                      {}
call_module    layer2_1_relu          layer2.1.relu                                            (layer2_1_bn1,)                        {}
call_module    layer2_1_conv2         layer2.1.conv2                                           (layer2_1_relu,)                       {}
call_module    layer2_1_bn2           layer2.1.bn2                                             (layer2_1_conv2,)                      {}
call_function  add_3                  <built-in function add>                                  (layer2_1_bn2, layer2_0_relu_1)        {}
call_module    layer2_1_relu_1        layer2.1.relu                                            (add_3,)                               {}
call_module    layer3_0_conv1         layer3.0.conv1                                           (layer2_1_relu_1,)                     {}
call_module    layer3_0_bn1           layer3.0.bn1                                             (layer3_0_conv1,)                      {}
call_module    layer3_0_relu          layer3.0.relu                                            (layer3_0_bn1,)                        {}
call_module    layer3_0_conv2         layer3.0.conv2                                           (layer3_0_relu,)                       {}
call_module    layer3_0_bn2           layer3.0.bn2                                             (layer3_0_conv2,)                      {}
call_module    layer3_0_downsample_0  layer3.0.downsample.0                                    (layer2_1_relu_1,)                     {}
call_module    layer3_0_downsample_1  layer3.0.downsample.1                                    (layer3_0_downsample_0,)               {}
call_function  add_4                  <built-in function add>                                  (layer3_0_bn2, layer3_0_downsample_1)  {}
call_module    layer3_0_relu_1        layer3.0.relu                                            (add_4,)                               {}
call_module    layer3_1_conv1         layer3.1.conv1                                           (layer3_0_relu_1,)                     {}
call_module    layer3_1_bn1           layer3.1.bn1                                             (layer3_1_conv1,)                      {}
call_module    layer3_1_relu          layer3.1.relu                                            (layer3_1_bn1,)                        {}
call_module    layer3_1_conv2         layer3.1.conv2                                           (layer3_1_relu,)                       {}
call_module    layer3_1_bn2           layer3.1.bn2                                             (layer3_1_conv2,)                      {}
call_function  add_5                  <built-in function add>                                  (layer3_1_bn2, layer3_0_relu_1)        {}
call_module    layer3_1_relu_1        layer3.1.relu                                            (add_5,)                               {}
call_module    layer4_0_conv1         layer4.0.conv1                                           (layer3_1_relu_1,)                     {}
call_module    layer4_0_bn1           layer4.0.bn1                                             (layer4_0_conv1,)                      {}
call_module    layer4_0_relu          layer4.0.relu                                            (layer4_0_bn1,)                        {}
call_module    layer4_0_conv2         layer4.0.conv2                                           (layer4_0_relu,)                       {}
call_module    layer4_0_bn2           layer4.0.bn2                                             (layer4_0_conv2,)                      {}
call_module    layer4_0_downsample_0  layer4.0.downsample.0                                    (layer3_1_relu_1,)                     {}
call_module    layer4_0_downsample_1  layer4.0.downsample.1                                    (layer4_0_downsample_0,)               {}
call_function  add_6                  <built-in function add>                                  (layer4_0_bn2, layer4_0_downsample_1)  {}
call_module    layer4_0_relu_1        layer4.0.relu                                            (add_6,)                               {}
call_module    layer4_1_conv1         layer4.1.conv1                                           (layer4_0_relu_1,)                     {}
call_module    layer4_1_bn1           layer4.1.bn1                                             (layer4_1_conv1,)                      {}
call_module    layer4_1_relu          layer4.1.relu                                            (layer4_1_bn1,)                        {}
call_module    layer4_1_conv2         layer4.1.conv2                                           (layer4_1_relu,)                       {}
call_module    layer4_1_bn2           layer4.1.bn2                                             (layer4_1_conv2,)                      {}
call_function  add_7                  <built-in function add>                                  (layer4_1_bn2, layer4_0_relu_1)        {}
call_module    layer4_1_relu_1        layer4.1.relu                                            (add_7,)                               {}
call_module    avgpool                avgpool                                                  (layer4_1_relu_1,)                     {}
call_function  flatten                <built-in method flatten of type object at 0x112aac6c0>  (avgpool, 1)                           {}
call_module    fc                     fc                                                       (flatten,)                             {}
output         output                 output                                                   (fc,)                                  {}

After addition

opcode         name                   target                                                   args                                   kwargs
-------------  ---------------------  -------------------------------------------------------  -------------------------------------  --------
placeholder    x                      x                                                        ()                                     {}
call_module    conv1                  conv1                                                    (x,)                                   {}
call_module    bn1                    bn1                                                      (conv1,)                               {}
call_module    relu                   relu                                                     (bn1,)                                 {}
call_module    maxpool                maxpool                                                  (relu,)                                {}
call_module    layer1_0_conv1         layer1.0.conv1                                           (maxpool,)                             {}
call_module    layer1_0_bn1           layer1.0.bn1                                             (layer1_0_conv1,)                      {}
call_module    layer1_0_relu          layer1.0.relu                                            (layer1_0_bn1,)                        {}
call_module    layer1_0_conv2         layer1.0.conv2                                           (layer1_0_relu,)                       {}
call_module    layer1_0_bn2           layer1.0.bn2                                             (layer1_0_conv2,)                      {}
call_function  stochastic_depth       <function stochastic_depth at 0x7fcc18cf60d0>            (layer1_0_bn2, 0.0, 'row', True)       {}
call_function  add                    <built-in function add>                                  (maxpool, stochastic_depth)            {}
call_module    layer1_0_relu_1        layer1.0.relu                                            (add,)                                 {}
call_module    layer1_1_conv1         layer1.1.conv1                                           (layer1_0_relu_1,)                     {}
call_module    layer1_1_bn1           layer1.1.bn1                                             (layer1_1_conv1,)                      {}
call_module    layer1_1_relu          layer1.1.relu                                            (layer1_1_bn1,)                        {}
call_module    layer1_1_conv2         layer1.1.conv2                                           (layer1_1_relu,)                       {}
call_module    layer1_1_bn2           layer1.1.bn2                                             (layer1_1_conv2,)                      {}
call_function  stochastic_depth_1     <function stochastic_depth at 0x7fcc18cf60d0>            (layer1_1_bn2, 0.0, 'row', True)       {}
call_function  add_1                  <built-in function add>                                  (layer1_0_relu_1, stochastic_depth_1)  {}
call_module    layer1_1_relu_1        layer1.1.relu                                            (add_1,)                               {}
call_module    layer2_0_conv1         layer2.0.conv1                                           (layer1_1_relu_1,)                     {}
call_module    layer2_0_bn1           layer2.0.bn1                                             (layer2_0_conv1,)                      {}
call_module    layer2_0_relu          layer2.0.relu                                            (layer2_0_bn1,)                        {}
call_module    layer2_0_conv2         layer2.0.conv2                                           (layer2_0_relu,)                       {}
call_module    layer2_0_bn2           layer2.0.bn2                                             (layer2_0_conv2,)                      {}
call_module    layer2_0_downsample_0  layer2.0.downsample.0                                    (layer1_1_relu_1,)                     {}
call_module    layer2_0_downsample_1  layer2.0.downsample.1                                    (layer2_0_downsample_0,)               {}
call_function  add_2                  <built-in function add>                                  (layer2_0_bn2, layer2_0_downsample_1)  {}
call_module    layer2_0_relu_1        layer2.0.relu                                            (add_2,)                               {}
call_module    layer2_1_conv1         layer2.1.conv1                                           (layer2_0_relu_1,)                     {}
call_module    layer2_1_bn1           layer2.1.bn1                                             (layer2_1_conv1,)                      {}
call_module    layer2_1_relu          layer2.1.relu                                            (layer2_1_bn1,)                        {}
call_module    layer2_1_conv2         layer2.1.conv2                                           (layer2_1_relu,)                       {}
call_module    layer2_1_bn2           layer2.1.bn2                                             (layer2_1_conv2,)                      {}
call_function  stochastic_depth_2     <function stochastic_depth at 0x7fcc18cf60d0>            (layer2_1_bn2, 0.0, 'row', True)       {}
call_function  add_3                  <built-in function add>                                  (layer2_0_relu_1, stochastic_depth_2)  {}
call_module    layer2_1_relu_1        layer2.1.relu                                            (add_3,)                               {}
call_module    layer3_0_conv1         layer3.0.conv1                                           (layer2_1_relu_1,)                     {}
call_module    layer3_0_bn1           layer3.0.bn1                                             (layer3_0_conv1,)                      {}
call_module    layer3_0_relu          layer3.0.relu                                            (layer3_0_bn1,)                        {}
call_module    layer3_0_conv2         layer3.0.conv2                                           (layer3_0_relu,)                       {}
call_module    layer3_0_bn2           layer3.0.bn2                                             (layer3_0_conv2,)                      {}
call_module    layer3_0_downsample_0  layer3.0.downsample.0                                    (layer2_1_relu_1,)                     {}
call_module    layer3_0_downsample_1  layer3.0.downsample.1                                    (layer3_0_downsample_0,)               {}
call_function  add_4                  <built-in function add>                                  (layer3_0_bn2, layer3_0_downsample_1)  {}
call_module    layer3_0_relu_1        layer3.0.relu                                            (add_4,)                               {}
call_module    layer3_1_conv1         layer3.1.conv1                                           (layer3_0_relu_1,)                     {}
call_module    layer3_1_bn1           layer3.1.bn1                                             (layer3_1_conv1,)                      {}
call_module    layer3_1_relu          layer3.1.relu                                            (layer3_1_bn1,)                        {}
call_module    layer3_1_conv2         layer3.1.conv2                                           (layer3_1_relu,)                       {}
call_module    layer3_1_bn2           layer3.1.bn2                                             (layer3_1_conv2,)                      {}
call_function  stochastic_depth_3     <function stochastic_depth at 0x7fcc18cf60d0>            (layer3_1_bn2, 0.0, 'row', True)       {}
call_function  add_5                  <built-in function add>                                  (layer3_0_relu_1, stochastic_depth_3)  {}
call_module    layer3_1_relu_1        layer3.1.relu                                            (add_5,)                               {}
call_module    layer4_0_conv1         layer4.0.conv1                                           (layer3_1_relu_1,)                     {}
call_module    layer4_0_bn1           layer4.0.bn1                                             (layer4_0_conv1,)                      {}
call_module    layer4_0_relu          layer4.0.relu                                            (layer4_0_bn1,)                        {}
call_module    layer4_0_conv2         layer4.0.conv2                                           (layer4_0_relu,)                       {}
call_module    layer4_0_bn2           layer4.0.bn2                                             (layer4_0_conv2,)                      {}
call_module    layer4_0_downsample_0  layer4.0.downsample.0                                    (layer3_1_relu_1,)                     {}
call_module    layer4_0_downsample_1  layer4.0.downsample.1                                    (layer4_0_downsample_0,)               {}
call_function  add_6                  <built-in function add>                                  (layer4_0_bn2, layer4_0_downsample_1)  {}
call_module    layer4_0_relu_1        layer4.0.relu                                            (add_6,)                               {}
call_module    layer4_1_conv1         layer4.1.conv1                                           (layer4_0_relu_1,)                     {}
call_module    layer4_1_bn1           layer4.1.bn1                                             (layer4_1_conv1,)                      {}
call_module    layer4_1_relu          layer4.1.relu                                            (layer4_1_bn1,)                        {}
call_module    layer4_1_conv2         layer4.1.conv2                                           (layer4_1_relu,)                       {}
call_module    layer4_1_bn2           layer4.1.bn2                                             (layer4_1_conv2,)                      {}
call_function  stochastic_depth_4     <function stochastic_depth at 0x7fcc18cf60d0>            (layer4_1_bn2, 0.0, 'row', True)       {}
call_function  add_7                  <built-in function add>                                  (layer4_0_relu_1, stochastic_depth_4)  {}
call_module    layer4_1_relu_1        layer4.1.relu                                            (add_7,)                               {}
call_module    avgpool                avgpool                                                  (layer4_1_relu_1,)                     {}
call_function  flatten                <built-in method flatten of type object at 0x112aac6c0>  (avgpool, 1)                           {}
call_module    fc                     fc                                                       (flatten,)                             {}
output         output                 output                                                   (fc,)                                  {}

After deletion

opcode         name                   target                                                   args                                   kwargs
-------------  ---------------------  -------------------------------------------------------  -------------------------------------  --------
placeholder    x                      x                                                        ()                                     {}
call_module    conv1                  conv1                                                    (x,)                                   {}
call_module    bn1                    bn1                                                      (conv1,)                               {}
call_module    relu                   relu                                                     (bn1,)                                 {}
call_module    maxpool                maxpool                                                  (relu,)                                {}
call_module    layer1_0_conv1         layer1.0.conv1                                           (maxpool,)                             {}
call_module    layer1_0_bn1           layer1.0.bn1                                             (layer1_0_conv1,)                      {}
call_module    layer1_0_relu          layer1.0.relu                                            (layer1_0_bn1,)                        {}
call_module    layer1_0_conv2         layer1.0.conv2                                           (layer1_0_relu,)                       {}
call_module    layer1_0_bn2           layer1.0.bn2                                             (layer1_0_conv2,)                      {}
call_function  add                    <built-in function add>                                  (maxpool, layer1_0_bn2)                {}
call_module    layer1_0_relu_1        layer1.0.relu                                            (add,)                                 {}
call_module    layer1_1_conv1         layer1.1.conv1                                           (layer1_0_relu_1,)                     {}
call_module    layer1_1_bn1           layer1.1.bn1                                             (layer1_1_conv1,)                      {}
call_module    layer1_1_relu          layer1.1.relu                                            (layer1_1_bn1,)                        {}
call_module    layer1_1_conv2         layer1.1.conv2                                           (layer1_1_relu,)                       {}
call_module    layer1_1_bn2           layer1.1.bn2                                             (layer1_1_conv2,)                      {}
call_function  add_1                  <built-in function add>                                  (layer1_0_relu_1, layer1_1_bn2)        {}
call_module    layer1_1_relu_1        layer1.1.relu                                            (add_1,)                               {}
call_module    layer2_0_conv1         layer2.0.conv1                                           (layer1_1_relu_1,)                     {}
call_module    layer2_0_bn1           layer2.0.bn1                                             (layer2_0_conv1,)                      {}
call_module    layer2_0_relu          layer2.0.relu                                            (layer2_0_bn1,)                        {}
call_module    layer2_0_conv2         layer2.0.conv2                                           (layer2_0_relu,)                       {}
call_module    layer2_0_bn2           layer2.0.bn2                                             (layer2_0_conv2,)                      {}
call_module    layer2_0_downsample_0  layer2.0.downsample.0                                    (layer1_1_relu_1,)                     {}
call_module    layer2_0_downsample_1  layer2.0.downsample.1                                    (layer2_0_downsample_0,)               {}
call_function  add_2                  <built-in function add>                                  (layer2_0_bn2, layer2_0_downsample_1)  {}
call_module    layer2_0_relu_1        layer2.0.relu                                            (add_2,)                               {}
call_module    layer2_1_conv1         layer2.1.conv1                                           (layer2_0_relu_1,)                     {}
call_module    layer2_1_bn1           layer2.1.bn1                                             (layer2_1_conv1,)                      {}
call_module    layer2_1_relu          layer2.1.relu                                            (layer2_1_bn1,)                        {}
call_module    layer2_1_conv2         layer2.1.conv2                                           (layer2_1_relu,)                       {}
call_module    layer2_1_bn2           layer2.1.bn2                                             (layer2_1_conv2,)                      {}
call_function  add_3                  <built-in function add>                                  (layer2_0_relu_1, layer2_1_bn2)        {}
call_module    layer2_1_relu_1        layer2.1.relu                                            (add_3,)                               {}
call_module    layer3_0_conv1         layer3.0.conv1                                           (layer2_1_relu_1,)                     {}
call_module    layer3_0_bn1           layer3.0.bn1                                             (layer3_0_conv1,)                      {}
call_module    layer3_0_relu          layer3.0.relu                                            (layer3_0_bn1,)                        {}
call_module    layer3_0_conv2         layer3.0.conv2                                           (layer3_0_relu,)                       {}
call_module    layer3_0_bn2           layer3.0.bn2                                             (layer3_0_conv2,)                      {}
call_module    layer3_0_downsample_0  layer3.0.downsample.0                                    (layer2_1_relu_1,)                     {}
call_module    layer3_0_downsample_1  layer3.0.downsample.1                                    (layer3_0_downsample_0,)               {}
call_function  add_4                  <built-in function add>                                  (layer3_0_bn2, layer3_0_downsample_1)  {}
call_module    layer3_0_relu_1        layer3.0.relu                                            (add_4,)                               {}
call_module    layer3_1_conv1         layer3.1.conv1                                           (layer3_0_relu_1,)                     {}
call_module    layer3_1_bn1           layer3.1.bn1                                             (layer3_1_conv1,)                      {}
call_module    layer3_1_relu          layer3.1.relu                                            (layer3_1_bn1,)                        {}
call_module    layer3_1_conv2         layer3.1.conv2                                           (layer3_1_relu,)                       {}
call_module    layer3_1_bn2           layer3.1.bn2                                             (layer3_1_conv2,)                      {}
call_function  add_5                  <built-in function add>                                  (layer3_0_relu_1, layer3_1_bn2)        {}
call_module    layer3_1_relu_1        layer3.1.relu                                            (add_5,)                               {}
call_module    layer4_0_conv1         layer4.0.conv1                                           (layer3_1_relu_1,)                     {}
call_module    layer4_0_bn1           layer4.0.bn1                                             (layer4_0_conv1,)                      {}
call_module    layer4_0_relu          layer4.0.relu                                            (layer4_0_bn1,)                        {}
call_module    layer4_0_conv2         layer4.0.conv2                                           (layer4_0_relu,)                       {}
call_module    layer4_0_bn2           layer4.0.bn2                                             (layer4_0_conv2,)                      {}
call_module    layer4_0_downsample_0  layer4.0.downsample.0                                    (layer3_1_relu_1,)                     {}
call_module    layer4_0_downsample_1  layer4.0.downsample.1                                    (layer4_0_downsample_0,)               {}
call_function  add_6                  <built-in function add>                                  (layer4_0_bn2, layer4_0_downsample_1)  {}
call_module    layer4_0_relu_1        layer4.0.relu                                            (add_6,)                               {}
call_module    layer4_1_conv1         layer4.1.conv1                                           (layer4_0_relu_1,)                     {}
call_module    layer4_1_bn1           layer4.1.bn1                                             (layer4_1_conv1,)                      {}
call_module    layer4_1_relu          layer4.1.relu                                            (layer4_1_bn1,)                        {}
call_module    layer4_1_conv2         layer4.1.conv2                                           (layer4_1_relu,)                       {}
call_module    layer4_1_bn2           layer4.1.bn2                                             (layer4_1_conv2,)                      {}
call_function  add_7                  <built-in function add>                                  (layer4_0_relu_1, layer4_1_bn2)        {}
call_module    layer4_1_relu_1        layer4.1.relu                                            (add_7,)                               {}
call_module    avgpool                avgpool                                                  (layer4_1_relu_1,)                     {}
call_function  flatten                <built-in method flatten of type object at 0x112aac6c0>  (avgpool, 1)                           {}
call_module    fc                     fc                                                       (flatten,)                             {}
output         output                 output                                                   (fc,)                                  {}

Also tested with:

model = add_regularized_shortcut(resnet50(), Bottleneck, partial(StochasticDepth, p=0.0, mode="row"))
model = add_regularized_shortcut(mobilenet_v2(), InvertedResidual, partial(StochasticDepth, p=0.0, mode="row"))
model = add_regularized_shortcut(mobilenet_v3_small(), InvertedResidual, partial(StochasticDepth, p=0.0, mode="row"))

model = del_regularized_shortcut(efficientnet_b0(), block_types=StochasticDepth, op=None) # First delete original StochasticDepth
model = add_regularized_shortcut(model, MBConv, partial(StochasticDepth, p=0.0, mode="row"))

Affected by pytorch/pytorch#66197 and pytorch/pytorch#66335

jamesr66a

I think overall this looks OK. If I understand correctly, the procedure is:

Iterate through the named modules in the module hierarchy, and for each module that's part of the block_types of interest:
a. Add the shortcut module
b. trace the module and search for a residual connection (i.e. add node with two input and a placeholder input)
c. Replace the residual connection with the shortcut module