## ConvBlock

#### Import library

In [2]:
import torch
from convblock import ConvBlock, ConvBranches, Config

Let's create simple 2D convolutional block consisting of two convolutions followed by **batch norm** and **relu** activations:

In [4]:
ConvBlock(input_shape=(32, 32, 32), layout='cna cna', c=dict(kernel_size=[3, 3], filters=[16, 16]))

ConvBlock(
  (Module_0): Conv(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
  (Module_1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Module_2): ReLU(inplace=True)
  (Module_3): Conv(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
  (Module_4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Module_5): ReLU(inplace=True)
)

Input shape is provided one time for each block, number of output channels and padding values are automaticallyt computed during **ConvBlock** construction. **ConvBlock** has following properties:

In [10]:
block = ConvBlock(input_shape=(32, 32, 32), layout='cna cna', c=dict(kernel_size=[3, 3], stride=[1, 2], filters=[16, 16]))

In [11]:
block.stride, block.input_shape, block.output_shape, block.in_channels, block.out_channels

((2.0, 2.0), array([32, 32, 32]), array([16, 16, 16]), 32, 16)

Say, for trasposed convolution **stride** will be 0.5 across two dimensions:

In [13]:
block = ConvBlock(input_shape=(32, 32, 32), layout='cna t',
                  c=dict(kernel_size=3, stride=1, filters=16),
                  t=dict(kernel_size=3, stride=2, filters=17))

In [14]:
block.stride, block.input_shape, block.output_shape, block.in_channels, block.out_channels

((0.5, 0.5), array([32, 32, 32]), array([17, 64, 64]), 32, 17)

For each basic typical operation like *convolution*, *pooling*, *dropout*, *batchnorm*, *instancenorm*, *activation* and etc there is a one-char shortcut, that can be specified in layout. In example above layout can be viewed as a simple sequence of **convolution**, **batch normalization** and **activation**. Note that default value for activation is 'relu'. If one would like to change **batch normalization** to **instance normalization** without affine parameters, it would be as simple as following

In [5]:
ConvBlock(input_shape=(32, 32, 32), layout='cia cia', i=dict(affine=False), c=dict(kernel_size=[3, 3], filters=[16, 16]))

ConvBlock(
  (Module_0): Conv(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
  (Module_1): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (Module_2): ReLU(inplace=True)
  (Module_3): Conv(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
  (Module_4): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (Module_5): ReLU(inplace=True)
)

As you can see affine parameter was set to **False** for all **instance normalization** operations in block. This automatic vectorization of parameters of layout on par with spatial unsqueezing(more on that latter) allow us to simplify convolutional block design.

You can get list of all avaliable options of layout using folowing code:

In [7]:
ConvBlock.get_options()

{'shortcut': <function convblock.pytorch.layers.conv_block.res_shortcut(input_shape, output_shape, layout='cna', kernel_size=1, stride=1, dilation=1, groups=1, bias=False, pool_size=2, pool_mode='max', allow_identity=True, broadcast=True, mode='+', filters=None, downsample_mode='c', **kwargs)>,
 'c': convblock.pytorch.layers.conv.Conv,
 't': convblock.pytorch.layers.conv.ConvTransposed,
 '<': convblock.pytorch.layers.layers.Flatten,
 'u': convblock.pytorch.layers.layers.Upsample,
 'n': convblock.pytorch.layers.layers.BatchNorm,
 'i': convblock.pytorch.layers.layers.InstanceNorm,
 'f': convblock.pytorch.layers.layers.Linear,
 'l': convblock.pytorch.layers.layers.Lambda,
 's': convblock.pytorch.layers.layers.ChannelsShuffle,
 'd': convblock.pytorch.layers.layers.Dropout,
 'p': <function convblock.pytorch.layers.pool.Pool(input_shape, kernel_size=3, stride=2, dilation=1, mode='max', padding='constant', norm_type=1.0, output_size=None)>,
 '>': convblock.pytorch.layers.pool.GlobalPool,
 'g'

It seems that most of them are quite intuitive to understand except maybe **shortcut** option that is the special option and is not one-char. More on shortcuts you can find in **Residual Connections** part of tutorial

### Automatic parameter vectorization

**ConvBlock** inner design allows you not to duplicate parameters of the same type operations provided in layout in case they are the same for all operations of that type. Also some operations have default values for their parameters(stride=1 for convolution operation 'c' layout option) and activation='relu' for activationn ('a' layout option). So in case of our simple block you may avoid providing filters for both convolutions and set in simply to 16. **ConvBlock** will automatically detect number of convolutions in layout and duplicate the parameter:

In [8]:
ConvBlock(input_shape=(32, 32, 32), layout='cna cna', c=dict(kernel_size=[3, 3], filters=16))

ConvBlock(
  (Module_0): Conv(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
  (Module_1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Module_2): ReLU(inplace=True)
  (Module_3): Conv(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
  (Module_4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Module_5): ReLU(inplace=True)
)

If you want to use (1, 3) => (3, 1) stack of convolutions instead(like it's implemented in **inception** architecture) it would be as simple ass following

In [9]:
ConvBlock(input_shape=(32, 32, 32), layout='cna cna', c=dict(kernel_size=[(1, 3), (3, 1)], filters=16))

ConvBlock(
  (Module_0): Conv(32, 16, kernel_size=(1, 3), stride=(1, 1), padding=(0, 0, 1, 1), padding_mode=constant, bias=False)
  (Module_1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Module_2): ReLU(inplace=True)
  (Module_3): Conv(16, 16, kernel_size=(3, 1), stride=(1, 1), padding=(1, 1, 0, 0), padding_mode=constant, bias=False)
  (Module_4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Module_5): ReLU(inplace=True)
)

Note that padding is automatically computed for convolutions and pooling operations to emulated the 'SAME' mode

### Residual Connections

There are three types of residual connections implemented in library: add, concat and mul. Each of residual connection type requires specification of starting and end point of residual connection by corresponding shortcut(+ for add, * for mul and . for concat). So, let's add residual connection to our simple block:

In [15]:
ConvBlock(input_shape=(32, 32, 32), layout='+ cna cna +', c=dict(kernel_size=3, filters=[16, 32]))

Sequential(
  (0): Branches(
    (branches): ModuleList(
      (0): ConvBlock(
        (Module_0): Conv(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
        (Module_1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_2): ReLU(inplace=True)
        (Module_3): Conv(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
        (Module_4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_5): ReLU(inplace=True)
      )
      (1): Identity()
    )
  )
)

We use the same concept of residual connection as used in **ResNet** paper: if output shape does not match input shape then use 1x1 convolution otherwise use raw input. Of course there is a way to customize behaviour shortcut  in **ConvBlock**:

In [16]:
ConvBlock(input_shape=(32, 32, 32),
          layout='+ cna cna +',
          c=dict(kernel_size=3, filters=[16, 32]),
          shortcut={'allow_identity': False})

Sequential(
  (0): Branches(
    (branches): ModuleList(
      (0): ConvBlock(
        (Module_0): Conv(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
        (Module_1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_2): ReLU(inplace=True)
        (Module_3): Conv(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
        (Module_4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_5): ReLU(inplace=True)
      )
      (1): ConvBlock(
        (Module_0): Conv(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (Module_1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_2): ReLU(inplace=True)
      )
    )
  )
)

As you can see, **ConvBlock** was forced to use shorcut convolution by setting **allow_identity=False**. One can also customized the layout of shortcut layer:

In [19]:
ConvBlock(input_shape=(32, 32, 32),
          layout='+ cna cna cn +',
          c=dict(kernel_size=[1, 3, 1], filters=[8, 8, 32], stride=[1, 2, 1]),
          shortcut=dict(layout='c', kernel_size=3))

Sequential(
  (0): Branches(
    (branches): ModuleList(
      (0): ConvBlock(
        (Module_0): Conv(32, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (Module_1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_2): ReLU(inplace=True)
        (Module_3): Conv(8, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
        (Module_4): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_5): ReLU(inplace=True)
        (Module_6): Conv(8, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (Module_7): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_8): ReLU(inplace=True)
      )
      (1): ConvBlock(
        (Module_0): Conv(32, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
      )
    )
  )
)

Adding post activation is really easy:

In [23]:
ConvBlock(input_shape=(32, 32, 32),
          layout='+ cna cna cn + a',
          c=dict(kernel_size=[1, 3, 1], filters=[8, 8, 32], stride=[1, 2, 1]),
          shortcut=dict(layout='c', kernel_size=3))

Sequential(
  (0): Branches(
    (branches): ModuleList(
      (0): ConvBlock(
        (Module_0): Conv(32, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (Module_1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_2): ReLU(inplace=True)
        (Module_3): Conv(8, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
        (Module_4): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_5): ReLU(inplace=True)
        (Module_6): Conv(8, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (Module_7): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ConvBlock(
        (Module_0): Conv(32, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
      )
    )
  )
  (1): ConvBlock(
    (Module_0): ReLU(inplace=True)
  )
)

Shortcut convolution was set to use 3x3 kernel and we changed layout to 'c' insted of default 'cna'

ConvBlock allows to use several shortcut connections in one block followed one by one. You can also combine different types of residual connections. Let's suppose that you want to create a more complicated block with 'concat' residual connection similiar to one used in **DenseNet**:

In [24]:
ConvBlock(input_shape=(32, 32, 32),
          layout='. cna cna .' * 4,
          c=dict(kernel_size=[1, 3] * 4,
                 filters=[8, 32] * 4))

Sequential(
  (0): Branches(
    (branches): ModuleList(
      (0): ConvBlock(
        (Module_0): Conv(32, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (Module_1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_2): ReLU(inplace=True)
        (Module_3): Conv(8, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
        (Module_4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_5): ReLU(inplace=True)
      )
      (1): Identity()
    )
  )
  (1): Branches(
    (branches): ModuleList(
      (0): ConvBlock(
        (Module_0): Conv(64, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (Module_1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (Module_2): ReLU(inplace=True)
        (Module_3): Conv(8, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=

#### Residual in Residual pattern

In [27]:
ConvBlock(input_shape=(32, 32, 32),
          layout='cna . + cna cn + cn . a',
          c=dict(kernel_size=[5, 1, 3, 5],
                 filters=[8, 8, 32, 32]))

Sequential(
  (0): ConvBlock(
    (Module_0): Conv(32, 8, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2, 2, 2), padding_mode=constant, bias=False)
    (Module_1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (Module_2): ReLU(inplace=True)
  )
  (1): Branches(
    (branches): ModuleList(
      (0): Sequential(
        (0): Branches(
          (branches): ModuleList(
            (0): ConvBlock(
              (Module_0): Conv(8, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (Module_1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (Module_2): ReLU(inplace=True)
              (Module_3): Conv(8, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
              (Module_4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            )
            (1): ConvBlock(
              (Module_0): Conv(8, 32, kernel_size=(

Currently **ConvBlock** does not support residual in residual pattern for the same residual type because of the design restriction. + ... + ... + ... + pattern will be interpreted as sequence of two residuals followed one by one

Finnaly, let's create a ResBlock with SE block for self attention:

In [31]:
ConvBlock(input_shape=(32, 32, 32),
          layout='+ cna cna cn * p cna cna * + a',
          p=dict(mode='avg', output_size=1),
          a=dict(activation=['relu', 'relu', 'relu', 'sigmoid', 'relu']),
          c=dict(kernel_size=[1, 3, 1, 1, 1], filters=[8, 8, 32, 8, 32]))

Sequential(
  (0): Branches(
    (branches): ModuleList(
      (0): Sequential(
        (0): ConvBlock(
          (Module_0): Conv(32, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (Module_1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (Module_2): ReLU(inplace=True)
          (Module_3): Conv(8, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1, 1, 1), padding_mode=constant, bias=False)
          (Module_4): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (Module_5): ReLU(inplace=True)
          (Module_6): Conv(8, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (Module_7): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): Branches(
          (branches): ModuleList(
            (0): ConvBlock(
              (Module_0): AdaptiveAvgPool(input_shape=[32 32 32], output_shape=[32  1  1])
              (Module_1): Conv(32, 8,

Using **ConvBlock** allows you to design complex convolutional modules with non-linear sequential sturcture using just a few lines of code. In second tutorial you will learn to use **ConvBranches** module that allow to create even more complex structures with several branches.