# Introducing Convolutional Layer with a Twist

I'm exicted to introduce `conv_twist`, a replacement of (and an improvement on) the good old **convolutional layer** widely used in Deep Learning models (rightfully called ConvNets or CNN) in Computer Vision. Famously introduced by Yann LeCun some 30 years ago into image classification, it became the source of the Deep Learning/Artificial Intelligence revolution with AlexNet in 2012. Rapid improvements on the architecture followed, most importantly the ResNet of 2015. Recently attention has somewhat shifted away from image classification, but convolutional layers are still the bread and butter of any Computer Vision models. What more can be said about convolutional layers, one might ask? The answer is a little bit of mathematics.

Long story short, here is the `conv_twist` in PyTorch, and you can easily swap out the 3x3 `Conv2d` in your model and plug this in, and train from scratch to see if it gives any improvement.

In [5]:
import torch
import torch.nn as nn
import torchvision

class conv_twist(nn.Module):  # replacing 3x3 Conv2d
    def __init__(self, ni, nf, init_max=1.5, stride=1):
        super(conv_twist, self).__init__()
        self.conv = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.conv_x = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.conv_y = nn.Conv2d(ni, nf, kernel_size=3, stride=stride, padding=1, bias=False)
        self.conv_x.weight.data = (self.conv_x.weight - self.conv_x.weight.flip(2).flip(3)) / 2  # make conv_x a "first-order operator" by symmetrizing it
        self.conv_y.weight.data = self.conv_x.weight.transpose(2,3).flip(3)                      # make conv_y a 90 degree rotation of convx
        self.center_x = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        self.center_y = nn.Parameter(torch.Tensor(nf), requires_grad=True)
        self.center_x.data.uniform_(-init_max, init_max)
        self.center_y.data.uniform_(-init_max, init_max)

    def forward(self, x):
        self.conv_x.weight.data = (self.conv_x.weight - self.conv_x.weight.flip(2).flip(3)) / 2  
        self.conv_y.weight.data = (self.conv_y.weight - self.conv_y.weight.flip(2).flip(3)) / 2
        x1 = self.conv(x)
        _, c, h, w = x1.size()
        XX = torch.from_numpy(np.indices((1,h,w))[2]*2/w).type(x.dtype).to(x.device) - self.center_x.view(-1,1,1)
        YY = torch.from_numpy(np.indices((1,h,w))[1]*2/h).type(x.dtype).to(x.device) - self.center_y.view(-1,1,1)
        return x1 + (XX * self.conv_x(x) + YY * self.conv_y(x))

Let's take a look at the weights in such a `conv_twist` model:

In [6]:
model = conv_twist(3,5)
for name, para in model.named_parameters():
    print(name, para)

center_x Parameter containing:
tensor([ 0.3023, -0.3360,  0.5309,  0.4433, -1.0316], requires_grad=True)
center_y Parameter containing:
tensor([-1.1532,  0.8339,  1.0706, -1.0671,  0.9563], requires_grad=True)
conv.weight Parameter containing:
tensor([[[[ 0.0571,  0.0091,  0.1836],
          [-0.1347,  0.1280,  0.0614],
          [-0.1534,  0.0817, -0.0577]],

         [[ 0.1923,  0.1592,  0.0034],
          [ 0.1896, -0.1555, -0.1734],
          [-0.0581, -0.1072, -0.1130]],

         [[ 0.1653,  0.0071,  0.1774],
          [-0.1583,  0.1580,  0.0131],
          [-0.1118, -0.0300, -0.1756]]],


        [[[ 0.1088, -0.1590, -0.0011],
          [-0.1373,  0.1578,  0.1774],
          [ 0.1487, -0.0590, -0.0774]],

         [[-0.0160, -0.0466,  0.1536],
          [ 0.1157,  0.0899, -0.0484],
          [-0.0265,  0.0195,  0.1479]],

         [[ 0.1583,  0.1361, -0.1898],
          [ 0.0298,  0.1011, -0.0624],
          [-0.0675, -0.0696, -0.0140]]],


        [[[ 0.0903,  0.0801,  0.0272],

If you take a look at the `conv_x` weights, you'd notice that each 3x3 kernel is symmetric (the numbers on the opposite ends are identical except for the signs, with the middle one always 0). That's what I mean by "symmetrizing" `conv_x`. You can also check that the `conv_x` and `conv_y` weights are identical but off by a 90 degree rotation.

In [7]:
model.conv_x.weight[0,0], model.conv_y.weight[0,0]

(tensor([[-0.0752, -0.0183,  0.0832],
         [-0.0698,  0.0000,  0.0698],
         [-0.0832,  0.0183,  0.0752]], grad_fn=<SelectBackward>),
 tensor([[-0.0832, -0.0698, -0.0752],
         [ 0.0183,  0.0000, -0.0183],
         [ 0.0752,  0.0698,  0.0832]], grad_fn=<SelectBackward>))

Why do I choose to initialize the weights this way? Well, I'll try to explain it later. But for now it's important to note that `conv_twist` is a lot bigger than the standard `Conv2d` layer, but not as much as it appears to be. This particular implementation, if I had done it properly, has about twice as many trainable weights as a single `Conv2d` layer.

## What are convolutions?

The standard story in Neural Networks is that the convolution operator captures the spatial relation of the pixels, so is particularly suited for image-related learning task, and has much fewer weights, than a generic linear map (fully connected layer). And the consensus has come down to using only 3x3 convolutions, and go deeper (i.e., many layers), hence *deep* learning.

What is perhaps not well-known is that different 3x3 kernels have very intuitive meanings, in terms of what it does to the image. For example, the Gaussian kernel in image processing "blurs" the image. We can do a little experiment to see: