# Group Equivariant Neural Networks

---

Notebook written by *Gabriele Cesa* (cesa.gabriele@gmail.com)


# 3. Equivariant Networks

We are finally ready to build our equivariant neural network.
An equivariant neural network starts with a *lifting layer* to map an input image $x \in X$ into a function $y \in Y$ over the group $p4$.
We can now alternate a sequence of non-linearities (e.g. ReLU) and *group-convolutions*, which map $Y$ to $Y$.

Since we will apply this network to the task of image classification, in the last layer we apply a *pooling* operation as is normally done in a CNN; this time, however, we also pool over the $4$ rotational channels.


### 3.1 Exercise. (5pt)

Explain why we need to pool over the $4$ rotational channels.
Why can't we keep the $4$ channels and use all of them as features of the final classifier?


#### 3.1 Insert Your Solution Here:

> Because the label function of the image is invariant to the rotation of the image


### 3.2 Implement A Deep Rotation Equivariant CNN (10 pt)

Fianlly, you can combine the layers you have implemented earlier to build a rotation equivariant CNN.
You model will take in input batches of $33 \times 33$ images with a single input channel.

The network performs a first *lifting layer* with $8$ output channels and is followed by $4$ *group convolution* with, respectively, $16$, $32$, $64$ and $128$ output channels.
All convolutions have kernel size $3$, padding $1$ and stride $1$ and should use the bias.
All convolutions are followed by `torch.nn.MaxPool3d` and `torch.nn.ReLU`.
Note that we use `MaxPool3d` rather than `MaxPool2d` since our feature tensors have $5$ dimensions (there is an additional dimension of size $4$).
In all pooling layers, we will use a kernel of size $(1, 3, 3)$, a stride of $(1, 2, 2)$ and a padding of $(0, 1, 1)$.
This ensures pooling is done only on the spatial dimensions, while the rotational dimension is preserved.
The last pooling layer, however, will also pool over the rotational dimension so it will use a kernel of size $(4, 3, 3)$, stride $(1, 1, 1)$ and padding $(0, 0, 0)$.

Finally, the features extracted from the convolutional network are used in a linear layer to classify the input in $10$ classes.
You don't need to apply a softmax layer on top.

Follow the given template:

In [None]:
class C4CNN(torch.nn.Module):
  def __init__(self, n_classes=10):

    super(C4CNN, self).__init__()

    channels = [8, 16, 32, 64, 128]

    ### BEGIN SOLUTION
    layers = [
            LiftingConv2d(in_channels = 1, out_channels = channels[0], kernel_size = 3, padding=1, bias=True),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            GroupConv2d(in_channels = channels[0], out_channels = channels[1], kernel_size = 3, padding=1, bias=True),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            torch.nn.ReLU(inplace=True),
            GroupConv2d(in_channels = channels[1], out_channels = channels[2], kernel_size = 3, padding=1, bias=True),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            torch.nn.ReLU(inplace=True),
            GroupConv2d(in_channels = channels[2], out_channels = channels[3], kernel_size = 3, padding=1, bias=True),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            torch.nn.ReLU(inplace=True),
            GroupConv2d(in_channels = channels[3], out_channels = channels[4], kernel_size = 3, padding=1, bias=True),
            torch.nn.MaxPool3d(kernel_size=(4, 3, 3), stride=(1, 1, 1), padding=(0, 0, 0)),
            torch.nn.ReLU(inplace=True),
            
    ]
    self.layers = torch.nn.Sequential(*layers)
    self.head = torch.nn.Linear(channels[4], n_classes)
    ### END SOLUTION

  def forward(self, input: torch.Tensor):
    
    ### BEGIN SOLUTION
    fts = self.layers(input)
    fts = fts.reshape(input.shape[0], -1)
    
    return self.head(fts)
    ### END SOLUTION


In [None]:
# Let's try our model

net = C4CNN()

x = torch.randn(5, 1, 33, 33)

y = net(x)

assert y.shape == (5, 10)

# Let's check if the model is invariant!

gx = rotate(x, 1)

gy = net(gx)

assert torch.allclose(y, gy, atol=1e-5, rtol=1e-5)

Let's try now to train out model.
We will train the network on rotated MNIST.

First of all, we need to download the dataset.

Then, we will provide a dataloader for the dataset and a training script.

In [None]:
# download the dataset
!wget -nc http://www.iro.umontreal.ca/~lisa/icml2007data/mnist_rotation_new.zip
# uncompress the zip file
!unzip -n mnist_rotation_new.zip -d mnist_rotation_new

--2021-08-09 22:36:39--  http://www.iro.umontreal.ca/~lisa/icml2007data/mnist_rotation_new.zip
Resolving www.iro.umontreal.ca (www.iro.umontreal.ca)... 132.204.26.36
Connecting to www.iro.umontreal.ca (www.iro.umontreal.ca)|132.204.26.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58424278 (56M) [application/zip]
Saving to: ‘mnist_rotation_new.zip’


2021-08-09 22:36:44 (11.7 MB/s) - ‘mnist_rotation_new.zip’ saved [58424278/58424278]

Archive:  mnist_rotation_new.zip
  inflating: mnist_rotation_new/mnist_all_rotation_normalized_float_train_valid.amat  
  inflating: mnist_rotation_new/mnist_all_rotation_normalized_float_test.amat  


In [None]:
from torch.utils.data import Dataset
from torchvision.transforms import ToTensor
import tqdm

from PIL import Image

device = 'cuda' if torch.cuda.is_available() else 'cpu'


class MnistRotDataset(Dataset):
    
    def __init__(self, mode, transform=None):
        assert mode in ['train', 'test']
            
        if mode == "train":
            file = "mnist_rotation_new/mnist_all_rotation_normalized_float_train_valid.amat"
        else:
            file = "mnist_rotation_new/mnist_all_rotation_normalized_float_test.amat"
        
        self.transform = transform

        data = np.loadtxt(file, delimiter=' ')
        
        self.labels = data[:, -1].astype(np.int64)
        self.num_samples = len(self.labels)    
        self.images = data[:, :-1].reshape(-1, 28, 28).astype(np.float32)

        # images in MNIST are only 28x28
        # we pad them to have shape 33 x 33
        self.images = np.pad(self.images, pad_width=((0,0), (2, 3), (2, 3)), mode='edge')

        assert self.images.shape == (self.labels.shape[0], 33, 33)
    
    def __getitem__(self, index):
        image, label = self.images[index], self.labels[index]
        image = Image.fromarray(image)
        if self.transform is not None:
            image = self.transform(image)
        return image, label
    
    def __len__(self):
        return len(self.labels)


train_set = MnistRotDataset('train', ToTensor())
test_set = MnistRotDataset('test', ToTensor())

def train_model(model: torch.nn.Module):

  train_loader = torch.utils.data.DataLoader(train_set, batch_size=64)
  loss_function = torch.nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(model.parameters(), lr=5e-5, weight_decay=1e-5)

  model.to(device)
  model.train()

  for epoch in tqdm.tqdm(range(30)):
    
    for i, (x, t) in enumerate(train_loader):
        
        optimizer.zero_grad()

        x = x.to(device)
        t = t.to(device)

        y = model(x)

        loss = loss_function(y, t)

        loss.backward()

        optimizer.step()
    
  return model


def test_model(model: torch.nn.Module):
  test_loader = torch.utils.data.DataLoader(test_set, batch_size=64)
  total = 0
  correct = 0
  with torch.no_grad():
      model.eval()
      for i, (x, t) in tqdm.tqdm(enumerate(test_loader)):

          x = x.to(device)
          t = t.to(device)
          
          y = model(x)

          _, prediction = torch.max(y.data, 1)
          total += t.shape[0]
          correct += (prediction == t).sum().item()
  accuracy = correct/total*100.

  return accuracy

You can now trian and test your neural network.
With the default parameters you should achieve an accuracy of roughly **93-94%**.
Feel free to adapt the training procedure to improve the performance of the model.

In [None]:
model = C4CNN()

model = train_model(model)

acc = test_model(model)
print(f'Test Accuracy: {acc :.3f}')


100%|██████████| 30/30 [37:46<00:00, 75.54s/it]
782it [02:20,  5.57it/s]

Test Accuracy: 93.240





In [None]:
# Let's check if the model is still invariant!

x = torch.randn(5, 1, 33, 33)

y = net(x)

gx = rotate(x, 1)

gy = net(gx)

assert torch.allclose(y, gy, atol=1e-5, rtol=1e-5)

### 3.3 Implement Equivariant CNN with Isotropic Convolution (5pt)

For comparison, implement a similar network by using the `IsotropicConv2d` module you implement before.
The model should have the same structure (kernel size, channels, padding, etc..) of the previous model.
Except for the last one, in all convolutional layers you should increase by a factor of $4$ the number of output channels since the output of `IsotropicConv2d` is $4$ times smaller than the output of `GroupConv2d`, if the same value for `out_channels` is used.
Note that in this model you will use `MaxPool2d` rather than `MaxPool3d`.
However, in the last convolutional layer you can keep the same number of channels; we will not need to perform pooling here.

Follow the given template:

In [None]:
class IsotropicCNN(torch.nn.Module):
  def __init__(self, n_classes=10):

    super(IsotropicCNN, self).__init__()

    old_channels = [8, 16, 32, 64, 128]

    channels = [4*c for c in old_channels[:-1]] + [old_channels[-1]]

    ### BEGIN SOLUTION
    layers = [
            IsotropicConv2d(in_channels=1, out_channels=channels[0], bias=True),  
            torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            IsotropicConv2d(in_channels = channels[0], out_channels = channels[1], bias=True),
            torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            torch.nn.ReLU(inplace=True),
            IsotropicConv2d(in_channels = channels[1], out_channels = channels[2], bias=True),
            torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            torch.nn.ReLU(inplace=True),
            IsotropicConv2d(in_channels = channels[2], out_channels = channels[3], bias=True),
            torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            torch.nn.ReLU(inplace=True),
            IsotropicConv2d(in_channels = channels[3], out_channels = channels[4], bias=True),
            torch.nn.MaxPool2d(kernel_size=3, stride=1, padding=0),
            torch.nn.ReLU(inplace=True),
            
    ]
    self.layers = torch.nn.Sequential(*layers)
    self.head = torch.nn.Linear(channels[4], n_classes)

    ### END SOLUTION

  def forward(self, input: torch.Tensor):
    
    ### BEGIN SOLUTION
    fts = self.layers(input)
    fts = fts.reshape(input.shape[0], -1)
    
    return self.head(fts)

    ### END SOLUTION


In [None]:
# Let's try our model

net = IsotropicCNN()

x = torch.randn(5, 1, 33, 33)

y = net(x)

assert y.shape == (5, 10)

# Let's check if the model is invariant!

gx = rotate(x, 1)

gy = net(gx)

assert torch.allclose(y, gy, atol=1e-5, rtol=1e-5)

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


You can now train and test your model.
With the default parameters you should achieve an accuracy of roughly **45-50%**.

In [None]:
model = IsotropicCNN()

model = train_model(model)

acc = test_model(model)
print(f'Test Accuracy: {acc :.3f}')


100%|██████████| 30/30 [38:31<00:00, 77.04s/it] 
782it [01:52,  6.93it/s]

Test Accuracy: 52.976





In [None]:
# Let's check if the model is still invariant!

x = torch.randn(5, 1, 33, 33)

y = net(x)

gx = rotate(x, 1)

gy = net(gx)

assert torch.allclose(y, gy, atol=1e-5, rtol=1e-5)

### 3.4 Exercise: Analyse the results (5pt)
Do you note any difference between the performance of the `IsotropicCNN` and the `C4CNN`?
What do you think causes this gap? Explain.


#### 3.4 Insert Your Solution Here:

> YOUR ANSWER

### 3.5 Batch Normalization (10 pt)

Batch normalization is a common module in most deep neural networks.
How can we define BatchNormalization to make it equivariant?

Describe how an equivariant batch normalization can be constructed.
Then, implement a batch normalization layer.
Finally, build a new CNN model similar to `C4CNN` which uses your batchnormalization after each convolution.
Train your model and test that it is still equivariant after training.


#### 3.5 Insert Your Solution Here:

> YOUR ANSWER

In [None]:
class C4CNNWithBatchNorm(torch.nn.Module):
  def __init__(self, n_classes=10):

    super(C4CNNWithBatchNorm, self).__init__()

    channels = [8, 16, 32, 64, 128]

    ### BEGIN SOLUTION
    layers = [
            LiftingConv2d(in_channels = 1, out_channels = channels[0], kernel_size = 3, padding=1, bias=True),
            torch.nn.InstanceNorm3d(num_features=channels[0]),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            GroupConv2d(in_channels = channels[0], out_channels = channels[1], kernel_size = 3, padding=1, bias=True),
            torch.nn.InstanceNorm3d(num_features=channels[1]),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            torch.nn.ReLU(inplace=True),
            GroupConv2d(in_channels = channels[1], out_channels = channels[2], kernel_size = 3, padding=1, bias=True),
            torch.nn.InstanceNorm3d(num_features=channels[2]),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            torch.nn.ReLU(inplace=True),
            GroupConv2d(in_channels = channels[2], out_channels = channels[3], kernel_size = 3, padding=1, bias=True),
            torch.nn.InstanceNorm3d(num_features=channels[3]),
            torch.nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)),
            torch.nn.ReLU(inplace=True),
            GroupConv2d(in_channels = channels[3], out_channels = channels[4], kernel_size = 3, padding=1, bias=True),
            torch.nn.InstanceNorm3d(num_features=channels[4]),
            torch.nn.MaxPool3d(kernel_size=(4, 3, 3), stride=(1, 1, 1), padding=(0, 0, 0)),
            torch.nn.ReLU(inplace=True),
            
    ]
    self.layers = torch.nn.Sequential(*layers)
    self.head = torch.nn.Linear(channels[4], n_classes)

    ### END SOLUTION

  def forward(self, input: torch.Tensor):
    
    ### BEGIN SOLUTION
    fts = self.layers(input)
    fts = fts.reshape(input.shape[0], -1)

    return self.head(fts)

    ### END SOLUTION


# Let's try our model

net = C4CNNWithBatchNorm()

x = torch.randn(5, 1, 33, 33)

y = net(x)

assert y.shape == (5, 10)

# Let's check if the model is invariant!

gx = rotate(x, 1)

gy = net(gx)

assert torch.allclose(y, gy, atol=1e-5, rtol=1e-5)

You can now train and test your model.
With the default parameters you should achieve an accuracy of roughly **95%**.

In [None]:
model = C4CNNWithBatchNorm()

model = train_model(model)

acc = test_model(model)
print(f'Test Accuracy: {acc :.3f}')


# Let's check if the model is still invariant!
x = torch.randn(5, 1, 33, 33)
y = net(x)
gx = rotate(x, 1)
gy = net(gx)
assert torch.allclose(y, gy, atol=1e-5, rtol=1e-5)


 60%|██████    | 18/30 [29:22<19:20, 96.73s/it]

# 4 Bonus Question (15 pt)

This bonus question is not required to get 100% of the points.
Solving this question, however, will give you a few additional points.
We recommend solving this question only after you completed the rest of the notebook.

So far, we have only considered rotation equivariance.
At the beginning of the notebook, we defined the group $D_4$ of rotations and reflections.
Together with translations, this forms the group $p4m$.

In this exercise, you need to build a `D4CNN` model.
This model is similar to `C4CNN` but the group convolutions will also include reflections.
This means that, instead of $4$ rotational channels, the features will have $8$ channels (1 for each rotation and reflection).
You should first implement the corresponding `D4LiftingConv2d` and `D4GroupConv2d` and then use them to build the `D4CNN`.

Finally, test that the model is equivariant to both rotations and translations.

