# Validating IcoCNN


## Used in section 4.1 and 5.1 of thesis

Notebook to run the Icosahedral validation task. Uses tensorboard. Results are stored in Validation/ subdirectory.

## General description of this notebook
The goal of this notebook is to find out whether we can reproduce the results given in "Gauge Equivariant Convolutional Networks and the Icosahedral CNN" by Cohen et al. 

There results from the paper are given in the following, they use the naming convention "trainingsettype/testsettype", where traininsettype and testsettype can be in {N, R, I}, where N corresponds to no-rotation, R to fully random rotation and I to a rotation of the symmetrygroup of the icosahedron.

Test set accuracies (averaged over 3 runs):

N/N: 99.43 <br>
N/I: 99.43 <br>
N/R: 69.99 <br>
I/I: 99.38 <br>
I/R: 66.26 <br>
R/R: 99.31 <br>

Our results (averaged over 3 runs):

N/N: 99.23 <br>
N/I: 99.34 <br>
N/R: 68.57 <br>
I/I: 99.27 <br>
I/R: 69.31 <br>
R/R: 99.37 <br>

In [5]:
# NN
print("NN",(99.32 + 99.33 + 99.03)/3)

# NI
print("NI",(99.41 + 99.32 + 99.30)/3)

# NR
print("NR",(69.00 + 70.72 + 66)/3)

# II

print("NI",(99.17 + 99.29 + 99.36)/3)

# IR
print("IR",(72.35 + 68.18 + 67.4)/3)

# RR
print("RR",(99.29 + 99.44 + 99.38) / 3)

NN 99.22666666666665
NI 99.34333333333332
NR 68.57333333333334
NI 99.27333333333333
IR 69.31
RR 99.37


## Network architecture
In the supplementary material of their paper Cohen et al. give details on their architecture. They state:
"Our main model consists of 7 convolution layers and 3 linear layers. The first layer is a scalar-to-regular gauge equivariant convolution layer and the following 6 layers are regular-to-regular layers. These layers have 8,16,16,24,24,32,64 output channels and stride 1, 2,1,2,1,2,1, respectively.

In between convolution layers, we use batch normalization and ReLU nonlinearities. When using batch normalization we average over groups of 6 feature maps, to make sure the operation is equivariant. [...].

After the convolution layers, we perform global pooling over spatial and orientation channels, yielding an invariant representation. We map these through 3 FC layers (with 64,32,10 channels) before applying softmax.

[...]

The models were trained for 60 epochs, or 1 epoch of the 60x augmented dataset (where each instance is transformed by each icosahedron symmetry $g \in \mathcal{I}$, or by a random rotation $g \in SO(3)$"

For this experiment, no specification for the batchsize is given. We try to reproduce their architecture as thoroghly as possible in the following.

In [1]:
# imports and definitions

import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch
import torch.utils.data as data_utils
from torch.utils.tensorboard import SummaryWriter

import gzip
import pickle

import os

import numpy as np
from torch.autograd import Variable
import argparse
from groupy.gconv.pytorch_gconv.p6_conv_axial import P6ConvZ2, P6ConvP6
from torch.nn import BatchNorm3d as IcoBatchNorm2d
from groupy.gconv.pytorch_gconv.pooling import plane_group_spatial_orientational_max_pooling

from icosahedron import Icosahedron, rand_rotation_icosahedron, rand_rotation_matrix, plot_voronoi, plot_voronoi_charts

from modules import g_padding_full

In [2]:
run_nr = 3
# select which transformations we want to be applied in the dataset that we want to open.
train_rot_type = "ico"
test_rot_type = "ico"
MNIST_PATH = "MNIST_data/sph_ico_mnist_train_{}_test_{}.gz".format(train_rot_type, test_rot_type)

DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

if train_rot_type in ["ico", "rot"]:
    NUM_EPOCHS = 1
elif train_rot_type == "norot":
    NUM_EPOCHS = 60
else:    
    raise ValueError("Training set rotation type is not valid")
    
if test_rot_type in ["ico", "rot"] and train_rot_type =="norot":
    TEST_INTERVAL = 5 # in this cases the test set is much bigger than the training set - do not compute every epoch
else:
    TEST_INTERVAL = 1
    
if test_rot_type not in ["ico", "rot", "norot"]:
    raise ValueError("Test set rotation type is not valid")    

# make sure that a MNIST file for the given configuration exists 
# and that we haven't run any experiments for this config yet    
if os.path.isdir('Validate_rerun/train_{}_test_{}_run_{}'.format(train_rot_type, test_rot_type, run_nr)):
    raise ValueError("A directory for this run already exists")
if not os.path.isfile(MNIST_PATH):
    raise ValueError("No MNIST file with the given specifications exists.")
    
BATCH_SIZE = 128
LEARNING_RATE = 5e-3  # use either this or default ADAM learning rate

In [3]:
# define some helper layers.  
class icoStridedP6ConvP6(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(icoStridedP6ConvP6, self).__init__()
        self.conv = P6ConvP6(in_channels = in_channels, 
            out_channels=out_channels, 
            kernel_size=3,
            stride=2)


    def forward(self, x):
        """
        Because we have g-padding the strided convolution is not trivial. 
        We need to add rows in order to maintain the right shape. 
        We do this by adding one row at the bottom of each chart. Afterwards we also need to g_pad the results.
        Assume x has shape (batchsize, n_channels, n_stabilizer, n_charts*height, width)
        """
        
        x = self.conv(x[...,1:,:])
        x = F.pad(x,(1,1,1,0)) # pad a single line on the bottom of the image combining the 5 charts
        x = x.view(x.shape[0], x.shape[1], x.shape[2], 5, -1, x.shape[-1])
        x = F.pad(x,(0,0,0,1))
        x = x.view(x.shape[0], x.shape[1], x.shape[2], -1, x.shape[-1])
        return x

In [4]:
# function to give dataloaders and datasets

def load_data(path, batch_size):

    with gzip.open(path, 'rb') as f:
        dataset = pickle.load(f)

    train_data = torch.from_numpy(
        dataset["train"]["images"][:, None, :, :].astype(np.float32))
    train_labels = torch.from_numpy(
        dataset["train"]["labels"].astype(np.int64))

    train_dataset = data_utils.TensorDataset(train_data, train_labels)
    train_loader = data_utils.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

    test_data = torch.from_numpy(
        dataset["test"]["images"][:, None, :, :].astype(np.float32))
    test_labels = torch.from_numpy(
        dataset["test"]["labels"].astype(np.int64))

    test_dataset = data_utils.TensorDataset(test_data, test_labels)
    test_loader = data_utils.DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

    return train_loader, test_loader, train_dataset, test_dataset

In [5]:
# reimplement the architecture from the paper:

class icoNet_original(nn.Module):

    def __init__(self):
        super(icoNet_original, self).__init__()        
        conv_n_out_channels = [8,16,16,24,24,32,64]
        fc_n_out_channels = [64,32,10]
        
        self.conv1 = P6ConvZ2(1, 
            out_channels=conv_n_out_channels[0], 
            kernel_size=3,
            padding=1)
        self.BN1 = IcoBatchNorm2d(conv_n_out_channels[0])

        self.conv2 = icoStridedP6ConvP6(in_channels=conv_n_out_channels[0],
            out_channels=conv_n_out_channels[1])
        self.BN2 = IcoBatchNorm2d(conv_n_out_channels[1])

        
        self.conv3 = P6ConvP6(in_channels=conv_n_out_channels[1],
            out_channels=conv_n_out_channels[2],
            kernel_size=3,
            padding=1)
        self.BN3 = IcoBatchNorm2d(conv_n_out_channels[2])
        
        self.conv4 = icoStridedP6ConvP6(in_channels=conv_n_out_channels[2],
            out_channels=conv_n_out_channels[3])
        self.BN4 = IcoBatchNorm2d(conv_n_out_channels[3])

        self.conv5 = P6ConvP6(in_channels=conv_n_out_channels[3],
            out_channels=conv_n_out_channels[4],
            kernel_size=3,
            padding=1)
        self.BN5 = IcoBatchNorm2d(conv_n_out_channels[4])
        
        self.conv6 = icoStridedP6ConvP6(in_channels=conv_n_out_channels[4],
            out_channels=conv_n_out_channels[5])
        self.BN6 = IcoBatchNorm2d(conv_n_out_channels[5])
        
        self.conv7 = P6ConvP6(in_channels=conv_n_out_channels[5],
            out_channels=conv_n_out_channels[6],
            kernel_size=3,
            padding=1)
        self.BN7 = IcoBatchNorm2d(conv_n_out_channels[6])
        
        self.FC1 = nn.Linear(conv_n_out_channels[6], fc_n_out_channels[0])
        self.BN_FC1 = nn.BatchNorm1d(fc_n_out_channels[0])
        self.FC2 = nn.Linear(fc_n_out_channels[0], fc_n_out_channels[1])
        self.BN_FC2 = nn.BatchNorm1d(fc_n_out_channels[1])
        
        # this is the final layer before the output
        self.FC3 = nn.Linear(fc_n_out_channels[1], fc_n_out_channels[2])
    

    def forward(self, x):
        """ Assume input has shape (batchsize, n_charts*height_chart, width)"""
        
        # first we need to pad the input with zeros, to have the right shape to apply g-padding
        x = x.view(x.shape[0], 1, 5, -1, x.shape[-1])
        x = F.pad(x,(1,1,1,1))
        x = x.view(x.shape[0], 1, 1, -1, x.shape[-1])
        
        
        #convolution 1
        g_padding_full(x, in_stab_size=1) # modifies x
        x = self.conv1(x)
        x = F.relu(self.BN1(x))
        # print("layer 1:", x.shape)        
        #convolution 2
        g_padding_full(x, in_stab_size=6)
        x = self.conv2(x)
        # print(x.shape)
        x = F.relu(self.BN2(x))
        # print("layer 2:", x.shape) 
        
        #convolution 3
        g_padding_full(x, in_stab_size=6)
        x = self.conv3(x)
        x = F.relu(self.BN3(x))
        # print("layer 3:", x.shape) 
        
        #convolution 4
        g_padding_full(x, in_stab_size=6)
        x = self.conv4(x)
        x = F.relu(self.BN4(x))
        # print("layer 4:", x.shape) 
        
        #convolution 5
        g_padding_full(x, in_stab_size=6)
        x = self.conv5(x)
        x = F.relu(self.BN5(x))
        # print("layer 5:", x.shape) 
        
        #convolution 6
        g_padding_full(x, in_stab_size=6)
        x = self.conv6(x)
        x = F.relu(self.BN6(x))
        # print("layer 6:", x.shape) 
        
        #convolution 7
        g_padding_full(x, in_stab_size=6)
        x = self.conv7(x)
        x = F.relu(self.BN7(x))
        # print("layer 7:", x.shape) 
        
        # pool over orientations and space
        g_padding_full(x, in_stab_size=6)
        x = plane_group_spatial_orientational_max_pooling(x)       
        
        # FC1:
        x = F.relu(self.BN_FC1(self.FC1(x)))
        
        # FC2:
        x = F.relu(self.BN_FC2(self.FC2(x)))
        
        # FC3: (final layer, so no BN and ReLU)
        x = self.FC3(x)
        
        return x

In [6]:
for run_nr in range(0,1):

    print("Load data")

    train_loader, test_loader, train_dataset, test_dataset = load_data(
        MNIST_PATH, BATCH_SIZE)

    classifier = icoNet_original()
    classifier.to(DEVICE)

    running_loss = 0.0
    writer = SummaryWriter('Validate_rerun/train_{}_test_{}_run_{}'.format(train_rot_type, test_rot_type, run_nr))

    print("Set up model")
    print("#params", sum(x.numel() for x in classifier.parameters()))

    criterion = nn.CrossEntropyLoss()
    criterion = criterion.to(DEVICE)

    optimizer = torch.optim.Adam(classifier.parameters())

    print("Start training")
    for epoch in range(NUM_EPOCHS):
        for i, (images, labels) in enumerate(train_loader):
            classifier.train()

            images = images.to(DEVICE)
            labels = labels.to(DEVICE)

            optimizer.zero_grad()
            outputs = classifier(images)
            loss = criterion(outputs, labels)
            loss.backward()

            optimizer.step()


            running_loss += loss.item()
            if i % 100 == 99:    # every 100 mini-batches...
                # ...log the running loss
                print('\rEpoch [{0}/{1}], Iter [{2}/{3}] Loss: {4:.4f}'.format(
                    epoch+1, NUM_EPOCHS, i+1, len(train_loader),
                    running_loss/100), end="")
                writer.add_scalar('training loss',
                                running_loss / 100,
                                epoch * len(train_loader) + i)
                running_loss = 0
        running_loss = 0
        print("\n")

        if epoch % TEST_INTERVAL == TEST_INTERVAL-1:
            correct = 0
            total = 0
            for images, labels in test_loader:

                classifier.eval()

                with torch.no_grad():
                    images = images.to(DEVICE)
                    labels = labels.to(DEVICE)

                    outputs = classifier(images)
                    _, predicted = torch.max(outputs, 1)
                    total += labels.size(0)
                    correct += (predicted == labels).long().sum().item()

            writer.add_scalar('test accuracy',
                              100 * correct / total,
                              epoch)
            print('Test Accuracy: {0}'.format(100 * correct / total))
            
    del test_loader
    del train_loader
    del test_dataset
    del train_dataset
    del classifier

Load data
Set up model
#params 232089
Start training
Epoch [1/1], Iter [28100/28125] Loss: 0.0010

Test Accuracy: 99.17


In [7]:
writer.close()