## Permutation Group (1pt):
Consider the permutation group S(n).
A transposition, written as (a,b), switches element a and b, e.g., (1,3) indicates that the sequence 1,2,3 will be mapped to 3,2,1.
Show that the set of transpositions (1,2), ..., (1,n) serves as a generator for S(n).

YOUR ANSWER HERE

## Group Action (2pts):

There are three sticks, and three coloured disks with holes in them located on the sticks (see sketch).

![sketch](sketch.png)

Take the symmetry group S(3) and define that a transposition (a, b) acts on the configuration as "take a disk from stick a and move it to stick b".
**a)** Does this result in a valid group action? 
**b)** What if the action is instead "exchange disks A <-> B"?

YOUR ANSWER HERE

Can you define a group action that corresponds to rearranging the disks? How many dimensions does it have?
(it is not necessary to define _all_ matrices, a  description how to construct them is sufficient).

YOUR ANSWER HERE

#### Hint:
This exercise is deliberately somewhat vague and uses a weird example. There are multiple, potentially contradictory, ways to answer this. The important part is not the answer itself, but how it is justified.

## Two-dimensional isometries (1pt)
Consider the group of translations and rotations in two dimensions. Using the trick presented in the lecture, write down a faithful group representation. Let (theta, (u, v)) correspond to a rotation by angle theta, followed by a shift by (u, v). 
Write down R(theta, (u, v)). Show that your matrix effects the right transformation when applied to a vector, and  that the product of two such matrices correspond to the composition of group elements.

YOUR ANSWER HERE

## Group Smoothing (2pts)

In [None]:
!pip install torch torchvision

In [None]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

Get CIFAR data. We are interested in the data-scarce regime, so we subsample the  training set down to just two percent of the original data.

In [None]:
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
batch_size = 8
trainset = torchvision.datasets.CIFAR10(root='/coursedata', train=True,
                                      download=True, transform=transform)
trainset, valset, _ = torch.utils.data.random_split(trainset, [0.02, 0.02, 0.96], generator=torch.Generator().manual_seed(42))
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True)

valloader = torch.utils.data.DataLoader(valset, batch_size=batch_size,
                                        shuffle=False)

testset = torchvision.datasets.CIFAR10(root='/coursedata', train=False,
                                     download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False)

Define a simple MLP model.

In [None]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(32*32*3, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Define a training loop with early stopping based on the validation set

In [None]:
def evaluate(net, dataloader):
    correct = 0
    total = 0
    with torch.no_grad():
        for data in dataloader:
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    return correct, total

def train_and_evaluate(net):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.002, momentum=0.9)
    best = 0
    patience = 0

    for epoch in range(100):  
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):  # loop over the dataset
            inputs, labels = data

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        correct, total = evaluate(net, valloader)        
        print(f'[{epoch + 1}] loss: {running_loss / len(trainset):.3f} eval: {100 * correct / total:.2f}')
        running_loss = 0.0
        
        if correct < best:
            patience += 1
            if patience > 10:
                # early stop
                break
        else:
            torch.save(net.state_dict(), "ckp")
            best = correct
            patience = 0
    
    # reload best weights
    net.load_state_dict(torch.load("ckp", weights_only=True))

Train a standard MLP

In [None]:
net = MLP()
train_and_evaluate(net)

Fill in the code below and implement a model wrapper that takes an arbitrary pytorch model, and makes a group-smoothed version for the D4 group (rotation+flip).

In [None]:
class D4Net(nn.Module):
    def __init__(self, backbone: nn.Module):
        super().__init__()
        self.backbone = backbone

    def forward(self, x):
        ### YOUR CODE HERE
        return self.backbone(x)
        ### YOUR CODE HERE

Check that the resulting network is in fact invariant.

In [None]:
fake_image_a = torch.zeros((1, 3, 32, 32))
fake_image_a[0, 0, 0, 1] = 1.0

fake_image_b = torch.zeros((1, 3, 32, 32))
fake_image_b[0, 0, 0, 30] = 1.0

assert(np.allclose(D4Net(net)(fake_image_a).detach().numpy(), D4Net(net)(fake_image_b).detach().numpy()))

fake_image_b = torch.zeros((1, 3, 32, 32))
fake_image_b[0, 0, 30, 0] = 1.0

assert(np.allclose(D4Net(net)(fake_image_a).detach().numpy(), D4Net(net)(fake_image_b).detach().numpy()))

Evaluate both the original model, and its group-smoothed version

In [None]:
c, t = evaluate(net, testloader)
c4, t4 = evaluate(D4Net(net), testloader)
print(f"Accuracy: {100 * c / t:.2f}% vs {100 * c4 / t4:.2f}%")

Now train a group-smoothed net from scratch

In [None]:
d4net = D4Net(MLP())
train_and_evaluate(d4net)

And evaluate this, too

In [None]:
c4, t4 = evaluate(d4net, testloader)
print(f"Accuracy: {100 * c / t:.2f}% vs {100 * c4 / t4:.2f}%")

What is your interpretation of this result? Does it contradict the lecture?

YOUR ANSWER HERE