Assignment 12: Radial Basis Function Network
============================================


Microsoft Forms Document: https://forms.office.com/r/Zyre1SxphD


In our experiments we will apply this layer as the prior-to-last layer in our classification network, and try to learn two-dimensional deep feature representations for the MNIST dataset that are able to classify the digits into 10 different classes.

We will use three different parameters here:

* $D$ is the dimensionality of the deep features that feed into the RBF layer. We will use $D=2$ in out experiments to be able to visualize the computed feature space.
* $K$ is the number of basis functions, i.e., the output dimension of the RBF layer.
* $O$ is the number of outputs. Since we will work with MNIST, $O=10$ in our experiments.


Task 1: Dataset
---------------

We will make use of the default implementations of the MNIST dataset.
As usual, we will need the training and validation set splits of MNIST, including data loaders.
Select appropriate batch sizes for training and validation set.

In [None]:
import torch
import torchvision

# training set and data loader
train_loader = ...

# validation set and data loader
validation_loader = ...

device = torch.device("cuda")

Task 2: Radial Basis Function Layer
-----------------------------------
Implement the RBF layer as a `torch.nn.Module`.
In `__inut__`, store the weight matrix (aka. the basis functions) as a `torch.nn.Parameter`, and initialize them randomly with values in range $[-2,2]$.
In `forward`, compute and return the distances of the input to all *basis functions*, i.e., vectors stored in out weight matrix.

In [None]:
class RBFLayer(torch.nn.Module):
  def __init__(self, K, R):
    # call base class constructor
    super(RBFLayer, self).__init__()
    # store a parameter for the basis functions
    self.W = ...
    # initialize the matrix between -2 and 2
    ...

  def forward(self, x):
    # collect the required shape parameters, B, R, K
    B, R, K = ...
    # Bring the weight matrix of shape R,K to size B,R,K by adding batch dimension (B, dim 0)
    W = ...
    # Bring the input matrix of shape B,K to size B,R,K by adding R dimension (dim=1)
    X = ...
    # compute the activation 
    A = ...
    return A

Task 3: Radial Basis Function Activation
----------------------------------------
The activation function also requires a `Parameter`, i.e., the standard deviations of the Gaussian.
Hence, we need to implement the activation function also as a `torch.nn.Module`.
Here, we are treating the denominator of the Gaussian as a separate variable: `sigma2 = 2*sigma*sigma`.

Implement the activation function with learnable `sigma2` parameters.
Initialize all `sigma2` parameters with the value of 1.

In [None]:
class RBFActivation(torch.nn.Module):
  def __init__(self, R):
    # call base class constructor
    super(RBFActivation, self).__init__()
    # store a parameter for the basis functions
    self.sigma = ...
  
  def forward(self, x):
    # implement the RBF activation function
    return ...

Test 1: RBF Layer and Activation
--------------------------------

Instantiate an RBF layer and an RBF activation function for $K=4$ and $R=10$.
Generate a random batch of size $B=12$.
Call both the RBF layer and the activation on the batch.
Make sure that the resulting output is of shape $B\times K$.

In [None]:
# instantiate layer and activation
test_layer = ...
test_activation = ...

# create test data batch
test_data = ...

# forward test data through the layer and the activation
a = ...
h = ...

# check that the shape is correct
...

Task 4: Radial Basis Function Network
-------------------------------------

As the network, we rely on our convolutional network from Assignment 8.
However, this time we add an RBF layer and its activation between the first and the second fully-connected layer.
We will return both the deep features of dimension $K$ and the logits of dimension $O$.
Note that the processing will happen on batch level.

In [None]:
class RBFNetwork(torch.nn.Module):
  def __init__(self, Q1, Q2, K, R, O):
    # call base class constrcutor
    super(RBFNetwork,self).__init__()
    # convolutional define layers
    self.conv1 = ...
    self.conv2 = ...
    # pooling and activation functions will be re-used for the different stages
    self.pool = ...
    self.act = ...
    # define first fully-connected layer
    self.flatten = ...
    self.fc1 = ...
    # define RBF layer and its activation
    self.rbf_layer = ...
    self.rbf_activation = ...
    # define second fully-connected layer
    self.fc2 = ...
  
  def forward(self,x):
    # get the deep feature layer as the output of the first fully-connected layer
    deep_feature = ...
    # apply the RBF layer and activation
    ...
    # apply the last fully-connected layer to obtain the logits
    logits = ...
    # return both the logits and the deep features
    return logits, deep_feature


Task 5: Training and Validation Loop
------------------------------------

The training and validation loop is as usual.
Instantiate the network with $Q_1=32$, $Q_2=64$, $K=2$, $R=100$ and $O=10$.
Instantiate loss function and optimizer.
Train the network on the training set.
Compute the validation set accuracy after each epoch of training.

Hint: The validation set accuracy after the first epoch should be more than 80%. 
If it is much lower, increase the learning rate and/or change the optimizer.
On the other hand, should the accuracy get stuck around 10% and does not change over the epochs, reduce the learning rate.

In [None]:
network = ...
optimizer = ...
loss = ...

for epoch in range(20):
  for x,t in train_loader:
    # train network with the current batch
    ...

  # compute validation set accuracy
  correct = 0
  with torch.no_grad():
    for x,t in validation_loader:
      # compute accuracy for current batch
      ...

  print(F"Epoch {epoch+1}; test accuracy: {correct/...:1.4f}")

Task 6: Deep Feature Extraction
-------------------------------

Extract the deep feature representations from the validation set.
Separate them by target class.


In [None]:
import numpy

# extract all deep features for all validation set samples
features = [[] for _ in range(10)]

with torch.no_grad():
  for x,t in validation_loader:
    # extract deep features
    ...
    # separate the 10 different targets into separate lists
    ...

# convert features to numpy for later processing/plotting
features = [numpy.array(f) for f in features]

Task 7: Deep Feature Visualization
----------------------------------

Obtain a list of 10 distinct colors.
Plot a dot for each in the 2-D space, maybe using `pyplot.scatter`.
Plot these dots with a different color for each class.

Task 8: Basis Function Visualization
------------------------------------

Obtain the learned basis functions $\vec w_r$ and their according sizes $\sigma_r$ from the trained network.
Draw black circles centered at $\vec w_r$ and with radii corresponding to $\sigma_r$ on top of the deep feature plot.
Note that the `s=` parameter to the `scatter` function is given in pts.

Note: Since each notebook cell uses its own drawing process, we need to combine Tasks 7 and 8 here.

In [None]:
from matplotlib import pyplot

# define 10 visually distinct colors
colors = numpy.array([
    [230, 25, 75],
    [60, 180, 75],
    [255, 225, 25],
    [67, 99, 216],
    [245, 130, 49],
    [145, 30, 180],
    [70, 240, 240],
    [240, 50, 230],
    [188, 246, 12],
    [250, 190, 190],
]) / 255.


# generate 10 scatter plots, one for each label
...


# get the basis functions from the rbf layer
basis_functions = ...
# get the basis functions from the rbf activation
sigmas = ...

# plot learned centers
pyplot.scatter(..., ..., color="k", marker="o", s=..., facecolors="none")

# make the plot more beatuiful
...