Assignment 8: Open-Set Classification
=====================================


Microsoft Forms Document: https://forms.office.com/r/xY9sQDQdGh

We select the MNIST dataset and define several classes to be known, known unknown (used as negative class during training) and unknown unknown (not used for training at all).

Task 1: Target Vectors
----------------------

For our training dataset, we want to use four classes of MNIST digits (4,5,8,9) as known classes and four (0,2,3,7) as known unknowns. 
The remaining two classes shall be ignored during training and validation, amd only be used for testing purposes.

When we want to train with our adapted softmax function, we need to assign the correct target vectors for the classes. 
These are $(1,0,0,0)$, $(0,1,0,0)$, $(0,0,1,0)$ and $(0,0,0,1)$ for the known classes, respectively. 
For known unknown classes, the target vector is $\left(\frac14,\frac14,\frac14,\frac14\right)$, throughout.


In [1]:
import torch
import torchvision

# define the three types of classes
known_classes = (4, 5, 8, 9)
known_unknown_classes = (0, 2, 3, 7)
unknown_classes = (1, 6)

O = len(known_classes)
# define one-hot vectors
eye = torch.eye(O)
same = torch.tensor([1/O]*O)

def target_vector(index):
    # select correct one-hot vector for known classes, and the 1/O-vectors for unknown classes
    return eye[known_classes.index(index)] if index in known_classes else same

Test 1: Check your Target Vectors
---------------------------------

Test that your target vectors are correct, for all tpyes of known and unknown samples.

In [2]:
# check that the target vectors for known classes are correct
for index in known_classes:
    t = target_vector(index)
    print(index,t)
    assert max(t) == 1
    assert sum(t) == 1

# check that the target vectors for unknown classes are correct
for index in known_unknown_classes + unknown_classes:
    t = target_vector(index)
    print(index,t)
    assert max(t) == 0.25
    assert sum(t) == 1

4 tensor([1., 0., 0., 0.])
5 tensor([0., 1., 0., 0.])
8 tensor([0., 0., 1., 0.])
9 tensor([0., 0., 0., 1.])
0 tensor([0.2500, 0.2500, 0.2500, 0.2500])
2 tensor([0.2500, 0.2500, 0.2500, 0.2500])
3 tensor([0.2500, 0.2500, 0.2500, 0.2500])
7 tensor([0.2500, 0.2500, 0.2500, 0.2500])
1 tensor([0.2500, 0.2500, 0.2500, 0.2500])
6 tensor([0.2500, 0.2500, 0.2500, 0.2500])


Task 2 and 3: Training Dataset
------------------------
We rely on the MNIST dataset implementation from PyTorch and adapt some parts of it. 
Mainly, we will let PyTorch load the dataset by calling the base class constructor and modify the `self.data` and `self.targets` ourselves.
Additionally, we need to implement the index function to return the data and targets in the desired format.

Since Jupyter Notebook does not allow splitting classes over several code boxes, the two tasks are required to be solved in the same code box.

In [3]:
class DataSet(torchvision.datasets.MNIST):
    def __init__(self, purpose="train"):
        # call base class constructor to handle the data loading
        super(DataSet, self).__init__(
          root="./temp",
          train = purpose == "train",
          download = True
        )

        # select the valid classes based on the current purpose
        valid_classes = known_classes + (unknown_classes if purpose=="test" else known_unknown_classes)
        # select the samples that belongs to these classes
        valid_samples = sum(self.targets == v for v in valid_classes).bool()
        # sub-select the data of valid classes
        self.data = self.data[valid_samples]
        # select the targets of valid classes
        self.targets = [target_vector(t) for t in self.targets[valid_samples]]

    def __getitem__(self, index):
        # perform appropriate actions on the data and the targets
        input = self.data[index][None].float()/255
        target = self.targets[index]
        return input, target

Test 2: Data Sets
-----------------

Instantiate the training dataset.
Implement a data loader for the training dataset with a batch size of 64.
Assure that all inputs are of the desired type and shape.
Assert that the target values are in the correct format, and the sum of the target values per sample is one.

In [7]:
# instantiate the training dataset
train_set = DataSet(purpose="train")
train_loader = torch.utils.data.DataLoader(train_set, 64, shuffle=True)

# assert that we have not filtered out all samples
assert len(train_loader)

# check the batch and assert valid data and sizes
for x, t in train_loader:
    assert len(x) <= 64
    assert len(t) == len(x)
    assert torch.all(torch.sum(t, axis=1) == 1)
    assert x.shape == torch.Size([x.shape[0], 1, 28, 28])
    assert x.dtype == torch.float32
    assert torch.max(x) <= 1

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./temp/DataSet/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./temp/DataSet/raw/train-images-idx3-ubyte.gz to ./temp/DataSet/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./temp/DataSet/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./temp/DataSet/raw/train-labels-idx1-ubyte.gz to ./temp/DataSet/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./temp/DataSet/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./temp/DataSet/raw/t10k-images-idx3-ubyte.gz to ./temp/DataSet/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./temp/DataSet/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./temp/DataSet/raw/t10k-labels-idx1-ubyte.gz to ./temp/DataSet/raw



Task 4: Utility Function
------------------------

Implement a function that splits a batch of samples into known and unknown parts. For the known parts, also provide the target vectors.
How can we know which of the data samples are known smaples, and which are unknown?

In [8]:
def split_known_unknown(batch, targets):
    # select the indexes at which known and unknown samples exist
    known = torch.max(targets, dim=1)[0] == 1
    unknown = ~known
    # return the known samples, the targets of the known samples, as well as the unknown samples
    return batch[known], targets[known], batch[unknown]

Task 5: Loss Function Implementation
------------------------------------

We implement a loss function that implements an autograd function, i.e., we define both the forward and the backward pass for our loss computation.

In [9]:
class AdaptedSoftMax(torch.autograd.Function):

    # implement the forward propagation
    @staticmethod
    def forward(ctx, logits, targets):
        # compute the log probabilities via log_softmax
        log_y = torch.nn.functional.log_softmax(logits,dim=1)
        # save required values for backward pass
        ctx.save_for_backward(log_y, targets)
        # compute loss
        loss = - torch.sum(log_y * targets)
        return loss

    # implement Jacobian
    @staticmethod
    def backward(ctx, result):
        # get results stored from forward pass
        log_y, targets = ctx.saved_tensors
        # compute derivative of loss w.r.t. the logits
        dJ_dy = torch.exp(log_y) - targets
        # return the derivatives; none for derivative for the targets
        return dJ_dy, None


# DO NOT REMOVE!
# here we set the adapted softmax function to be used later
adapted_softmax = AdaptedSoftMax.apply

Task 5a: Alternative Loss Function
----------------------------------

If the implementation of the autograd function in Task 5 is too complicated, we can also rely on PyTorch to compute the gradient for us.
In this case, we only need to define the forward pass, i.e., the loss function itself.

In [10]:
def adapted_softmax_alt(logits, targets):
    # compute cross-entropy loss on top of softmax values of the logits
    loss = - torch.mean(logits * targets) + torch.mean(torch.logsumexp(logits, dim=1)) / targets.shape[1]
    return loss

Task 6: Confidence Evaluation
-----------------------------

Implement a function to compute the confidence value for a given batch of samples. Make sure to split the batch between known and unknown samples, and compute the confidence value for both separately.

In [11]:
def confidence(logits, targets):
    # comupte softmax confidences
    confidences = torch.nn.functional.softmax(logits, dim=1)
    # split between known and unknown
    known_confidences, known_targets, unknown_confidences = split_known_unknown(confidences, targets)
    # compute confidence score for known targets
    conf_known = sum(known_confidences[known_targets.bool()])
    # compute confidence score for unknown targets
    conf_unknown = torch.sum(1 - torch.max(unknown_confidences, dim=1)[0] + 1/O)
    return conf_known + conf_unknown

Test 3: Check Confidence Implementation
---------------------------------------

Test that your confidence implementation does what it is supposed to do.

In [15]:
# select good logit vectors for known and unknown classes
logits = torch.tensor([[10.,0.,0.,0.,],[-10.,0.,-10.,-10.],[0.,0.,0.,0.]])
# select the according target vectors for these classes
targets = torch.stack([target_vector(known_classes[0]), target_vector(known_classes[1]), target_vector(known_unknown_classes[0])])

# the confidence should be close to 1 for all cases
assert 3 - confidence(logits,targets) < 1e-3


Task 7: Network Definition
--------------------------

We define our own small-scale network to classify known and unknown samples for MNIST.
We basically use the same convolutional network as in Assignment 6, with some small adaptations.
However, this time we need to implement our own network model since we need to modify our network output.

In [17]:
class Network(torch.nn.Module):
    def __init__(self, Q1, Q2, K, O):
        # call base class constrcutor
        super(Network, self).__init__()
        # define convolutional layers
        self.conv1 = torch.nn.Conv2d(in_channels=1, out_channels=Q1, kernel_size=5, stride=1, padding=2)
        self.conv2 = torch.nn.Conv2d(in_channels=Q1, out_channels=Q2, kernel_size=5, stride=1, padding=2)
        # pooling and activation functions will be re-used for the different stages
        self.pool = torch.nn.MaxPool2d(kernel_size=(2,2),stride=2)
        self.act = torch.nn.ReLU()
        # define fully-connected layers
        self.flatten = torch.nn.Flatten()
        self.fc1 = torch.nn.Linear(7*7*Q2, K, bias=True)
        self.fc2 = torch.nn.Linear(K, O, bias=False)

    def forward(self, x):
        # compute first layer of convolution, pooling and activation
        a = self.act(self.pool(self.conv1(x)))
        # compute second layer of convolution, pooling and activation
        a = self.act(self.pool(self.conv2(x)))
        # get the deep features as the output of the first fully-connected layer
        deep_features = self.fc1(self.flatten(a))
        # get the logits as the output of the second fully-connected layer
        logits = self.fc2(deep_features)
        # return both the logits and the deep features
        return logits, deep_features


# run on cuda device
device = torch.device("cuda")
# create network with 20 hidden neurons in FC layer
network = Network(32,32,20,4).to(device)

Task 8: Training Loop
---------------------

Instantiate everything that you need.
Implement the training loop for 100 epochs.
Compute the running training confidence and validation confidence and print them at the end of each epoch.

In [None]:
# SGD optimizer with appropriate learning rate
optimizer = torch.optim.SGD(...)

# validation set and data loader
validation_set = DataSet("valid")
validation_loader = ...

for epoch in range(10):  # or 100
    # evaluate average confidence for training and validation set
    train_conf = validation_conf = 0.0

    for x, t in train_loader:
        # extract logits (and deep features) from network
        ...
        # compute our loss
        ...

        # perform weight update
        ...

        # compute training confidence
        train_conf += ...

    # compute validation comfidence
    with torch.no_grad():
        for x, t in validation_loader:
            # extract logits (and deep features)
            ...
            # compute validation confidence
            validation_conf += ...

    # print average confidence for training and validation
    print(
        f"\rEpoch {epoch}; train: {train_conf/len(train_set):1.5f}, val: {validation_conf/len(validation_set):1.5f}"
    )

Task 9: Feature Magnitude Plot
------------------------------

Take the validation and test sets and plot their feature magnitude as histogram, based on the pre-trained network and split between known, known unknown (validation set) and unknown unknown (test set).

In [None]:
# instantiate test set and according data loader
test_set = DataSet("test")
test_loader = ...

# collect feature magnitudes for
known, known_unknown, unknown = [], [], []

with torch.no_grad():
    # extract deep features magnitudes for validation set
    for x, t in validation_loader:
        # extract deep features (and logits)
        ...
        # compute norms
        ...
        # split between known and unknown
        ...
        # collect norms of known samples
        known.extend(...)
        # collect norms of known unknwown samples
        known_unknown.extend(...)

    for x, t in test_loader:
        # extract deep features (and logits)
        _, f = network(x.to(device))
        # compute norms
        ...
        # split between known and unknown
        ...
        # collect norms of known samples
        ...
        # collect norms of unknown unknown samples
        unknown.extend(...)


# plot the norms as histograms
from matplotlib import pyplot

pyplot.figure(figsize=(4, 2))

# keep the same maximum magnitude; I could also compute it, but I am too lazy.
max_mag = 20
# plot the three histograms
pyplot.hist(
    known,
    bins=100,
    range=(0, max_mag),
    density=True,
    color="g",
    histtype="step",
    label="Known",
)
pyplot.hist(
    known_unknown,
    bins=100,
    range=(0, max_mag),
    density=True,
    color="b",
    histtype="step",
    label="Known Unknown",
)
pyplot.hist(
    unknown,
    bins=100,
    range=(0, max_mag),
    density=True,
    color="r",
    histtype="step",
    label="Unknown Unknown",
)

# beautify plot
pyplot.legend()
pyplot.xlabel("Deep Feature Magnitude")
pyplot.ylabel("Density")

Task 10: Classification Evaluation
----------------------------------

For a fixed threshold of $\tau=0.98$, compute CCR and FPR for the test set.
A well-trained network can achieve a CCR of > 90% for an FPR < 10%.
You might need to vary the threshold.

In [None]:
tau = 0.98

# count the correctly classified and the total number of known samples
correct = known = 0
# count the incorrectly classified and the total number of unknown samples
false = unknown = 0

with torch.no_grad():
    for x, t in test_loader:
        # extract logits (and deep features)
        ...
        # compute softmax confidences
        ...
        # split between known and unknown
        ...

        # compute number of correctly classified knowns above threshold
        correct += ...
        known += ...

        # compute number of incorrectly accepted known samples
        false += ...
        unknown += ...

# print both rates
print(f"CCR: {correct} of {known} = {correct/known*100:2.2f}%")
print(f"FPR: {false} of {unknown} = {false/unknown*100:2.2f}%")