# Comparing alternative losses for classification

Classification networks commonly estimates a vector of _logits_ for a given input.
The logits are then squashed with a soft-max function to get a proper probability vector.
The probability vector is compared to the true class label (represented as a one-hot vector)
through an NLL-loss.

The result are over-confident networks, which are really only uncertain at class boundaries.
Testing on _out of distribution_ data tend to give extreme logits which saturate the soft-max function,
giving extremely confident predictions.

An alternative approach is to represent the classes with arbitrary directions in some high-dimensional space.
Now, the network is trained to match input to the correct direction.
The rationale for this representation is:

 - With $K << D$, where $K$ is the number of classes and $D$ the dimension of the directions space,
 then the class directions are likely to be more or less orthogonal.
 - With an OOD sample, some element(s) in the predicted direction vector will be far off and so be orthogonal to _all_ classes making it easy to classify as OOD.

In [None]:
"""Training on MNIST"""

# Jupyter notebook meta-settings to always reload our python code
# rather than only the first time this cell is run
%load_ext autoreload
%autoreload 2

# Hack to get the notebook to operate from the repo root
import os
from pathlib import Path
while not (Path.cwd() / ".git").exists():
    os.chdir(Path.cwd().parent)

# Actual dependencies
import logging
import torch
import torch.optim as optim
from src.train import train, TrainSettings, TRAIN_KEY, VAL_KEY
import src.models.prob_vec_cnn as cnn
from src.loss.nll import Loss as NllLoss
from src.dataloaders import mnist
import src.utils.pytorch as torch_utils
import src.utils.log as log_utils


experiment_name = "classical"
log_file = Path.cwd() / "logs" / log_utils.log_name(experiment_name)
log_utils.setup_logger(log_file, logging.INFO)
store_model_dir = Path.cwd() / "stored_models"

num_epochs = 1
batch_size_train = 64
batch_size_test = 1000
learning_rate = 0.01
momentum = 0.5

random_seed = 1
torch.manual_seed(random_seed)
device = torch_utils.cuda_settings(use_gpu=True)

model = cnn.Net()
loss_function = NllLoss()
optimizer = optim.SGD(model.parameters(),
                      lr=learning_rate,
                      momentum=momentum)

datasets = {TRAIN_KEY: mnist.trainloader(batch_size_train), VAL_KEY: mnist.testloader(batch_size_test)}

train_settings = TrainSettings(log_interval=50, num_epochs=num_epochs, device=device)
            
model, store_loss = train(model, datasets, loss_function,
      optimizer, train_settings)

torch_utils.save_model(model, store_model_dir, experiment_name)

In [None]:
import matplotlib.pyplot as plt

plt.plot(store_loss[TRAIN_KEY])

# Angle representation

It is conceptually easy to see the promise of the angle representation but how do we express it in a loss function?

### Definitions:

- $x$, input
- $\hat{y}(x) \in \mathbb{R}^{D}$, direction estimated by the network.
- $\mathcal{Y}_{ref} = \{y_k\}_{k=1}^K$ a set of reference directions where $y_k \in \mathbb{R}^D,\ \forall k$.
- $y(x) \in \mathcal{Y}_{ref} $, true direction (arbitrarily indexing the correct class)
- $\alpha_k$, angle between $\hat{y}(x)$ and $y_k$

### Possible losses:

- Minising/maximising actual angle to correct/incorrect (Orthogonal is maximal)
- $1 / cos(\alpha)$
- ... ?

In [None]:
import src.models.dir_cnn as cnn
from src.loss.angle_loss import Loss as AngleLoss

experiment_name = "angle"
num_epochs = 3
batch_size_train = 64
batch_size_test = 1000
learning_rate = 0.01
momentum = 0.5

random_seed = 1
torch.backends.cudnn.enabled = False
torch.manual_seed(random_seed)

model = cnn.Net()
loss_function = AngleLoss()
optimizer = optim.SGD(model.parameters(),
                      lr=learning_rate,
                      momentum=momentum)

datasets = {TRAIN_KEY: mnist.trainloader(batch_size_train), VAL_KEY: mnist.testloader(batch_size_test)}

train_settings = TrainSettings(log_interval=50, num_epochs=num_epochs)
            
model = train(model, datasets, loss_function,
      optimizer, train_settings)

torch.save(model.state_dict(), Path.cwd() / "results/{}.pth".format(experiment_name))
torch.save(optimizer.state_dict(), Path.cwd() / "results/{}_optimizer.pth".format(experiment_name))