In [1]:
import torch

Let's first define a linear model.

In [2]:
n = 10
d = 1
model = torch.nn.Linear(n, d)

Let's create fake inputs and look at regression.

In [3]:
num_samples = 20

x = torch.randn(num_samples, n)
y = torch.randn(num_samples, d)

In [4]:
pred_y = model(x)

We need to now compare our predictions

In [5]:
loss = torch.nn.functional.mse_loss(pred_y, y)
print(loss)

tensor(1.6710, grad_fn=<MseLossBackward0>)


We can also write this ourselves

In [6]:
def mse_loss(pred_y, y):
    return torch.mean((pred_y - y) ** 2)

loss = mse_loss(pred_y, y)
print(loss)

tensor(1.6710, grad_fn=<MeanBackward0>)


For binary classification, we need binary labels.

In [11]:
y = (torch.randn(num_samples, d) > 0).float()
print(y.T) # printed as transpose so it prints better on screen.

pred_y = (model(x) > 0).float()
print(pred_y.T)

accuracy = torch.mean((pred_y == y).float())
print(accuracy)

tensor([[0., 1., 1., 0., 1., 0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., 1., 0.,
         1., 1.]])
tensor([[0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1.,
         0., 0.]])
tensor(0.5500)


The loss function that we use here is binary cross entropy loss, with logits. We want to use "with logits" because that loss function will add the log sigmoid for us, meaning that we can just take our regular linear model and directly pass its results to the loss function.

In [12]:
loss = torch.nn.functional.binary_cross_entropy_with_logits(model(x), y)

In [13]:
# loss ~ 0.7 means 50/50 chance.
# Lower than 0.7 means better than random.
# Higher than 0.7 means worse than random.
print(loss)

tensor(0.7071, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)


For multi-class classification, we need to update this since we're looking at the multiple classes.

In [16]:
n_classes = 3
model = torch.nn.Linear(n, n_classes)
# NOTE: loss function for multi-class classification will
# expect integer labels.
y = torch.randint(n_classes, (num_samples, 1)).float()
print(y.T)

tensor([[1., 1., 1., 0., 0., 1., 2., 0., 2., 2., 1., 2., 0., 0., 1., 0., 0., 0.,
         1., 0.]])


In [18]:
preds = model(x)

In [19]:
loss = torch.nn.functional.cross_entropy(preds, y.squeeze().long())

In [20]:
print(loss)

tensor(1.0877, grad_fn=<NllLossBackward0>)


NOTE: don't add the sigmoid or softmax to the "forward" method of the class. In the forward method, just do the linear step. Have the loss function take care of that instead.

This is for the following reasons:
1. Numerical instability: softmax and sigmoid functions can be numerically unstable since you can be dealing with very small numbers. Most loss functions that use these have optimized ways of dealing with this numerical instability, so it's better to have the loss function handle it.
2. Log-Sum-Exp Trick: Loss functions like cross-entropy often use the log-sum-exp trick to improve numerical stability. This trick helps in managing the large exponentials that appear in the softmax computation, making the training process more robust.
3. More efficient gradients: When the sigmoid or softmax is added in the loss function, PyTorch contains optimizations that make calculating the gradients more efficient.

It's better for us to return the raw logits in the forward step and let the optimized loss function implementations handle the numerical instability of the sigmoid and softmax functions.