In the context of machine learning and particularly in classification problems, "logits" usually refer to the vector of raw (non-normalized) predictions that a classification model generates, which are then typically passed to a normalization function. If the normalization function is the softmax function, the model is said to be predicting the "logits".

For example, suppose we have a model for classifying images into 3 categories. The output layer of this model might be a fully connected layer with 3 nodes, one for each category. The values produced by this layer, before any kind of normalization like softmax, are the logits.

The term "logits" actually comes from the log-odds, in the context of logistic regression. In logistic regression, the logit function, which is the inverse of the logistic sigmoid function, takes a probability value between 0 and 1 and transforms it into a value between negative infinity and positive infinity.

However, in modern usage in the context of deep learning, "logits" often simply refer to the output of the last layer of a network before the application of an activation function.

Here's a basic illustration using Python and PyTorch:

In [1]:
import torch
import torch.nn as nn

# suppose we have a model with 3 output classes
model = nn.Linear(10, 3)

# suppose we have some input vector x
x = torch.rand(10)

# we can compute the logits as follows
logits = model(x)

print(logits)  # these are the logits


tensor([ 0.0524,  0.1751, -0.3278], grad_fn=<AddBackward0>)


In [2]:
model

Linear(in_features=10, out_features=3, bias=True)

In [3]:
x

tensor([0.8137, 0.2957, 0.8366, 0.4001, 0.3475, 0.3923, 0.0906, 0.8936, 0.8353,
        0.3907])

In [4]:
probabilities = torch.nn.functional.softmax(logits, dim=0)

print(probabilities)  # these are the probabilities associated with each class


tensor([0.3553, 0.4017, 0.2429], grad_fn=<SoftmaxBackward0>)
