The term **logits** comes from **log-odds**, which is related to how logits are used in machine learning, particularly in classification tasks.

### Breaking Down the Name "Logits":

1. **Log-Odds**:
   - The **odds** of an event happening are the ratio of the probability of the event happening to the probability of it not happening.
     $$
     \text{Odds}(p) = \frac{p}{1 - p}
     $$
   - The **log-odds** (or **logit**) is simply the logarithm of the odds:
     $$
     \text{log-odds}(p) = \log\left(\frac{p}{1 - p}\right)
     $$
   - This transforms a probability (which is between 0 and 1) into a value that can range from $-\infty$ to $+\infty$.

2. **Logits in Machine Learning**:
   - In the context of machine learning, **logits** refer to the unnormalized scores that are often passed to a softmax function for multi-class classification. These logits represent the model's confidence in each class before being converted into probabilities.
   - The idea is that logits can be interpreted similarly to **log-odds**, though they aren’t exactly log-odds in most cases. They are raw scores that, when passed through softmax, give the equivalent of probabilities for each class.
   - In binary classification using logistic regression, the raw output is indeed the log-odds (hence the term **logit**), which is then converted to a probability using the **sigmoid** function. For multi-class classification, the softmax function generalizes this concept.

3. **Logistic Function**:
   - The **logistic function** (used in logistic regression) converts log-odds into probabilities:
     $$
     \text{sigmoid}(z) = \frac{1}{1 + e^{-z}}
     $$
     This function maps a logit (which can be any real number) into a probability between 0 and 1.
   - In **binary classification**, the output of a model is often interpreted as log-odds, and the logistic function converts those into probabilities, making the term **logit** a natural name.

### Why "Logits" in Modern Deep Learning:
- While in deep learning (especially with softmax), logits are not directly the log-odds, the term is still used to refer to the **raw, unnormalized scores** that precede the probability computation.
- The name **logits** stuck because it captures the idea that these are intermediate values that can be transformed into probabilities through a suitable function (like softmax or sigmoid).

### Summary:
The name **logits** comes from the concept of **log-odds** in statistics. Even though modern logits are not always log-odds, the term is used to describe the unnormalized scores output by models before they are transformed into probabilities using functions like softmax or sigmoid.


To explain how torch.nn.functional.nll_loss works, let's walk through a simple example with some sample data.

Key Points about nll_loss:
- NLL Loss (Negative Log Likelihood Loss) is often used in classification problems where the model outputs log probabilities (often the output from log_softmax).
- The function compares the log-probabilities from the model to the actual target labels.
- This loss function expects the input to be log probabilities and the target to be class indices.