## What is log loss?


References:

* [Understanding binary cross-entropy / log loss: a visual explanation](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a) by Daniel Godoy

## Logistic regression

Multi-class logistic regression can be expressed as a shallow neural network consisting of one linear layer and a softmax activation function.

For binary classification, we can use sigmoid (a.k.a. logistic function):

$$ \sigma(x) = \frac{1}{1+\exp(-x)} $$

Softmax function transforms any vector into distribution vector (values in range (0., 1.) that sum up to 1.):
$$\text{softmax}(x_i) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$$

We use a cross-entropy loss function:
$$- \sum_j p_{j, true} \log(p_{j, pred})$$

Note that we do not state explicitly the softmax function in the model class below. For details see [torch.nn.CrossEntropyLoss](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss).

See also:

* [Cross-entropy vs. mean-squared error loss](https://www.reddit.com/r/MachineLearning/comments/8im9eb/d_crossentropy_vs_meansquared_error_loss/)
* [Cross entropy](https://pandeykartikey.github.io/machine/learning/basics/2018/05/22/cross-entropy.html) - another explanation
* [Softmax function](https://en.wikipedia.org/wiki/Softmax_function)
* [Multiclass logistic regression](https://en.wikipedia.org/wiki/Multinomial_logistic_regression)

In [2]:
from torch import tensor

For example, let's say the predictions for 'this dot is red' are:

In [3]:
preds  = tensor([0.2, 0.05, 0.1, 0.8, 0.8, 0.9])
labels = tensor([  0,    0,   1,   1,   0,   1])