<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Sigmoïde" data-toc-modified-id="Sigmoïde-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Sigmoïde</a></span></li><li><span><a href="#Softmax" data-toc-modified-id="Softmax-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Softmax</a></span></li><li><span><a href="#Binary-Cross-Entropy" data-toc-modified-id="Binary-Cross-Entropy-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Binary Cross Entropy</a></span></li><li><span><a href="#Cross-Entropy" data-toc-modified-id="Cross-Entropy-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Cross Entropy</a></span></li><li><span><a href="#Shape" data-toc-modified-id="Shape-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Shape</a></span></li></ul></div>

|Problem type|Last-layer activation|Loss function|Example|
|------------|---------------------|-------------|-------|
|Binary classification|sigmoid|binary cross entropy|Dog vs cat, Sentiment analysis(pos/neg)|
|Multi-class, single-label classification|softmax|cross entropy|MNIST has 10 classes single label (one prediction is one digit)|
|Multi-class, multi-label classification|sigmoid|binary cross entropy|News tags classification, one blog can have multiple tags
|Regression to arbitrary values|None|mse|Predict house price(an integer/float point)|
|Regression to values between 0 and 1|sigmoid|mse or binary crossentropy|Engine health assessment where 0 is broken, 1 is new|

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import numpy as np

## Sigmoïde

$$\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}$$

In [None]:
x = np.arange(-10,10,0.01,dtype='f')
plt.plot(x,torch.sigmoid(torch.from_numpy(x)).numpy())
plt.grid()

## Softmax

$$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$$

In [None]:
input = torch.randn(2, 3)
input

In [None]:
out1 = nn.Softmax(dim=0)(input)
print(out1)
print(out1.sum(0))

In [None]:
out2 = nn.Softmax(dim=1)(input)
print(out2)
print(out2.sum(1))

## Binary Cross Entropy

`torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')`

Creates a criterion that measures the Binary Cross Entropy
between the target and the output:

The loss can be described as:

$$
        \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
$$

$$
        l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right],
$$

where $N$ is the batch size. If reduce is ``True``, then

$$
        \ell(x, y) = \begin{cases}
            \operatorname{mean}(L), & \text{if}\; \text{size_average} = \text{True},\\
            \operatorname{sum}(L),  & \text{if}\; \text{size_average} = \text{False}.
        \end{cases}
$$

This is used for measuring the error of a reconstruction in for example
an auto-encoder. Note that the targets `y` should be numbers
between 0 and 1.

In [None]:
m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
input

In [None]:
target = torch.empty(3).random_(2)
target

In [None]:
output = loss(m(input), target)
output

In [None]:
output.backward()
-input.grad

## Cross Entropy

`CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')`

his criterion combines `nn.LogSoftmax` and `nn.NLLLoss` in one single class.

It is useful when training a classification problem with `C` classes.
If provided, the optional argument `weight` should be a 1D `Tensor`
assigning weight to each of the classes.
This is particularly useful when you have an unbalanced training set.

The `input` is expected to contain scores for each class.

`input` has to be a Tensor of size either $(minibatch, C)$ or
$(minibatch, C, d_1, d_2, ..., d_K)$
with $K \geq 2$ for the `K`-dimensional case (described later).

This criterion expects a class index (0 to `C-1`) as the
`target` for each value of a 1D tensor of size `minibatch`

The loss can be described as:

$$
        \text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right)
                       = -x[class] + \log\left(\sum_j \exp(x[j])\right)
$$

or in the case of the `weight` argument being specified:

$$
\text{loss}(x, class) = weight[class] \left(-x[class] + \log\left(\sum_j \exp(x[j])\right)\right)
$$

The losses are averaged across observations for each minibatch.

Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size $(minibatch, C, d_1, d_2, ..., d_K)$ with $K \geq 2$,
where $K$ is the number of dimensions, and a target of appropriate shape
(see below).

## Shape

- Input: $(N, C)$ where `C = number of classes`, or $(N, C, d_1, d_2, ..., d_K)$ with $K \geq 2$ in the case of `K`-dimensional loss.
- Target: $(N)$ where each value is $0 \leq \text{targets}[i] \leq C-1$, or $(N, d_1, d_2, ..., d_K)$ with $K \geq 2$ in the case of K-dimensional loss.
- Output: scalar. If reduce is ``False``, then the same size as the target: $(N)$, or $(N, d_1, d_2, ..., d_K)$ with $K \geq 2$ in the case of K-dimensional loss.

In [None]:
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
input

In [None]:
target = torch.empty(3, dtype=torch.long).random_(5)
target

In [None]:
output = loss(input, target)
print(output)
output.backward()
print(-input.grad)