## YOLOv2 Loss function
The loss function is central for a training and it took me a while to understand it properly since it is not given explicitly in the original YOLOv2 paper. The following discussion of the loss function is based on [link to Yumi's blog](https://fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html), which is in turn based on the [implementation by experiencor](https://github.com/experiencor/keras-yolo2/blob/master/Yolo%20Step-by-Step.ipynb), fixing some typos along the way.


### Total loss function
The total loss is given by

$$
\text{loss} = \sum_{i=1}^{S^2}\sum_{j=1}^B \left(\text{loss}_{i,j}^{xywh} + \text{loss}_{i,j}^p + \text{loss}_{i,j}^c\right)
$$


Here $i=1,\dots,S^2$ is the index of the gridcell and $j=1,\dots,B$ is the index of the anchor box slot. Each of the three terms in the loss function will be scaled by a hyperparameter; these hyperparameters are denoted as $\lambda_{\text{coord}}$, $\lambda_{\text{class}}$ and $\lambda_{\text{obj}}$

Let $C_{i,j}$ be the ground truth that there is an object associated with anchor box $j$ in grid cell $i$. The the total number of objects in the image is given by

$$
N_{\text{obj}} = \sum_{i=1}^{S^2}\sum_{j=1}^B C_{i,j}.
$$

### Loss due to bounding box mismatch (coordinate loss)
$$
\text{loss}_{i,j}^{xywh} = \frac{\lambda_{\text{coord}}}{N_{\text{obj}}} C_{i,j}\left[\left(x_{i,j}-\hat{x}_{i,j}\right)^2+\left(y_{i,j}-\hat{y}_{i,j}\right)^2+\left(\sqrt{w_{i,j}}-\sqrt{\hat{w}_{i,j}}\right)^2+\left(\sqrt{h_{i,j}}-\sqrt{\hat{h}_{i,j}}\right)^2\right]
$$

Here $x_{i,j}$, $y_{i,j}$, $w_{i,j}$, $h_{i,j}$ are the true coordinates of the centre and width/height of the bounding box. The corresponding predicted values $\hat{x}_{i,j}$, $\hat{y}_{i,j}$, $\hat{w}_{i,j}$, $\hat{h}_{i,j}$ are indicated with a hat. Since each term is multiplied by $C_{i,j}\in\{0,1\}$, the coordinate loss only contributes for those $i,j$ which correspond to a true bounding box.

### Classification loss

Let $p_{i,j}^{c}\in\{0,1\}$ be the ground truth probability that the object associated with $i,j$ is of class $c\in\text{classes}$. The corresponding predicted probabilities $\hat{p}_{i,j}^c$ are denoted with a hat. Then the classification loss in $i,j$ is given by the cross-entropy

$$
\text{loss}_{i,j}^{c} = -\frac{\lambda_{\text{class}}}{N_{\text{obj}}} C_{i,j} \sum_{c\in\text{classes}} p_{i,j}^c \log\left(\hat{p}_{i,j}^c\right).
$$

Again, since we multiply each term by $C_{i,j}\in\{0,1\}$, only those $i,j$ which are associated with a real object contribute.

### Confidence loss

Define

$$
C_{i,j}^{\text{noobj}} = \begin{cases}
1 & \text{if $\max_{i',j'}\left\{\text{IoU}\left(\mathcal{B}(x_{i',j'},y_{i',j'},w_{i',j'},h_{i',j'}),\mathcal{B}(\hat{x}_{i,j},\hat{y}_{i,j},\hat{w}_{i,j},\hat{h}_{i,j})\right)\right\} < 0.6$ and $C_{i,j}=0$}\\
0 & \text{otherwise}
\end{cases}.
$$

Here $\mathcal{B}(x,y,w,h)$ denotes the bounding box with centre coordinate $x,y$ and width/height $w,h$. $\text{IoU}\left(\mathcal{B}_a,\mathcal{B}_b\right)$ is the *''intersection over union''* of two bounding boxes $\mathcal{B}_a$ and $\mathcal{B}_b$.

Further, let

$$
N^{\text{conf}} = \sum_{i=1}^{S^2}\sum_{j=1}^B\left(C_{i,j}+C_{i,j}^{\text{noobj}}\right).
$$

Then the confidence loss is

$$
\begin{aligned}
\text{loss}_{i,j}^{c} &= \frac{\lambda_{\text{obj}}}{N^{\text{conf}}} C_{i,j}\left(\text{IoU}\left(\mathcal{B}(x_{i,j},y_{i,j},w_{i,j},h_{i,j}),\mathcal{B}(\hat{x}_{i,j},\hat{y}_{i,j},\hat{w}_{i,j},\hat{h}_{i,j})\right)-\hat{C}_{i,j}\right)^2\\
&\quad+\;\; \frac{\lambda_{\text{noobj}}}{N^{\text{conf}}} C_{i,j}\left(0-\hat{C}_{i,j}\right)^2
\end{aligned}
$$

where $\hat{C}_{i,j}$ is the predicted confidence of finding an object in $i,j$. Note that I think that in [Yumi's blog](https://fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html) a square is missing in the second line.


### Final comments
I believe that there is a bug in Yumi's `get_conf_mask()` function: the penultimate line should read 
```Python
conf_mask = conf_mask + true_box_conf * LAMBDA_OBJECT
```
(i.e. `true_box_conf_IOU` needs to be replaced by `true_box_conf`) to be consistent with the experiencor's implementation and the discussion in the blog.
