### Loss Functions

Loss functions define how neural network models calculate the overall error from their residuals for each training batch. This in turn affects how they adjust their coefficients when performing backpropagation, so the choice of loss function has a direct influence on model performance.

### Focal Loss
This loss is an improvement to the standard cross-entropy criterion. This is done by changing its shape such that the loss assigned to well-classified examples is down-weighted. Ultimately, this ensures that there is no class imbalance. In this loss function, the cross-entropy loss is scaled with the scaling factors decaying at zero as the confidence in the correct classes increases. The scaling factor automatically down weights the contribution of easy examples at training time and focuses on the hard ones.

<img src="https://i1.wp.com/neptune.ai/wp-content/uploads/focal-loss.png?ssl=1">

### Dice Loss

This loss is obtained by calculating smooth dice coefficient function. This loss is the most commonly used loss is segmentation problems.  

<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/a80a97215e1afc0b222e604af1b2099dc9363d3b">

#### 1 - DSC

### BCE- Dice Loss 

<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/80f87a71d3a616a0939f5360cec24d702d2593a2">

### Intersection over Union (IoU)-balanced Loss
The IoU-balanced classification loss aims at increasing the gradient of samples with high IoU and decreasing the gradient of samples with low IoU. In this way, the localization accuracy of machine learning models is increased.

<img src="https://i2.wp.com/neptune.ai/wp-content/uploads/IoU-balanced-Loss.png?ssl=1">


### Weighted cross-entropy
In one variant of cross-entropy, all positive examples are weighted by a certain coefficient. It is used in scenarios that involve class imbalance.

<img src="https://i2.wp.com/neptune.ai/wp-content/uploads/Weighted-cross-entropy.png?ssl=1">

### Lovász-Softmax loss
This loss performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses.

<img src="https://i0.wp.com/neptune.ai/wp-content/uploads/Lov%C3%A1sz-Softmax-loss.png?ssl=1">

<a href="https://neptune.ai/blog/image-segmentation-in-2020"> Reference 1</a> <br>
<a href="https://www.kaggle.com/bigironsphere/loss-function-library-keras-pytorch">Reference 2</a><br>
<a href="https://www.jeremyjordan.me/semantic-segmentation/">Reference 3</a><br>

# Metrics

### Here we go
A school is running a machine learning primary diabetes scan on all of its students.
The output is either diabetic (+ve) or healthy (-ve).
There are only 4 cases any student X could end up with.<br>
We’ll be using the following as a reference later, So don’t hesitate to re-read it if you get confused. <br>
> True positive (TP): Prediction is +ve and X is diabetic, we want that <br>
> True negative (TN): Prediction is -ve and X is healthy, we want that too <br>
> False positive (FP): Prediction is +ve and X is healthy, false alarm, bad <br>
> False negative (FN): Prediction is -ve and X is diabetic, the worst<br>

To remember that, there are 2 tricks
- If it starts with True then the prediction was correct whether diabetic or not, so true positive is a diabetic person correctly predicted & a true negative is a healthy person correctly predicted.
Oppositely, if it starts with False then the prediction was incorrect, so false positive is a healthy person incorrectly predicted as diabetic(+) & a false negative is a diabetic person incorrectly predicted as healthy(-). <br>

- Positive or negative indicates the output of our program. While true or false judges this output whether correct or incorrect.
Before I continue, true positives & true negatives are always good. we love the news the word true brings. Which leaves false positives and false negatives.<br>

In our example, false positives are just a false alarm. In a 2nd more detailed scan it’ll be corrected. But a false negative label, this means that they think they’re healthy when they’re not, which is — in our problem — the worst case of the 4.<br>

Whether FP & FN are equally bad or if one of them is worse than the other depends on your problem. This piece of information has a great impact on your choice of the performance metric, So give it a thought before you continue.<br>

### Which performance metric to choose?

##### Accuracy
It’s the ratio of the correctly labeled subjects to the whole pool of subjects. <br>
Accuracy is the most intuitive one. <br>
Accuracy answers the following question: <i> <b> How many students did we correctly label out of all the students? </b></b><br>
<b> Accuracy = (TP+TN)/(TP+FP+FN+TN) </b><br>
numerator: all correctly labeled subject (All trues) <br>
denominator: all subjects <br>
<br>
##### Precision
Precision is the ratio of the correctly +ve labeled by our program to all +ve labeled. <br>
Precision answers the following:<i><b> How many of those who we labeled as diabetic are actually diabetic? </b></i><br>

<b>Precision = TP/(TP+FP)</b><br>

numerator: +ve labeled diabetic people.<br>
denominator: all +ve labeled by our program (whether they’re diabetic or not in reality). <br>
<br>

##### Recall (aka Sensitivity)
Recall is the ratio of the correctly +ve labeled by our program to all who are diabetic in reality. <br>
Recall answers the following question: <i><b>Of all the people who are diabetic, how many of those we correctly predict?</b>></i><br>
<b>Recall = TP/(TP+FN)</b><br>
numerator: +ve labeled diabetic people.<br>
denominator: all people who are diabetic (whether detected by our program or not)<br><br>

##### F1-score (aka F-Score / F-Measure)
F1 Score considers both precision and recall. <br>
It is the harmonic mean(average) of the precision and recall. <br>
F1 Score is best if there is some sort of balance between precision (p) & recall (r) in the system. Oppositely F1 Score isn’t so high if one measure is improved at the expense of the other. <br>

For example, if P is 1 & R is 0, F1 score is 0.
<b> F1 Score = 2*(Recall * Precision) / (Recall + Precision) </b><br><br>

###### Specificity
Specificity is the correctly -ve labeled by the program to all who are healthy in reality.
Specifity answers the following question: <i><b>Of all the people who are healthy, how many of those did we correctly predict?</b></i><br>

<b>Specificity = TN/(TN+FP)</b><br><br>
numerator: -ve labeled healthy people.<br>
denominator: all people who are healthy in reality (whether +ve or -ve labeled)<br>

### General Notes
<b>Yes, accuracy is a great measure but only when you have symmetric datasets (false negatives & false positives counts are close), also, false negatives & false positives have similar costs. </b> <br>

If the cost of false positives and false negatives are different then F1 is your savior. <b>F1 is best if you have an uneven class distribution.</b><br>

Precision is how sure you are of your true positives whilst recall is how sure you are that you are not missing any positives.<br>

<b>Choose Recall if the idea of false positives is far better than false negatives, in other words, if the occurrence of false negatives is unaccepted/intolerable</b>, that you’d rather get some extra false positives(false alarms) over saving some false negatives, like in our diabetes example. <br>
You’d rather get some healthy people labeled diabetic over leaving a diabetic person labeled healthy.<br>

<b>Choose precision if you want to be more confident of your true positives.</b> for example, Spam emails. You’d rather have some spam emails in your inbox rather than some regular emails in your spam box. So, the email company wants to be extra sure that email Y is spam before they put it in the spam box and you never get to see it. <br>

<b>Choose Specificity if you want to cover all true negatives,</b> meaning you don’t want any false alarms, you don’t want any false positives. for example, you’re running a drug test in which all people who test positive will immediately go to jail, you don’t want anyone drug-free going to jail. False positives here are intolerable.<br>


<a href="https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-to-optimize-on-867d3f11124">Reference</a>

### AUC- ROC

<a href="https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5">Reference</a>