#### Paper - On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks
- Thulasidasan et al
- NeurIPS 2019


_Mixup_ Training

$$
\tilde{x} = \lambda x_{i} + (1-\lambda)x_{j}\newline
\tilde{y} = \lambda y_{i} + (1-\lambda)y_{j}\newline
\text{Where:} \newline
\lambda \in [0,1] \sim Beta(\alpha,\alpha)
$$



#### Paper - On Calibration of Modern Neural Networks
- Guo et al
- ICML 2017

$$
X \in \mathcal{X} \hspace{5mm} (\text{inputs}) \newline
Y \in \mathcal{Y} = \{1, .., K\} \hspace{5mm} (\text{labels})\newline
h(x) = (\hat{Y}, \hat{P})   \hspace{5mm} (\hat{Y} = \text{predictions}, \hat{Y} = \text{confidences})\newline
\mathbb{P}(\hat{Y} = Y | \hat{P}=p)=p, \; \forall p \in [0,1] \hspace{5mm} (\text{Perfect calibration})\newline
$$  


**ECE**
$$
ECE = \sum_{m=1}^{M}\frac{|B_{m}|}{n}|acc(B_{m})-conf(B_{m})| \newline
$$

**MCE**
$$
MCE = \max_{m\in\{1,...,M\}} |acc(B_{m})-conf(B_{m})| \newline
$$

**Platt Scaling**
$$
P(y=1|f)=\frac{1}{1+e^{Af+B}}
$$

**Isotonic Regression**
$$
y_{i}=m(f_{i}) + \epsilon_{i}
$$

$$
\hat{m}=argmin_{z}\sum(y_{i}-z(f_{i}))^2
$$

**Temperature Scaling**

$$
\hat{q_{i}} = \max_{k} \sigma_{SM} (z_{i}/T)^{k}
$$


#### Paper - Obtaining Well Calibrated Probabilities Using Bayesian Binning
- Naeini et al
- AAAI 2015
- Bayesian Binning into Quantiles (BBQ)
- post processes the output of a binary classification algorithm
- In all these the post-processing step can be seen as a function that maps the output of a prediction model to probabilities that are intended to be well-calibrated
- BBQ extends the simple histogram-binning calibration method by considering multiple binning models and their combination
- Distribute the data-points in the training set equally across all bins


Score
$$
Score(M) = P(M).P(\mathcal{D}|M) \newline
\text{Where:} \newline
P(\mathcal{D}|M) = \prod_{b=1}^B \frac{\Gamma(\frac{N'}{B})}{\Gamma(N_{b}+\frac{N'}{B})}
\frac{\Gamma(m_{b}+\alpha_{b})}{\Gamma(\alpha_{b})}
\frac{\Gamma(n_{b}+\beta_{b})}{\Gamma(\beta_{b})}\newline
P(M) = \text{Prior = Uniform Distribution}
$$

Calibrated Predictions

$$
P(z=1|y) = \sum_{i=1}^{T} \frac{Score(M_{i})}{\sum_{j=1}^{T}Score(M_{j})}P(z=1|y, M_{i})
$$

#### Paper - Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings
- Kumar et al
- ICML 2018
- a practical and principled fix by minimizing calibration error during training along with classification error
- Unfortunately on high capacity neural networks, NLL fails to minimize calibration error because of over-fitting


Trainable function that encompases Accuracy and Calibration error


$$
\min_{\theta} \; NLL(D,\theta) + \alpha \; CE(D,\theta)
$$



$$
MMCE_{m}(D) = \left\| \sum_{(r_{i},c_{i})\in D} \frac{(c_{i}-r_{i})\; \phi(r_{i})}{m} \right\|_\mathcal{H}
$$


#### Paper - Regularizing Neural Networks by Penalizing Confident Output Distributions
- Pereyra et al (w/ Hinton and Lukasz Kaiser - Transformer)
- ICLR 2017

- Label smoothing + Confidence penalization


- probabilities assigned to class labels that are incorrect (according to the training data) are part of the knowledge of the network


$$
\mathcal{L}(\theta) = - \sum \log p_{\theta}(y|x) - \beta \; H(p_{\theta}(y|x))
$$

where
$$
H(p_{\theta} (y|x)) = - \sum_{i} p_{\theta}(y_{i}|x) \; \log(p_{\theta}(y_{i}|x))
$$

#### Paper - Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration
- Kull et al
- NeurIPS 2019



**Claswise-ECE**
$$
classwise-ECE = \frac{1}{k} \sum_{j=1}^{k} \sum_{i=1}^{m}\frac{|B_{i,j}|}{n}|acc(B_{i,j})-conf(B_{i,j})| \newline
$$

#### Paper - Brier Score


Brier Score

$$
BS = \frac{1}{n_{samples}}\sum_{i=0}^{n_{samples}-1}(y_{i}-p_{i})^2
$$

#### Paper - Calibrating Deep Neural Networks using Focal Loss
- Mukhoti et al
- NeurIPS 2020

$$
\mathcal{L}_{f}=-(1-\hat{p}_{i,y_{i}})^\gamma \log \hat{p}_{i,y_{i}}
$$

$$
\mathcal{L}_{f} \geq KL(q||\hat{p}) - \gamma \mathbb{H}[\hat{p}]
$$


In [7]:
import numpy as np
from sklearn.metrics import brier_score_loss
from sklearn.metrics import mean_squared_error

In [8]:
y_true = np.array([0, 1, 1, 0])
y_true_categorical = np.array(["spam", "ham", "ham", "spam"])
y_prob = np.array([0.1, 0.9, 0.8, 0.3])

In [10]:
mean_squared_error(y_true, y_prob)

0.03749999999999999

In [9]:
brier_score_loss(y_true, y_prob)

0.03749999999999999

In [5]:
brier_score_loss(y_true, 1-y_prob, pos_label=0)

0.0375

In [6]:
brier_score_loss(y_true, 1-y_prob)

0.6875