In [1]:
import numpy as np
import torch

In [2]:
dist = torch.distributions.Normal(0, torch.sqrt(torch.Tensor([0.1])))
print("NegLL", -dist.log_prob(0))
print("Likelihood", torch.exp(dist.log_prob(0)))

NegLL tensor([-0.2324])
Likelihood tensor([1.2616])


In [3]:
dist = torch.distributions.MultivariateNormal(torch.Tensor([0, 0]), torch.Tensor([[0.1, 0], [0, 0.1]]))
print("NegLL", -dist.log_prob(torch.Tensor([0, 0])))
print("Likelihood", torch.exp(dist.log_prob(torch.Tensor([0, 0]))))
print("Normalized NegLL", -dist.log_prob(torch.Tensor([0, 0])) / 2)
print("Normalized Likelihood", torch.exp(dist.log_prob(torch.Tensor([0, 0])) / 2))

NegLL tensor(-0.4647)
Likelihood tensor(1.5915)
Normalized NegLL tensor(-0.2324)
Normalized Likelihood tensor(1.2616)


## 1D Normal Log-Likelihood

For the log-likelihood for a single sample in a Normal distribution we have: </br>
$ ll = -0.5 log(2\pi) - 0.5 log(\sigma^2) - \frac{(x - \mu)^2}{2\sigma^2}$ </br>
When we treat a multidimensional sample as a Normal distribution, we calculate the individual log-likelihood for d samples and take the mean, which gives us </br>
$ LL = \frac{1}{d} \sum^d_{i=0} -0.5 log(2\pi) - 0.5 log(\sigma_i^2) - \frac{(x_i - \mu_i)^2}{2\sigma_i^2}$

## Multivariate Normal Log-Likelihood

For the log-likelihood of a single observation with d observed dimensions we get: </br>
$LL = -0.5*d*log(2\pi) - 0.5log(|\sum|) - 0.5MD(x, \mu, \sum)^2$ </br>
where MD denotes the Mahalanobis distance.
Now, if we assume diagonal covariance, we see that the determinant of the covariance matrix is simply a summation over the logged diagonal elements, which gives </br>
$LL = -0.5\sum^d*log(2\pi) - 0.5\sum^dlog(\sum_{i,i}) - 0.5 MD(x, \mu, \sum)^2$ </br>
Again, if we assume diagonal covariance, the Mahalanobis distance reduces to a standardized euclidian distance: </br>
$LL = -0.5\sum^d*log(2\pi) - 0.5\sum^dlog(\sum_{i,i}) - (\sum_{i=0}^d \sqrt(\frac{(x_i - \mu_i)}{2\sigma_i^2}))^2$ </br>
$LL = -0.5\sum^d*log(2\pi) - 0.5\sum^dlog(\sum_{i,i}) - \sum_{i=0}^d \frac{(x_i - \mu_i)}{2\sigma_i^2}$ </br>
We can pull out the sum, get </br>
$LL = \sum_{i=0}^d -0.5log(2\pi) - 0.5log(\sum_{i,i}) - \frac{(x_i - \mu_i)}{2\sum_{i,i}^2}$ </br>
and see that we indeed have to normalize by $\frac{1}{d}$ to have comparable metrics.