# Assigning Probabilities by Feature

<br>

Suppose we have $\mathbf{x}\in\mathbb{R}^m,\ \boldsymbol{\mu}\in\mathbb{R}^m,\text{ and }\boldsymbol{\Sigma}\in\mathbb{R}^{m,m}$ such that $\boldsymbol{\Sigma}^\text{T}=\boldsymbol{\Sigma}$ and the eigen-values of $\boldsymbol{\Sigma}$ are all positive.


Recall that the negative log pdf of a multivariate normal to be

$$\begin{align*}
    - \log \text{pdf}(\mathbf{x}\ |\ \boldsymbol{\mu},\ \boldsymbol{\Sigma}) &= \frac{m}{2} \log 2 \pi + \frac{1}{2}\log |\boldsymbol{\Sigma}| + \frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\text{T}\boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu}),\qquad & \text{let } \mathbf{U}\mathbf{S}\mathbf{U}^\text{T} = \boldsymbol{\Sigma},\\
    &= \frac{m}{2} \log 2 \pi + \frac{1}{2} \log |\mathbf{U}\mathbf{S}\mathbf{U}^\text{T}| + \frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\text{T}\mathbf{U}\mathbf{S}^{-1}\mathbf{U}^\text{T}(\mathbf{x} - \boldsymbol{\mu}),\qquad & \text{recall that } |\mathbf{AB}| = |\mathbf{A}| \cdot |\mathbf{B}| \text{ and } |\mathbf{A}| = 1 \text{ if } \mathbf{A} \text{ is orthonormal},\\
    &= \frac{1}{2} \log |2 \pi \mathbf{I}| + \frac{1}{2} \log |\mathbf{S}| + \frac{1}{2}\mathbf{v}^\text{T}\mathbf{S}^{-1}\mathbf{v}, & \text{let } \mathbf{v} = \mathbf{U}^\text{T}(\mathbf{x} - \boldsymbol{\mu}), \\
    &= \frac{1}{2} \log |2 \pi \mathbf{S}| + \frac{1}{2}\mathbf{v}^\text{T}\mathbf{S}^{-1}\mathbf{v},\\
    &= \frac{1}{2} \sum_j^m \log 2 \pi s_{jj} + \frac{1}{2}\text{Tr}[\mathbf{vv}^\text{T}\mathbf{S}^{-1}], \qquad & \text{let } \mathbf{K} = \mathbf{vv}^\text{T}\mathbf{S}^{-1},\\
    &= \frac{1}{2} \sum_j^m \log 2 \pi s_{jj} + k_{jj},\\
    &\therefore \frac{1}{2} (\log 2 \pi s_{jj} + k_{jj}), & \text{ is the } \log \text{ marginalised probability contribution from the } j\text{-th feature value with respect to all feature values.}
\end{align*}
$$

In [1]:
from   scipy.stats import multivariate_normal
import numpy as np

# set seed for reproducibility
np.random.seed(0)

# no. of features
m              = 3

# mean and cov
mean           = np.random.normal(size = m)
temp           = np.random.normal(size = (m ,m))
cov            = temp.T @ temp                                     # ensures SPD

# define most likely Gaussian distribution that explains the data
dist           = multivariate_normal(mean, cov)

# random new sample
x              = np.random.normal(size = m)

# eigenvalues and eigenvectors (orthonormal)
s, U           = np.linalg.eigh(cov)

# helper variables
v              = U.T @ (x - mean)
S_inv          = np.diag(1 / s)

# negative log likelihood
nll            = np.log(2 * np.pi * s).sum() + v.T @ S_inv @ v
nll           /= 2

# similar value to scipy's computation
print(nll, -dist.logpdf(x))

4.210372454378047 4.210372454378047


In [2]:
# computation for final line in above equation
constant       = np.log(2 * np.pi * s) / 2
quad           = np.diag(np.outer(v, v) @ S_inv) / 2

nll_by_feature = constant + quad

print(nll_by_feature)

nll_by_feature.sum()

[0.61349238 1.50395575 2.09292433]


4.210372454378047

The 3rd value is the least likely so lets inspect the difference between the sample and the mean.

In [3]:
x - mean

array([-1.00301462, -0.27848219, -0.53487475])

We can see that $\mathbf{x}$ is less than $\boldsymbol{\mu}$ and that the $x_1$ value is actually the furthest away from the associated $\mu_1$ value. Lets inspect $\boldsymbol{\Sigma}$.

In [4]:
cov

array([[ 6.09286146,  4.10033934, -1.69091987],
       [ 4.10033934,  3.5314304 , -1.60002145],
       [-1.69091987, -1.60002145,  3.08063762]])

Upon examining the values, we can see from $\boldsymbol{\Sigma}$ that $(x_1 - \mu_1)$ and $(x_2 - \mu_2)$ should have the same sign, whereas $(x_3 - \mu_3)$ should have the opposite sign.