## Demonstration on 6x6 images, revisited

### Challenge: sampling from non-normalized distribution?

From reaching out to Doshi-Velez's presentation, it looks like we can sample a distribution that is only provided in proportionality, as long as it has a shape, which is well-defined, that we know how to sample from, ie typically a Gaussian.  There are probably other ways of handling other distributions, but a Gaussian should be reasonably straightforward to sample from, if we can get the distribution in that form.  Let's reach right back to the first section, and see what distribution(s) we need to sample from.

Equation 22 from the Griffiths and Ghahramani tutorial states:

$$
P(z_{ik} \mid \mathbf{X}, \mathbf{Z}_{-(i,k)}, \sigma_X, \sigma_A)
\propto
p(\mathbf{X} \mid \mathbf{Z}, \sigma_X, \sigma_A)
\,
P(z_{ik} \mid \mathbf{z}_{-i,k})
$$

The first term of this equation, ie the likelihood of $\mathbf{X}$, given the latent variables, and the hyper-parameters, is a Gaussian.  For the finite model, $P(z_{ik} \mid \mathbf{Z}_{-i,k})$ is given by equation 17 in the tutorial:

$$
P(z_{ik} = 1 \mid \mathbf{z}_{-i,k})
= \frac{m_{-i,k} + \frac{\alpha}{K}}
  {N + \frac{\alpha}{K}}
$$

This seems not to be a Gaussian.  How to sample from the product of a Gaussian and this term?

### Interlude: is the product of two normalized distributions also normalized?

Brainstorming a bit, we could sample from the Gaussian, which we could normalize first, and then multiply by $p(z_{ik} = 1 \mid \mathbf{z}_{-i,k})$.  Is it fair to say that the product of two normalized probability functions will be normalized?  Probably not, eg we could have the following two distributions:

$$
f(x) = 1
\mathrm{\,when\,} x \ge 0 \mathrm{\,and\,} x \le 1 \\ 
= 0 \mathrm{\, otherwise}
$$

(which integrates to 1), and:

$$
g(x) = 1
\mathrm{\,when\,} x \ge 2 \mathrm{\,and\,} x \le 3 \\ 
= 0 \mathrm{\, otherwise}
$$

... which integrates to 1 too.  But their product integrates to 0.

### Integrate the un-normalized distribution over $z_{ik}$?

Actually, the equation for the probaiblty of $z_{ik} = 1$ is not actually a probability distribution: it's the value of this probaiblity for one specific value of $z_{ik}$, ie $1$.

Let's try integrating over $c \cdot p(\mathbf{X} \mid \mathbf{Z}, \sigma_X, \sigma_A) \cdot P(z_{ik} \mid \mathbf{z}_{-i,k})$ $z_{ik}$, using a probability distribution of $z_{ik}$, rather than just one specific value, and where $c$ is a constant of normalization, that will make the integrant integrate to $1$.

$$
\int
c
\cdot
P(\mathbf{X} \mid \mathbf{Z}, \sigma_X, \sigma_A)
\cdot
P(z_{ik})
\,
dz_{ik}
$$

And since $z_{ik}$ is discrete, ie $z_{ik} \in \{0, 1\}$, then we can rewrite the integral as a sum:

$$
=
c
\sum_{z_{ik}=0}^1
\left(
    P(\mathbf{X} \mid \mathbf{Z}, \sigma_X, \sigma_A)
    \cdot
    P(z_{ik})
\right)
$$
&nbsp;

$$
=
c
\sum_{z_{ik}=0}^1
\left(
    \mathcal{N}(\mathbf{X}; \mu_{\mathbf{Z}, \sigma_A, \sigma_X}, \Sigma_{\mathbf{Z}, \sigma_A, \sigma_X})
    \cdot
    P(z_{ik})
\right)
$$



So, it seems like maybe we can simply calculate the value of the gaussian, for $z_{ik} \in \{0, 1\}$, and multiply by $P(z_{ik} \mid \mathbf{z}_{-i,k})$, each time; and then normalize the sum of these two products?  Just to imagine this a bit, let's say we have:

In [3]:
import numpy as np

p_X_given_Z = [0.03, 0.02]  # pretend Gaussian samples, not normalized
p_zik_given_Z_minus = [0.8, 0.2]  # normalized, sum to 1.0

#Then
p_zik_given_X_Z = [0] * 2
for zik in [0, 1]:
    p_zik_given_X_Z[zik] = p_X_given_Z[zik] * p_zik_given_Z_minus[zik]

print(p_zik_given_X_Z)

# normalize
p_zik_given_X_Z /= np.sum(p_zik_given_X_Z)
print('normalized p_zik_given_X_Z', p_zik_given_X_Z)


[0.024, 0.004]
normalized p_zik_given_X_Z [ 0.85714286  0.14285714]


So, the normalized values, with this toy data, are influenced by both the likelihood, and by the prior.

Let's run with this.