# Understanding Importance Sampling

In [1]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# np.random.choice(np.arange(self.size), size=self.batch_size, replace=False, p=fair_dice)

Modeling a fair and biased dice as $f(x)$ and $g(x)$, respectively.

In [25]:
def f(x):
    fair_dice = np.ones(6)*(1/6)
    return fair_dice[x]

In [28]:
def g(x):
    biased_dice = np.array([0.3,0.25,0.2,0.125,0.075,0.05])
    return biased_dice[x]

We can get their expectation using the population mean.

$$\mathbb{E}_f[x]=\sum_{x}{xf(x)}$$

and

$$\mathbb{E}_g[x]=\sum_{x}{xg(x)}$$

In [32]:
exp_f = 0
for i in range(len(fair_dice)):
    exp_f += (i+1)*f(i)
print(exp_f)

3.5


In [34]:
exp_g = 0
for i in range(len(unfair_dice)):
    exp_g += (i+1)*g(i)
print(exp_g)

2.575


What if we didn't have the distribution though? How would we then calculate the mean of the dice?

We could estimate it using the sample mean, $\bar{x}$, by sampling the unknown probability distribution.

$$\bar{x} = \dfrac{1}{n}\sum_{i=1}^{n}x_{i}^{f}$$

Suppose we only had access to the fair die but not the biased die and we wanted to work out the mean of the biased one. Could we still nontheless come up with a decent approximation of the mean of the biased die?

$$\mathbb{E}_{g}[x]=\sum_{x}xg(x)=\sum_{x}x\dfrac{g(x)}{f(x)}f(x)=\mathbb{E}_{f}\left[x\dfrac{g(x)}{f(x)}\right]=\dfrac{1}{n}\sum_{i=1}^{n}x_{i}\dfrac{g(x_i)}{f(x_i)}$$

So the idea is that because we're taking the expectation with respect to the fair die. We can just throw the fair die a number of times and come up with the expectation by calculating the ratio between the fair and bias distributions.