# Imprtance Sampling

## Overview

Section [Basic Monte Carlo Integration](basic_monte_carlo_integration.ipynb) discussed 
Monte Carlo integration in its basic form. In this method we need to sample from a known distribution
$f$. However, there may be cases where it is difficult to sample from it. 

In this section, we will introduce <a href="https://en.wikipedia.org/wiki/Importance_sampling">importance sampling</a>.
This is a generalization of the basic Monte Carlo method that overcomes the problem of a difficult distribution.

## Importance sampling

Let us consider once again the integral 

$$I=\int_a^b h(x) dx$$

and rewrite it as 

$$I=\int_a^b \omega(x)f(x)$$

Importance sampling introduces a new probability distribution $g$, also known as the proposal distribution [2], 
that it is easier to  sample from. Thus we rewrite the integral as

$$I=\int_a^b \frac{\omega(x)f(x)}{g(x)}g(x)dx=E_g \left[Y \right]$$

where $Y$ is the random variable defined by

$$Y=\frac{\omega(x)f(x)}{g(x)}$$

We can now sample from $g$ and estimate $I$ as

$$\hat{I}=\frac{1}{N}\sum_i Y_i$$

Just like we did in the Monte Carlo integration section, we can use the law of 
large numbers and show that $\hat{I}\rightarrow I$ in probability.

In importance sampling we draw samples from $g$ and re-weight the integral using importance weights so
that the correct distribution is targeted [2]. However, $g$ in general has to have a similar shape with $f$. 
Moreover, it has to  have thicker  tails than $f$ otherwise the integral may become infinite [1]. 
Indeed, consider the second moment of $Y$

$$E_g\left[ Y^2 \right]=\int Y^2g(x)dx=\int \frac{\omega^2(x)f^2(x)}{g(x)}dx $$

Thinner tails for $g$ means that it goes fatser to zero than what $f$ does. 

All in all, a good choice for $g$ is a distribution that is similar to $f$ but with thicker tails. 
In fact, the optimal choice for $g$ is given by the following theorem [1]

----
**Theorem**

The choice of $g$ that minimizes the variance of $\hat{I}$ is

$$g(x)=\frac{|h(x)|f(x)}{\int |h(s)|f(s)ds}$$

----

## Python example

The first example we will consider is taken from [1]. We want to estimate the 
following probability; $P(Z > 3)$ where $Z\sim N(0,1)$ This is just the integral

$$P(Z > 3) = \int_{3}^{+\infty}f(x)dx = \int_{-\infty}^{+\infty}h(x)f(x)dx$$

where $h(x)$ is 1 if $x > 3$ and 0 otherwise and $f(x)$ is the PDF for the standard normal distribution.

In [1]:
import numpy as np
from scipy import random
from scipy.stats import norm

Define $h$.

In [2]:
def h(x)->float:
    return 1 if x > 3.0 else 0

Let $g\sim N(4,1)$. We draw samples form $g$ and calculate  

$$\hat{I}=\frac{1}{N}\sum_i Y_i$$

In [12]:
# the sample size
N = 100

# how many iterations to perform
n_iterations = 1000

integrals = []

for i in range(n_iterations):
    
    integral = 0.0
    
    # sample the points from g
    points = np.random.normal(4, 1, N)
    for p in points:
    
        nominator = (h(p) * norm.pdf(p , loc=0.0 , scale=1.0))
        denominator = norm.pdf(p, loc=4.0, scale=1.0)
        value = nominator / denominator
        integral += value
        
    integrals.append((integral) / float(N) )
    
print(f"E[I]={np.mean(integrals)}")
print(f"V[I]={np.var(integrals)}")        

E[I]=0.0013410013889053102
V[I]=9.159774683823069e-08


## Summary

In this section we reviewed importance sampling. This is another method that allows us to estimate integrals
just loke Monte Carlo integration. Importance sampling can be used when it is difficult
to sample from $f$. 


Extensions of importance sampling, include sequential importance sampling, <a href="https://en.wikipedia.org/wiki/Particle_filter">particle filtering</a> and <a href="https://en.wikipedia.org/wiki/Approximate_Bayesian_computation">approximate Bayesian computation</a>.

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.
2. <a href="https://astrostatistics.psu.edu/su14/lectures/cisewski_is.pdf">Imporatnce sampling</a>