# Homework 3

Problem 1: Proposals.  Revisit this notebook.  I'd like to evaluate different proposals.  In the notebook we tried two different proposals, a sort of "global proposal" (uniform on -10 to 10) and a more local proposal (x + a normal distribution).  Part (a) For the second proposal type, is the convergence better if you change the scale of the normal distribution jump?  Part (b)  What if you change the jump to x +/- 0.001 with equal likelihood?  Convergence is better or worse?

In [26]:
import numpy as np
import pandas as pd
import plotly.express as px

In [27]:
# Here's a little function putting together everything,
# including burn-in, the random walk through
# the state space, and the data handling.
def MCMC(target, proposal, acceptance, sample_size, initial,number_of_chains=1,burn_in=100):
    chains=[]
    for i in range(number_of_chains):
        # Initialize
        x = initial
        # Burn-in
        for i in range(burn_in):
            y = proposal(x)
            alpha = acceptance(x,y)
            u = np.random.uniform()
            if u < alpha:
                x = y
        # storage for the sample
        sample = np.array([x])
        for i in range(sample_size):
            y = proposal(x)
            alpha = acceptance(x,y)
            # ／(^ㅅ^)＼ dice roll of 100 sided die
            #           if u < alpha, accept the proposed y
            #           if u >= alpha, stay on x
            u = np.random.uniform()
            if u < alpha:
                x = y
            sample = np.append(sample,x)
        # ／(^ㅅ^)＼ appends list to list of lists chains !!note numpy array does vector addition with overloaded + operator, not append
        chains = chains + [list(sample)]
    return(np.array(chains))

In [28]:
# Target Density
def f(x):
    return(1/np.sqrt(2*np.pi)*np.exp(-(x**2)/2))
# Proposal Density
def g(x=0):
    return(np.random.uniform(low=-10.0,high=10.0))
# Acceptance discipline
def A(x,y):
    return( min([1,f(y)/f(x)]))

I've pulled the code to modify into it's own main in order to make this easier. While slightly clunky to put all in one place, it should work.

In [29]:
def main():
    samples = MCMC(f,g,A,5000,0.0,number_of_chains=10)

    from sklearn.neighbors import KernelDensity
    model = KernelDensity()
    model.fit(samples[0,:].reshape(-1,1))
    x_space = np.linspace(-4,4,1000)
    normal_values = f(x_space)
    def score(a_sample,bandwidth):
        model = KernelDensity(bandwidth=bandwidth)
        model.fit(a_sample.reshape(-1,1))
        model_values = np.exp(model.score_samples(x_space[:,None]))
        return( np.sum((normal_values - model_values)**2) )

    test_bandwidth = 0.5
    scores = []
    number_chains,sample_size = samples.shape
    for i in range(number_chains):
        evolution = np.array([samples[i,0]])
        for j in range(10,sample_size,10):
            evolution = np.append(evolution,score(samples[i,0:j],test_bandwidth))
        scores = scores + [evolution]

    scores = np.array(scores)
    the_names = ['X'+str(i) for i in range(number_chains)]
    scores_df = pd.DataFrame(dict(zip(the_names,scores)))

    fig = px.line(scores_df,title="Distance from normal distribution as more steps are added")
    fig.update_xaxes(range = [0,300])
    fig.update_yaxes(range = [0,2])
    fig.show()

In [30]:
# Proposal Density
def g(x):
    return(x + np.random.normal(scale=2.0))

main()

For reference, this is the original distribution.

In [31]:
# Proposal Density
def g(x):
    return(x + np.random.normal(scale=20.0))

main()

Above, we see the proposal distribution with `scale=20`. This looks worse - the chains are completely scattered.

In [32]:
# Proposal Density
def g(x):
    return(x + np.random.normal(scale=1.0))

main()

With `scale=1`, the distance from the normal distribution is smaller. What if we try making it even smaller?

In [33]:
# Proposal Density
def g(x):
    return(x + np.random.normal(scale=.5))

main()

...yeah, no. This is worse again.

In [35]:
# Proposal Density
def g(x):
    if(np.random.randint(0, 2) == 0):
        return(x + 0.001)
    else:
        return(x - 0.001)

main()

Problem 2: State spaces.  Revisit this notebook.  I'd like you to build a sampler for the Poisson distribution with lambda = 5.  Challenge: the Poisson distribution has a state space which is discrete, not continuous (it's a distribution of counts; we use it frequently in wildlife count data).  Experiment a bit with the proposal distribution.

In [None]:
# Target Density
def f(x):
    l = 5
    return (np.math.pow(l, x) * np.exp(-l))/np.math.factorial(x)

# Proposal Density
def g(x):
    if(np.random.randint(0, 1) < .5):
        return(x + 1)
    else:
        return(x - 1)

main()

In [None]:
# Proposal Density
def g(x):
    return np.math.floor(x + np.random.normal(scale=5))

main()