## Astronomy 406 "Computational Astrophysics" (Fall 2016)

### Homework 6 (due Thursday, Nov 17)

Use the [<b>emcee</b>](http://dfm.io/emcee) MCMC ensemble sampler to map the posterior distribution of a 2D Gaussian with strong correlation between the two parameters.
The likelihood is given by
$$
    {\cal L}(p_1,p_2) = {1 \over 2\pi \sigma_1 \sigma_2 \sqrt{1-r^2}}
    \exp{\left( - {1\over 2(1-r^2)} \left[ 
      \frac{(p_1-\mu_1)^2}{\sigma_1^2} + \frac{(p_2-\mu_2)^2}{\sigma_2^2}
      -\frac{2r(p_1-\mu_1)(p_2-\mu_2)}{\sigma_1\sigma_2}
    \right]\right)}
$$
and we take $\mu_1 = 1$, $\mu_2 = 5$, $\sigma_1 = 1$, $\sigma_2 = 0.2$, $r = 0.9$.

Determine how many steps to burn and how long to run the chain, using the Gelman-Rubin convergence indicator.  You can use the routines below, from the Week 10 notebook. Individual chains in emcee are 3D arrays, which can be accessed as <tt>sampler.chain[walker,step,parameter]</tt>.

The autocorrelation time calculation should succeed without errors if you apply it only after the chain has grown to about 1000 steps for each walker.  I also set an internal parameter that controls the minimum number of autocorrelation times needed to trust the estimate to 3 instead of default 10: <tt>tacor = sampler.get_autocorr_time(c=3)</tt>.

1. Write the <tt>lnLikelihood</tt> and <tt>lnPosterior</tt> functions for the above distribution, choosing appropriately wide priors.

2. Start the emcee chains in random locations and extend them iteratively in chunks of $\sim 100$ steps of each walker. Choose at least 100 walkers. Check the Gelman-Rubin convergence indicator after each iteration and stop when the difference $|R_{GR}-1|$ falls below 1%.  Do not discard any burn-in samples yet or reset the chain.

3. Store the values of $R_{GR}-1$ and the autocorrelation times for both parameters after each iteration, and plot them (vs. step number) after the chains converge.

4. Calculate the mean and standard deviation of the two parameters from the full chain, and the correlation coefficient between the parameters.

5. Plot the parameter distributions from the flattened chain (combined for all walkers) using the top-hat KDE, with superimposed Gaussian distribution with the calculated mean and standard deviation.  Is there a good correspondence?  Whether your anser is yes or no, what do you think is the reason for it?

6. Use the <tt>plot_samples</tt> routine to plot the parameter distribution from the full chain.

7. Use the calculated auto-correlation time to determine how many steps to <b>burn</b> and how to <b>thin</b> the remaining samples. You need to burn the first steps of each walker chain, which you can do as
<tt>p1t=[]; [p1t.append(sampler.chain[i,nburn::nthin,0]) for i in range(nwalkers)]; p1t=np.array(p1t).flatten()</tt>
Here nburn is the number of steps to burn for each walker.  Use the <tt>plot_samples</tt> routine for the thinned chain.  How does it differ from the full chain?

8. Recalculate the mean and standard deviation of the two parameters from the thinned chain, and compare with the target values ($\mu_k$ and $\sigma_k$).  Same for the correlation coefficient.  Do they agree better than the full chain?

In [None]:
from matplotlib.colors import LogNorm

def plot_samples(xh, yh):
    """
    This routine plots the 2D histogram of MCMC samples and their contour levels,
      along with the contours of the target distribution.
    It uses the same user-supplied 'lnLikelihood' function used to run the emcee sampler,
      'mu' is the array of mean target parameters.
    Arguments are flattened 1D arrays of two samples.
    """
    plt.xlabel(r'$p_1$')
    plt.ylabel(r'$p_2$')
    
    # 2D color histogram
    plt.hist2d(xh, yh, bins=100, norm=LogNorm(), normed=1)
    plt.colorbar()
                      
    # target likelihood level contours
    x = np.arange(np.min(xh)-1, np.max(xh)+1, 0.05)
    y = np.arange(np.min(yh)-1, np.max(yh)+1, 0.05)      
    X, Y = np.meshgrid(x,y)  
    Z = [np.exp(lnLikelihood([x1,x2])) for x1,x2 in zip(X,Y)]
    
    # contours enclosing 68.27 and 99% of the probability density
    dlnL2 = np.array([9.21, 2.30])
    lnLmax = lnLikelihood(mu)
    lvls = np.exp(lnLmax-0.5*dlnL2)
    cs = plt.contour(X,Y,Z, linewidths=1.5, colors='black', norm = LogNorm(), levels = lvls)

    plt.show()

In [None]:
def GR_indicator(mwx, swx, mwy, swy, nchain):
    """
    Gelman-Rubin convergence indicator.
    Arguments are 1D arrays of sums of two sample values (mwx,mwy) and their squares (swx,swy), 
      nchain is the number of elements in each walker chain.
    """
    mwxc = mwx/(nchain-1.);  mwyc = mwy/(nchain-1.) 
    swxc = swx/(nchain-1.)-np.power(mwxc,2)
    swyc = swy/(nchain-1.)-np.power(mwyc,2)
    # within chain variance
    Wgrx = np.sum(swxc)/nwalkers; Wgry = np.sum(swyc)/nwalkers
    # mean of the means over Nwalkers
    mx = np.sum(mwxc)/nwalkers; my = np.sum(mwyc)/nwalkers
    # between chain variance
    Bgrx = nchain*np.sum(np.power(mwxc-mx,2))/(nwalkers-1.)
    Bgry = nchain*np.sum(np.power(mwyc-my,2))/(nwalkers-1.)
        
    # Gelman-Rubin R factor
    Rgrx = (1 - 1/nchain + Bgrx/Wgrx/nchain)*(nwalkers+1)/nwalkers - (nchain-1)/(nchain*nwalkers)
    Rgry = (1 - 1/nchain + Bgry/Wgry/nchain)*(nwalkers+1)/nwalkers - (nchain-1)/(nchain*nwalkers)

    return np.array([ Rgrx, Rgry ])