**Tutorial 8b - Markov Chain Monte Carlo**

In this tutorial, we will apply a Metropolis-Hastings Markov Chain (MCMC) sampler to type Ia supernova data.

The Data will be from the Supernova Cosmology Project at:
http://supernova.lbl.gov/union/descriptions.html#Magvsz

Background:

The apparent magnitude of an object with luminosity $L$ is

$m = - 2.5 \log\left( \frac{L}{2\pi D_L^2} \right) + m_o = 5 \log\left( D_L \right) + 2.5 \log\left( L \right) + m_o$

where $D_L$ is the luminosity distance.  The peak luminosity of a type Ia supernova 
is directly related to the width of its lightcurve and its color.  In this data set, the correction to a standard candle has already been done and it is reported in terms of the estimated distance modulus

$\mu = 5 \log\left( D_L \right) - 5$

This assumes a Hubble constant and requires a calibration using other distance indicators in local galaxies so there is an additive constant to the distance modulus, or a multiplicative constant to the brightness, that is not very well constrained, i.e. the relative brightnesses of the supernovae are well measured, but not their absolute brightnesses.

General relativity and energy conservation predict $D_L(z)$, or $\mu(z)$, where $z$ is the cosmological redshift of the supernovae.  This relationship depends on the cosmological parameters, $\Omega_{m}$ (the average density of matter) and $\Omega_\Lambda$ (the density of dark energy).



1) Read in the supernova data from `SCPUnion2.1_mu_vs_z.txt` and plot distance modulus vs redshift with error bars.

In [None]:
import pandas as pa
import numpy as np
import matplotlib.pyplot as plt
import random

data = pa.read_csv("SCPUnion2.1_mu_vs_z.txt",sep='\t',comment='#')

z = ...
mu = ...
mu_err = ...

.
.
.


2) Make a function that takes the parameters `p` and the redshift `z`.  It should return the predicted distance modulus.  Use `astropy.cosmology.cosmo.LambdaCDM()` to calculate the luminosity distance.  The parameters should be `p['M']`, `p['Omega_m']` and `p['Omega_Lambda']`.  It is not possible to measure the Hubble parameter independently from `M`, so just set it equal to 70.

In [None]:
import astropy.cosmology as cosmo
def mu_model(p,z):
    cosmo = 
    ...
    return 5* ... + ...


3) Make two functions. One is $\chi^2$ using the data that has been uploaded and assuming the distance moduli are normally distributed.  Also, make `lnprob(p)` ,which returns the log of the Gaussian likelihood and uses the $\chi^2$ function whithin it.

In [None]:
def chi_squared(p) :
    .
    .
    .

def lnprob(p) :
    ...


4) We are going to use the library `lmfit`.  This library handles 
parameters through the class `lmfit.Parameters()`.  Here, create an instance of this class.  Each parameter has a name, an initial guess, whether it is varied, the minimum allowed value and the maximum allowed value.

In [None]:
import lmfit

params = lmfit.Parameters()
params.add_many(('Omega_m',0.2,True,0,1)
           ,('M',...)
           ,('Omega_Lambda',...)
           )

5) Now we can find the minimum of the chi-squared using `lmfit.minimize()`.

Information on the output of lmfit.minimize() can be found at https://lmfit.github.io/lmfit-py/fitting.html under "MinimizerResult – the optimization result"

In [None]:
mi = lmfit.minimize(chi_squared, params, method='Nelder')
lmfit.printfuncs.report_fit(mi.params)
mi


6) Now we can use `lmfit.minimize()` to run multiple Markov chains using the method `emcee`.  (You might have to install the `emcee` library using pip or conda.)  This is not actually a minimization but an MCMC chain.  Run 100 chains of lengths 2000 with burnin periord 500.  Use the minimum chi-square solution above as the initial guess.

This might take a little while to run.  

`emcee` can also be run outside of `lmfit`.  We are using `lmfit` here because it gives a nice way to fix different parameters.

In [None]:
result = lmfit.minimize(lnprob, method='emcee',
                     nan_policy='omit',  # this removes non numerical outputs
                     nwalkers=..., burn=..., steps=..., 
                     params=mi.params,
                     progress=True)
result

7) Now we can make a nice "corner plot" using the package `corner`.

In [None]:
import corner
result.params.pretty_print()
labels=['$\Omega_m$','$\mu_o$','$\Omega_\Lambda$']
fig = corner.corner(result.flatchain,labels=labels,
                     show_titles=True, title_kwargs={"fontsize": 12})

8) **Correlation function for chains.**

Now, we want to be sure the chain has converged.  We can do this by constructing the autocorrelation function:
\begin{align}
C_{X}(n) = \frac{\frac{1}{N-n} ~\sum_{i=0}^{N-n-1} \left( X_i-\overline{X} \right) \left(X_{i+n}-\overline{X}\right) }{\frac{1}{N}\sum_{i=0}^{N-1} \left( X_i-\overline{X} \right)^2 }
\end{align}
where $n$ is called the lag.  You can see that $C_X(0)=1$ by construction.  There is one of these for each parameter (and also crosscorrelation functions between parameters).

The `result` object has attributes `result.chain` and `result.flatchain`.  The first is seporated into the different independent chains (i.e. different initial conditions) while the second is all the chains put together.  Look at their shapes with `np.shape()`. We will estimate the correlation function by averaging over the independent chains at the same lag.

Calculate the correlation function from $n=0$ to $n=500$ for each of the 3 parameters. Plot $C(n)$ vs $n$.  It is helpful to use `np.cov()` instead of doing the sum above.

In [None]:
s=np.shape(result.chain)
print(s)
N=s[0]

print(np.shape(result.chain[:,0,0]))
lag = np.arange(0,500,5)

## there will be three loops here plus a call to np.cov()
 # loop over parameters
 for v in range(s[2]) :
    corr = np.zeros(len(lag))
    .
    .
    .
     # loop over lags
     for i,n in enumerate(lag) :
        .
        .
        .
        # loop over chains
        for j in range(s[1]) :
             c = np.cov(result.chain[0:N-n,j,v],result.chain[n:N,j,v])
            .
            .
            
    plt.plot(lag,corr,label=variable_names[v])

plt.legend()
plt.plot([0,lag[-1]],[0,0],linestyle=':')
plt.xlabel('lag')
plt.ylabel('correlation function')
plt.show()

9) If we define the correlation length as the first time the autocorrelation function hits zero, what is this length for $\Omega_\Lambda$ and how many correlation lengths are in each chain?

10) **The maximum likelihood**

Find the entry in the chain with the largest likelihood.  The calculated log-likelihoods as stored in `result.lnprob` (np.argmax() is useful for this).

In [None]:
highest_prob = np.argmax(result.lnprob)
print(highest_prob)
hp_loc = np.unravel_index(highest_prob, result.lnprob.shape)
print(hp_loc)
mle_soln = result.chain[hp_loc]
for i, par in enumerate(p):
    p[par].value = mle_soln[i]

print('\nMaximum Likelihood Estimation from emcee       ')
print('-------------------------------------------------')
print('Parameter  MLE Value   Median Value   Uncertainty')
fmt = '  {:5s}  {:11.5f} {:11.5f}   {:11.5f}'.format
for name, param in p.items():
    print(fmt(name, param.value, res.params[name].value,
              res.params[name].stderr))

11) We can find quantile ranges for the parameters from `result.flatchain`.

In [None]:
print('\nError estimates from emcee:')
print('------------------------------------------------------')
print('Parameter  -2sigma  -1sigma   median  +1sigma  +2sigma')

for name in p.keys():
    quantiles = np.percentile(result.flatchain[name],
                              [2.275, 15.865, 50, 84.135, 97.275])
    median = quantiles[2]
    err_m2 = quantiles[0] - median
    err_m1 = quantiles[1] - median
    err_p1 = quantiles[3] - median
    err_p2 = quantiles[4] - median
    fmt = '  {:5s}   {:8.4f} {:8.4f} {:8.4f} {:8.4f} {:8.4f}'.format
    print(fmt(name, err_m2, err_m1, median, err_p1, err_p2))

12) What is the 99% lower credibility limit on $\Omega_\Lambda$?

13) What is posterior probability of $\Omega_m$ being less than 20%.

14) **Compare a flat model to an open model**

The geometrically flat Universe is where $\Omega_m + \Omega_\Lambda=1$.  Thus, a flat model will have one less parameter ($\Omega_\Lambda$ will not be necessary).  Rewrite `mu_model(p,z)`, but now using `cosmo.FlatLambdaCDM()`, redefine `params` to have only two parameters and then find the minimum chi-squared solution.

In [None]:
def mu_model(p,z):
    ...
    
params = ...

.
.
.

15) Does the Bayesian information criterion (BIC) support the conclusion that the Universe is not flat?  (Note that the BIC output when method='emcee' is not valid.  Calculate is yourself.)

16) Redo the MCMC with this two-parameter model.

17) Make a corner plot for this model. 

18) What is the 99% lower credibility limit on $\Omega_\Lambda$ within flat models? 