**Tutorial 9b - Bayesian Model Checking**

In this tutorial, we will do some Bayesian model checking on the type Ia supernovae data we used in tutorial 8b.

The object of this exercise is to see if the data is consistent with the cosmological and statistical model we have been using to find the best-fit parameters.  These assumptions could be wrong in any number of ways.  For example, the errors in the distance moduli are not Gaussian or not measured correctly, the cosmological model we are assuming is incorrect or the observed redshifts which we have treated as an independent variable have significant errors in them.  We might wonder if the cosmological model we have assume which has no curvature (flat) and a cosmological constant is too simple.  Might there be evidence that the cosmological constant is evolving? 

In [None]:
import pandas as pa
import numpy as np
import matplotlib.pyplot as plt
import random

1) Go back to tutorial 8b.  

After question 6) save the chain `result.flatchain` first as a Pandas `DataFrame` and then to a CSV file called `open_mcmc.csv` (df.to_csv(filename)).

After question 16) save the chain `result.flatchain` first as a Pandas `DataFrame` and then to a CSV file called `flat_mcmc.csv`.

Read in the MC chain of parameters that you calculated in tutorial 8b.  Put them into dataframes `open_mcmc` and `flat_mcmc` first.

In [None]:
open_mcmc = pa.read_csv(...)
flat_mcmc = ...

print(open_mcmc)

2) Read in the supernova data again and put them in the arrays named.

In [None]:
data = pa.read_csv("SCPUnion2.1_mu_vs_z.txt",sep='\t',comment='#')

redshifts = data['redshift']
magnitudes = data['dist_mod']
errors = data['dist_mod_error']


3) Make functions `mu_model_flat(p)` and `mu_model_open(p)` as was done in tutorial 8b.  But in this case they should use the array `redshifts` already defined and return a vector of predictions for each supernova.

The parameters should be `p['M']`, `p['Omega_m']` and `p['Omega_Lambda']` in the case of `mu_model_open(p)`.

In [None]:

import astropy.cosmology as cosmo
def mu_model_flat(p):
    .
    .

def mu_model_open(p):
    .
    .


4) We will need to generate new mock data sets given a set of parameters.

Make two functions that take the parameters `params` and generate a new data set with the same size and redshifts.  Inside the function it should use `mu_model_flat()` or `mu_model_flat()`.  You should add noise to each image.  Assuming the errors are normally distributed in magnitudes.  No looping should be necessary.  (hint: Use `numpy.random.normal()` ).

In [None]:
def data_generator_flat(params) :
    .
    .

def data_generator_open(params) :
    .
    .
  

# lets print a data set to see if it is working
p = {'M':20,'Omega_m':0.3,'Omega_Lambda':0.1}
print(data_generator_open(p))

5) Now we need to define the $\chi^2$ functions.

Making `chi2_flat()` and `chi2_open()` functions that take the parameters and, as a keyword parameter `data`,  the distance moduli.  They should return the $\chi^2$.  It should use the `errors` array and `mu_model()` inside it.

In [None]:

def chi2_flat(params,data=None) :
    .
    .
    
def chi2_open(...) :
    .
    .


6) Use the function `lmfit.minimize()` to minimize `chi2_flat()` and `chi2_open()` to find the best-fit parameters for the observed data.  Note that the object returned by `lmfit.minimize` has both the parameter values at the minimum (`result.params`).  Store these values for later use.

In [None]:
import lmfit

params = lmfit.Parameters()
params.add_many((...
           )

result_flat = lmfit.minimize(chi2_flat, params, method='Nelder',kws={'data':magnitudes})
lmfit.printfuncs.report_fit(result_flat.params)

best_chi2_flat = ...

params.add(...)

result_open = ...
lmfit.printfuncs.report_fit(result_open.params)

best_chi2_open = ...

7) What is the difference in the minimum $\chi^2$ 's for the two models?  Does this signify that one model should be favored over the other?

8) Besides $\chi^2$, we could also think of other statistics.  One that we will use is the maximum absolute residual with respect to the best-fit model prediction

${\rm max}_i\left| \frac{\mu_i - \mu_{model}(M,\Omega_m,\Omega_\Lambda,z_i)}{\sigma_i}  \right|$

We would expect that if the errors are not Gaussian in the sense that there are catastrophic outliers, this value would be higher than expected.

Find the maximum absolute residual for the data set and store it in `max_res_observed`.

In [None]:
flat_max_res_observed = ...
print("maximum absolute residual for data flat = ",flat_max_res_observed)

open_max_res_observed = ...
print("maximum absolute residual for data open = ",open_max_res_observed)

9) Now we have all the tools to create a sample drawn from the distribution

$ p(T) = \int d\theta ~p(\theta | D ) p(T | \theta) = \int d\theta ~p(\theta | D ) \int_{V(T)}dx~ p(x | \theta) $

where $T$ is a statistic and $V(T)$ is the volume in data-space where $T(x)=T$.  
The bootstrap approximation for this distribution is

$p(T) \simeq \frac{1}{n} \sum \delta\left( T - T(x_i) \right) ~~~~ x_i 
\sim p(x | \theta) ~,~ \theta \sim p(\theta | D ) $

From this we can calculate the significance of the statistic $T(x)$ for our two models.


For $T$, we will choose the minimum $\chi^2$ and the maximum absolute residual.  

Make at least 1000 simulated data sets, calculate the statistics, and store their values.
Do this only for the open model.

The parameters should be a random selection from the MCMC chain.  This should not involve the observed data directly. 

In [None]:

from numpy.random import randint
from scipy.optimize import minimize


chi2array = []
max_res = []

nmcmc = ...

p = ...

for i in range(1,1000) :
    # take random set of parameters from the Markov Chain 
    params = open_mcmc.to_dict('records')[ randint(...) ]
    print(i,params)
    
    # generate a new random data set from the model and this set of parameters 
    data = ...
      
    # find the maximum likelihood parameters for the new dataset 
    bestfit = lmfit.minimize(chi2_open, p, method='Nelder',kws={'data':data})

    chi2array.append(...)
    
    max_res.append(...)


10) Make a histogram of the maximum absolute residuals (MAR).  Then make a plot of their cumulative distribution.  Mark, in this last plot, the observed MAR with a vertical line.

In [None]:

fig, ax = plt.subplots(nrows=1, ncols=2)
fig.tight_layout()

ax[0]. ...
ax[0].set_xlabel(r'max(residual)')
ax[0].set_box_aspect(aspect=1)

.
.

ax[1].plot(...)
ax[1].plot(...,[0,1],linestyle=':')
ax[1].set_ylim(0,1)

ax[1].set_xlabel(r'max(residual)')
ax[1].set_ylabel(r'F$(max(residual))$')
ax[1].set_box_aspect(aspect=1)
plt.show()


11) Make a histogram of the minimum $\chi^2$ s from the simulations. Then make a plot of their cumulative distribution. Mark, in this last plot, the observed minimum $\chi^2$ with a vertical line.

In [None]:
# very similar to above


12) Calculate the right-hand, one-sided p-values for the MAR and the minimum $\chi^2$.  Is this model for cosmology and errors consistent with the data in terms of these two statistics? 

13) Now, we are going to do some quick nested sampling.

Install Nautilus with `pip install nautilus-sampler` or `conda install -c conda-forge nautilus-sampler`.  This program works in a similar way to `emcee`, which we used in tutorial 8b, but it does nested sampling instead of MCMC.
The documentation is at https://nautilus-sampler.readthedocs.io/en/latest/index.html.

Make a prior for the open model with uniform priors.

Make a log-likelihood function for the data that takes the parameters only.  You can use the `mu_model_open()` function you already wrote.

Run the sampler with `verbose=True`.  This may take a while.

What is the evidence?  It is called `log_z` in this case. 

In [None]:
from scipy.stats import norm
from nautilus import Prior
from nautilus import Sampler

prior = Prior()
prior.add_parameter('M', dist=(20,50))
prior.add_parameter(...)
prior.add_parameter(...)

def likelihood_open(params) :
      .
      .

sampler = Sampler(prior, likelihood_open, n_live=1000)
sampler.run(verbose=True)

print('log of the evidence log Z: {:.2f}'.format(sampler.log_z))

14) Do the same as above but with the flat model.

In [None]:
prior_flat = Prior()
prior_flat.add_parameter(...)
prior_flat.add_parameter(...)

def likelihood_flat(params) :
    .
    .

sampler_flat = Sampler(prior_flat, likelihood_flat, n_live=1000)
sampler_flat.run(verbose=True)

print('log of the evidence log Z: {:.2f}'.format(sampler_flat.log_z))

15) Calculate Bayes' ratio for these models.  Do these data favor an open model or a flat one?  Is there strong evidence that the Universe is not flat?

In [None]:
print("Bayesian odds :", ...)


16) Make `corner` plots with the output from the output of Nautilus. 

In [None]:

import corner

points, log_w, log_l = sampler.posterior()
corner.corner(
    points, weights=np.exp(log_w), bins=20, labels=prior.keys, color='blue',
    plot_datapoints=False, range=np.repeat(0.999, len(prior.keys)))

points, log_w, log_l = sampler_flat.posterior()
corner.corner(
    points, weights=np.exp(log_w), bins=20, labels=prior_flat.keys, color='blue',
    plot_datapoints=False, range=np.repeat(0.999, len(prior_flat.keys)))
