# PerfectNS demo

This notebook demonstrates the basic functionality of the PerfectNS module; for background see the README and the dynamic nested sampling paper [(Higson, 2017a)](https://arxiv.org/abs/1704.03459).

### Running nested sampling calculations

The likelihood $\mathcal{L}(\theta)$, prior $\pi(\theta)$ and calculation settings are specified in a PerfectNSSettings object. For this example we will use a 10-dimensional spherically symmetric Gaussian likelihood with size $\sigma_\mathcal{L}=1$ and a Gaussian prior with size $\sigma_{\pi}=10$.

In [5]:
import PerfectNS.settings
import PerfectNS.likelihoods as likelihoods
import PerfectNS.priors as priors

# Input settings
settings = PerfectNS.settings.PerfectNSSettings()
settings.likelihood = likelihoods.gaussian(likelihood_scale=1)
settings.prior = priors.gaussian(prior_scale=10)
settings.n_dim = 10

The "dynamic_goal" setting determines if dynamic nested sampling should be used and, if so, how to split the computational effort between increasing parameter estimation accuracy and evidence calculation accuracy. dynamic_goal=1 optimises purely for parameter estimation and dynamic_goal=0 optimises purely for calculating the Bayesian evidence $\mathcal{Z}$.

Lets try running standard nested sampling and dynamic nested sampling calculation:

In [6]:
import PerfectNS.nested_sampling as nested_sampling

# Perform standard nested sampling
settings.dynamic_goal = None
standard_ns_run = nested_sampling.generate_ns_run(settings)
# Perform dynamic nested sampling
settings.dynamic_goal = 1  # optimise for parameter estimation accuracy
dynamic_ns_run = nested_sampling.generate_ns_run(settings)

We can now make posterior inferences using the samples generated by the nested sampling calculations. Here we calculate:

1\. the log Bayesian evidence $\log \mathcal{Z}=\log \left( \int \mathcal{L}(\theta) \pi(\theta) \mathrm{d}\theta \right)$,

2\. the mean of the first parameter $\theta_1$,

3\. the second moment of the posterior distribution of $\theta_1$,

4\. the median of $\theta_1$,

5\. the 84% one-tailed credible interval on $\theta_1$.

For the Gaussian likelihood and prior we can calculate the posterior distribution analytically, so we first calculate the analytic values of each quantity for comparison. The results are displayed in a Pandas data frame.

In [7]:
import PerfectNS.estimators as e
import PerfectNS.analyse_run as ar

estimator_list = [e.LogZ(),
                  e.ParamMean(),
                  e.ParamSquaredMean(),
                  e.ParamCred(0.5),
                  e.ParamCred(0.84)]
results = e.get_true_estimator_values(estimator_list, settings)
results.loc['standard run'] = ar.run_estimators(standard_ns_run, estimator_list)
results.loc['dynamic run'] = ar.run_estimators(dynamic_ns_run, estimator_list)
results

Unnamed: 0,logz,theta1,theta1squ,Median(theta1),theta1c_0.84
true values,-32.264988,0.0,0.99008,0.0,0.989523
standard run,-32.600105,-0.007731,0.992193,-0.031522,0.964821
dynamic run,-33.13628,0.005184,0.986081,0.017635,0.991641


### Estimating sampling errors

You can estimate the numerical uncertainties on these results by calculating the standard deviation of the sampling errors distributions each run using the bootstrap resampling approach described in [Higson (2017b)](https://arxiv.org/abs/1703.09701).

In [8]:
results.loc['standard unc'] = ar.run_std_bootstrap(standard_ns_run,
                                                   estimator_list,
                                                   n_simulate=200)
results.loc['dynamic unc'] = ar.run_std_bootstrap(dynamic_ns_run,
                                                  estimator_list,
                                                  n_simulate=200)
results.loc[['standard unc', 'dynamic unc']]

Unnamed: 0,logz,theta1,theta1squ,Median(theta1),theta1c_0.84
standard unc,0.315792,0.02522,0.045671,0.033014,0.052911
dynamic unc,0.913339,0.01312,0.024447,0.016673,0.02242


### Generating and analysing runs in parallel

Multiple nested sampling runs can be generated and analysed in parallel (this uses the concurrent.futures module).

In [9]:
import PerfectNS.parallelised_wrappers as pw
import PerfectNS.maths_functions as mf

# Generate 100 nested sampling runs
run_list = pw.generate_runs(settings, 100, max_workers=4)
# Calculate posterior inferences for each run
values = pw.func_on_runs(ar.run_estimators, run_list, estimator_list,
                           parallelise=True)
# Show the mean and standard deviation of the calculation results
estimator_names = [est.name for est in estimator_list]
multi_run_tests = mf.get_df_row_summary(values, estimator_names)
multi_run_tests.loc[['mean', 'std']]

                                                 

func_on_runs: calculating run_estimators for 100 runs


                                       

Unnamed: 0,logz,theta1,theta1squ,Median(theta1),theta1c_0.84
mean,-32.225957,0.000617,0.985842,0.000488,0.986943
std,1.638654,0.012395,0.019875,0.014765,0.022042


### Comparing dynamic and standard nested sampling performance
  
Lets now compare the performance of dynamic and standard nested sampling, using the 10-dimensional Gaussian likelihood and prior. 

This is the code that was used for Table 1 of the dynamic nested sampling paper [(Higson, 2017a)](https://arxiv.org/abs/1704.03459), although we only use 100 runs instead of 5000. Tables 2, 3 and 4 can also be replicated by changing the settings; for more information about the get_dynamic_results function look at its docstring.

In [None]:
import PerfectNS.results_tables as rt

settings.likelihood = likelihoods.gaussian(likelihood_scale=1)
settings.prior = priors.gaussian(prior_scale=10)
settings.n_dim = 10
dynamic_results_table = rt.get_dynamic_results(100, [0, 1], estimator_list, settings)
dynamic_results_table

Note that every second column gives an estimated numerical uncertainty on the values in the previous column.

Looking at the final row of dynamic_results_table (above), you should see that dynamic nested sampling targeted at parameter estimation (dynamic goal=1) has an efficiency gain (equivalent computational speedup) for parameter estimation (columns other than $\log \mathcal{Z}$) of factor of around 3 to 4 compared to standard nested sampling.

### Comparing bootstrap error estimates to observed distributions of results

Finally lets check if the bootstrap estimates of parameter estimation sampling errors are accurate, using a 3d Gaussian likelihood and Gaussian prior.

This is the code that was used for Table 5 of the dynamic nested sampling paper [(Higson, 2017a)](https://arxiv.org/abs/1704.03459), although we only use 100 runs instead of 5000. See the paper and the get_bootstrap_results function's docstring for more details.

In [None]:
settings.likelihood = likelihoods.gaussian(likelihood_scale=1)
settings.prior = priors.gaussian(prior_scale=10)
settings.n_dim = 3
bootstrap_results_table = rt.get_bootstrap_results(100, 200,
                                                   estimator_list, settings,
                                                   n_run_ci=20,
                                                   n_simulate_ci=1000,
                                                   add_sim_method=False,
                                                   cred_int=0.95,
                                                   ninit_sep=False,
                                                   parallelise=True)
bootstrap_results_table

Note that every second column gives an estimated numerical uncertainty on the values in the previous column.

You should see that the ratio of the bootstrap error estimates to bootstrap_results the standard deviation of results (row 4 of bootstrap_results_table) has values close to 1 given the estimated numerical uncertainties.'