# Phase 5: Diagnosing Outputs and Generate Report




This notebook  attempts to do what the software [Tracer](https://beast.community/tracer) does and make some improvements. 

## Instructions

Code cells of this Jupyter notebook should be run sequentially via shift+enter. Several cells will produce widgets that allow you to make various selections to select for MCMC chains that have converged. Once you have made that selection left-click on the cell below and press shift+enter.

## Suggested Reading

Up to and including "**x% HPD interval**" of:

Drummond, Alexei J., and Bouckaert, Remco R. ‘Ch 10: Posterior Analysis and Post Processing.’ In Bayesian Evolutionary Analyses with BEAST. Cambridge University Press, 2015. https://www.cambridge.org/core/books/bayesian-evolutionary-analysis-with-beast/81F5894F05E87F13C688ADB00178EE00.

The authors have been kind enough to make a draft copy of the book available at http://alexeidrummond.org/assets/publications/2015-drummond-bayesian.pdf.

## Parameters
<details>
    <summary>Click To See A Decription of Parameters</summary>
        <pre>
            <code>
beast_outputs: str
    A valid path containing output `.log` and `.tree` files from running BEAST 2

report_template: str
    Name of a valid report template to use to generate report.

kernel: str, default 'beast_pype'
    Name of Jupyter python kernel to use when running report template notebooks.

  </code>
</pre>

In [None]:
report_template = None
beast_xml_path = None
kernel_name = 'beast_pype'

Import necessary packages.

In [None]:
from copy import deepcopy
import yaml
from beast_pype.nb_utils import execute_notebook
import pandas as pd
from beast_pype.diagnostics.mcmc import BEASTDiag
from beast_pype.report_gen import add_unreported_outputs, gen_mcc_notebook
from beast_pype.diagnostics.runtime import get_beast_runtimes, get_slurm_job_stats
import warnings
import os
import importlib.resources as importlib_resources
# stop annoying matplotlib warnings
warnings.filterwarnings("ignore", module="matplotlib\*")

In [None]:
workflow_modules = importlib_resources.path('beast_pype', 'workflow_modules')
save_dir=os.getcwd()

Get pipeline run info, if there is any.

In [None]:
if os.path.isfile(f'{save_dir}/pipeline_run_info.yml'):
    with open(f'{save_dir}/pipeline_run_info.yml', "r") as file:
        data = file.read()
    file.close()
    pipeline_run_info = yaml.safe_load(data)
else:
    pipeline_run_info = {}

Create outputs_and_reports directory

In [None]:
outputs_and_reports_dir = f'{save_dir}/outputs_and_reports'
os.makedirs(outputs_and_reports_dir)

## Load log files.

In [None]:
if os.path.exists(f'{save_dir}/beast_outputs'):
    beast_outputs_path = f'{save_dir}/beast_outputs'
else:
    beast_outputs_path = save_dir

sample_diag = BEASTDiag(beast_outputs_path)

## Selecting Burn-in and Chains to Remove

Activating the cell below will generate an interactive widget. Widgets parts:
 * Top interactive part: this allows you to select for a different burn-in and remove chains and select the parameters used in the rest of the widget.,
 * Middle display: KDE and trace plots, see [arviz.plot_trace documentation](https://python.arviz.org/en/stable/api/generated/arviz.plot_trace.html#arviz.plot_trace).
 * Bottom display: A table of statistics regarding the traces, see [arviz.summary documentation](https://python.arviz.org/en/stable/api/generated/arviz.summary.html#arviz.summary). Regarding these statistics:
    * Ideally the ESSs should be >= 200, see [arviz.ess documentation](https://python.arviz.org/en/stable/api/generated/arviz.ess.html#arviz.ess).
    * Ideally the r_hat should be close fo 1, see [arviz.rhat documentation](https://python.arviz.org/en/stable/api/generated/arviz.rhat.html#arviz.rhat).
    * Markov Chain Standard Error MCSEs, see [arviz.mcse](https://python.arviz.org/en/stable/api/generated/arviz.mcse.html#arviz.mcse).

After making your selection click on the cell below the widget and then keep pressing shift+enter to carry on with the rest of the cells in this notebook.

In [None]:
sample_diag_widget = sample_diag.generate_widget(parameters_displayed=4)
sample_diag_widget

In [None]:
pipeline_run_info["Chains Used"] = deepcopy(sample_diag.selected_chains)
pipeline_run_info["Burn-In"] = deepcopy(sample_diag.burinin_percentage)
phase_5i_params = sample_diag.merging_outputs_params(output_path=outputs_and_reports_dir)
phase_5i_log = execute_notebook(input_path=f'{workflow_modules}/Phase-5i-Merge-BEAST-outputs.ipynb',
                                  output_path=save_dir + '/Phase-5i-Merge-BEAST-outputs.ipynb',
                                  parameters=phase_5i_params,
                                  progress_bar=True,
                                  nest_asyncio=True
                                 )

## Update the pipeline_run_info yml.

In [None]:
with open(f'{save_dir}/pipeline_run_info.yml', 'w') as fp:
    yaml.dump(pipeline_run_info, fp, sort_keys=True, indent=4)

fp.close()

## Get BEAST 2 runtimes

In [None]:
runtimes_df = get_beast_runtimes(beast_outputs_path,
                                 outfile_startswith='run-with-seed-',
                                 outfile_endswith='.out')

runtimes_df.to_csv(f'{outputs_and_reports_dir}/BEAST_runtimes.csv', index=False)

If BEAST 2 was run on slurm get slurm stats.

In [None]:
if os.path.isfile(f'{beast_outputs_path}/slurm_job_ids.txt'):
    jobs_df = pd.read_csv(f'{beast_outputs_path}/slurm_job_ids.txt', sep=';')
    jobs_df['JobID'] = jobs_df['JobID'].astype(str)
    stats_df = get_slurm_job_stats(jobs_df['JobID'].to_list())
    job_stats_df = jobs_df.merge(stats_df, on='JobID')
    job_stats_df.to_csv(f'{outputs_and_reports_dir}/BEAST_slurm_stats.csv', index=False)

## Generate output Report
Now you can now move on to visualising outputs from BEAST using a report template.

In [None]:
report_params = {'save_dir': outputs_and_reports_dir, 'beast_xml_path':beast_xml_path}
output_report_path = f'{outputs_and_reports_dir}/BEAST_pype-Report.ipynb'
add_unreported_outputs(report_template, outputs_and_reports_dir, output_report_path)
output = execute_notebook(input_path=output_report_path,
                          output_path=output_report_path,
                          parameters=report_params,
                          progress_bar=True,
                          kernel_name=kernel_name)

### Convert Output Report from Jupyter Notebook to Notebook

This also removes code cells.

In [None]:
%%bash -l -s {output_report_path}
source activate beast_pype
jupyter nbconvert --to html --no-input $@

## Produce MCC tree

This can take sometime. So you can look at the report whilst waiting this is done last.

In [None]:
gen_mcc_notebook(outputs_and_reports_dir, 'Phase-5ii-Gen-MCC-Trees.ipynb')
mcc_tree_output = execute_notebook(input_path='Phase-5ii-Gen-MCC-Trees.ipynb',
                             output_path='Phase-5ii-Gen-MCC-Trees.ipynb',
                             progress_bar=True)