# Phase 5: Diagnosing Outputs and Generate Report




This notebook  attempts to do what the software [Tracer](https://beast.community/tracer) does and make some improvements. 

## Instructions

Code cells of this Jupyter notebook should be run sequentially via shift+enter. Several cells will produce widgets that allow you to make various selections to select for MCMC chains that have convreged. Once you have made that selection left click on the cell below and press shift+enter.

## Suggested Reading

Up to and including "**x% HPD interval**" of:

Drummond, Alexei J., and Bouckaert, Remco R. ‘Ch 10: Posterior Analysis and Post Processing.’ In Basian Evolutionary Analyses with BEAST. Cambridge University Press, 2015. https://www.cambridge.org/core/books/bayesian-evolutionary-analysis-with-beast/81F5894F05E87F13C688ADB00178EE00.

The authors have been kind enough to make a draft copy of the book avialable at http://alexeidrummond.org/assets/publications/2015-drummond-bayesian.pdf.

## Setup

```
Parameters
-------------
save_dir: str, oplional
    Location of .log and .trees files from running several BEAST 2 MCMC chains using the same BEAST 2 xml.
    If not prvided save_dir is assumed to be the location of this notebook.

report_template: str
    Path for report pemplate to use to generate report on the running of this workflow.

metadata_path: str
    Path of metadata pertaining to the fasta file used in generating  BEAST 2 xml.

add_unreported_fields: bool, default True
    If true the metadata of the notebook report_template will be searched for the entry "BEAST outputs reported".
    Any parameter not listed but occuring in .log files will then be added to the notebook reporting on the outputs from
    running this work workflow.
```

In [None]:
save_dir = None
report_template = None
metadata_path = None
add_unreported_fields = True

Import necessary packages.

In [None]:
from copy import deepcopy
import json
import papermill as pm
from beast_pype.mcmc_diagnostics import BEASTDiag
from beast_pype.report_gen import add_unreported_outputs
from beast_pype.workflow import get_slurm_job_stats
import warnings
import os
import importlib.resources as importlib_resources
# stop annoying matplotlib warnings
warnings.filterwarnings("ignore", module="matplotlib\*")

In [None]:
if report_template is None:
    report_template =  importlib_resources.path('beast_pype', 'report_templates') / 'BDSKY-Report.ipynb'

if save_dir is None:
    save_dir=os.getcwd()

if metadata_path is None:
    raise ValueError('metadata_path is required.')

In [None]:
with open(save_dir + "/pipeline_run_info.json", "r") as file:
    data = file.read()
file.close()
pipeline_run_info = json.loads(data)
pipeline_run_info["Chains Used"] = []
pipeline_run_info["Burn-In"] = []

## Get Intormation on Run of Pipeline
### Slurm Job Stats

In [None]:
try:
    slurm_job_stats = get_slurm_job_stats(pipeline_run_info['slurm job IDs'])
    slurm_job_stats.to_csv(f"{save_dir}/slurm_job_stats.csv", index=False)
    to_display = slurm_job_stats.
except:
    job_ids_request = ','.join([f"{entry}.batch" for entry in pipeline_run_info['slurm job IDs']])
    request = f"sacct --jobs={job_ids_request} --format=JobID,AllocTres,Elapsed,CPUTime,TotalCPU,MaxRSS -p --delimiter='/t'"
    to_display = ('The function for summarising slurm job statistics into a table may not work properly with certain slurm configurations (formating issues). \n' +
                    'We suggest you attempt the following from the command line on the terminal in which you ran this beast_pype workflow:\n' +
                  request)

display(to_display)

## Load log files.

In [None]:
sample_diag = BEASTDiag(save_dir)

## Selecting burnin and Chains to Remove

In [None]:
sample_diag_widget = sample_diag.generate_widget()
sample_diag_widget

## Merge Kept chains

### Log files

In [None]:
%%bash -l -s {sample_diag.logcombiner_args(suffix='.log')}
source activate beast_pype

logcombiner -b $1 -log ${@:3}  -o $2

### Tree Files

In [None]:
%%bash -l -s {sample_diag.logcombiner_args(suffix='.trees')}
source activate beast_pype

logcombiner -b $1 -log ${@:3}  -o $2

### Recoard Chains used & burnin for this sample

In [None]:
pipeline_run_info["Chains Used"].append(deepcopy(sample_diag.selected_chains))
pipeline_run_info["Burn-In"].append(deepcopy(sample_diag.burinin_percentage))

## Update the pipeline_run_info json.

In [None]:
with open(f'{save_dir}/pipeline_run_info.json', 'w') as fp:
    json.dump(pipeline_run_info, fp, sort_keys=True, indent=4)

fp.close()

## Generate output Report
Now you can now move on to visualising outputs from BEAST using a report template.

In [None]:
report_params = {'save_dir': save_dir, 'metadata_path': metadata_path}
output_report_path = f'{save_dir}/BEASTPype-Report.ipynb'
if add_unreported_fields:
    add_unreported_outputs(report_template, f'{sample_diag.directory}/merged.log', output_report_path)
output = pm.execute_notebook(input_path=output_report_path,
                             output_path=output_report_path,
                             parameters=report_params,
                             progress_bar=True)

### Convert Output Report from Jupyter Notebook to Notebook

This also removes code cells.

In [None]:
%%bash -l -s {output_report_path}
source activate beast_pype
jupyter nbconvert --to html --no-input $@