# Optimising BEAST Runs.

This workflow is aimed at testing different configurations of BEAST 2 command line arguments when running a BEAST 2 xml, with the aim of optimising runtime and memory.
There are options for testing different SLURM `sbatch` configuration alongside  BEAST 2 command line arguments.

Key Features of this workflow:


<details>
    <summary>Click To See A Decription of Parameters</summary>
        <pre>
            <code>

Running an Instance of this Workflow
-------------------------------------------
overall_save_dir: str
    Path to where you are saving all the runs of this workflow.
    This creates a folder by the name you specify here e.g. creates a folder named folder in the root folder to save all the files produced when running the workflow.

specific_run_save_dir: str, optional
    Sub-directory of overall_save_dir you wish to save all the files from this instance of this workflow.
    If None, 'None' or an empty string a timestamp of format 'YYYY-MM-DD_hour-min-sec' is used instead.

kernel: str, default 'beast_pype'
    Name of Jupyter python kernel to use when running workflow. This is also the name of the conda environment to use in phases 4 &
    phase 2ii (as these Jupyter notebooks use the `bash` kernel).


General Inputs
----------------
ready_to_go_xml: str
    Path to a BEAST 2 xml that you wish to run unaltered.


Running BEAST 2 Configurations
-------------------------------------
thread_options: dict {'start': int, 'step': int, 'stop': int} or list [int], optional
    Number of thread_options to use in conjunction with configurations entries.
    See '-threads' on https://www.beast2.org/2021/03/31/command-line-options.html.
    If dict:
        A np.arange from 'start' to 'stop' going up in 'step' will be used.
        Each value in this entry will be tested with each entry in  configurations.
    If list:
        Each value will be tested with each entry in  configurations.

instance_options: dict {'start': int, 'step': int, 'stop': int} or list [int], optional
    Number of instance_options to use in conjunction with configurations entries.
    See '-instances' on https://www.beast2.org/2021/03/31/command-line-options.html.
    If dict:
        A np.arange from 'start' to 'stop' going up in 'step' will be used.
        Each value in this entry will be tested with each entry in  configurations.
    If list:
        Each value will be tested with each entry in  configurations.

cpu_options: dict {'start': int, 'step': int, 'stop': int} or list [int], optional
    Number of cpus to use in conjunction with configurations entries. Only to be used if calling slurm's sbatch.
    See '--cpus-per-task' on https://slurm.schedmd.com/sbatch.html
    If dict:
        A np.arange from 'start' to 'stop' going up in 'step' will be used.
        Each value in this entry will be tested with each entry in  configurations.
    If list:
        Each value will be tested with each entry in  configurations.

combined_cpu_thread_instance_options: dict {'start': int, 'step': int, 'stop': int} or list [int], optional
    Number of cpus, threads and instances to use in conjunction with configurations entries. Only to be used if calling slurm's sbatch.
    See:
     * '-threads' on https://www.beast2.org/2021/03/31/command-line-options.html.
     * '-instances' on https://www.beast2.org/2021/03/31/command-line-options.html.
     * '--cpus-per-task' on https://slurm.schedmd.com/sbatch.html
    If dict:
        A np.arange from 'start' to 'stop' going up in 'step' will be used.
        Each value in this entry will be tested with each entry in  configurations.
    If list:
        Each value will be tested with each entry in  configurations.

combined_thread_instance_options: dict {'start': int, 'step': int, 'stop': int} or list [int], optional
    Number of threads and instances to use in conjunction with configurations entries. Only to be used if calling slurm's sbatch.
    See:
     * '-threads' on https://www.beast2.org/2021/03/31/command-line-options.html.
     * '-instances' on https://www.beast2.org/2021/03/31/command-line-options.html.
    If dict:
        A np.arange from 'start' to 'stop' going up in 'step' will be used.
        Each value in this entry will be tested with each entry in  configurations.
    If list:
        Each value will be tested with each entry in  configurations.

configurations: nested_dict
    Keys:   will be used in naming the directory in which the results of a configuration are saved.
               If used with the argument threads `\_threads\_{number\_of\_threads}' will be used as a suffix in naming the directory.
    Values:  will be a dictionary outlining the settings of the configuration:

        number_of_beast_runs: int
            Number of chains to use (number of parallel runs to do) when running BEAST (e.g. 9).

        seeds: list of ints, otional
            Seeds to use when running BEAST. Generated if not given.
            If given, length of list should be the same as the number_of_beast_runs (number of chains), so each run has a designated seed.

        beast_options_without_a_value: list of strs
            Single word arguments to pass to BEAST 2.
            For instance to use a GPU when running BEAST 2 this would be `['-beagle_GPU']`.
            See https://www.beast2.org/2021/03/31/command-line-options.html.

        beast_options_needing_a_value: dict
            Word followed by value arguments to pass to BEAST 2.
            If the argument `threads` is used Do NOT use `--threads` in this dictionary.
            See https://www.beast2.org/2021/03/31/command-line-options.html.

        sbatch_options_without_a_value: list of strs
            Single word arguments to pass to sbatch.
            See https://slurm.schedmd.com/sbatch.html.

        sbatch_options_needing_a_value: dlct
            Word followed by value arguments to pass to sbatch.
            See https://slurm.schedmd.com/sbatch.html.

        max_threads: int, default None
            The maximum number of threads to use when running BEAST. I
            Only used to be used if NOT using `sbatch_options_without_a_value` or `sbatch_options_needing_a_value`  arguments.
            In such an event a value for max_threads does not have to given and the number of cores available
            minus 1 is used (`multiprocessing.cpu_count() - 1`).

  </code>
</pre>

In [None]:
'''
Parameters
-------------
'''
# Running an Instance of this Workflow
overall_save_dir = None
specific_run_save_dir=None
kernel_name = 'beast_pype'

# General Inputs
ready_to_go_xml = None

# Running BEAST 2 Configurations
combined_cpu_thread_instance_options=None
combined_thread_instance_options = None
thread_options=None
instance_options=None
cpu_options=None
configurations=None

## Import libraries and define functions:

In [None]:
import numpy as np
import os
import shutil
from datetime import datetime
from beast_pype.nb_utils import execute_notebook
import importlib.resources as importlib_resources
from beast_pype.workflow_params import setup_optimising_config
from itertools import product
from warnings import warn

### Check parameters are correct.

#### Check for not being assigned

In [None]:
variable_names = [
    'overall_save_dir', 'ready_to_go_xml', 'configurations'
]

for variable_name in variable_names:
    # I tried having this loop within a function inside the available_workflows module. eval function could not seem to find the variable.
    # I guess it may only have local (within) function scope.
    if eval(f"{variable_name} is None"):
        raise Exception(
            f"{variable_name} is missing from the parameters yml file, or\n" +
            f"{variable_name} has been given an 'null' value (which are converted to None in python).\n" +
            f"None/null values cannot be used for {variable_name}."
        )

#### If specific_run_save_dir has not been give use timestamp.

In [None]:
if specific_run_save_dir is None:
    now = datetime.now()
    specific_run_save_dir = now.strftime('%Y-%m-%d_%H-%M-%S')

### Creating Folders and Subfolders

In [None]:
if not os.path.exists(overall_save_dir):
    os.makedirs(overall_save_dir)

save_dir = overall_save_dir +'/'+ specific_run_save_dir
for folder in [save_dir]:
    if not os.path.exists(folder):
        os.makedirs(folder)

### Set path to workflow modules

In [None]:
workflow_modules = importlib_resources.path('beast_pype', 'workflow_modules')

## Setting up options product

In [None]:
if combined_cpu_thread_instance_options is not None:
    for variable_name in ['thread_options', 'cpu_options', 'instance_options', combined_thread_instance_options]:
        if eval(f"{variable_name} is not None"):
            raise ValueError(
                f"combined_cpu_thread_instance_options and {variable_name} are mutually exclusive."
            )
    if isinstance(combined_cpu_thread_instance_options, dict):
        combined_cpu_thread_instance_options = np.arange(**combined_cpu_thread_instance_options)
    else:
        if not isinstance(combined_cpu_thread_instance_options, list):
            raise TypeError('combined_cpu_thread_instance_options be a list, dict or None.')
    options_product = [(item, item, item) for item in combined_cpu_thread_instance_options]
elif combined_thread_instance_options is not None:
    for variable_name in ['thread_options', 'cpu_options', 'instance_options']:
        if eval(f"{variable_name} is not None"):
            raise ValueError(
                f"combined_thread_instance_options and {variable_name} are mutually exclusive."
            )
    if isinstance(combined_thread_instance_options, dict):
        combined_thread_instance_options = np.arange(**combined_thread_instance_options)
    else:
        if not isinstance(combined_thread_instance_options, list):
            raise TypeError('combined_thread_instance_options be a list, dict or None.')
    options_product = [(item, item, None) for item in combined_thread_instance_options]
else:
    if thread_options is None:
        thread_options = [None]
    elif isinstance(thread_options, dict):
        thread_options = np.arange(**thread_options)
    else:
        if not isinstance(thread_options, list):
            raise TypeError('thread_options should be a list, dict or None.')

    if instance_options is None:
        instance_options = [None]
    elif isinstance(instance_options, dict):
        instance_options = np.arange(**instance_options)
    else:
        if not isinstance(instance_options, list):
            raise TypeError('instance_options should be a list, dict or None.')

    if cpu_options is None:
        cpu_options = [None]
    elif isinstance(cpu_options, dict):
        cpu_options = np.arange(**cpu_options)
    else:
        if not isinstance(cpu_options, list):
            raise TypeError('cpu_options should be a list, dict or None.')
    options_product = product(thread_options, instance_options, cpu_options)

## Running different confligurations

In [None]:
for threads_arg, instances_arg, cpu_arg in options_product:
    if (threads_arg is not None and cpu_arg is not None) and (threads_arg > cpu_arg):
        warn(f'This combination has more threads requested ({str(threads_arg)}) than cpu cores ({str(cpu_arg)})! Therefore it has been skipped')
    else:
        for name, configuration in configurations.items():
            parameters = setup_optimising_config(name=name,
                                                 configuration=configuration,
                                                 save_dir=save_dir,
                                                 ready_to_go_xml=ready_to_go_xml,
                                                 threads_arg=threads_arg,
                                                 instances_arg=instances_arg,
                                                 cpu_arg=cpu_arg)
            config_save_path =parameters['save_dir']
            if 'sbatch_arg_string' in parameters:
                phase_4_log = execute_notebook(input_path=f'{workflow_modules }/Phase-4-SBATCH-Running-BEAST.ipynb',
                                                  output_path=f'{config_save_path }/Phase-4-SBATCH-Running-BEAST.ipynb',
                                                  parameters=parameters,
                                                  progress_bar=True,
                                                  nest_asyncio=True)
            else:
                phase_4_log = execute_notebook(input_path=f'{workflow_modules }/Phase-4-GNU-Parallel-Running-BEAST.ipynb',
                                                  output_path=f'{config_save_path}/Phase-4-GNU-Parallel-Running-BEAST.ipynb',
                                                  parameters=parameters,
                                                  progress_bar=True,
                                                  nest_asyncio=True)

## Add Stats summarising notebook.

In [None]:
shutil.copy(f'{workflow_modules }/Phase-5-Diagnosing-Runtime-and-Resource-Usage.ipynb', f'{save_dir}/Phase-5-Diagnosing-Runtime-and-Resource-Usage.ipynb')