Author: Lester Hedges<br>
Email:&nbsp;&nbsp; lester.hedges@bristol.ac.uk

___Jupyter Recap___:
* Press Shift+Enter to execute a cell and move to the cell below.
* Press Ctrl+Enter to execute a cell and remain in that cell.
* Run a shell command on the underlying operating system by prefixing the command with an exclamation mark, !
* Remember that the flow is in the order that you execute cells, which is not necessarily linear in the notebook. Keep track of the numbers in brackets to the left of the cell!


# Running nodes

The previous notebook showed you how to write a node to perform minimisation of a molecular system within an interactive Jupyter notebook. This notebook introduces you to some of the other ways of running BioSimSpace nodes, showing how the same script can be used in several different ways. 

## Running nodes on the command-line

The typical way of interacting with BioSimSpace is by running a workflow component, or _node_, from the command-line. A node is just a normal Python script that is run using the `python` interpreter. Let's use the molecular minimisation example from the previous notebook, which we've provided as a Python script called `minimisation.py` within `nodes` directory. (This is the just the previous notebook, downloaded as a regular Python script.)

From the command-line, we can query the node to see what it does and get information about the inputs:

In [None]:
!python nodes/minimise.py --help

In the previous notebook, input was achieved via a graphical user interface where the user could configure options and upload files. On the command-line, inputs must be set as command-line arguments. From the information provided in the node itself, i.e. the description, the definition of inputs and outputs, BioSimSpace has autogenerated a nicely formatted [argparse](https://docs.python.org/3/library/argparse.html) help message that describes how the node works. The information shows all of the inputs and outputs, let's us know which inputs are optional, and specifies any default values or constraints.

Note that it's possible to pass options to the node in various ways, e.g. directly on the command-line, using a [YAML](https://en.wikipedia.org/wiki/YAML) configuration file, or even using environment variables. This provides a lot of flexibility in the way in which BioSimSpace nodes can be run. For now we'll just pass arguments on the command-line.

Try running the node without any arguments and seeing what the output is:

In [None]:
!python nodes/minimise.py

Thankfully we've provided some files for you. As before, these are found in the `input` directory.

In [None]:
!ls inputs/ala*

(The files define a solvated alanine dipeptide system in [AMBER](http://ambermd.org) format.)

Let's now run the minimisation node using these files as input. In the interests of time, let's also reduce the number of steps to 1000. The files can be passed to the script in various ways. All of the following are allowed:

```bash
!python nodes/minimisation.py --steps=1000 --files="inputs/ala.crd, inputs/ala.top"
!python nodes/minimisation.py --steps=1000 --files inputs/ala.crd inputs/ala.top
!python nodes/minimisation.py --steps=1000 --files inputs/ala.*
```

In [None]:
!python nodes/minimise.py --steps=1000 --files inputs/ala.*

Once the process has finished running (the asterisk to the left of the cell will disappear) we should find that the minimised molecular system has been written to the working directory.

In [None]:
!ls minimised.*

Note that the files have been written in the same format as the original molecular system, i.e. AMBER.

We have also provided some GROMACS format input files.

In [None]:
!ls inputs/kigaki.*

Let's now run the node using these files as input. This is a larger system so the minimisation will take a little longer.

In [None]:
!python nodes/minimise.py --steps=1000 --files inputs/kigaki.*

There should now be two additional GROMACS format output files in the working directory. (Remember that they won't appear until the cell above finishes running.)

In [None]:
!ls minimised.*

## Running nodes from within BioSimSpace

BioSimSpace also provides functionality for running nodes internally. This allows you to call a node from within a script, thereby using existing nodes as building blocks for more complicated workflows. To activate nodes you can point BioSimSpace to a directory in which they are contained. As such, you can maintain your own internal nodes and have them available users when needed.

For example.

In [None]:
import BioSimSpace as BSS

BSS.Node.setNodeDirectory("nodes")
BSS.Node.list()

To get information about a particular node we can pass its name to the help function:

In [None]:
BSS.Node.help("minimise")

To execute a node we use the `run` function. This takes a dictionary of input values and returns another dictionary containing the outputs. Let's generate a valid input dictionary:

In [None]:
input = {"files": ["inputs/ala.crd", "inputs/ala.top"], "steps": 1000}

We can now run the `minimise` node, passing the dictionary from above:

In [None]:
output = BSS.Node.run("minimise", input)

Finally, let's print the output dictionary to see the result of running the node:

In [None]:
print(output)

BioSimSpace nodes can also autogenerate their own [Common Workflow Language](https://www.commonwl.org/) (CWL) tool wrappers, allowing them to be plugged into any workflow engine that supports the standard. To generate a wrapper, simply pass the `--export-cwl` argument when running the node, e.g.:

In [None]:
!python nodes/equilibrate.py --export-cwl

Let's examine the wrapper:

In [None]:
!cat nodes/equilibrate.cwl

As a simple example of chaining BioSimSpace nodes in a command-line workflow, consider the following script:


```bash
#!/usr/bin/env bash
# scripts/workflow.sh

# Exit immediately on error.
set -e

echo "Parameterising..."
python nodes/parameterise.py --pdb inputs/methanol.pdb --forcefield gaff

echo "Solvating..."
python nodes/solvate.py --files parameterised.* --water_model tip3p

echo "Minimising..."
python nodes/minimise.py --files solvated.* --steps 1000

echo "Equilibrating..."
python nodes/equilibrate.py --files minimised.* --restraint heavy

echo "Done!"
```

Starting from a PDB topology, this script calls each of the nodes in sequence, passing the output of one as the input to the next. The output of the final node is a set of files representing the equlibrated molecular system, as well as a trajectory and PDB file that can be visualised with, e.g. the [Visual Molecular Dynamics](https://www.ks.uiuc.edu/Research/vmd/) (VMD) program.

Let's run the workflow:

In [None]:
!bash scripts/workflow.sh