# gmxapi flow

```
Author                : M. Eric Irrgang
Goal                  : Understand the gmxapi scripting interface
Time                  : 25 minutes
Prerequisites         : Familiarity with GROMACS command line tools.
Software requirements : GROMACS, gmxapi
Tested for            : GROMACS2021
```

# General information

This notebook illustrates the Python interface for gmxapi with current and planned functionality and syntax.
Additional design aspects are illustrated where possible.

 - Prerequisites
     - GROMACS2021 with shared library support (standard)
     - gmxapi 0.2 Python package
 - Version
     - 2
 - References
     - [gmxapi manual](https://manual.gromacs.org/current/gmxapi/index.html)
     - [GROMACS CLI manual](https://manual.gromacs.org/current/user-guide/cmdline.html#commands-by-name)

# Python module dependencies
Import modules we will use in this notebook.

In [None]:
# Import Python standard library tools.
import os
import shutil
import subprocess
from pathlib import Path

In [None]:
# Import gmxapi package
import gmxapi as gmx

# Self-check

In [None]:
gmx.version.api_is_at_least(0,2)

In [None]:
# Get the path to the *gmx* command line interface tool.
cli = Path(shutil.which('gmx'))

assert cli.exists()
cli

In [None]:
# Work around resource management limitations
from gmxapi.simulation.mdrun import ResourceManager as _ResourceManager
_ResourceManager.mdrun_kwargs = {'threads': 1}

# Prepare working directories

Note that GROMACS does not provide an API for its filesystem interactions, so we need to do our own file management.

In [None]:
# Confirm the availability of inputs.
input_dir = Path('inputs').absolute()
assert input_dir.exists()

In [None]:
nb_root = Path('pipeline-basics').absolute()
if not nb_root.exists():
    nb_root.mkdir()

In [None]:
# Confirm the availability of inputs.
input_dir = Path('inputs').absolute()
assert input_dir.exists()

In [None]:
# Define a function to clean the output directories we are going to use.
def clean_dir(dirname):
    if os.path.isabs(dirname):
        d = Path(dirname)
    else:
        d = nb_root/dirname
    try:
        shutil.rmtree(d)
    except FileNotFoundError:
        # Okay. Nothing to remove.
        print(f'{d} does not exist. No problem.')
    except OSError as e:
        print('Trouble preparing working directory:')
        print(e)
    d.mkdir()
    print(f'Created clean {d}')
    return d

# Exercise 1: Construct molecular model.

We will borrow from the `alanine-msm-tutorial` by Cathrine Bergh.
Following the example of that tutorial, we will parameterize a `amber99sb-ildn`
force field with `tip3p` water and Hydrogen virtual sites.

In [None]:
# Choose a working directory that we will try to use.
wd = clean_dir('ex1')

In [None]:
# Check
assert wd.exists()
wd

In [None]:
shutil.copy('inputs/start0.pdb', wd)

In [None]:
os.listdir(wd)

## Use the command line interface to bootstrap the molecular modeling inputs.

In [None]:
!cd pipeline-basics/ex1 && gmx pdb2gmx -ff amber99sb-ildn -water tip3p -vsite hydro -f start0.pdb -p topol.top -i posre.itp -o conf.gro

In [None]:
os.listdir(wd)

## Use the Python subprocess module to call `editconf`
Prepare the box before solvation. Can we do this all in Python?

In [None]:
struct_in = wd/'conf.gro'
struct_out = wd/'box.gro'
assert struct_in.exists()
assert not struct_out.exists()

In [None]:
argv = ['gmx', 'editconf']
argv.extend(['-bt', 'dodeca'])
argv.extend(['-d', '1.0'])
argv.extend(['-f', struct_in])
argv.extend(['-o', struct_out])

In [None]:
editconf = subprocess.run(argv, check=True)

In [None]:
editconf

In [None]:
os.listdir(wd)

But... where did these files come from, again? Can I get stronger references to the data flow?

# Exercise 2: Reimplement with a more formal pipeline.

In [None]:
wd = str(clean_dir('ex2'))

# Check
assert os.path.exists(wd)
wd

In [None]:
os.listdir(wd)

In [None]:
# Command line positional arguments have no meaning that Python can intuit.
args = ['pdb2gmx']
args.extend(['-ff', 'amber99sb-ildn'])
args.extend(['-water', 'tip3p'])
args.extend(['-vsite', 'hydro'])
args

In [None]:
cmd_dir = str(clean_dir('ex2/make_top'))

In [None]:
# We need a special wrapper to tell an API what input and output arguments mean.
make_top = gmx.commandline_operation(
    cli,
    args,
    input_files={
        '-f': os.path.join(input_dir, 'start0.pdb')
    },
    output_files={
        '-p': os.path.join(cmd_dir, 'topol.top'),
        '-i': os.path.join(cmd_dir, 'posre.itp'),
        '-o': os.path.join(cmd_dir, 'conf.gro')
    }
)

In [None]:
# The command has not run yet.
os.listdir(cmd_dir)

In [None]:
# Demand to know the returncode output value.
if make_top.output.returncode.result() != 0:
    print(make_top.output.erroroutput.result())

In [None]:
# Now the command has run.
os.listdir(cmd_dir)

## Chain commands
Use Python references to connect the outputs of one command to the inputs of another.

In [None]:
cmd_dir = str(clean_dir('ex2/edit'))

In [None]:
edit = gmx.commandline_operation(
    cli,
    ('editconf',
     '-bt', 'dodeca',
     '-d', '1.0'),
    input_files={
     '-f': make_top.output.file['-o']
    },
    output_files={
     '-o': os.path.join(cmd_dir, 'box.gro')
    }
)

### optional
if edit.output.returncode.result() != 0:
    print(edit.output.erroroutput.result())
os.listdir(cmd_dir)

## Solvate.

**Note:** GROMACS automatically looks for the `-cs` argument in `$GMXLIB`

**Warning:** There is no way to independently specify topology input and output files.
Such a caveat is beyond the scope of what gmxapi can solve.
I would be interested to hear how workflow systems are treating such tools, if at all, so that we might consider whether/how to evolve GROMACS tool interfaces.

In [None]:
# Borrow the utility from the previous tutorial.
@gmx.function_wrapper(output={'path': str})
def cp(src: str, dst: str, output):
    infile = os.path.abspath(src)
    if not os.path.exists(infile):
        raise RuntimeError('Input file does not exist.')
    outfile = os.path.abspath(dst)
    if os.path.exists(outfile):
        if os.path.isfile(outfile):
            raise RuntimeError('Output file already exists.')
        else:
            if not os.path.isdir(outfile):
                raise RuntimeError('dst must be an existing directory or non-existing filename.')
            outfile = os.path.join(outfile, os.path.basename(infile))
    shutil.copy(infile, outfile)
    output.path = outfile

In [None]:
cmd_dir = str(clean_dir('ex2/solvate'))

In [None]:
topol = cp(src=make_top.output.file['-p'], dst=cmd_dir).output.path

In [None]:
solvate = gmx.commandline_operation(
    cli,
    ('solvate',
     '-cs', 'spc216.gro'),
    input_files={
     '-cp': edit.output.file['-o']
    },
    output_files={
     '-o': os.path.join(cmd_dir, 'solvated.gro'),
     '-p': topol.result()
    }
)

In [None]:
solvated = solvate.output.file['-o']

In [None]:
if solvate.output.returncode.result() != 0:
    print(solvate.output.erroroutput.result())
os.listdir(cmd_dir)

In [None]:
# Replace our *topol* local reference.
topol = solvate.output.file['-p']

# Exercise 3: Prepare simulation input.

In [None]:
wd = str(clean_dir(nb_root/'ex3'))

# Check
assert os.path.exists(wd)
wd

## Finish adjusting topology to resolve force field compatibility.
`pdb2gmx` will not have applied some necessary constraints at termini.

`alanine-msm-tutorial` suggests additions to make after the first `#include` line.

We will read the `topol.top` file produced by `solvate` and rewrite it as `edited_topol.top`.

In [None]:
# Visually inspect the file.
with open(topol.result(), 'r') as fh:
    for line in fh:
        print(line.rstrip())

In [None]:
# Note the line at which to split.
sentry = '#include "amber99sb-ildn.ff/forcefield.itp"'

# Define the text to insert, per alanine-msm-tutorial.
append = """
[ constrainttypes ]
; constraints for capped termini
MCH3 C 2 0.166426
MCH3 C 2 0.166426
N MCH3 2 0.166426
N MCH3 2 0.166426
"""

In [None]:
cmd_dir = str(clean_dir(os.path.join(wd, 'em_pp')))

In [None]:
# Re-write the file, inserting the additional lines.
infile = topol.result()
topol = os.path.join(cmd_dir, 'edited_topol.top')
with open(topol, 'w') as outfh:
    with open(infile, 'r') as infh:
        for line in infh:
            outfh.write(line)
            if line.startswith(sentry):
                outfh.write(append)

In [None]:
# Check output.
with open(topol, 'r') as fh:
    for line in fh:
        print(line.rstrip())

We're going to wrap the following command line from the msm tutorial, with a few edits.

    !gmx grompp -f em -p run${i}/topol.top  -c run${i}/solvated.gro -o run${i}/em.tpr

Note that GROMACS allows you to give a file argument that is not the actual filename. To avoid confusion, I will try to be explicit.

Also note that we rewrote the topology file to a new filename.

In [None]:
grompp = gmx.commandline_operation(
    'gmx',
    ('grompp',),
    input_files={
        '-f': os.path.join(input_dir, 'em.mdp'),
        '-p': topol,
        '-c': solvated
    },
    output_files={
        '-o': os.path.join(cmd_dir, 'em.tpr'),
    }
)

In [None]:
if grompp.output.returncode.result() != 0:
    print(grompp.output.erroroutput.result())

In [None]:
run_input = grompp.output.file['-o']

### optional
assert Path(run_input.result()).exists()

In [None]:
tpr = gmx.read_tpr(run_input)

In [None]:
tpr.output.parameters.result()

!cd run${i} ; gmx mdrun -s em.tpr

In [None]:
em = gmx.mdrun(tpr)

In [None]:
em.output.trajectory.result()

In [None]:
em_dir = em.output._work_dir.result()
os.listdir(em_dir)

In [None]:
# Sorry. gmxapi.mdrun.output.conformation is not implemented.
conformation = os.path.join(em_dir, 'confout.gro')

# Production phase
Generate new simulation input using the final frame of the energy minimization.
Run a stochastic dynamics integrator on the new run input.

In [None]:
cmd_dir = str(clean_dir(os.path.join(wd, 'md_pp')))

!gmx grompp -f run -p run${i}/topol.top -c run${i}/confout.gro -o run${i}/run.tpr

In [None]:
grompp = gmx.commandline_operation(
    'gmx',
    ('grompp',),
    input_files={
        '-f': input_dir/'run.mdp',
        '-p': topol,
        '-c': conformation
    },
    output_files={
        '-o': os.path.join(cmd_dir, 'run.tpr'),
    }
)

### optional
if grompp.output.returncode.result() != 0:
    print(grompp.output.erroroutput.result())
os.listdir(cmd_dir)

Note that this run input file writes frequent output that we don't need for this tutorial. Let's modify the input to reduce the output interval.

In [None]:
sim_input = gmx.read_tpr(grompp.output.file['-o'])

In [None]:
sim_input.output.parameters.result()

In [None]:
modified_input = gmx.modify_input(sim_input,
                                  parameters={
                                      'nstxout-compressed': 1000,
                                      'nsteps': 3000
                                  }
                                 )

In [None]:
md = gmx.mdrun(modified_input)

## Force resolution of data dependencies

In [None]:
md.output.parameters.result()

In [None]:
print(md.output._work_dir.result())

In [None]:
os.listdir(md.output._work_dir.result())

# How do we handle multiple pipelines?

In [None]:
md = gmx.mdrun([modified_input, modified_input])

In [None]:
md.output.ensemble_width

In [None]:
md.output._work_dir.description

In [None]:
# TODO: Launch MPI ranks from within the notebook.
md.output._work_dir.result()