# Executing Simulations

## Overview

### Questions

* How can I execute a multistep workflow on many simulations?

### Objectives

* Introduce **workflows**.
* Demonstrate how to use **row** to define multiple **actions** and the dependencies between them.
* Execute the **workflow** to randomize and compress all **state points** in the **data space**.

In [1]:
import os

# Progress bars do not format well in notebook output.
os.environ["ROW_NO_PROGRESS"] = "true"
# We do not want to encourage users to use --yes, but the tutorials are not interactive.
os.environ["ROW_YES"] = "true"

## Actions

The [Introducing HOOMD-blue](../00-Introducing-HOOMD-blue/00-index.ipynb) tutorial describes distinct **actions** that must be performed in sequence on each **state point**.
These are *initialization*, *randomization*, *compression*, *equilibration*, and *analysis*.
The previous section in this tutorial *initialized* every **state point** in the **data space**.
This section will *randomize* and *compress* them and the next section will *equilibrate* them.
*Analysis* can also be implemented as an **action**, but this is left as an exercise for the reader.

This tutorial defines each **action** as a _function_ Python code that takes a list of **signac jobs** as an argument and breaks the project up into several smaller modules.
The first module defines the `create_simulation` method which creates a `Simulation` object given the parameters in the **signac job**.

In [2]:
%pycat create_simulation.py

[38;5;28;01mimport[39;00m hoomd


[38;5;28;01mdef[39;00m create_simulation(job, communicator):
    [33m"""Create a Simulation object based on the signac job."""[39m
    cpu = hoomd.device.CPU(communicator=communicator)

    [38;5;66;03m# Set the simulation seed from the state point.[39;00m
    simulation = hoomd.Simulation(device=cpu, seed=job.statepoint.seed)
    mc = hoomd.hpmc.integrate.ConvexPolyhedron()
    mc.shape[[33m"octahedron"[39m] = dict(
        vertices=[
            (-[32m0.5[39m, [32m0[39m, [32m0[39m),
            ([32m0.5[39m, [32m0[39m, [32m0[39m),
            ([32m0[39m, -[32m0.5[39m, [32m0[39m),
            ([32m0[39m, [32m0.5[39m, [32m0[39m),
            ([32m0[39m, [32m0[39m, -[32m0.5[39m),
            ([32m0[39m, [32m0[39m, [32m0.5[39m),
        ]
    )
    simulation.operations.integrator = mc

    [38;5;28;01mreturn[39;00m simulation


The *randomize* action implements the code previously shown in the [Introducing HOOMD-blue](../00-Introducing-HOOMD-blue/00-index.ipynb) tutorial with slight adjustments to read the `lattice.gsd` file from each **state point** and write outputs to the **job directory**.

In [3]:
%pycat randomize.py

[38;5;28;01mimport[39;00m hoomd
[38;5;28;01mfrom[39;00m create_simulation [38;5;28;01mimport[39;00m create_simulation


[38;5;28;01mdef[39;00m randomize(*jobs):
    [33m"""Randomize the particle positions and orientations."""[39m
    [38;5;28;01mfor[39;00m job [38;5;28;01min[39;00m jobs:
        communicator = hoomd.communicator.Communicator()
        simulation = create_simulation(job, communicator)

        [38;5;66;03m# Read `lattice.gsd` from the signac job's directory.[39;00m
        simulation.create_state_from_gsd(filename=job.fn([33m"lattice.gsd"[39m))

        [38;5;66;03m# Apply trial moves to randomize the particle positions and orientations.[39;00m
        simulation.run([32m10e3[39m)

        [38;5;66;03m# Write `random.gsd` to the signac job's directory.[39;00m
        hoomd.write.GSD.write(
            state=simulation.state, mode=[33m"xb"[39m, filename=job.fn([33m"random.gsd"[39m)
        )


`compress.py` similarly implements the *compress* action where the final box volume is a function of the job's **state point**:

In [4]:
%pycat compress.py

[38;5;28;01mimport[39;00m math

[38;5;28;01mimport[39;00m hoomd
[38;5;28;01mfrom[39;00m create_simulation [38;5;28;01mimport[39;00m create_simulation


[38;5;28;01mdef[39;00m compress(*jobs):
    [33m"""Compress the simjlation to the target density."""[39m
    [38;5;28;01mfor[39;00m job [38;5;28;01min[39;00m jobs:
        communicator = hoomd.communicator.Communicator()
        simulation = create_simulation(job, communicator)

        [38;5;66;03m# Read `random.gsd` from the signac job directory.[39;00m
        simulation.create_state_from_gsd(filename=job.fn([33m"random.gsd"[39m))

        a = math.sqrt([32m2[39m) / [32m2[39m
        V_particle = [32m1[39m / [32m3[39m * math.sqrt([32m2[39m) * a**[32m3[39m

        initial_box = simulation.state.box
        final_box = hoomd.Box.from_box(initial_box)

        [38;5;66;03m# Set the final box volume to the volume fraction for this signac job.[39;00m
        final_box.volume = (
            simulation.st

`project.py` imports these methods and implements a command line argument parser (*equilibrate* will be described in the next section).

In [5]:
%pycat project.py

[38;5;28;01mimport[39;00m argparse

[38;5;28;01mimport[39;00m signac
[38;5;28;01mfrom[39;00m compress [38;5;28;01mimport[39;00m compress
[38;5;28;01mfrom[39;00m equilibrate [38;5;28;01mimport[39;00m equilibrate
[38;5;28;01mfrom[39;00m randomize [38;5;28;01mimport[39;00m randomize

[38;5;28;01mif[39;00m __name__ == [33m"__main__"[39m:
    [38;5;66;03m# Parse the command line arguments: python action.py --action <ACTION> [DIRECTORIES][39;00m
    parser = argparse.ArgumentParser()
    parser.add_argument([33m"--action"[39m, required=[38;5;28;01mTrue[39;00m)
    parser.add_argument([33m"directories"[39m, nargs=[33m"+"[39m)
    args = parser.parse_args()

    [38;5;66;03m# Open the signac jobs[39;00m
    project = signac.get_project()
    jobs = [project.open_job(id=directory) [38;5;28;01mfor[39;00m directory [38;5;28;01min[39;00m args.directories]

    [38;5;66;03m# Call the action[39;00m
    [38;5;28;01mif[39;00m args.action == [33m"compress"[39m:

You can use `project.py` to execute any of these **actions** on the command line
For example: 
```shell
$ python project.py --action randomize 59363805e6f46a715bc154b38dffc4e4
```

As your project grows, you will soon find it difficult to keep track of which actions you have executed on which directories and what needs to be done next.

## Define the workflow

**Row** is a command line tool that helps you manage your workflow.
This tutorial demonstrates how to use **row** with HOOMD-blue.
See the [row user documentation](https://row.readthedocs.org) for full details, including an introductory tutorial.

You describe the workflow actions in the file `workflow.toml`.
Normally, you would write this file in a text editor.
This tutorial is a Jupyter notebook, so it will instead use Python code to write it (`workflow.toml` will change from one section to the next).

In [6]:
with open("workflow.toml", "w") as workflow:
    workflow.write("""
[default.action]
command = "python project.py --action $ACTION_NAME {directories}"

[[action]]
name = "randomize"
products = ["random.gsd"]
resources.walltime.per_directory = "00:05:00"

[[action]]
name = "compress"
previous_actions = ["randomize"]
products = ["compressed.gsd"]
resources.walltime.per_directory = "00:10:00"
""")

`workflow.toml` describes an array of tables in `action` where each element describes a single action.
The keys in `default.action` apply to all actions (the default can be overridden).
In this example, the default action's command executes the `project.py` file shown above.
In the command's arguments, **row** will provide concrete values for `$ACTION_NAME` and `{directories}`.

The two tables in the `[[action]]` array describe the two **actions** in our workflow.
Each is given a name (in `name`), a list of files it produces (in `products`), and an estimate of the maximum walltime needed to execute the action (in `resources.walltime`).
The _compress_ action names _randomize_ in `previous_actions`.
This tells **row** that it should not execute *compress* on a given directory until **randomize** has completed.

## Run the Workflow

Now that you have defined the **workflow**, you can check its status.
Normally, you would execute `row show status` in your shell.
Because this tutorial is a Jupyter notebook, it must prefix shell commands with `!`.

In [7]:
! row show status

[4mAction   [0m [4mCompleted[0m [4mSubmitted[0m [4mEligible[0m [4mWaiting[0m [4mRemaining cost[0m
[1mrandomize[0m [32m[1m        0[0m [33m[1m        0[0m [34m       3[0m [36m[2m      0[0m [2m[3m   0 CPU-hours[0m
[1mcompress [0m [32m[1m        0[0m [33m[1m        0[0m [34m       0[0m [36m[2m      3[0m [2m[3m   0 CPU-hours[0m


There are 3 directories *eligible* to run the *randomize* action.
The 3 for compress are *waiting* for *randomize* to complete first.

`row submit` with no other arguments will submit all eligible jobs.
When you are on a workstation (or you set `--cluster=none`), `row submit` will execute the actions immediately.
When you are on a cluster (see the last section in this tutorial), `row submit` will submit cluster jobs to the queue.
This notebook was executed on a workstation.

In [8]:
! row submit

Submitting [33m[1m1 job[0m that may cost up to [36m[1m0 CPU-hours[0m.
[1/1] Submitting action '[34mrandomize[0m' on directory [1m59363805e6f46a715bc154b38dffc4e4[0m[3m and 2 more[0m ([2m0ms[0m).


Every **directory** in the **data space** now has a `random.gsd` file produced by *randomize*:

In [9]:
! ls workspace/*

workspace/59363805e6f46a715bc154b38dffc4e4:
lattice.gsd            random.gsd             signac_statepoint.json

workspace/972b10bd6b308f65f0bc3a06db58cf9d:
lattice.gsd            random.gsd             signac_statepoint.json

workspace/c1a59a95a0e8b4526b28cf12aa0a689e:
lattice.gsd            random.gsd             signac_statepoint.json


The status now shows that *randomize* is complete and *compress* is eligible:

In [10]:
! row show status

[4mAction   [0m [4mCompleted[0m [4mSubmitted[0m [4mEligible[0m [4mWaiting[0m [4mRemaining cost[0m
[1mrandomize[0m [32m[1m        3[0m [33m[1m        0[0m [34m       0[0m [36m[2m      0[0m
[1mcompress [0m [32m[1m        0[0m [33m[1m        0[0m [34m       3[0m [36m[2m      0[0m [2m[3m   0 CPU-hours[0m


Execute it:

In [11]:
! row submit

Submitting [33m[1m1 job[0m that may cost up to [36m[1m0 CPU-hours[0m.
[1/1] Submitting action '[34mcompress[0m' on directory [1m59363805e6f46a715bc154b38dffc4e4[0m[3m and 2 more[0m ([2m0ms[0m).


Every **directory** in the **data space** now has a `compressed.gsd` file produced by *compress*:

In [12]:
! ls workspace/*

workspace/59363805e6f46a715bc154b38dffc4e4:
compressed.gsd           random.gsd               signac_statepoint.json
lattice.gsd              signac_job_document.json

workspace/972b10bd6b308f65f0bc3a06db58cf9d:
compressed.gsd           random.gsd               signac_statepoint.json
lattice.gsd              signac_job_document.json

workspace/c1a59a95a0e8b4526b28cf12aa0a689e:
compressed.gsd           random.gsd               signac_statepoint.json
lattice.gsd              signac_job_document.json


## Summary

In this section of the tutorial, you defined the **actions** to *randomize* and then *compress* the initial configuration using **row**.
Then you executed these **actions** on all **state points** in the **data space**.
The **directory** for each simulation now contains `compressed.gsd` and is ready for equilibration at the target volume fraction.

The next section in this tutorial teaches you how to write an **action** that can continue itself and complete over several submissions.

This tutorial only teaches the basics of **row**.
Read the [row documentation](https://row.readthedocs.io/) to learn more.