# Workflow Tutorial

This text is designed to help someone set up a `fireworks` server and start creating workflows. In principle, it's also possible to just follow the tutorials on the [Fireworks website](https://materialsproject.github.io/fireworks/), but this will be a more succinct version, which stays focused on setting up the workflows using Python, instead of `.yaml` files. There is also a little more info about how to set up the server.

## Setting up the server

The standard approach for using the workflows (or, at least, _my_ standard approach), is to set up a mongoDB server using mongoDB Atlas and submitting workflows to this server using the Fireworks `lpad` command. So, the first step is to set up such a server, which we can later use as a home for our workflows.

First: go the the [mongoDB Atlas website](https://www.mongodb.com/cloud/atlas) and set up an account. Next, create a new project using the context menu on the top left. Give it a suitable name (e.g. Workflow Tutorial) and press "Create Project". Next, it's time to create a cluster, by pressing the "Build a Cluster" button in the center of the screen. Now, let's go over the various click-down menu's available on the cluster creation screen:

* __Global Cluster Configuration__: Just ignore this one. This only matters in case you want a very high tier (i.e. expensive) cluster.
* __Cloud Provider & Region__: Here you can choose the location of the server, as well as the service that provides it. Choose whatever region and provider you prefer. I will choose Google Cloud Platform, because they have free tiers available in Belgium. Make sure the region you choose also has free tiers available. 
* __Cluster Tier__: For the cluster tier, choose the M0. This is the only free one.
* __Additional Settings__: You can safely ignore this tab, as you can't make any choices here, unless you're willing to pay.
* __Cluster Name__: Pretty self-explanatory.

Setting up the cluster takes a little time. This is the perfect time for some [covfefe](https://www.amazon.com/Covfefe-Mug-11oz-Presidential-Alternative/dp/B072Q1B52J).

Once the cluster is set up, you need to set up the Admin user, and whitelist all ip's to be able to connect to the cluster. Go to the "Security" tab next to the "Overview" tab, and then use the "Add New User" button to set up a new user. Type a username and password, and make sure to select "Atlas Admin" privileges. Next, go the "IP Whitelist" subtab and click on "Add IP Adress". Click on "Allow Access From Everywhere" and confirm to simply whitelist all IP addresses. On an actual cluster you use for production runs, I would just whitelist the IP adresses from whatever machine you run the workflows on.

## Setting up the Python environment

The next part of the tutorial involves setting up the python environment that can access the cluster and hence submit the workflows. This part of the tutorial will be focused on doing this on the computing resources of Lawrence Livermore National Laboratory (LLNL). I hope to add a more general tutorial later.

Connect to any LC machine. I'll use quartz for this tutorial, but in principle it should not matter. Once you're connected, it's best to install miniconda to set up the environment. Miniconda will also automatically install Python v3.7, which will be much faster to use than the python that comes with the modules on the cluster. First, create a suitable directory for installing miniconda in. I've chosen `$HOME/python/miniconda3`. Next, download the miniconda installer and run it:

```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
```

Once you accept the licence, the installation of miniconda will begin by installing Python v3.7 and the conda dependencies. This will take a while, so get more covfefe. Executing python commands on LC seems to be pretty slow in general, so you'll have to be a little patient in setting up the Python environment (PS: If you have a solution for this, please tell me. Working with python on LC is frustratingly slow). The conda installer will have added a line that redefines the `PATH` in your `~/.bashrc`. This is currently not the prefered method of activating conda, however. As LC uses `~/.bash_profile` for setting up the environment, you can in principle safely delete the newly created `.bashrc` file. (Unless you somehow are using this file to set up your linux environment). Setting up the conda environment will most likely not work after the install, as the `conda` command will not be known. This can be fixed with the following commands:

```
echo ". $HOME/miniconda3/etc/profile.d/conda.sh" >> ~/.bash_profile
source ~/.bash_profile
```

Now you should be able to activate the 'base' environment by using the `conda activate` command. However, it's best to set up specific environments for each purpose. Let's set up a python environment for this tutorial. This can be done easily with conda:

```
conda create -n tutorial
```

You can activate the environment by now using the command `conda activate tutorial`. Once you've activated the environment, I usually install pip for the installation of other python packages:

```
conda install pip
```

This took ages for me. No covfefe-break should ever take this long. Finally, we can install the `fireworks` package to finish setting up the environment for the tutorial:

```
pip install fireworks
```


## Setting up the launchpad

In order to interact with the mongoDB server, we need to set up the launchpad file. This turned out to be a little trickier when using mongoDB Atlas when compared to mlab. However, as mlab is merging with Atlas, I wanted to make it work. I finally found a solution on the [fireworks google group](https://groups.google.com/forum/#!topic/fireworkflows/0nBxQLap0Qk). Apparently setting up the MongoClient for Atlas requires an extra input argument `authsource` to be set to `admin`. I usually have a `launchpad` directory in my home directory with the appropriate subdirectories from which I manage the calculations on the server. This is done by setting up a `my_launchpad.yaml` file in this directory, and then using `lpad` to interact with the server. Below you can see the `my_launchpad.yaml` file I created to interact with the mongoDB Atlas database:

```
name: tutor
host: "mongodb+srv://tutorial-urpwq.gcp.mongodb.net" 
port: 27017
username: mbercx
password: tutorial
ssl: true
authsource: admin
```

You can adjust as required. Here's some more information and tips on the various input:

* The database name can be chosen freely; the database will be created when it is reset using `lpad reset`. Once the database has been reset, you will also find it under the "Collections" tab in your tutorial cluster on mongoDB Atlas.
* For me, the `mongodb+srv://` prefix in the host is required for the connection to work. This also most likely will require the installation of the `dnspython` python package. To find the proper host, click on connect in the cluster overview, then on Connect with the Mongo shell. Under "(2) Connect via the Mongo Shell", you can then click on "Short RSV sonnection string", and copy the hostname from the mongo shell command.
* The default port for mongoDB Atlas seems to be 27017. I have so far found no reason to change this.
* The username and password should correspond with those you have set up in the first part of the tutorial.
* Don't forget the ssl and authsource tags!

Once you've set it up, you can use `lpad reset` to see if you can successfully reset the server.

## Creating your first workflow

Now it's finally time to put some python in this notebook! Of course, if you want to use the commands in this notebook, you need to set up the conda environment properly on your local machine. This can be done in a similar fashion as for the cluster, with the addition of installing the `jupyterlab` package.

In order to set up workflows that I can submit to the server, I set up a LAUNCHPAD global variable in the workflows module of my package. (In the newer version of a package I use here, I set up a configuration file for the launchpad, so the launchpad details are not just accessible via github). First, let's load the Launchpad class and set up an instance using the details of the server, which are directly copied from the `my_launchpad.yaml` file:

In [19]:
from fireworks import LaunchPad

LAUNCHPAD = LaunchPad(host="mongodb+srv://tutorial-urpwq.gcp.mongodb.net", port=21017, name="tutor",
                      username="mbercx", password="tutorial", ssl=True, authsource="admin")

If you're running this notebook, you can adjust the details according to the mongoDB Atlas database you have set up previously. Now, you can access the server information with the methods of the `LAUNCHPAD` object:

In [20]:
LAUNCHPAD.get_fw_ids()

[]

This should be empty. However, we can set up some simple workflows to add to the server. Every workflow that is submitted to the server consists of FireWorks, which in turn consist of FireTasks. The best way to set up whatever workflow you want to perform depends on several elements. You can find more tips on this topic on [the fireworks website](https://materialsproject.github.io/fireworks/design_tips.html). Now I will simply show a very basic example of setting up a workflow:

In [21]:
from fireworks import ScriptTask, Firework, Workflow

hello_task = ScriptTask.from_str("echo 'Hello, user!'")
intro_task = ScriptTask.from_str("echo 'My name is Billy!'")

fw1 = Firework(tasks=[hello_task, intro_task], name="Introduction")

glad_task = ScriptTask.from_str("echo 'I am glad to tell you that it worked!'")

fw2 = Firework(tasks=[glad_task], name="Success Message")

workflow = Workflow(fireworks=[fw1, fw2], name="My First Workflow",
                   links_dict={"fw1":[fw2]})


A few remarks:

* The `FireTasks` are passed as a list to the `FireWork`, the `FireWorks` are passed as a list to the `WorkFlow`.
* Note the `links_dict` argument for the `WorkFlow`. This determines the 'structure' of the workflow, i.e. which commands are executed in which succession. If the `links_dict` argument was not given, both `FireWorks` would be executed simultaneously. 

Next, we can sent it to the server with the `add_wf` method of the Launchpad instance:

In [22]:
LAUNCHPAD.add_wf(workflow)

2018-12-23 16:02:18,648 INFO Added a workflow. id_map: {-6: 1, -5: 2}


{-6: 1, -5: 2}

If we now ask the launchpad for the list of fireworks:

In [23]:
LAUNCHPAD.get_fw_ids()

[1, 2]

You should see a list of 2 firework ID's. If you're running these commands in the notebook, and have adjusted the `LaunchPad` initialization arguments correctly, you can also go back to the directory on the cluster which contains the `my_launchpad.yaml` file and try executing the `lpad get_fws` command again. You should obtain the something similar to the following:

```
(tutorial) [bercx1@quartz380:tutorial]$ lpad get_fws
[
    {
        "fw_id": 1,
        "created_on": "2018-12-23T14:47:24.600753",
        "updated_on": "2018-12-23T14:47:25.762610",
        "state": "READY",
        "name": "Success Message"
    },
    {
        "fw_id": 2,
        "created_on": "2018-12-23T14:47:24.600697",
        "updated_on": "2018-12-23T14:47:25.762614",
        "state": "READY",
        "name": "Introduction"
    }
]
```

While still in the directory which contains the `my_launchpad.yaml` file, you can launch one of the `WorkFlows` on the launchpad using the `rlaunch` command. Let's fire a single rocket (i.e. launch a single firework) for now:

```
(tutorial) [bercx1@quartz380:tutorial]$ rlaunch singleshot
2018-12-23 07:02:48,035 INFO Hostname/IP lookup (this will take a few seconds)
2018-12-23 07:02:48,037 INFO Launching Rocket
2018-12-23 07:02:55,482 INFO RUNNING fw_id: 2 in directory: /g/g91/bercx1/launchpad/tutorial
2018-12-23 07:02:56,961 INFO Task started: ScriptTask.
Hello, user!
2018-12-23 07:02:57,025 INFO Task completed: ScriptTask 
2018-12-23 07:02:57,309 INFO Task started: ScriptTask.
My name is Billy!
2018-12-23 07:02:57,313 INFO Task completed: ScriptTask 
2018-12-23 07:02:59,299 INFO Rocket finished
```

You can see all the information the mongoDB server communicates as it looks for a suitable `FireWork` to run and executes the `FireTasks` within. If you now check the fireworks on the server again:

```
(tutorial) [bercx1@quartz380:tutorial]$ lpad get_fws
[
    {
        "fw_id": 1,
        "created_on": "2018-12-23T15:02:16.718093",
        "updated_on": "2018-12-23T15:02:18.514967",
        "state": "READY",
        "name": "Success Message"
    },
    {
        "fw_id": 2,
        "created_on": "2018-12-23T15:02:16.718039",
        "updated_on": "2018-12-23T15:02:58.733314",
        "state": "COMPLETED",
        "name": "Introduction"
    }
]
```

You can see that one of them has been marked as `COMPLETED`. You can run the `rlaunch singleshot` command again, which will run the `ScriptTask` in the second `FireWork`. Attempting to launch another rocket will get you the message: `No FireWorks are ready to run and match query!`.

## Running FireWorks via the queueing system

In the example above, the `rlaunch` command was simply run from the prompt after logging onto LC, and hence was executed by the login nodes. In practise, we want to use the queuing system to submit jobs to the computational nodes of whatever cluster we're working on. On LC, here is the script I've used so far:

```
(tutorial) [bercx1@quartz1532:tutorial]$ more 1node.sh 
#!/bin/bash
#MSUB -S /bin/bash
#MSUB -N cage_workflows
#MSUB -j eo
#MSUB -o FW_logs.out
#MSUB -l partition=quartz
#MSUB -l nodes=1
#MSUB -q pbatch
#MSUB -l walltime=24:00:00
#MSUB -A pls2
#MSUB -V   #pass-through environment variables
##MSUB -m e  #send email when job completes

conda activate tutorial

rlaunch -w /g/g91/bercx1/launchpad/tutorial/my_fworker.yaml -l /g/g91/bercx1/launchpad/tutorial/my_launchpad.yaml rapidfire --nlaunches infinite --sleep 10 --timeout 72000

exit

```

This script will run for 24 hrs on a single node, continuously asking for work every 10 seconds until it reaches the timeout of 20 hrs. Let's discuss the various aspects of the commands in the script:

* `conda activate tutorial` activates the conda environment. This is needed for the cluster to recognize the `rlaunch` command.
* `rlaunch` requests for FireWorks from the mongoDB Atlas server, just like we did before, but now within the queue submission script.
* The `-w` option allows us to specify the `FireWorker` file. This one is pretty much empty atm:<br/><br/>
```
(tutorial) [bercx1@quartz1532:tutorial]$ more my_fworker.yaml 
name: quartz
category: ''
query: '{}'
```
<br/>
So far I've only used the `category` tag to specify the amount of nodes for `FireWorks`. I'll get into that later. For now, you can just use however many nodes you need for the most computationally expensive step in your workflow.
* The `-l` option allows you to specify the `my_launchpad.yaml` file that should be used. Just pass it the full absolute path to the launchpad file you set up earlier.
* `rapidfire` is the subcommand that lets the `rlaunch` command to continuously keep on 'firing rockets', i.e. asking for work from the mongoDB Atlas server.
* `--nlaunches` is the option that indicates the amount of launches that should be performed consecutively. By setting it to `infinite`, the `rlaunch` command will keep on requesting work from the mongoDB Atlas server.
* `--sleep 10` just lets the launcher wait ten seconds in between each communication with the mongoDB Atlas server when requesting work.
* `--timeout 72000` means that the `rlaunch` command will stop requesting work after 20 hrs (72000 seconds). This is to try and avoid calculations running into the walltime.

Submitting this script to the cluster via `msub 1node.sh` should work fine, and once it starts running it should set up the conda environment and start running whatever FireWorks are on the tutorial server.

## Running something more serious

So far we've only used the `ScriptTask` FireTask to set up the various tasks in the workflow. Usually, the type of task I use most is the `PyTask`, as this allows me to run any Python script inside the workflow. The example code below is the one I use to optimize the geometry of the molecule before adding the cation and calculating the landscape.

```Python
import os

from cage.core import Cage
from fireworks import LaunchPad

LAUNCHPAD = LaunchPad(host="mongodb+srv://tutorial-urpwq.gcp.mongodb.net", port=21017, name="tutor",
                      username="mbercx", password="tutorial", ssl=True, authsource="admin")

RUN_NWCHEM_COMMAND = "srun -N1 -n36 /g/g91/bercx1/nwchem/nwchem-6.6/bin/LINUX64/nwchem"

def optimize_workflow(filename, charge=0):
    """
    Workflow for the optimization of a molecule

    Args:
        filename (str): Path to the structure file of the cage molecule.
        Json formats with charge assigned are preferred.

        charge (int): Charge of the molecule.

    Returns:

    """
    # Load the cage molecule from the filename provided
    molecule = Cage.from_file(filename)

    # Create the PyTask that sets up the calculation
    setup_task = PyTask(
        func="cage.cli.commands.setup.optimize",
        kwargs={"filename": filename,
                "charge": charge}
    )

    optimize_dir = os.path.join(os.getcwd(), "optimize")

    optimize_command = RUN_NWCHEM_COMMAND + " " \
                       + os.path.join(optimize_dir, "input") + " > " \
                       + os.path.join(optimize_dir, "result.out")

    run_nwchem = ScriptTask.from_str(optimize_command)

    fw = Firework(tasks=[setup_task, run_nwchem],
                  name="Run Nwchem")

    LAUNCHPAD.add_wf(
        Workflow(fireworks=[fw],
                 name="Optimize " + molecule.composition.reduced_formula)
    )

```

Note that the `FireWork` consists of two `FireTasks`:

1. A `PyTask` that sets up the NwChem calculation in the `./optimize` directory. This python method is defined in the `cage.cli.commands.setup` module.
2. A `ScriptTask` that runs the NwChem calculation. I compiled the NwChem binary (probably in a horribly inefficient manner, but hey) my first week at LLNL in the `/g/g91/bercx1/nwchem/nwchem-6.6/bin/LINUX64/` directory. You can find the command that is executed in the `ScriptTask` partially in the header of the code.

When the `FireTasks` are set up, they are combined into a `FireWork`, simply called `fw`. This is then put into a single step `WorkFlow` and added to the LAUNCHPAD. When the `rlaunch` command in the submission script is called, `fireworks` 