# 1 - Beginning Workflows

In this lesson, we'll cover the basics of using atomate to run calculations. This will be a hands-on lesson where we dive into running a full workflows and break that down into components to understand how the various moving parts give us the ability to scale from 1 calculation to 10's of thousands.

# Building a workflow

To begin, we'll start by grabbing a structure from materials project using pymatgen and the MPRester interface we learned about in a previous course

In [37]:
from pymatgen import MPRester

mpr = MPRester()

struc = mpr.get_structure_by_material_id("mp-27")
print(struc)

Full Formula (Si1)
Reduced Formula: Si
abc   :   2.736139   2.736139   2.736139
angles:  60.000000  60.000000  60.000000
Sites (1)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Si      0    0    0


Now, let's construct a workflow using atomate to optimize this structure in DFT

In [38]:
from atomate.vasp.workflows.presets.core import wf_structure_optimization

In [39]:
wf = wf_structure_optimization(struc,{"DB_FILE": None})
print(wf)

Workflow object: (fw_ids: dict_keys([-3]) , name: Si)


Get some more information on the workflow

In [40]:
wf.as_dict()

{'created_on': datetime.datetime(2018, 8, 2, 20, 16, 46, 124454),
 'fws': [{'created_on': '2018-08-02T20:16:46.124309',
   'fw_id': -3,
   'name': 'Si-structure optimization',
   'spec': {'_tasks': [{'_fw_name': 'FileWriteTask',
      'files_to_write': [{'contents': '',
        'filename': 'FW--Si-structure_optimization'}]},
     {'_fw_name': '{{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}',
      'structure': {'@class': 'Structure',
       '@module': 'pymatgen.core.structure',
       'charge': None,
       'lattice': {'a': 2.7361386705337156,
        'alpha': 60.00000010462969,
        'b': 2.7361386686542817,
        'beta': 60.000000127351925,
        'c': 2.73613867,
        'gamma': 60.000000062179055,
        'matrix': [[2.3695656, 0.0, 1.36806933],
         [0.7898552, 2.23404787, 1.36806933],
         [0.0, 0.0, 2.73613867]],
        'volume': 14.484360157964268},
       'sites': [{'abc': [0.0, 0.0, 0.0],
         'label': 'Si',
         'species': [{'element': 'Si'

# Running with Fake VASP to simulate a DFT calculation

Due to a combination of licensing issues and just not being able to run this quickly on the jupyter server, we're going to simulate VASP running with a magic function. You will later learn about powerups, which let you modify a workflow. For this exercise we're going to use a powerup that will replace the normal VASP running functionality with something that just copies files we've prepared for you

In [41]:
from atomate.vasp.powerups import use_fake_vasp

## Lets do some work to get the path to fake VASP files

In [42]:
import os

currrent_path = os.path.curdir
relative_path = os.path.join(currrent_path, "../../mp_workshop/fake_vasp/Si_structure_opt")
absolute_path = os.path.abspath(relative_path)
print(absolute_path)

/Users/shyamd/Dropbox/Projects/2018 - MP Workshop/workshop-2018/mp_workshop/fake_vasp/Si_structure_opt


In [43]:
wf = use_fake_vasp(wf, ref_dirs={"Si-structure optimization": absolute_path})
wf.as_dict()

{'created_on': datetime.datetime(2018, 8, 2, 20, 16, 46, 124454),
 'fws': [{'created_on': '2018-08-02T20:16:46.124309',
   'fw_id': -3,
   'name': 'Si-structure optimization',
   'spec': {'_tasks': [{'_fw_name': 'FileWriteTask',
      'files_to_write': [{'contents': '',
        'filename': 'FW--Si-structure_optimization'}]},
     {'_fw_name': '{{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}',
      'structure': {'@class': 'Structure',
       '@module': 'pymatgen.core.structure',
       'charge': None,
       'lattice': {'a': 2.7361386705337156,
        'alpha': 60.00000010462969,
        'b': 2.7361386686542817,
        'beta': 60.000000127351925,
        'c': 2.73613867,
        'gamma': 60.000000062179055,
        'matrix': [[2.3695656, 0.0, 1.36806933],
         [0.7898552, 2.23404787, 1.36806933],
         [0.0, 0.0, 2.73613867]],
        'volume': 14.484360157964268},
       'sites': [{'abc': [0.0, 0.0, 0.0],
         'label': 'Si',
         'species': [{'element': 'Si'

## Now we have to get ourself a LaunchPad so that we can submit this workflow to our database


Atomate uses Fireworks as its workflow engine. Fireworks hides the database with an object called a LaunchPad. This allows you to submit and query workflows from anywhere you have database access. We need to get ourselves a LaunchPad object so we can submit our workflow

In [44]:
from fireworks.core.launchpad import LaunchPad

lp = LaunchPad()

We can use the launchpad to add a workkflow to our database:

In [45]:
lp.add_wf(wf)

2018-08-02 13:16:51,885 INFO Added a workflow. id_map: {-3: 3}


{-3: 3}

# Monitoring Workflows

Fireworks lets you monitor the status of workflows and fireworks using both python and the command line. Let's start off by looking at the status of our workflow. For each bit of python code, i'll include a cell with a command line command using jupyter-notebook's '!' functionality. In practice, we use the command line tools quite a bit and will be emphasized in this notebook.

**Command Line Access in Jupyter**: Jupyter lets you running command line commands by prefacing them with the exclamation mark:

In [46]:
# Lets get workflows

def get_wflows(query = None):
    
    # Clever way to have default arguments in python
    query = query if query else {}
    
    for wf_id in lp.get_wf_ids():
        for k,v in lp.get_wf_summary_dict(wf_id).items():
            print(k, ": ",v)
        print("\n")

get_wflows()

state :  FIZZLED
name :  Si
created_on :  2018-08-02 18:33:07.796000
updated_on :  2018-08-02 18:47:17.124000
states :  OrderedDict([('Si-structure optimization--1', 'FIZZLED')])
launch_dirs :  OrderedDict([('Si-structure optimization--1', ['/Users/shyamd/Dropbox/Projects/2018 - MP Workshop/workshop-2018/lessons/atomate'])])


state :  COMPLETED
name :  Si
created_on :  2018-08-02 19:33:39.942000
updated_on :  2018-08-02 19:34:10.120000
states :  OrderedDict([('Si-structure optimization--2', 'COMPLETED')])
launch_dirs :  OrderedDict([('Si-structure optimization--2', ['/Users/shyamd/Dropbox/Projects/2018 - MP Workshop/workshop-2018/lessons/atomate'])])


state :  READY
name :  Si
created_on :  2018-08-02 20:16:46.124000
updated_on :  2018-08-02 20:16:46.124000
states :  OrderedDict([('Si-structure optimization--3', 'READY')])
launch_dirs :  OrderedDict([('Si-structure optimization--3', [])])




This is how you get workflow information on the command line

In [47]:
!lpad get_wflows

[
    {
        "state": "FIZZLED",
        "name": "Si--1",
        "created_on": "2018-08-02T18:33:07.796000",
        "states_list": "F"
    },
    {
        "state": "COMPLETED",
        "name": "Si--2",
        "created_on": "2018-08-02T19:33:39.942000",
        "states_list": "C"
    },
    {
        "state": "READY",
        "name": "Si--3",
        "created_on": "2018-08-02T20:16:46.124000",
        "states_list": "REA"
    }
]


In [48]:
def get_fws():
    for fw_id in lp.get_fw_ids():
        fw = lp.get_fw_dict_by_id(fw_id)
        for prop in ["fw_id","updated_on","state","name"]:
            print(prop, ": ",fw[prop])

        print("\n")
        
get_fws()

fw_id :  1
updated_on :  2018-08-02T18:47:17.124985
state :  FIZZLED
name :  Si-structure optimization


fw_id :  2
updated_on :  2018-08-02T19:34:10.120443
state :  COMPLETED
name :  Si-structure optimization


fw_id :  3
updated_on :  2018-08-02T20:16:51.878206
state :  READY
name :  Si-structure optimization




This command line gets you the same information

In [49]:
!lpad get_fws

[
    {
        "fw_id": 1,
        "created_on": "2018-08-02T18:33:07.796299",
        "updated_on": "2018-08-02T18:47:17.124985",
        "state": "FIZZLED",
        "name": "Si-structure optimization"
    },
    {
        "fw_id": 2,
        "created_on": "2018-08-02T19:33:39.942125",
        "updated_on": "2018-08-02T19:34:10.120443",
        "state": "COMPLETED",
        "name": "Si-structure optimization"
    },
    {
        "fw_id": 3,
        "created_on": "2018-08-02T20:16:46.124309",
        "updated_on": "2018-08-02T20:16:51.878206",
        "state": "READY",
        "name": "Si-structure optimization"
    }
]


In [50]:
# Let's look at what this command can do:
!lpad get_fws --help

usage: lpad get_fws [-h] [-i FW_ID [FW_ID ...]] [-n NAME]
                    [-s {ARCHIVED,FIZZLED,DEFUSED,PAUSED,WAITING,READY,RESERVED,RUNNING,COMPLETED}]
                    [-q QUERY] [-lm] [--qid QID]
                    [-d {all,more,less,ids,count,reservations}] [-m MAX]
                    [--sort {created_on,updated_on}]
                    [--rsort {created_on,updated_on}]

optional arguments:
  -h, --help            show this help message and exit
  -i FW_ID [FW_ID ...], --fw_id FW_ID [FW_ID ...]
                        fw_id
  -n NAME, --name NAME  get FWs with this name
  -s {ARCHIVED,FIZZLED,DEFUSED,PAUSED,WAITING,READY,RESERVED,RUNNING,COMPLETED}, --state {ARCHIVED,FIZZLED,DEFUSED,PAUSED,WAITING,READY,RESERVED,RUNNING,COMPLETED}
                        Select by state.
  -q QUERY, --query QUERY
                        Query (enclose pymongo-style dict in single-quotes,
                        e.g. '{"state":"COMPLETED"}')
  -lm, --launches_mode  Query t

# Now lets run this workflow


There are a few different ways to run a workflow. The first is to just run it within this notebook directly.

In [34]:
from fireworks.core.rocket_launcher import launch_rocket

In [57]:
# Lets move into a temporary working directory
import os

os.mkdir("temp")
os.chdir("temp")

In [58]:
launch_rocket(lp)

2018-08-02 13:18:08,462 INFO Launching Rocket
2018-08-02 13:18:08,485 INFO RUNNING fw_id: 3 in directory: /Users/shyamd/Dropbox/Projects/2018 - MP Workshop/workshop-2018/lessons/atomate/temp
2018-08-02 13:18:08,494 INFO Task started: FileWriteTask.
2018-08-02 13:18:08,496 INFO Task completed: FileWriteTask 
2018-08-02 13:18:08,499 INFO Task started: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}.
2018-08-02 13:18:08,509 INFO Task completed: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}} 
2018-08-02 13:18:08,513 INFO Task started: {{atomate.vasp.firetasks.run_calc.RunVaspFake}}.
2018-08-02 13:18:08,533 INFO atomate.vasp.firetasks.run_calc RunVaspFake: verified inputs successfully
2018-08-02 13:18:08,558 INFO atomate.vasp.firetasks.run_calc RunVaspFake: ran fake VASP, generated outputs
2018-08-02 13:18:08,560 INFO Task completed: {{atomate.vasp.firetasks.run_calc.RunVaspFake}} 
2018-08-02 13:18:08,565 INFO Task started: {{atomate.common.firetasks.glue_tasks.PassCa



2018-08-02 13:18:13,123 INFO atomate.vasp.drones Post-processing dir:/Users/shyamd/Dropbox/Projects/2018 - MP Workshop/workshop-2018/lessons/atomate/temp
2018-08-02 13:18:13,128 INFO atomate.vasp.drones Post-processed /Users/shyamd/Dropbox/Projects/2018 - MP Workshop/workshop-2018/lessons/atomate/temp
2018-08-02 13:18:13,132 INFO Task completed: {{atomate.vasp.firetasks.parse_outputs.VaspToDb}} 
2018-08-02 13:18:13,152 INFO Rocket finished


True

Now, lets see how that changed our fireworks

In [None]:
!lpad get_fws

This let me run a single firework in the notebook. What if I wanted to run multiple fireworks? First lets reset the old firework and add some more workflows to our database

In [17]:
# Let's use the python to rerun those fireworks
for fw_id in lp.get_fw_ids():
    lp.rerun_fw(fw_id)
    

In [None]:
# We can do the same thing using the command line:
!lpad rerun_fws 

In [None]:
!lpad get_fws

In [None]:
# Let's add the workflow a few more times to have multiple fireworks in database
lp.add_wf(wf)
lp.add_wf(wf)

We can run all of the available fireworks using a 2 lines of python and a single command:

In [None]:
from fireworks.core.rocket_launcher import rapidfire
rapidfire(lp)

This let us run fireworks until we no longer had any to run. But we're still running fireworks in our jupyter notebook. If I want to run on this on another machine I need to do something else. Normally, we would want to launch these jobs to our supercomputing queue and let that run them as resources become available. 

### Using the queue launcher:

Setting up the queue launcher unfortunately takes some work. There are configuration files to tell atomate how to submit jobs, where the database is and what special parameters to use for this supercomputer. 

This has all been setup for you in this workshop. Once setup, to use the queue, we simply launch the fireworks to the queue.

In [None]:
!qlaunch rapidfire

Now, the supercomputer will take care of running the jobs and eventually we can test to see that they are working

In [None]:
!lpad get_fws