# 2 - Workflow management with atomate

In this lesson we cover a few more advanced features of the atomate software package.  More specifically we'll go over an example that includes parents and children, talk more about where you can find preset workflows, and cover a basic example of analyzing workflow-generated materials data.

In [None]:
!pmg config --add PMG_MAPI_KEY <MAPI_KEY>
from mp_workshop.atomate import wf_to_graph, use_fake_vasp_workshop
from tqdm import tqdm_notebook


## Why use atomate?

Just to motivate this lesson a bit more, I'd like to share an example that illustrates the value of atomate.  Let's say you wanted to calculate the bandstructure of every polymorph of SiO$_2$.  Normally, you'd need to get all of the cifs from MP or the ICSD, construct the POSCARs by hand, or use some of your own infrastructure to convert them, run the calculations and manage the directory structure, and aggregate the results in a way that allowed you to analyze them together.  In atomate, this is achieved in a simple, five line snippet:

In [None]:
from atomate.vasp.workflows.presets.core import get_wf
from fireworks import LaunchPad

lpad = LaunchPad.auto_load()
lpad.reset("", require_password=False)

from pymatgen import MPRester
mpr = MPRester('')


In [None]:
## Submit all the different polymorphs of SiO2


This snippet attests to the so-called "high-throughput" approach, which has value both because it enables you to aggregate a lot of data quickly in a way that is only possible in computational materials science and because it enables you to examine trends in a large dataset much more quickly, which is typically the most valuable way to use DFT-based simulation.

It also allows us to keep the provanonce of the various calculations we performed so they are more reproducible.

Before we move on, let's reset the workflow database.

In [None]:
lpad.reset(password="", require_password=False, max_reset_wo_password=10000)  ### NEVER DO THIS IN PRODUCTION

## The elastic tensor: A multi-step preset workflow

In addition to enabling "high-throughput" functionality, atomate allows you to execute a relatively complex set of instructions using very simple constructions.  Our canonical example of this functionality is contained in the elastic workflow.

In [None]:
# Get the structure of Si from the MPRester ("mp-149") using the conventional cubic cell


We can view the structure

In [None]:
# View structure
from mp_workshop import view


In [None]:
from atomate.vasp.workflows.presets.core import wf_elastic_constant, wf_elastic_constant_minimal

In [None]:
# create an elasctic constant workflow


Here's a rendering of the control flow associated with the firework, which you can see both here and in the fireworks dashboard.

In [None]:
from mp_workshop.atomate import wf_to_graph
wf_to_graph(wf)

From our exploration of the workflow, we can see that the workflow has a "root" firework, which is the structure optimization firework, and a single "leaf" firework, which is the analysis task.  In between, we have several fireworks which have "elastic deformation" in their titles, each of which represents a calculation which will transform the output of the optimization firework by applying a unique strain.

In the analysis step, the stresses from those deformations and the strains are aggregated and used to fit an elastic tensor.  Let's examine the different types of fireworks using the graph tool.

## Explore the data

We can explore the workflow further.  Since each workflow is composed of fireworks we can examine what each fireworks look like.

Remember we can use `dir` to exlore a python object

In [None]:
# print the name and and graph of the first firework 


In [None]:
# print the name and and graph of the next firework 


In [None]:
# print the name and and graph of the last firework 


Note that the standard preset workflow for the elastic tensor uses many calculations in excess of what is required to determine the elastic tensor, which yields a higher-quality tensor in which some of the numerical noise is washed out over the duplicity of calculations.  Note that you can also generate a "minimal" elastic workflow which uses neither the more expensive DFT parameters nor the extended calculations.  Tensors generated using this workflow are typically not as accurate, but often work for simple semiconductors with a lot of symmetry.  Let's try it with bulk silicon.

In [None]:
wf = wf_elastic_constant_minimal(structure)

In [None]:
# Display the graph for this workflow


Note that for silicon, the two deformation tasks are a single normal deformation along the x axis (which are equivalent to those along the y- and z-axis), and a single shear deformation (equivalent to the two others normally included).  Since this is considerably simpler, we'll run this as our example.

To run the workflow, we import our launchpad and rocket launcher tools.  Since we can't use VASP in the workflow, we're also going to "fake" vasp by copying files rather than running the binary.

In [None]:
from fireworks import LaunchPad
from mp_workshop.atomate import use_fake_vasp_workshop

In [None]:
wf.name

The atomate queue is called the `LaunchPad`

In [None]:
# Add the workflow to the LaunchPad using use_fake_vasp_workshop
wf = use_fake_vasp_workshop(wf)
lpad.add_wf(wf)

Now we can run the workflow either in the notebook using the pythonic fireworks rocket launcher or using `qlaunch` from the linux command line.

In [None]:
from fireworks.core.rocket_launcher import rapidfire
import os
nb_dir = os.path.abspath('.')
calc_dir = os.path.join(os.path.expanduser("~"), 'mp_workshop', 'fake_vasp', 'temp')
if not os.path.exists(calc_dir):
    os.makedirs(calc_dir)
os.chdir(calc_dir)
rapidfire(lpad)
os.chdir(nb_dir)

## Powerups

Atomate contains a number of tools for modifying workflows, known as "powerups" which are functions which will modify the workflows in commonly desired ways.  For example, if you want to modify the parameters of a VASP calculation, you can use the add_modify_incar powerup to change these for every firework in the workflow.

### add_modify_incar

In [None]:
from atomate.vasp.powerups import add_modify_incar
wf = get_wf(structure, "bandstructure.yaml")
wf_to_graph(wf.fws[0])
modified = add_modify_incar(wf, {"incar_update": {"ENCUT": 700}})
wf_to_graph(modified.fws[0])

In [1]:
# Explore the modified workflows and find where the part of the workflow that shows the incar_update


Prior to the execution of the VASP calculation, a ModifyIncar firetask is inserted which will read the incar and modify the ENCUT parameter such that it matches our powerup specification.

### add_tags, add_structure_metadata

One of the things that you will encount as you start running simulations and explore your database:
- You need to organize that calculations
- This is achieved through tagging of the tasks

In [None]:
from atomate.vasp.powerups import add_tags

In [None]:
print("WF metadata: ", wf.metadata)
print("Last task: ", wf.fws[0].tasks[-1])

In [None]:
# use add_tags to insert a tag to the current workflow


In [None]:
print("WF metadata: ", wf.metadata)
print("Last task: ", wf.fws[0].tasks[-1])

## A few other "complex" workflows.

Atomate contains a few workflows that will do more complicating things like calculations on surfaces.  Let's say I want to do a caculation to determine the hydrogen adsorption energy on every low-index facet of Pt.

In [2]:
from atomate.vasp.workflows.base.adsorption import get_wfs_all_slabs
from pymatgen import MPRester, Molecule

Loading configurations for atomate for submitting Jimmy's jobs


In [None]:
# Get the unit cell of Pt and H2 molecule


In [None]:
# Create the adsorption slab workflow using those objects


In [None]:
## Using the `IPython.display.display`, we can show what the different workflows for each terminating surface looks like
from IPython.display import display
display("WF 0 =============================")
display(wf_to_graph(wf[0]))
display("WF 1 =============================")
display(wf_to_graph(wf[1]))
display("WF 2 =============================")
display(wf_to_graph(wf[2]))

In [None]:
# explore the wf and plot what the structure looks like for the [111] surface.
from pymatgen import Structure
fw_dict = wf[0].fws[2].tasks[0].to_dict()
ss = Structure.from_dict(fw_dict['structure'])
view(ss)

In [None]:
# explore the wf and plot what the structure looks like for the [110] surface.
fw_dict = wf[1].fws[2].tasks[0].to_dict()
ss = Structure.from_dict(fw_dict['structure'])
ss.translate_sites(list(range(len(ss))), [0,0,0.7]) # shift the slab
view(ss)

This is how you can do the requisite calculation to find the band-gap from an HSE calculation after the PBE bandstructure.

In [None]:
from atomate.vasp.workflows.presets.core import wf_bandstructure_plus_hse

In [None]:
wf = wf_bandstructure_plus_hse(structure)

In [None]:
wf_to_graph(wf)

There's also support for workflows in FEFF (Computational Spectroscopy) and QChem(Localized-basis quantum chemistry).

FEFF

Dr. Kiran Mathew
<img src="https://perssongroup.lbl.gov/img/kmathew.jpg">  


QChem

Dr. Samuel Blau
<img src="https://perssongroup.lbl.gov/img/smblau.jpg">

In [None]:
from atomate.feff.workflows.core import get_wf_xas

In [None]:
wf = get_wf_xas("Si", structure)
wf_to_graph(wf)

In [None]:
wf_to_graph(wf.fws[0])

In [None]:
from atomate.qchem.workflows.base.double_FF_opt import get_wf_double_FF_opt
from pymatgen import Molecule
molecule = Molecule("CO", [[0, 0, 0], [0, 0, 1.23]])
wf = get_wf_double_FF_opt(molecule, 0.5)
wf_to_graph(wf)
wf_to_graph(wf.fws[0])

## Create workflows from YAML files

So pymatgen has a robust set of parameters for running VASP but they can't cover all usage senarios.
So you will often need to pass slightly different parameter modifications to a VASP calculation.
Or string together a few basic VASP calculations to give a material 

we will first submit a series of structure optimizations using the `get_wf_from_spec_dict` utility which allows us to reuse the same calculation settings.

This is essentially like the `bandstructure.yaml` file we used for the band structure calculations earlier in:

```python
wf = get_wf(structure, "bandstructure.yaml")
```

Which just reads the following file

```yml
# A typical band structure
# Author: Anubhav Jain (ajain@lbl.gov)
fireworks:
- fw: atomate.vasp.fireworks.core.OptimizeFW
- fw: atomate.vasp.fireworks.core.StaticFW
  params:
    parents: 0
- fw: atomate.vasp.fireworks.core.NonSCFFW
  params:
    parents: 1
    mode: uniform
- fw: atomate.vasp.fireworks.core.NonSCFFW
  params:
    parents: 1
    mode: line
```

In [3]:
## Write a function that reads a YAML file and creates a workflow
import json
from monty.serialization import loadfn
from atomate.utils.utils import get_wf_from_spec_dict


Assuming we want to override a basic VASP flag

In [None]:
%%file opti.yaml  
fireworks:
# Relaxation for
- fw: atomate.vasp.fireworks.core.OptimizeFW
  override_default_vasp_params:
    user_incar_settings:
        ISPIN: 2


## Analyzing workflow results

Our final example for atomate will take a set of tasks from our database and construct a phase diagram.

We can run this workflow on for materials in the Al-Cr system


In [None]:
# get the structures in the Al-Cr system
# create workflows using those structures and add them to the LaunchPad
# remember to modify the workflows using use_fake_vasp_workshop 


Now we can run these workflows on a computing cluster.

Here, we will just run the fake version for the workshop in a `temp` directory.

In [None]:
from fireworks.core.rocket_launcher import rapidfire
from mp_workshop.atomate import use_fake_vasp_workshop
import os

nb_dir = os.path.abspath('.')
calc_dir = os.path.join(os.path.expanduser("~"), 'mp_workshop', 'fake_vasp', 'temp')
if not os.path.exists(calc_dir):
    os.makedirs(calc_dir)
os.chdir(calc_dir)
rapidfire(lpad)
os.chdir(nb_dir)

### Rerunning fireworks

When you are doing thousands of DFT calculations a small percentage of calculations are going to fail.
Sometime is can be fixed by simply rerunning the calculation. 


In [None]:
# look through the documentation and find a way to rerun the FIZZLED firework
lpad.rerun_fw(21)

## The calculations have finished

When atomates runs a VASP calculation, the output is stored in a collection defined by the db.json file.

IMPORTANT: The location of this file will be different depending on how you setup your system, and this db.json file must be access on the computing cluster where your simulation are running.


In [None]:
!cat $HOME/mp_workshop/fireworks_config/db.json

The data from our fake vasp calculations are used to populate the `tasks` collection on `localhost:mp_workshop`. 

To access these results we can use `pymongo` or our wrapper for `pymongo` --- (`maggma`) 

In [None]:
# We can look at these results by querying database
from maggma.stores import MongoStore
tasks = MongoStore(database="mp_workshop", collection_name="tasks")


This is way too much infomation so we can view things in a different way.

This allows us to explore the data slowly

## Create a phase diagram using the data we have just computed

We can use the query functionality of MongoStores (which is just like `find` in Pymongo)

If we are only interested in the outputs we can use the `properties` to reduce the amount of data we have to send back and forth. 

In [None]:
doc = tasks.query_one(criteria={"chemsys": "Al"}, properties=['output'])

In [4]:
# Query all of the "output" data in a given chemical system


Note that this returns a pymongo cursor, without actually querying the database.
We can get the full set of data using the `list` command

In [None]:
docs = list(docs)

In [None]:
# Print the formula for each entry
[d['formula_pretty'] for d in docs]

In [None]:
# get the all the tasks for 'Al' 'Cr' and 'Al-Cr' Chemical systems


In [None]:
len(al_cr_tasks)

`ComputedEntry` is the obeject in Pymatgen that tracks energy and chemical composition.

In [None]:
from pymatgen.entries.computed_entries import ComputedEntry
from pymatgen import Composition

task = al_cr_tasks[0]
energy = task['output']['energy']
composition = Composition.from_dict(task['composition_unit_cell'])
entry = ComputedEntry(composition, energy)
entry

In [None]:
# Use the tasks to populate a list of ComputedEntry


## Plot the phase diagram

In [None]:
from pymatgen.analysis.phase_diagram import PhaseDiagram, PDPlotter

In [None]:
pd = PhaseDiagram(entries)

In [None]:
plotter = PDPlotter(pd)

In [None]:
plotter.show()

In [None]:
plotter.show_unstable = True
plotter.show()