# Automated DFT

This lesson focuses on key concepts necessary to run automated Density Functional Theory (DFT) workflows using our atomate code. We begin by focusing on using pymatgen to build inputs and parsing outputs of DFT calculations. We then build an understanding of `InputSets` in pymatgen, a key capability to automate running DFT. We finally go through a demonstration of using atomate.


## Core Concepts
- **Input/Output** - There is an enitre module in pymatgen devoted to reading and writing the input and output files of other codes
- **Input Sets** - Standardizing the input options to various codes allows for repeatability and automatically generitng inputs for a new calculation
- **Atomate** - This is a set of automated recipes for computing materials and molecular properties using density functional theory

## Lesson 1: Using Pymatgen IO

For the first lesson we'll focus on using pymatgen to interface with DFT codes in the most simple way possible. Pymatgen has a IO module which is designed to have methods to parse and write files that are compatible with a number of external codes. These include:

- AbInit
- EXCITING
- FEFF
- LAMMPS
- Lobster
- QChem
- VASP
- ATAT
- Gaussian
- NWCHem
- ShengBTE
- Wannier90
- Zeo++

For the purpose of this tutorial, we'll focus on using VASP and QChem as these are the primary DFT codes used by the Materials Project.

### VASP Inputs
Let's beging by reading some VASP inputs for an Al-Cr alloy structure:

In [None]:
from pymatgen.core.structure import Structure
struc = Structure.from_file("Al16Cr10.cif")

This is a CIF file which is not compatible with many DFT codes. Each code has its own input structure. Next, we'll make a POSCAR file which is the way structures are represented in VASP

In [None]:
from pymatgen.io.vasp.inputs import Poscar

poscar = Poscar(structure=struc)
print(poscar)

Note that this POSCAR object has a method that lets you write it out as a file. This makes it very easy to construct complex structures such as heterostructures or adsorbates on surfaces and write them out to compute using a DFT code like VASP.

For the purpose of this example, we've provided a completed VASP calculation under `VASP_Al16Cr10_example`. Let's use pymatgen to read the inputs in this directory

In [None]:
poscar = Poscar.from_file("./VASP_Al16Cr10_example//POSCAR.gz")
print(poscar)

In this POSCAR object there is a full pymatgen Structure we can grab and use pythonically if we have a manipulation we want to apply. 

In [None]:
poscar.structure

VASP has three other primary input files: KPOINTS, POTCAR, and INCAR. Each of these has a corresponding object. Each of these can read and write the corresponding VASP Files. Let's read the inputs in the `Al16Cr10_VASP` directory included in this lesson. Let's look at the POTCAR file next.

In [None]:
from pymatgen.io.vasp.inputs import Potcar

al_cr_potcar = Potcar.from_file("./VASP_Al16Cr10_example/POTCAR.gz")
print(al_cr_potcar.symbols)

Note the above warning suggesting something is wrong with these POTCARs. We maintain a list of hashes of VASP POTCARs to ensure the data is not corrupted when VASP is run.  For the purpose of this lesson, we've included fake POTCARs as the offical POTCARs are licensed by VASP. You can always check the hash of your own POTCARs by investigating the `spec` of the Potcar object.

In [None]:
print(al_cr_potcar.spec)

### VASP Outputs

Once, you've run VASP, you'll need to parse the VASP outputs to get the data you want. VASP makes a number of output files: 

- WAVECAR
- CHGCAR
- OUTCAR
- vasprun.xml
- PROCAR
- And more ...

Please read the [VASP documentation](https://www.vasp.at/wiki/index.php/The_VASP_Manual) for a good description of what these all are. For this lesson. We'll focus on `vasprun.xml` in the `Al16Cr10_vasp` directory. The Vasprun contains a lot information that can be parsed using the `Vasprun` object in pymatgen

In [None]:
from pymatgen.io.vasp.outputs import Vasprun

vrun = Vasprun(filename="./VASP_Al16Cr10_example/vasprun.xml.gz")

Since VASPRUN has most of the information that VASP can provide, this one object is often enough to parse a full VASP calculation. The various properties available from the VASPRUN are properties in the `Vasprun` object. Use Shift+Tab to look through all the properties and try some out

In [None]:
vrun.vasp_version

We can see what actual kpoints VASP used when it auto-generated it's mesh to run DFT on:

In [None]:
vrun.actual_kpoints

We can also look at the final energy

In [None]:
vrun.final_energy

Or even the progress of VASP as it optimized the structure

In [None]:
len(vrun.ionic_steps)

Each ionic step contains the energy, forces, stress, the structure

In [None]:
vrun.ionic_steps[0].keys()

## Lesson 2: Input Sets

This lesson will focus on using `InputSets`. These are objects that provide default parameters to perform a specific kind of calcualtion. There are several `InputSets` that are default MP parameters, but it's also possible to define your own `InputSet` that lets you build new calculations using the parameters you want. 

Let's begin by taking the Al-Cr structure and making a simple input set to optimize this structure

In [None]:
from pymatgen.io.vasp.sets import MPRelaxSet

relax_set = MPRelaxSet(structure=struc)
print(relax_set.incar)

We can see that the input set has pre-set values for a number of parameters. Some of these such as `ALGO` and `IBRION` just tell VASP what kind of calculation to perform, but other such as `EDIFF`, `ENCUT,` and `MAGMOM` are structure specific. The `MPRelaxSet`, which is designed to optimize a structure, has default MP parameters that make the resulting structure compataible with MP data. 


Many DFT calculations actually extend from a previous calculation. For instance, it's often desireable to perform a static calculation from a previous relaxation calculation. `InputSet`s for these types of calculations have `classmethod` that lets you automatically construct the next calculation input set from the output of a previous calculation.

In [None]:
from pymatgen.io.vasp.sets import MPStaticSet
static_set = MPStaticSet.from_prev_calc("./VASP_Al16Cr10_example/")

print(static_set.incar)

Notice how the `MAGMOM` list is very different for this calculation. `MAGMOM` is the set of magnetic moments that VASP is using both as its input and its writing in its output. Because the magnetic moment and the charge density are closely related, the new lowest energy configuration is actually not the nice Ferromagnetic configuration that was put into the initial relaxation calculation but a newer more complex configuration with much lower overall magnetic moment. 


Other inputs are a function of the InputSet is designed to do. The `MPRelaxSet` is designed to be a structure optimization that balances accuracy and computational cost. The `MPstaticSet` on the other hand is designed to increase the accuracy once an optimized structure is found to get a good charge density and DFT energy. The result is that these two sets produce different K-point densities as shown next.

In [None]:
relax_set.kpoints

In [None]:
static_set.kpoints

### Storing Input Sets

InputSet objects follow the `MSONable` pattern used heavily in pymatgen. This means that an InputSet with a structure or molecules can be converted to a dictionary and then stored in a database, which is very useful for running high throughput calculations or workflows. 

In [None]:
relax_set.as_dict()

## Lesson 3: Automating DFT with Atomate

The final section of this lesson focuses on automating DFT using our `atomate` code. `atomate` is a set of recipes for computing properties for both molecules and structures. The workflows in `atomate` run on `fireworks`, our workflow management software. `fireworks` stores workflow information and calculation summaries in MongoDB. Using this infrastructure MP routinely manages 10,000 simultaneous calculations on supercomputers such as Cori at NERSC. 

Let's begin by importing a basic silicon structure

In [None]:
from pymatgen.core.structure import Structure
si = Structure.from_file("Si.CIF")

Now, we'll make a workflow to optimize our structure

In [None]:
from atomate.vasp.workflows import wf_structure_optimization

In [None]:
wf = wf_structure_optimization(structure=si)
print(wf)

Fireworks doesn't tell us much about the workflow, but we can tell there is only one firework. A firework is a single job for a supercomputer, involving usually just a few DFT calculations. Let's use a more complex example to illustrate how easy this is. Let's compute the full bandstructure for Silicon

In [None]:
from atomate.vasp.workflows import wf_bandstructure

wf = wf_bandstructure(structure=si)
print(wf)

Now we can see there are a low more fireworks in this workflow, but its still hard to tell what is going on. Let's start by looking at the whole workflow as a graph.

We'll use a function built for this workshop to plot what the workflow looks like.

In [None]:
from mp_workshop.atomate import wf_to_graph

In [None]:
wf_to_graph(wf)

This is clearly a more complex workflow with an intermediate step to compute the full bandstructore of Si both along the special high-symmetry lines in Brillouin zone and with a uniform grid.


To make this workshop more usefull, we've provided a set of fake VASP input and outputs and a helper function that will let atomate "run" these calculations. In reality, these take a lot more CPU time and a License to VASP, which we can't provide in this workshop. Let's use that fake VASP utility to "run" this workflow.

In [None]:
from mp_workshop.atomate import use_fake_vasp_workshop

Now lets run the above workflow. First we have to add this to our LaunchPad. Fireworks hides the database with an object called a LaunchPad. This allows you to submit and query workflows from anywhere you have database access. We need to get ourselves a LaunchPad object so we can submit our workflow

In [None]:
from fireworks import LaunchPad

In [None]:
lp = LaunchPad.auto_load()

For this one time, we have to initialize the database. In everyday use, we'll only need to do this once, upon database creation.

In [None]:
lp.reset(password=None,require_password=False)

In [None]:
wf = use_fake_vasp_workshop(wf)
lp.add_wf(wf)

Now let's see what the status of our workflow is in the LaunchPad

In [None]:
lp.get_wf_summary_dict(1)

Normally we don't run these calculation in a notebook, but rather on a supercomputer. We use a command called `qlaunch` to automatically submit to the supercomputer to run. We can also run them locally using the command `rlaunch`. Let's check how we'd use it.

In [None]:
!rlaunch --help

Let's run one job in rapidfire mode

In [None]:
!rlaunch rapidfire --nlaunches 1

Try running with `nlaunches` unset and see what happens

In [None]:
!rlaunch rapidfire 