In [None]:
import pickle
from itertools import permutations

import numpy as np
from atomate2.forcefields.flows.phonons import PhononMaker
from atomate2.forcefields.jobs import ForceFieldRelaxMaker
from fireworks import LaunchPad
from jobflow import SETTINGS, Flow, job
from jobflow.managers.fireworks import flow_to_workflow
from jobflow.managers.local import run_locally
from mp_api.client import MPRester
from pymatgen.analysis.phase_diagram import PhaseDiagram
from pymatgen.core import Composition, Structure
from pymatgen.core.periodic_table import Element
from pymatgen.entries.compatibility import MaterialsProject2020Compatibility
from pymatgen.entries.computed_entries import ComputedEntry
from pymatgen.phonon.bandstructure import PhononBandStructureSymmLine
from pymatgen.phonon.dos import PhononDos
from pymatgen.phonon.plotter import PhononBSPlotter, PhononDosPlotter
from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
from tqdm import tqdm

In [None]:
""" Configure MongoDB for atomate2 & FireWorks """

# The lecturer will tell you the IP address
IP_ADDRESS = "INSERT_IP_HERE"

import os

os.environ["ATOMATE2_CONFIG_FILE"] = f"{os.getcwd()}/atomate2/config/atomate2.yaml"
os.environ["JOBFLOW_CONFIG_FILE"] = f"{os.getcwd()}/atomate2/config/jobflow.yaml"

!sed -i 's/INSERT_IP_HERE/{IP_ADDRESS}/' my_launchpad.yaml
!sed -i 's/INSERT_IP_HERE/{IP_ADDRESS}/' atomate2/config/jobflow.yaml

In this notebook, we will leverage Atomate2 to automate relaxations of all generated garnet structures and to compute the dynamic stability for the most promising candidates.

### Introduction to Atomate2 Workflows

 Atomate2 provides a modular, extensible framework built on top of several core packages—Jobflow, Custodian, Fireworks, Emmet, Maggma and Pymatgen—that together streamline high‐throughput computational materials science.

![Atomate2 workflow schema](atomate2_schema.png)
[Image source](https://members.cecam.org/storage/presentation/atomate2_intro-1742285569.pdf) by Alex Ganose

We will discuss some concepts from most relevant user-facing libraries. 

### Workflow Management with Jobflow and Fireworks

- **Jobflow**  
  Jobflow defines high‐level workflow abstractions. It defines “jobs” (atomic tasks, such as a static DFT calculation) and “flows” (collections of jobs connected by dependencies).
  
   By encapsulating each step of our calculation—structure preparation, DFT relaxation, phonon analysis—as a Jobflow job, we can automatically chain them together and handle branching logic and dynamic numbers of jobs (e.g., depending on the symmetry of a structure dynamically generate a number of calculations of structures with displaced atoms for a phonon workflow).
   
Below you can find the most basic job example that adds two numbers.

In [None]:
# adapted from jobflow tutorial https://materialsproject.github.io/jobflow/tutorials/1-quickstart.html


@job
def add(a, b):
    return a + b


add_first = add(1, 2)
add_second = add(add_first.output, 3)
flow = Flow([add_first, add_second])

Running add(1, 2) returns a job object. You can use the reference to that output in a second job.

The job add_second will only be able to run after add_first has been successfully executed.

A Flow is a collection of other jobs or Flow objects. The order in the list [add_first, add_second] does not matter, as the execution order is determined by their dependencies. In this case, add_second requires the output of add_first as an input. This dependency can be visualized using:

In [None]:
flow.draw_graph(figsize=(3, 3)).show()

So far the additions have not been executed. To run the flow, we can use the `run_locally` function:

In [None]:
responses = run_locally(flow)
print("Final result:", responses[add_first.uuid][1].output)

The results of the jobs are stored in the JobStore, a MongoDB database. For testing purposes we could also use `MemoryStore` that only persists in the current process. (larger documents (BSON limit of 16MB) can be stored in a GridFS or AWS S3 store. The different stores are defined in the maggma package. The database was already set up for the tutorial, and you were provided with the yaml files that have the credentials to access the database. 
While we commonly create jobs and workflows on our workstations we usually run them on HPC resources. For this purpose we will use `fireworks`.

[**Fireworks**](https://materialsproject.github.io/fireworks/)  

<img src="fireworks_schema.png" alt="Fireworks" style="width:80%;"/>


  Fireworks serves as an interface to the HPC resources we are using. It is responsible for:


  1. **Queuing & Scheduling**: Dispatching jobs onto HPC clusters. Managing a constant number of jobs in queue etc.    
  2. **Dependency Resolution**: E.g. ensuring that a phonon calculation only starts after its parent relaxation job has successfully completed.

  You will notice that the dependency resolution is already included in jobflow. Fireworks precurses the development of jobflow and right now [Jobflow Remote](https://matgenix.github.io/jobflow-remote/index.html#) is being developed as an alternative to Fireworks. However as it is still in beta, we will use Fireworks in this tutorial.
  
  Fireworks defines `firework` and `workflow` classes that correspond to the `job` and `Flow`. `flow_to_workflow` and `job_to_firework` allow for automatic conversion between the two.

In [None]:
add_first = add(1, 2, name="add_first")
add_second = add(add_first.output, 3, name="add_second")
flow = Flow([add_first, add_second])

wf = flow_to_workflow(flow)

lpad = LaunchPad.auto_load()
lpad.add_wf(wf)

By adding the workflow to the LaunchPad, it will be submitted to the FireWorks MongoDB for execution. You can now run
`lpad get_fws` in your terminal to see the status of all FireWorks jobs. You can also use `lpad webgui` as a graphical user interface.

You will see that each job was assigned a unique fireworks id and that the status of one job shows **READY** while the second job is waiting for the result of the first one as we expect.


You could now connect to your HPC resources and automatically submit jobs to the queue using `qlaunch rapidfire --nlaunches <N>`.

This would submit N jobs to the queue, where each job submission will follow your my_qadapter.yaml settings in your fireworks config directory.
In the qadapter file you can specify e.g. the run time of the job, node number, mpi settings, any commands that should be run before the job such as loading and environment etc. 

In our case we will run the jobs  directly in an interactive session using:

`rlaunch -w /path/to/fw_config/my_specific_worker.yaml rapidfire` 


The `rlaunch` command connects to the LaunchPad database, queries for any jobs that are marked as **READY** and have no worker or the worker in `/path/to/fw_config/my_specific_worker.yaml` specified. 
If it would be submitted as part of e.g. a slurm job:
1. The script exhausts its allotted wall‐time, or  
2. No more jobs remain in the **READY** state.

Now we are all set up in terms of workflows and management of our HPC resources. We have only left to discuss error handling and the actual calculation workflows.


### Error Handling with Custodian

- **Custodian**  
  When running large batches of calculations, most commonly DFT calculations, a number of them will fail. Custodian helps in finding these errors and automatically resolving them in many cases.
  Custodian invokes the DFT code and wraps each DFT invocation with a set of “handlers” that:
  1. **Monitor for Common Errors**: Parse the DFT output for known failure modes (e.g. incorrect smearing, frozen jobs, ...).  
  2. **Apply Automatic Fixes**: Modify INCAR or other input parameters (increase `ALGO`, switch to a different `PREC`, adjust `EDIFF`) and restart the calculation.  
  3. **Check output for correctness**: E.g. too large errors in the final energy due to smearing

Custodian can also be very useful to run DFT calculations without using any workflow software.
Custodian supports:
- FEFF 
- Gaussian
- JDFTx
- Lobster
- NWChem
- VASP

 **Caveats and Best Practices**  
  > **Warning: Custodian does not guarantee physically meaningful results.**  
  - For high‐throughput workflows, manual inspection of each output is impractical. Therefore, implement additional validation steps:  
    - **Statistical Outlier Detection:** Inspect distributions of total energies, band gaps, cell volumes, or forces to identify outliers.  
    - **Domain‐Knowledge Filters:** Flag chemically implausible results  

- **Pymatgen (Python Materials Genomics)**  
  Pymatgen provides an incredibly diverse computational materials science toolbox.
  In the case of atomate2 most relevant are classes for:
  1. **Structure I/O and Manipulation**: Reading/writing CIF/POSCAR, applying symmetry operations, substituting atomic species in the garnet prototype, and generating supercells for phonon calculations.  
  2. **DFT Input Generation**: Creating VASP input sets (INCAR, POSCAR, POTCAR, KPOINTS) with standardized parameters (e.g., recommended pseudopotentials, convergence criteria, Hubbard U for transition metals).  
  3. **Post‐Processing**: Parsing output files (OUTCAR, vasprun.xml) to extract energies, forces, phonon frequencies from Phonopy, and assembling them into convenient Python objects for downstream analysis.

Every relaxation and phonon job in our Atomate2 flow relies on Pymatgen to define the input structure, write input files, parse output files and interpret results.

### Metadata and Databases with Emmet

- **Emmet**  
  Emmet serves as Atomate2’s metadata schema and database interface. After each Jobflow job completes successfully, Emmet organizes the inputs and outputs of a calculation into a consistent `TaskDocument` schema.
  
  By confirming that all the results follow the respective Schema it works as an additional validation step.  

We will start by loading the YIG structure we retrieved from the oqmd before.
Then we will replace the elements with other element combinations in the periodic table.
We will limit ourselves to A3B5N12 compounds and remove any radioactive elements or noble gases that are unlikely to result in a stable compound.
We will also load the compositions of all the previously calculated nitride garnets to avoid calculating any duplicates.

In [None]:
yig_conventional = Structure.from_file("../data/YIG_POSCAR")
garnet_compositions = pickle.load(open("../data/garnet_compositions_N.pickle", "rb"))

In [None]:
selected_elements = []
for el in Element:
    if not el.is_radioactive and not el.is_noble_gas:
        selected_elements.append(el)

third_element = Element.N

new_structures = []
for e1, e2 in tqdm(
    permutations(selected_elements, 2),
    total=len(selected_elements) * len(selected_elements) - 1,
    desc="Generating structures",
):
    new_struct = yig_conventional.copy()

    new_mapping = {"Y": e1.symbol, "Fe": e2.symbol, "O": third_element.symbol}
    new_struct.replace_species(new_mapping)
    # Check if the structure was already calculated previously in alexandria
    if not new_struct.composition in garnet_compositions:
        new_structures.append(new_struct)

The next step will be to relax the generated crystal structures using a universal force field, we can use the energy obtained in this fashion to compute the phase diagram, i.e. the thermodynamic stability of the structures.

We will use the `ForceFieldRelaxMaker` job to relax the structures. We will add some additional metadata information to the job. 
## You should set the your_worker_name to first letter of the first name followed by your family name. e.g. John Smith, jsmith

In [None]:
random_indices = np.random.randint(0, len(new_structures), size=(20,))
your_worker_name = "test"
metadata = {
    "user": your_worker_name,
    "project": "Garnet Tutorial",
    "description": "Force field relaxation starting from YIG structure with different elements.",
}

Everybody in the tutorial randomly selects 20 of the structures to relax. You can then query the database for the results of everybody. If you want you could add your username to the query to only select the structures you calculated.

In [None]:
for index in random_indices:
    # Create a force field maker, we use MACE-MP-0 as our force field
    relax_maker = ForceFieldRelaxMaker(
        force_field_name="MACE-MP-0",
        calculator_kwargs={"device": "cuda"},
        fix_symmetry=True,
    )

    relax_job = relax_maker.make(structure=new_structures[index])
    # we update the config so if you run rlaunch with your worker name,
    # it will only find your jobs
    relax_job.update_config({"manager_config": {"_fworker": your_worker_name}})
    metadata["structure_index"] = index
    # We add some extra information to the job
    relax_job.update_metadata(metadata)
    workflow = flow_to_workflow(relax_job)
    lpad = LaunchPad.auto_load()
    lpad.add_wf(workflow)

Now you can use `rlaunch rapidfire` to run the jobs on the cluster. Once the first relaxation is complete, you can retrieve the initial results from the store.
We will use metadata information to find the relaxations, however we can query for any field in the document.
The query returns an iterator that returns a list of dictionaries. We are limiting the output to the the composition, energy and metadata. Requesting the full calculation output would take a minute or two. We will query for the calculations of everybody running the tutorial and remove any duplicates, so you can rerun the query at any point during the school and see if any new interesting structures were discovered or you could submit more relaxation jobs if you want.

In [None]:
store = SETTINGS.JOB_STORE
store.connect()
results = list(
    store.query(
        # you could add your worker name here to filter for your calculations
        {
            "metadata.description": "Force field relaxation starting from YIG structure with different elements."
        },
        # only load the properties we need
        properties=[
            "output.composition",
            "output.output.energy",
            "metadata",
            "output.structure.lattice",
        ],
        load=True,
    )
)
# select entries with unique metadata.structure_index
results.sort(key=lambda x: x["metadata"]["structure_index"])
unique_results = {}
for result in results:
    unique_results[result["metadata"]["structure_index"]] = result

In [None]:
# Extracting energies, compositions, lattice constants, and metadata from unique results to make it easier to work with
results_energies = [
    result["output"]["output"]["energy"] for result in unique_results.values()
]
results_compositions = [
    Composition.from_dict(result["output"]["composition"])
    for result in unique_results.values()
]
results_lattice_constant = [
    result["output"]["structure"]["lattice"]["a"] for result in unique_results.values()
]
results_metadata = [result["metadata"] for result in unique_results.values()]

Using the energies we just obtained, we can calculate the thermodynamic stability of the crystal structures by  querying a database for all entries in the same chemical system.
We use the MP REST API (via mpr.get_entries_in_chemsys(...)) to retrieve a list of all existing computed entries in the specified chem‐system. Each entry is a `ComputedEntry` (or subclass) with:

- A composition (e.g. LiFeO₂, Li₂O, Fe₂O₃, etc.)

- A total DFT energy per formula unit (as computed with GGA or GGA+U settings)

- Sometimes additional metadata (e.g. run parameters)

The extra filter {"thermo_types": ["GGA_GGA+U"]} tells MP to return only those entries whose energies were calculated with GGA + U or plain GGA if U was not required. This avoids mixing in, for example, the R2SCAN calculations in the materials project.

Why it matters:
Building a meaningful phase diagram requires a consistent set of energies. Mixing results from different exchange–correlation functionals (e.g., PBE vs. SCAN) can artificially distort the convex hull.
The *materials project* has developed a set of [corrections](https://docs.materialsproject.org/methodology/materials-methodology/thermodynamic-stability/thermodynamic-stability) to ensure consistency of GGA and GGA+U calculations and reduce the general formation energy errors.

Applying the full set of corrections would require the knowledge of the +U parameters and pseudopotentials used to calculate the training data. The training data should be consistent with the MPRelaxSet defined in pymatgen. 

In [None]:
compat = MaterialsProject2020Compatibility()
results_ehull = []
computed_entries = []
# Enter your Materials Project API key here
# You can obtain one from https://materialsproject.org/dashboard/profile
with MPRester("YOUR_API_KEY") as mpr:
    for i, (comp, E_tot) in enumerate(zip(results_compositions, results_energies)):
        try:
            # 1) Extract the chemical system (unique element symbols) from this structure
            element_symbols = sorted({el.symbol for el in comp.elements})

            # 2) Query MP for all existing computed entries in that chem‐system that were calculated with (GGA/GGA+U)
            entries_in_system = mpr.get_entries_in_chemsys(
                elements=element_symbols,
                additional_criteria={"thermo_types": ["GGA_GGA+U"]},
            )
            # 4) Create a ComputedStructureEntry for your “new” structure (with your DFT energy)
            entry = ComputedEntry(composition=comp, energy=E_tot)
            # In principle we should obtain the Hubbard parameters/Pseudopotential information from the training data and set them here to enable full corrections.
            entry.parameters["software"] = "non-vasp"
            entry.parameters["run_type"] = "GGA"
            # apply corrections to the entry
            compat.process_entry(entry)

            pd = PhaseDiagram(entries_in_system + [entry])
            e_above_hull = pd.get_e_above_hull(entry)
            entry.data = {
                "e_above_hull": e_above_hull,
                "structure_index": results_metadata[i],
            }
            print(results_metadata[i], "e_above_hull:", e_above_hull)
            computed_entries.append(entry)
            results_ehull.append(e_above_hull)
        except Exception as e:
            print(f"Error processing entry {i}: {e}")

We can filter the structures we calculated and see if any fall within our E-Hull and lattice constant criteria defined in the first notebook.

In [None]:
max_ehull_criterion = 0.05
ideal_lattice_constant = yig_conventional.lattice.a
filtered_results = []
for index, (lattice_a, entry) in enumerate(
    zip(results_lattice_constant, computed_entries)
):
    ehull = entry.data["e_above_hull"]
    structure_index = entry.parameters.get("structure_index", None)
    if ehull < max_ehull_criterion and abs(lattice_a - ideal_lattice_constant) < 0.5:
        filtered_results.append(
            {
                "metadata": results_metadata[index],
                "composition": results_compositions[index],
                "energy": results_energies[index],
                "lattice_a": results_lattice_constant[index],
                "ehull": ehull,
            }
        )

filtered_results.sort(key=lambda x: x["ehull"])
stable_structures = [result for result in filtered_results if result["ehull"] == 0]

if len(filtered_results) == 0:
    print("No structures found that fit the criteria.")
else:
    if len(stable_structures) > 1:
        stable_results = filtered_results[: len(stable_structures)]
        best_result = stable_structures.sort(
            key=lambda x: abs(x["lattice_a"] - ideal_lattice_constant)
        )[0]
    else:
        best_result = filtered_results[0]
    print(
        f"User {best_result['metadata']['user']} calculated the best candidate structure {best_result['composition']} \n \
          with a distance to the convex hull of {best_result['ehull']} fitting our lattice constant criteria at {best_result['lattice_a']}\n"
    )

The general expectation after structural relaxation is that the first derivative of the total energy with respect to the atomic positions is zero, i.e., the forces on all atoms vanish.
Building on the idea that a relaxed structure has zero forces (i.e., all first derivatives of the total energy with respect to atomic displacements vanish), we can ask: is that stationary point a true minimum, or could it be a saddle point?**  That is exactly what **dynamic stability** addresses.

If you are familiar with phonons you can skip this cell.

### Stationary point versus local minimum

* **Zero forces = stationary point**
  When you relax (optimize) atomic positions in DFT (or any other method), you drive all forces on atoms to (near) zero.  Formally, if $\mathbf{R} = \{R_{i\alpha}\}$ denotes all atomic coordinates ($i$ labels atom, $\alpha\in\{x,y,z\}$), then

  $$
  \frac{\partial E_{\mathrm{tot}}}{\partial R_{i\alpha}} \;=\; 0
  \quad\forall\,i,\alpha.
  $$

We can examine the second derivatives of the energy—i.e., the **Hessian**.

  $$
  H_{i\alpha,j\beta}
  \;=\;
  \frac{\partial^2 E_{\mathrm{tot}}}{\partial R_{i\alpha}\,\partial R_{j\beta}}
  \quad\text{for all atoms }i,j\text{ and Cartesian directions }\alpha,\beta.
  $$
* In practice, one often works with the **mass‐weighted Hessian** (sometimes called the dynamical matrix) defined by

  $$
  D_{i\alpha,j\beta}
  \;=\;
  \frac{1}{\sqrt{m_i\,m_j}}
  \;H_{i\alpha,j\beta},
  $$

  where $m_i$ is the mass of atom $i$.  Diagonalizing $D$ yields squared vibrational frequencies.

### Phonon modes and frequencies

* A **phonon mode** at wavevector $\mathbf{q}$ is essentially a collective pattern of atomic displacements oscillating with frequency $\omega(\mathbf{q})$.
* To test **dynamic stability** in an infinite crystal, one builds the dynamical matrix $D(\mathbf{q})$ (through finite‐differences or density‐functional perturbation theory) and diagonalizes it at high-symmetry $\mathbf{q}$'s in the Brillouin zone.
* If, for every $\mathbf{q}$, **all eigenvalues $\omega^2(\mathbf{q})$ are positive**, then $\omega(\mathbf{q})$ is real for all modes—meaning any small displacement will oscillate rather than spontaneously grow. 
* Conversely, if there exists any $\mathbf{q}$ where $\omega^2(\mathbf{q})<0$, then $\omega(\mathbf{q})$ is imaginary.  Equivalently, the PES has a “downhill” direction for that wavevector—i.e., the structure will spontaneously distort along that phonon eigenvector.  This signals **dynamic instability**.


We can take a look at how atomate2 automates phonon calculations by exploring the graph of the workflow.

In [None]:
maker = PhononMaker()
phonon_flow = maker.make(yig_conventional)
phonon_flow.draw_graph(figsize=(24, 12)).show()

We start in the bottom‐left corner:

* Relax the structure.
* Determine the supercell size. We will use a 2 × 2 × 2 cell, so we will skip this step. The supercell size determines at which q-points we will calculate the phonon frequencies.
* Generate supercells with atoms randomly displaced by a small amount to compute second derivatives numerically.
* Run `run_phonon_displacements`, which creates jobs for static energy calculations on each displaced supercell. The number of these jobs is determined automatically based on how many displaced structures exist. The generation of displaced structures and later the calculation of the force constants and phonon frequencies is implemented in phonopy.
* In parallel, perform a static calculation to obtain the energy of the unperturbed structure.
* Finally, `generate_frequencies_eigenvectors` collects all the data and interpolates the phonon band structure, density of states, and thermal displacements at different temperatures.


In [None]:
# Selected one of the candidate structures for a phonon calculation

result = store.query_one(
    {"metadata.structure_index": filtered_results[0]["metadata"]["structure_index"]},
    properties=["output.structure"],
    load=True,
)
structure = Structure.from_dict(result["output"]["structure"])

In [None]:
maker = PhononMaker(
    use_symmetrized_structure="primitive", create_thermal_displacements=False
)
flow = maker.make(structure, supercell_matrix=[[2, 0, 0], [0, 2, 0], [0, 0, 2]])

phonon_workflow = flow_to_workflow(phonon_flow)
metadata = {
    "structure": structure.reduced_formula,
    "user": your_worker_name,
    "comment": "Checking dynamical stability of Garnet structure",
    "project": "Garnet Tutorial",
}
phonon_flow.update_metadata(metadata)
phonon_flow.update = {"_fworker": your_worker_name}
phonon_flow.update_config({"manager_config": {"_fworker": your_worker_name}})
lpad = LaunchPad.auto_load()
phonon_workflow = flow_to_workflow(phonon_flow)
lpad.add_wf(phonon_workflow)
phonon_flow.draw_graph(figsize=(24, 12)).show()

As you can see in the graph, by fixing the supercell size we removed the get_supercell job, but by requesting that the structure be reduced to its primitive cell we added an additional job. You can now run rlaunch rapidfire again 
to start the phonon calculation. Below, we query the store for the result and examine the phonon band structure to determine whether the structure is dynamically stable. Pymatgen provides classes to handle and visualize this data.

In [None]:
phonon_results = store.query_one(
    {
        "metadata.calc_name": "get_frequencies_eigenvectors",
        "metadata.user": your_worker_name,
    },
    properties=["output.phonon_dos", "output.phonon_bandstructure"],
    load=True,
)

# Dos and Bandstructure objects that can initialized from the result dictionaries
ph_bs = PhononBandStructureSymmLine.from_dict(
    phonon_results[0]["output"]["phonon_bandstructure"]
)
ph_dos = PhononDos.from_dict(phonon_results["output"]["phonon_dos"])

dos_plot = PhononDosPlotter()
dos_plot.add_dos(label="a", dos=ph_dos)
dos_plot.get_plot()

bs_plot = PhononBSPlotter(bs=ph_bs)
bs_plot.get_plot()

### Beyond phonons: anharmonicity and finite temperature

* **Phonon‐based analysis is fundamentally a harmonic approximation** (energy expanded to second order in displacements).  A “soft mode” (imaginary $\omega$) within the harmonic picture may in reality be stabilized at finite temperature by anharmonic interactions. You could use the quasiharmonic workflow in atomate2 to get a first idea of the temperature dependent behaviour of the materials.

To obtain an accurate thermodynamic stability at temperature we would have to calculate the energy of the whole convex hull at temperature.



### DFT calculations
As follow up step we could repeat the relaxation and phonon workflows using actual density functional theory. We would only have to change a few lines of code and import for example the vasp `PhononMaker` instead of using the force field.
```python
from atomate2.vasp.flows.phonons import PhononMaker
```

## Exercises:
- Use Optimade to query Alexandria for the convex hull data in addition to the materials project one. Will the distances to the convex hull increase or decrease?
- Try to integrate the distance to the convex hull calculation in the automated workflow.
- Relax some of the garnets downloaded from alexandria and calculate the mean absolute error for the energies and lattice constants.