# Compute Decomposition Energies

The [GNoME](https://www.nature.com/articles/s41586-023-06735-9) dataset presents hundreds of thousands of novel stable crystals compared to prior datasets. While GNoME has updated the convex hull of many chemical systems of interest, further research will likely continue to find low energy structures and potentially update the convex hulls.

In this colab, we provide examples for computing the decomposition energy of a new structure compared to the entire GNoME dataset. This strategy can be used to check if a new structure is stable or (if not) compute the distance to the convex hull.


# Import Libraries

In [None]:
!pip install pymatgen

Collecting pymatgen
  Downloading pymatgen-2024.8.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting matplotlib>=3.8 (from pymatgen)
  Downloading matplotlib-3.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting monty>=2024.7.29 (from pymatgen)
  Downloading monty-2024.7.30-py3-none-any.whl.metadata (3.2 kB)
Collecting palettable>=3.3.3 (from pymatgen)
  Downloading palettable-3.3.3-py2.py3-none-any.whl.metadata (3.3 kB)
Collecting pybtex>=0.24.0 (from pymatgen)
  Downloading pybtex-0.24.0-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting ruamel.yaml>=0.17.0 (from pymatgen)
  Downloading ruamel.yaml-0.18.6-py3-none-any.whl.metadata (23 kB)
Collecting spglib>=2.5.0 (from pymatgen)
  Downloading spglib-2.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting uncertainties>=3.1.4 (from pymatgen)
  Downloading uncertainties-3.2.2-py3-none-any.whl.metadata (6.9 kB)
Collecting la

In [None]:
import itertools
import json
import os
import pandas as pd

import pymatgen as mg
from pymatgen.entries.computed_entries import ComputedEntry
from pymatgen.analysis import phase_diagram

## Download the Dataset

In [None]:
PUBLIC_LINK = "https://storage.googleapis.com/"
BUCKET_NAME = "gdm_materials_discovery"

FOLDER_NAME = "gnome_data"
FILES = (
    "stable_materials_summary.csv",
)

EXTERNAL_FOLDER_NAME = "external_data"
EXTERNAL_FILES = (
    "external_materials_summary.csv",
)

def download_from_link(link: str, output_dir: str):
  """Download a file using wget."""
  os.system(f"wget {link} -P {output_dir}")

parent_directory = os.path.join(PUBLIC_LINK, BUCKET_NAME)
for filename in FILES:
  public_link = os.path.join(parent_directory, FOLDER_NAME, filename)
  download_from_link(public_link, '.')

for filename in EXTERNAL_FILES:
  public_link = os.path.join(parent_directory, EXTERNAL_FOLDER_NAME, filename)
  download_from_link(public_link, '.')

## Preprocess the GNoME Dataset



In [None]:
gnome_crystals = pd.read_csv('stable_materials_summary.csv', index_col=0)
gnome_crystals

Unnamed: 0.1,Unnamed: 0,Composition,MaterialId,Reduced Formula,Elements,NSites,Volume,Density,Point Group,Space Group,...,Corrected Energy,Formation Energy Per Atom,Decomposition Energy Per Atom,Dimensionality Cheon,Bandgap,Is Train,Decomposition Energy Per Atom All,Decomposition Energy Per Atom Relative,Decomposition Energy Per Atom MP,Decomposition Energy Per Atom MP OQMD
0,234772,Ac10Ag8Au12,719c008190,Ac5(Ag2Au3)2,"['Ag', 'Au', 'Ac']",30,811.0171,11.2541,m,Cm,...,-122.5326,-0.6458,-0.1537,3D,,True,0.0000,-0.0038,-0.1537,-0.0826
1,312926,Ac10Al4Os6,975d473348,Ac5Al2Os3,"['Al', 'Os', 'Ac']",20,549.9835,10.6257,23,I2_13,...,-127.1471,-0.1794,-0.0108,3D,0.0040,True,0.0000,-0.0011,-0.0107,-0.0107
2,189330,Ac10As1Sn6,5b98cce302,Ac10Sn6As,"['As', 'Sn', 'Ac']",17,638.2733,7.9536,-3m,P-31m,...,-82.5133,-0.7403,-0.4838,3D,,True,0.0000,-0.0003,-0.4838,-0.2422
3,390242,Ac10As9Te1,bcad131283,Ac10TeAs9,"['As', 'Te', 'Ac']",20,672.3420,7.5869,-1,P-1,...,-116.3520,-1.5032,-1.4232,3D,,True,0.0000,-0.0086,-1.4231,-0.0086
4,356097,Ac10Au1Os3,ac2f203221,Ac10Os3Au,"['Os', 'Au', 'Ac']",14,487.9847,10.3367,1,P1,...,-80.3566,-0.1563,-0.0907,3D,,True,0.0000,-0.0023,-0.0907,-0.0464
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
529315,178951,V3W5,56959be1e1,V3W5,"['V', 'W']",8,120.7753,14.7393,mmm,Cmmm,...,-92.7376,-0.0869,-0.0094,3D,0.0046,True,-0.0000,-0.0029,-0.0094,-0.0029
529316,300749,V8W3,9179586bc7,V8W3,"['V', 'W']",11,156.3502,10.1857,2/m,C2/m,...,-112.0538,-0.0462,-0.0088,3D,,True,0.0004,0.0004,-0.0088,0.0004
529317,501620,Y1Zn4Zr1,f254757fb4,YZrZn4,"['Zn', 'Y', 'Zr']",6,114.3604,6.4145,6/mmm,P6/mmm,...,-22.2242,-0.3618,-0.0151,3D,0.0040,False,0.0000,-0.0043,-0.0151,-0.0043
529318,319699,Y6Zn44,9aa63becc5,Y3Zn22,"['Zn', 'Y']",50,823.5730,6.8783,4/mmm,I4_1/amd,...,-107.3622,-0.2627,0.0001,,,,0.0001,0.0001,0.0001,0.0001


In [None]:
# This set contains all other elements on the convex hull that are not inlcuded
# in the definition of GNoMe structures as they have a matching composition in
# Materials Project / OQMD.
reference_crystals = pd.read_csv('external_materials_summary.csv')
reference_crystals

Unnamed: 0.1,Unnamed: 0,Composition,MaterialId,Reduced Formula,Elements,NSites,Corrected Energy,Decomposition Energy Per Atom All,Decomposition Energy Per Atom MP
0,29940,Ac4,98dee9c38d,Ac,['Ac'],4.0,-16.484703,0.0,-7.584912e-07
1,23204,Ac1Ag2Ge2,132e8b7d0b,Ac(AgGe)2,"['Ac', 'Ag', 'Ge']",5.0,-21.057337,0.0,-1.165034e-01
2,44185,Ac1Ag2Sn2,c0cf7ad279,Ac(AgSn)2,"['Ac', 'Ag', 'Sn']",5.0,-19.976854,0.0,-1.451672e-01
3,6084,Ac1Ag8Al4,5338e797c2,Ac(AlAg2)4,"['Ac', 'Al', 'Ag']",13.0,-44.512306,0.0,-6.966974e-02
4,3564,Ac2Al4Au4,c6efd90a00,Ac(AlAu)2,"['Ac', 'Al', 'Au']",10.0,-43.610382,0.0,-2.132195e-01
...,...,...,...,...,...,...,...,...,...
54472,47097,Pd2Zn2Zr2,b19a5a64ff,ZrZnPd,"['Zr', 'Zn', 'Pd']",6.0,-33.405891,0.0,-3.882178e-02
54473,10048,Pd2Zn1Zr1,dc694cf185,ZrZnPd2,"['Zr', 'Zn', 'Pd']",4.0,-22.807571,0.0,-1.617029e-02
54474,56135,Pt2Zn2Zr2,5b263880de,ZrZnPt,"['Zr', 'Zn', 'Pt']",6.0,-36.715942,0.0,-7.164972e-02
54475,2488,Rh3Zn3Zr3,5f6876c555,ZrZnRh,"['Zr', 'Zn', 'Rh']",9.0,-57.113941,0.0,-2.625296e-03


In [None]:
def annotate_chemical_system(crystals: pd.DataFrame) -> pd.DataFrame:
  """Annotate a summary DataFrame with the chemical system"""
  chemical_systems = []
  for i, e in enumerate(crystals['Elements']):
    # replace single quotes with double quotes to avoid having to use python eval
    chemsys = json.loads(e.replace("'", '"'))

    # provide chemical system in sorted order to make for easier lookup
    chemical_systems.append(tuple(sorted(chemsys)))
  crystals['Chemical System'] = chemical_systems
  return crystals

In [None]:
# Collect list of all convex hull entries
gnome_crystals = annotate_chemical_system(gnome_crystals)
reference_crystals = annotate_chemical_system(reference_crystals)
all_crystals = pd.concat([gnome_crystals, reference_crystals], ignore_index=True)

In [None]:
required_columns = ['Composition', 'NSites', 'Corrected Energy', 'Formation Energy Per Atom', 'Chemical System']
minimal_entries = all_crystals[required_columns]
grouped_entries = minimal_entries.groupby('Chemical System')

## Choose a Structure

In [None]:
# @title Provide Entry Details
# @markdown To compute the decomposition energy of a provided structure, please
# @markdown fill out the composition and Corrected Energy in the form below.
# @markdown If no data is provided a random structure will be chosen.

composition = '' # @param {type:"string"}
energy = 0.0 # @param {type:"number"}

if composition == '':
  print("No composition provided. Choosing a random crystal.")
  sample = gnome_crystals.sample()
  sample_entry = ComputedEntry(
      composition=sample['Composition'].item(),
      energy=sample['Corrected Energy'].item(),
  )
  chemsys = sample['Chemical System'].item()
else:
  composition = mg.Composition(composition)
  sample_entry = ComputedEntry(
      composition=composition,
      energy=energy,
  )
  chemsys = [str(el) for el in composition.elements]

No composition provided. Choosing a random crystal.


## Gather Entries from the Chemical System

Computing the decomposition energy requires computing the convex hull of the associated system. To do so, we gather all other crystals from the given
chemical system from the GNoME dataset + previously known entries to the convex hull.

In [None]:
# Gather other entries on the convex hull

def gather_convex_hull(chemsys):
  phase_diagram_entries = []

  for length in range(len(chemsys) + 1):
    for subsystem in itertools.combinations(chemsys, length):
      subsystem_key = tuple(sorted(subsystem))
      subsystem_entries = grouped_entries.groups.get(subsystem_key, [])

      if len(subsystem_entries):
        phase_diagram_entries.append(minimal_entries.iloc[subsystem_entries])

  phase_diagram_entries = pd.concat(phase_diagram_entries)

  # Convert to mg.ComputedEntries for used with phase_diagram tooling
  mg_entries = []

  for _, row in phase_diagram_entries.iterrows():
    composition = row['Composition']
    formation_energy = row['Corrected Energy']
    entry = ComputedEntry(composition, formation_energy)
    mg_entries.append(entry)

  # Add entries with 0 formation entries for every element
  for element in chemsys:
    elemental_entry = ComputedEntry(element, 0.0)
    mg_entries.append(elemental_entry)

  return mg_entries

In [None]:
mg_entries = gather_convex_hull(chemsys)

## Compute Phase Diagram

In [None]:
# Compute the convex hull for the phase diagram
diagram = phase_diagram.PhaseDiagram(mg_entries)

## Compute Decomposition Energies

In [None]:
# View the currently sampled entry
sample_entry

None ComputedEntry - Pr2 Y2 Tm16 Bi12 (PrY(Tm4Bi3)2)
Energy (Uncorrected)     = -161.8166 eV (-5.0568  eV/atom)
Correction               = 0.0000    eV (0.0000   eV/atom)
Energy (Final)           = -161.8166 eV (-5.0568  eV/atom)
Energy Adjustments:
  None
Parameters:
Data:

In [None]:
decomposition, decomposition_energy = diagram.get_decomp_and_e_above_hull(sample_entry, allow_negative=True)

In [None]:
# For a sample from GNoME, this number is likely to be <1e-3 as this was the
# threshold set for the data release.
print(f"Decomposition Energy: {decomposition_energy}.")

Decomposition Energy: 0.0.


In [None]:
print(f"Decomposition: {decomposition}")

Decomposition: {None ComputedEntry - Pr2 Y2 Tm16 Bi12 (PrY(Tm4Bi3)2)
Energy (Uncorrected)     = -161.8166 eV (-5.0568  eV/atom)
Correction               = 0.0000    eV (0.0000   eV/atom)
Energy (Final)           = -161.8166 eV (-5.0568  eV/atom)
Energy Adjustments:
  None
Parameters:
Data:: 1.0}


# Run All Cells at Once

The following cell combines the rest of the logic used above and can be used instead of running the rest of the cells above multiple times.

In [None]:
# @title Provide Entry Details
# @markdown To compute the decomposition energy of a provided structure, please
# @markdown fill out the composition and Corrected Energy in the form below.
# @markdown If no data is provided a random structure will be chosen.

composition = '' # @param {type:"string"}
energy = 0.0 # @param {type:"number"}

assert composition, ("Please provide a entry details in the form.")
composition = mg.Composition(composition)
sample_entry = ComputedEntry(
    composition=composition,
    energy=energy,
)
chemsys = [str(el) for el in composition.elements]
mg_entries = gather_convex_hull(chemsys)
diagram = phase_diagram.PhaseDiagram(mg_entries)
decomposition, decomposition_energy = diagram.get_decomp_and_e_above_hull(sample_entry, allow_negative=True)
print(f"Decomposition Energy: {decomposition_energy}.")
print(f"Decomposition: {decomposition}")