# Exploring the supporting data accompanying the a2c manuscript

A crystal structure prediction pipeline accompanying [GNoME](https://www.nature.com/articles/s41586-023-06735-9) and utilizing the ML potentials therein was introduced in the [a2c preprint](https://arxiv.org/abs/2310.01117). For the campaigns reported in the manuscript, here we explore the accompanying supporting data in `a2c_supporting_data.json` available as part of the GNoME repository. For each campaign, the dataset contains:

 - Generated amorphous structures
 - The subcells extracted using a2c at the settings specific in the paper
 - Index *i* of a particular subcell from the amorphous phase that relaxed into the target phase
 - The force-field-relaxed final structure of subcell *i*
 - The DFT-relaxed final structure of subcell *i*

The final DFT results are included as a validation of the overall pipeline, and is run with VASP at the MPRelaxSet settings (two consecutive relaxation runs).

In [None]:
!pip3 install pymatgen

In [None]:
import os
import json
from pymatgen.core import Structure
from pymatgen.analysis.structure_matcher import StructureMatcher

# Load data

In [None]:
PUBLIC_LINK = "https://storage.googleapis.com/"
BUCKET_NAME = "gdm_materials_discovery"

FOLDER_NAME = "gnome_data/auxiliary_gnome_data"
FILES = (
    "a2c_supporting_data.json",
)

def download_from_link(link: str, output_dir: str):
  """Download the file from a public link using wget."""
  os.system(f"wget {link} -P {output_dir}")

parent_directory = os.path.join(PUBLIC_LINK, BUCKET_NAME)
for filename in FILES:
  public_link = os.path.join(parent_directory, FOLDER_NAME, filename)
  download_from_link(public_link, '.')

In [None]:
with open("a2c_supporting_data.json", "r") as f:
  a2c_data = json.load(f)

Below is a convenience function that prepares a zipped folder structure containing the final structures for each a2c campaign in a2c_data

In [None]:
def get_saved_poscars_zip(merged_data, base_dir='a2c_supporting_data'):
  os.mkdir(base_dir)
  for k, data_dump in merged_data.items():
    path = os.path.join(base_dir, k)
    os.mkdir(path)

    with open(os.path.join(path, 'a.vasp'), 'w') as f:
      f.write(data_dump['amorphous_structure'])

    for a2c_match in data_dump['a2c_match_after_relax_example']:
      with open(os.path.join(path, 'a2c_match_{}_{}_FF.vasp'.format(a2c_match['index_in_a2c_initial_structures'], a2c_match['formula'])), 'w') as f:
        f.write(a2c_match['relaxed_ff'])
      with open(os.path.join(path, 'a2c_match_{}_{}_DFT.vasp'.format(a2c_match['index_in_a2c_initial_structures'], a2c_match['formula'])), 'w') as f:
        f.write(a2c_match['relaxed_dft'])
      with open(os.path.join(path, 'a2c_match_{}_initial.vasp'.format(a2c_match['index_in_a2c_initial_structures'], a2c_match['formula'])), 'w') as f:
        f.write(data_dump['a2c_initial_structures'][a2c_match['index_in_a2c_initial_structures']])

  os.system('zip -r {} {}'.format(base_dir+'.zip',base_dir))

In [None]:
get_saved_poscars_zip(a2c_data)

# Analysis

We have 14 systems across various chemical spaces and various crystallization products.

In [None]:
a2c_data.keys()

For each system, we have an amorphous structure, a set of initial structures extracted from this amorphous structure using a2c utilities, and concrete examples of of one or more of these initial subcells that relaxed into the target crystal

In [None]:
a2c_data['Al2O3'].keys()

In [None]:
a2c_data['Al2O3']['amorphous_structure']

Let's see the a2c crystallization product provided:

In [None]:
a2c_data['Al2O3']['a2c_match_after_relax_example']

We can locate the "initial subcell" cut-out from the amorphous structure:

In [None]:
idx = a2c_data['Al2O3']['a2c_match_after_relax_example'][0]['index_in_a2c_initial_structures']
initial_structure = a2c_data['Al2O3']['a2c_initial_structures'][idx]
initial_structure

FF and DFT relaxation outputs (as we saw above):

In [None]:
ff_relaxed = Structure.from_str(a2c_data['Al2O3']['a2c_match_after_relax_example'][0]['relaxed_ff'], fmt='poscar')
dft_relaxed = Structure.from_str(a2c_data['Al2O3']['a2c_match_after_relax_example'][0]['relaxed_dft'], fmt='poscar')

In [None]:
ff_relaxed

In [None]:
dft_relaxed

Let's compare the FF and DFT structures to the experimental structure.

In [None]:
experimental_structure = Structure.from_str("""Al8 O12 - Theta Alumina
1.0
   0.0000000000000000   11.7928039938604172    0.0000000000000000
   2.9074770832785912    0.0000000000000000    0.0000000000000000
   0.0000000000000000   -1.3644423914716102   -5.4468586437869098
Al O
8 12
direct
   0.0904720900000000    0.0000000000000000    0.7956463100000000 Al3+
   0.4095279100000000    0.5000000000000000    0.2043536900000000 Al3+
   0.3419676300000001    0.0000000000000000    0.6825378799999999 Al3+
   0.1580323699999999    0.5000000000000000    0.3174621200000001 Al3+
   0.5904720900000000    0.5000000000000000    0.7956463100000000 Al3+
   0.9095279100000000    0.0000000000000000    0.2043536900000000 Al3+
   0.8419676300000001    0.5000000000000000    0.6825378799999999 Al3+
   0.6580323699999999    0.0000000000000000    0.3174621200000001 Al3+
   0.1606788400000000    0.0000000000000000    0.1090139600000000 O2-
   0.3393211600000000    0.5000000000000000    0.8909860400000000 O2-
   0.4952159600000000    0.0000000000000000    0.2575462300000000 O2-
   0.0047840400000000    0.5000000000000000    0.7424537700000000 O2-
   0.3265092550000001    0.5000000000000000    0.4330948900000000 O2-
   0.1734907449999999    0.0000000000000000    0.5669051100000000 O2-
   0.6606788400000001    0.5000000000000000    0.1090139600000000 O2-
   0.8393211599999999    0.0000000000000000    0.8909860400000000 O2-
   0.9952159599999999    0.5000000000000000    0.2575462300000000 O2-
   0.5047840400000001    0.0000000000000000    0.7424537700000000 O2-
   0.8265092549999999    0.0000000000000000    0.4330948900000000 O2-
   0.6734907450000001    0.5000000000000000    0.5669051100000000 O2-""", fmt='poscar')
experimental_structure.remove_oxidation_states()


Confirming the structures match:

In [None]:
sm = StructureMatcher()

In [None]:
ff_relaxed

In [None]:
sm.fit(ff_relaxed, experimental_structure)

In [None]:
sm.fit(dft_relaxed, experimental_structure)