# Chemistry Exploration with ggen

This notebook demonstrates how to systematically explore a chemical space by:
1. Specifying a chemical system (e.g., "Li-Co-O")
2. Generating candidate structures across different stoichiometries
3. Storing all data in SQLite for persistence
4. Building a phase diagram to identify thermodynamically stable candidates


In [1]:
%pip install -e ..

Obtaining file:///Users/mmoderwell/ouro/ggen
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: ggen
  Building editable for ggen (pyproject.toml) ... [?25ldone
[?25h  Created wheel for ggen: filename=ggen-0.1.0-0.editable-py3-none-any.whl size=7756 sha256=dcd8981fb456882e179dc6c326d4600a646a8cd1804c517d00d00d019110e505
  Stored in directory: /private/var/folders/zw/zcpqh2ss43v0d8_8mdqcds440000gn/T/pip-ephem-wheel-cache-9b4dpuru/wheels/ac/6f/ca/e5e77e53988e12825fc0b1323c5ae56703f9c5eab46d137fdd
Successfully built ggen
Installing collected packages: ggen
  Attempting uninstall: ggen
    Found existing installation: ggen 0.1.0
    Uninstalling ggen-0.1.0:
      Successfully uninstalled ggen-0.1.0
Successfully installed ggen-0.1.0
Note: you may need to

In [2]:
import logging
logging.basicConfig(level=logging.INFO)


In [3]:
from ggen import ChemistryExplorer


## Initialize the Explorer

Create a `ChemistryExplorer` instance. You can optionally specify:
- `calculator`: Custom ASE calculator (defaults to ORB)
- `random_seed`: For reproducibility
- `output_dir`: Where to store results


In [4]:
explorer = ChemistryExplorer(
    random_seed=42,
    output_dir="./exploration_runs"
)


## Preview Stoichiometries

Before running the full exploration, you can preview what stoichiometries will be generated:


In [5]:
# Parse the chemical system
elements = explorer.parse_chemical_system("Fe-Mn-Bi")
print(f"Elements: {elements}")

# Enumerate stoichiometries
stoichiometries = explorer.enumerate_stoichiometries(
    elements=elements,
    max_atoms=8,
    min_atoms=2,
    include_binaries=True,
    include_ternaries=True,
)

print(f"\nTotal stoichiometries: {len(stoichiometries)}")
print("\nFirst 10:")
for s in stoichiometries[:10]:
    formula = "".join(f"{el}{c if c > 1 else ''}" for el, c in sorted(s.items()))
    print(f"  {formula}: {s}")


Elements: ['Bi', 'Fe', 'Mn']

Total stoichiometries: 115

First 10:
  BiFe: {'Bi': 1, 'Fe': 1}
  BiFe2: {'Bi': 1, 'Fe': 2}
  Bi2Fe: {'Bi': 2, 'Fe': 1}
  BiFe3: {'Bi': 1, 'Fe': 3}
  Bi3Fe: {'Bi': 3, 'Fe': 1}
  BiFe4: {'Bi': 1, 'Fe': 4}
  Bi2Fe3: {'Bi': 2, 'Fe': 3}
  Bi3Fe2: {'Bi': 3, 'Fe': 2}
  Bi4Fe: {'Bi': 4, 'Fe': 1}
  BiFe5: {'Bi': 1, 'Fe': 5}


## Run the Exploration

Now let's run the full exploration. This will:
1. Generate structures for each stoichiometry
2. Optimize them using the ORB calculator
3. Store results in SQLite + CIF files
4. Build a phase diagram


In [6]:
# Run a smaller exploration for demonstration
# In practice, you'd want larger max_atoms and more trials

result = explorer.explore(
    chemical_system="Fe-Mn-Co",
    max_atoms=20,          
    min_atoms=2,
    num_trials=15,          # Trials per stoichiometry
    optimize=True,
    include_binaries=True,
    include_ternaries=True,
    max_stoichiometries=100,
    crystal_systems=["hexagonal", "tetragonal"],
    load_previous_runs=True,      # Load from all previous runs
    skip_existing_formulas=False,  # Skip formulas we already have
)


INFO:ggen.explorer:Starting exploration of Co-Fe-Mn
INFO:ggen.explorer:Found 5 previous runs for Co-Fe-Mn
INFO:ggen.explorer:Loaded 0 candidates from exploration_Co-Fe-Mn_20260102_165713
  struct = parser.parse_structures(primitive=primitive)[0]
  struct = parser.parse_structures(primitive=primitive)[0]
  struct = parser.parse_structures(primitive=primitive)[0]
  struct = parser.parse_structures(primitive=primitive)[0]
  struct = parser.parse_structures(primitive=primitive)[0]
  CIF={'Mn': 4.0, 'Fe': 6.0, 'Co': 10.0}
  PMG={'Mn': 8.0, 'Fe': 16.0, 'Co': 24.0}
  ratios={'Fe': 2.6666666666666665, 'Mn': 2.0, 'Co': 2.4}
  if struct := self._get_structure(data, primitive, symmetrized, check_occu=check_occu):
  struct = parser.parse_structures(primitive=primitive)[0]
  struct = parser.parse_structures(primitive=primitive)[0]
INFO:ggen.explorer:Loaded 161 candidates from exploration_Co-Fe-Mn_20260102_160815
INFO:ggen.explorer:Loaded 65 candidates from exploration_Co-Fe-Mn_20260102_153726
INFO:

SymmetryUndeterminedError: too close distance between atoms

## Explore the Results


In [None]:
print(f"Chemical System: {result.chemical_system}")
print(f"Elements: {result.elements}") 
print(f"Total candidates attempted: {result.num_candidates}")
print(f"Successful generations: {result.num_successful}")
print(f"Failed generations: {result.num_failed}")
print(f"Phases on convex hull: {len(result.hull_entries)}")
print(f"Total time: {result.total_time_seconds:.1f}s")
print(f"\nResults saved to: {result.run_directory}")
print(f"Database: {result.database_path}")


Chemical System: Co-Fe-Mn
Elements: ['Co', 'Fe', 'Mn']
Total candidates attempted: 158
Successful generations: 158
Failed generations: 0
Phases on convex hull: 6
Total time: 2703.3s

Results saved to: exploration_runs/exploration_Co-Fe-Mn_20260102_160815
Database: exploration_runs/exploration_Co-Fe-Mn_20260102_160815/exploration.db


## View Stable Candidates

Get the phases that are on or near the convex hull:


In [None]:
# Get candidates within 150 meV/atom of the hull
stable = explorer.get_stable_candidates(result, e_above_hull_cutoff=0.15)

print(f"Found {len(stable)} stable/near-stable phases:\n")
for c in stable:
    e_above = c.generation_metadata.get('e_above_hull', 0)
    # Extract timestamp from run name (e.g., "exploration_Co-Fe-Mn_20260102_121508" -> "121508")
    source_run = c.generation_metadata.get('source_run', '')
    run_time = source_run if source_run else 'current'
    print(f"  {c.formula:10s}  E={c.energy_per_atom:.4f} eV/atom  "
          f"SG={c.space_group_symbol:10s}  E_hull={e_above*1000:.1f} meV  run={run_time}")

Found 114 stable/near-stable phases:

  CoFe11      E=-8.3460 eV/atom  SG=P2/m        E_hull=0.0 meV  run=current
  Co9Fe10     E=-7.8325 eV/atom  SG=Cm          E_hull=0.0 meV  run=current
  Co8Fe5Mn7   E=-8.1582 eV/atom  SG=P4/m        E_hull=0.0 meV  run=current
  Co2FeMn9    E=-8.7369 eV/atom  SG=Cm          E_hull=0.0 meV  run=exploration_Co-Fe-Mn_20260102_153726
  FeMn19      E=-9.0517 eV/atom  SG=P1          E_hull=0.0 meV  run=exploration_Co-Fe-Mn_20260102_153726
  Co9Mn8      E=-7.9985 eV/atom  SG=P1          E_hull=0.0 meV  run=exploration_Co-Fe-Mn_20260102_153726
  Co4FeMn7    E=-8.3927 eV/atom  SG=Cm          E_hull=2.6 meV  run=current
  Co18Mn      E=-7.1789 eV/atom  SG=P-1         E_hull=3.1 meV  run=current
  CoFe12Mn7   E=-8.6042 eV/atom  SG=P1          E_hull=4.6 meV  run=current
  Fe2Mn5      E=-8.8929 eV/atom  SG=C2/m        E_hull=5.9 meV  run=current
  Co8Fe11Mn   E=-7.9565 eV/atom  SG=P1          E_hull=6.3 meV  run=exploration_Co-Fe-Mn_20260102_153726
  Co13Mn7 

## View the Phase Diagram


In [None]:
# Plot the phase diagram (for ternary systems)
if result.phase_diagram is not None:
    try:
        fig = explorer.plot_phase_diagram(result, show_unstable=0.150)
        fig.show()
    except Exception as e:
        print(f"Phase diagram plotting not available: {e}")
else:
    print("No phase diagram available (need at least 2 valid candidates)")


## Export Summary


In [8]:
# Export a JSON summary of the exploration
summary = explorer.export_summary(
    result,
    output_path=result.run_directory / "summary.json"
)

print("Summary exported!")
print(f"Hull entries: {summary['hull_entries']}")


NameError: name 'result' is not defined

## Inspect Individual Structures


In [None]:
# Get the most stable structure
if result.hull_entries:
    best = result.hull_entries[0]
    print(f"Most stable: {best.formula}")
    print(f"Energy: {best.energy_per_atom:.4f} eV/atom")
    print(f"Space group: {best.space_group_symbol} (#{best.space_group_number})")
    print(f"CIF file: {best.cif_path}")
    
    # View with pymatviz if available
from pymatviz import StructureWidget
StructureWidget(best.structure)

NameError: name 'result' is not defined

## Load a Previous Run

You can reload a previous exploration from its directory:


In [None]:
# Load a previous run
# loaded = ChemistryExplorer.load_run("./exploration_runs/your_run_name")
# print(f"Loaded {loaded.num_candidates} candidates")


## Query the SQLite Database Directly

The SQLite database allows flexible querying:


In [None]:
import sqlite3
import pandas as pd

# Connect to the database
conn = sqlite3.connect(str(result.database_path))

# Query all candidates
df = pd.read_sql_query("""
    SELECT formula, energy_per_atom, space_group_symbol, 
           e_above_hull, is_on_hull, is_valid
    FROM candidates
    WHERE is_valid = 1
    ORDER BY e_above_hull ASC
""", conn)

print("All valid candidates:")
df


All valid candidates:


Unnamed: 0,formula,energy_per_atom,space_group_symbol,e_above_hull,is_on_hull,is_valid
0,Co,-7.079214,Fm-3m,,0,1
1,Fe,-8.435699,Im-3m,,0,1
2,Mn,-9.025988,Im-3m,,0,1
3,CoFe11,-8.346041,P2/m,0.000000,1,1
4,Co9Fe10,-7.832494,Cm,0.000000,1,1
...,...,...,...,...,...,...
156,Co3Fe,-6.336643,P6/mmm,1.100379,0,1
157,FeMn3,-7.813682,P6/mmm,1.108315,0,1
158,Co5Fe3Mn2,-6.607213,Cm,1.313445,0,1
159,Co2Fe7Mn7,-6.144474,P1,2.437319,0,1
