# Getting Started — Official **ALIGNN** Framework

This notebook builds on the open‑source  
[**Atomistic Line Graph Neural Network (ALIGNN)**](https://github.com/usnistgov/alignn)  
by **Kamal Choudhary et al.**  

ALIGNN embeds **both** the atom‑bond graph **and** its bond‑angle line‑graph,
delivering state‑of‑the‑art accuracy for materials‑property prediction.

> **Reference**  
> *Choudhary K.* **et al.**  
> “ALIGNN: Atomistic Line Graph Neural Network for Improved Materials Property Prediction.”  
> *npj Computational Materials* **7**, 185 (2021).  
> DOI: [10.1038/s41524‑021‑00650‑1](https://www.nature.com/articles/s41524-021-00650-1)

---

#### ALIGNN version used in this study

```bash
# choose one package manager
pip install alignn==2024.01.01      # PyPI
# — or —
conda install -c conda-forge alignn=2024.01.01
```
---

## 1 · Workflow focus — **Cd<sub>28</sub>Se<sub>17</sub>Cl<sub>22</sub>**

All demonstrations below use the **Cl‑passivated quantum dot**  (Cd<sub>28</sub>Se<sub>17</sub>Cl<sub>22</sub>).  
The *same* workflow applies to Cd<sub>28</sub>Se<sub>17</sub>(OH)<sub>22</sub>;  
we show the band‑gap (HOMO–LUMO gap) of the Cl system for clarity.

---

## 2 · Context

* **AIMD length:** 10 ps • **Timestep:** 1 fs  
* **Geometry files:** `st1.vasp … st10001.vasp` (10,001 snapshots)  
* **Labels:** PBE band‑gaps for each snapshot

---

## **Steps for Train/Validation/Test**

###  Build the folder called **`root_dir/`**
which contains all the POSCAR files of the AIMD trajectories

```text
st1.vasp, st2.vasp, st3.vasp, ..., st10001.vasp (10,000 POSCAR files)



###  Create **`id_prop.csv`** of band‑gap labels with timestep 10 fs for train/validation/test

```text
structure_id,bandgap_eV
st1,1.66
st11,1.65
st21,1.62
    :
st9999,1.77
        (1001 rows)…


## **Train/validation/test strategy**  
  * Only **10 %** of the 10 001 structures are used for model fitting  
    (split **80 / 10 / 10** → train / val / test).  
  * The remaining **90 %** of frames are later **predicted**.

### What the ensemble‑submission script does

* **Why an ensemble?**  
  * To cover the whole 10 ps with those 1001 dataset landscape we train **20 ALIGNN models**,  
    each on a different **random shuffle of the id_prop.csv** files. Each run sees a different random 1 000‑frame sample
(via shuffled id_prop.csv and a unique random_seed in config.json)

* **What the script actually does for each run**  
  1. Create `run_<n>/` inside `BASE_OUTPUT_DIR`.  
  2. Copy `root_dir/` and a **shuffled** `id_prop.csv` (seed = unique).
  3. Write `config.json` with that seed.  
  4. Generate a one‑off `submit_job.sh` (Slurm, 1 GPU, 2 h).  
  5. `sbatch` the job so logs & checkpoints stay in `run_<n>/`.

  it would create folders like run_0, run_1, ..., run_19


In [11]:
# Cell [2]: Write out submit_ensemble.py
script = r'''
#!/usr/bin/env python
"""
submit_ensemble.py

This script automatically submits jobs for an ensemble of 20 ALIGNN training runs.
For each run, it:

* Creates a unique run folder (e.g., run_0, run_1, …, run_19) under a base output directory.
* Copies the base config file (config_example.json) into that run folder and updates the "random_seed" field
  with a unique random seed.
* Copies the entire "root_dir" folder (from the current working directory) into the run folder.
* Copies and shuffles the master id_prop.csv file (from the current directory) into the copied root_dir folder.
* Writes a run-specific SBATCH job script that calls the training script with --root_dir set to the run-specific "root_dir".
* Submits the job using sbatch so that outputs (e.g., slurm logs, checkpoints, and temp directories)
  remain in the run folder.

Please adjust the BASE_SOURCE_ROOT_DIR, BASE_CONFIG, BASE_OUTPUT_DIR, and other paths as needed.
"""

import os
import json
import random
import subprocess
import time
import shutil
import pandas as pd

# ----- Base directories and ensemble settings -----

BASE_SOURCE_ROOT_DIR = os.path.join(os.getcwd(), "root_dir")
BASE_ID_PROP         = os.path.join(os.getcwd(), "id_prop.csv")
BASE_CONFIG          = "/scratch/gilbreth/samantak/ALIGNN_AIMD_DFT_ML/Cl/train_to_10k/config_example.json"
BASE_OUTPUT_DIR      = "/scratch/gilbreth/samantak/ALIGNN_AIMD_DFT_ML/Cl/train_to_10k"
ENSEMBLE_SIZE        = 20

unique_seeds = random.sample(range(10000, 100000), ENSEMBLE_SIZE)

def update_config(base_config_path, new_config_path, new_seed):
    with open(base_config_path, "r") as f:
        cfg = json.load(f)
    cfg["random_seed"] = new_seed
    with open(new_config_path, "w") as f:
        json.dump(cfg, f, indent=4)

def copy_and_shuffle_idprop(source_csv, dest_csv, seed):
    try:
        df = pd.read_csv(source_csv)
    except Exception as e:
        print(f"Error reading {source_csv}: {e}")
        return False
    df_shuffled = df.sample(frac=1, random_state=seed).reset_index(drop=True)
    df_shuffled.to_csv(dest_csv, index=False)
    return True

def write_job_script(job_script_path, run_dir, new_run_root_dir, config_path, run_index):
    job_script = f"""#!/bin/bash
#SBATCH -A standby
#SBATCH -N 1
#SBATCH --gpus=1
#SBATCH -t 02:00:00
#SBATCH --job-name cl10k_run_{run_index}
#SBATCH -o {run_dir}/slurm-%j.out

cd "{run_dir}"
export PATH=/usr/local/cuda/bin:$PATH
source ~/.bashrc
conda_setup
conda activate alignn_original

train_alignn.py --root_dir "{new_run_root_dir}" --config "{config_path}" --output_dir "./temp"
"""
    with open(job_script_path, "w") as f:
        f.write(job_script)
    os.chmod(job_script_path, 0o755)

def submit_job(job_script_path):
    subprocess.run(["sbatch", job_script_path], check=True)

def main():
    for i in range(ENSEMBLE_SIZE):
        run_dir = os.path.join(BASE_OUTPUT_DIR, f"run_{i}")
        os.makedirs(run_dir, exist_ok=True)

        new_seed = unique_seeds[i]
        new_config_path = os.path.join(run_dir, "config.json")
        update_config(BASE_CONFIG, new_config_path, new_seed)

        dest_root_dir = os.path.join(run_dir, "root_dir")
        if os.path.exists(dest_root_dir):
            shutil.rmtree(dest_root_dir)
        shutil.copytree(BASE_SOURCE_ROOT_DIR, dest_root_dir)

        dest_idprop = os.path.join(dest_root_dir, "id_prop.csv")
        copy_and_shuffle_idprop(BASE_ID_PROP, dest_idprop, new_seed)

        job_script_path = os.path.join(run_dir, "submit_job.sh")
        write_job_script(job_script_path, run_dir, dest_root_dir, new_config_path, i)

        try:
            submit_job(job_script_path)
            print(f"Run {i}: submitted.")
        except subprocess.CalledProcessError as e:
            print(f"Run {i} submission failed: {e}")

        time.sleep(1)

if __name__ == "__main__":
    main()
'''
# write to disk
with open("submit_ensemble.py", "w") as f:
    f.write(script.strip())
print("submit_ensemble.py has been created.")


submit_ensemble.py has been created.


## 2. Run the submission script

This will spin up all 20 jobs:

```bash
!chmod +x submit_ensemble.py
!./submit_ensemble.py
