# 1. Dealing with atomistic data

In `aml`, each data point is considered as graph, which has species(`elems`) and atomic positions(`pos`) as default node attributes. The class `AtomsGraph` is container for atomistic graph data, which is basically `torch_geometric.data.Data` with some additional features. The structure with periodic boundary condition(PBC) is also supported when unit cell matrix(`cell`) is given.

In [1]:
import torch
from aml.data import AtomsGraph

# Water molecule
pos = torch.tensor(
    [[0.00,  0.00,  0.12],
     [0.00,  0.76, -0.48],
     [0.00, -0.76, -0.48]]
)
elems = torch.tensor([8, 1, 1], dtype=torch.long)
data = AtomsGraph(elems=elems, pos=pos)
print(data)

AtomsGraph(elems=[3], pos=[3, 3], n_atoms=[1], cell=[1, 3, 3])


In [2]:
# Pt FCC crystal
pos = torch.tensor(
    [[0.00, 0.00, 0.00],
     [1.96, 1.96, 0.00],
     [1.96, 0.00, 1.96],
     [0.00, 1.96, 1.96]]
)
elems = torch.tensor([78, 78, 78, 78], dtype=torch.long)
cell = torch.tensor(
    [[3.92, 0.00, 0.00],
    [0.00, 3.92, 0.00],
    [0.00, 0.00, 3.92]]
)
data = AtomsGraph(elems=elems, pos=pos, cell=cell)
print(data)
# Convert to ASE Atoms object
atoms = data.to_ase()

AtomsGraph(elems=[4], pos=[4, 3], cell=[1, 3, 3], n_atoms=[1])


Any other properties (ex. energy) can be added to the `data` object.

In [3]:
data = AtomsGraph(elems=elems, pos=pos, cell=cell, energy=torch.tensor(-42.0))
print(data)

AtomsGraph(elems=[4], pos=[4, 3], cell=[1, 3, 3], energy=[1], n_atoms=[1])


The edges can be assigned into `edge_index`, which is adjacency matrix with sparse format. Typical way of constructing the edges is to consider neighbor list (within cutoff) to be connected with center atoms. This can be done with `AtomsGraph.build_neighborlist`.

In [4]:
# Build neighbor list
data.build_neighborlist(cutoff=5.0)
edge_index = data.edge_index
idx_center, idx_neighbor = edge_index[1], edge_index[0]
print("Center atom indices:", idx_center)
print("Neighbor atom indices:", idx_neighbor)
print("Offset vectors:", data.edge_shift) # Offset vectors towards neighbors outside of unit cell

Center atom indices: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
Neighbor atom indices: tensor([1, 1, 3, 1, 2, 3, 1, 2, 3, 1, 0, 1, 3, 3, 2, 1, 0, 3, 2, 1, 2, 1, 0, 3,
        0, 1, 2, 3, 2, 1, 3, 3, 2, 3, 0, 3, 2, 1, 2, 0, 2, 2, 0, 2, 1, 2, 2, 3,
        1, 3, 3, 1, 0, 2, 0, 2, 1, 0, 0, 0, 0, 0, 2, 3, 3, 1, 3, 3, 2, 3, 2, 0,
        3, 2, 0, 0, 2, 3, 3, 3, 2, 1, 0, 2, 3, 1, 0, 3, 1, 0, 1, 0, 0, 2, 1, 1,
        3, 3, 1, 0, 0, 3, 2, 1, 0, 2, 0, 3, 3, 3, 0, 1, 3, 1, 0, 3, 2, 1, 0

`AtomsGraph` can be directly created from `ase.Atoms`. If pre-computed properties(ex. energy) are available, they are automatically read into `data`.

In [5]:
import ase.build
from ase.calculators.emt import EMT

atoms = ase.build.bulk("Pt", "fcc", a=3.92, cubic=True)
atoms.rattle(0.05)
data = AtomsGraph.from_ase(atoms, neighborlist_cutoff=5.0) # Build neighbor list
print(data)
# If energy, force, and stress are present in ASE Atoms object, they will be automatically read
atoms.calc = EMT()
data = AtomsGraph.from_ase(atoms, neighborlist_cutoff=5.0) # Build neighbor list
print("Energy:", data.energy) # eV
print("Forces:", data.force) # eV/A
print("Stress:", data.stress) # eV/A^3

AtomsGraph(n_atoms=[1], elems=[4], pos=[4, 3], cell=[1, 3, 3], batch=[4], edge_index=[2, 168], edge_shift=[168, 3])
Energy: tensor([0.0335])
Forces: tensor([[ 0.2283, -0.0933, -0.4044],
        [-0.1799,  0.2257,  0.1325],
        [-0.2301, -0.4467,  0.2327],
        [ 0.1817,  0.3143,  0.0392]])
Stress: tensor([[[-0.0073,  0.0004,  0.0012],
         [ 0.0004, -0.0061, -0.0003],
         [ 0.0012, -0.0003, -0.0082]]])


Multiple data can be stored in one `AtomsGraph` using `Batch` object from `torch_geometric`.

In [6]:
from torch_geometric.data import Batch

atoms_1 = ase.build.bulk("Pt", "fcc", a=3.92, cubic=True)
atoms_2 = ase.build.bulk("Si", "diamond", a=5.43, cubic=True)

data_1 = AtomsGraph.from_ase(atoms_1, neighborlist_cutoff=5.0)
data_2 = AtomsGraph.from_ase(atoms_2, neighborlist_cutoff=5.0)

batch = Batch.from_data_list([data_1, data_2])
print(batch)
print(batch.batch) # 0~3: data_1, 4~12: data_2
print("Unit cell:", batch.cell)

AtomsGraphBatch(n_atoms=[2], elems=[12], pos=[12, 3], cell=[2, 3, 3], batch=[12], edge_index=[2, 392], edge_shift=[392, 3], ptr=[3])
tensor([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1])
Unit cell: tensor([[[3.9200, 0.0000, 0.0000],
         [0.0000, 3.9200, 0.0000],
         [0.0000, 0.0000, 3.9200]],

        [[5.4300, 0.0000, 0.0000],
         [0.0000, 5.4300, 0.0000],
         [0.0000, 0.0000, 5.4300]]])
