<b>The basics:</b> converting a trajectory to HDF5

Let's start with the trajectory included in the tests of this package. This file is an `emmet-core` `TaskDoc` object, which contains trajectory information in its ionic steps.

In [11]:
from pathlib import Path

from emmet.core.tasks import TaskDoc
from monty.serialization import loadfn

from pymatgen.io import mp_archival
from pymatgen.io.mp_archival.trajectory import TrajArchive

task_doc_path = (
    Path(mp_archival.__file__).resolve().parent
    / "../../../tests/test_files/mp-1201400_task_doc.json.gz"
)
task_doc = TaskDoc(**loadfn(task_doc_path))

We use the `from_task_doc` feature of `TrajArchive` to convert this to an HDF5 file:

In [14]:
archiver = TrajArchive.from_task_doc(task_doc)
archiver.to_archive("traj.h5")

We can convert this HDF5 file to a `pymatgen` or `ase` `Trajectory` object with single lines of code:

In [15]:
pmg_traj = TrajArchive.to_pymatgen_trajectory("traj.h5")
ase_traj = TrajArchive.to_ase_trajectory("traj.h5")

If you have a `pymatgen` trajectory, you can also directly convert this to a `TrajArchive`

In [23]:
traj_copy = TrajArchive.from_pymatgen_trajectory(pmg_traj)

To achieve better data compression, you might want to store trajectories in batches. To do this, use the `to_group` feature of `TrajArchive`. This will allow you to add a trajectory to an arbitrary hierarchical position in an existing HDF5 file.

In [27]:
import h5py

with h5py.File("traj.h5", "w") as f:
    for idx in range(5):
        traj_copy.to_group(f, group_key=f"copy_{1+idx}")
    print(f.keys())

<KeysViewHDF5 ['copy_1', 'copy_2', 'copy_3', 'copy_4', 'copy_5']>
