# Handling LH5 data

LEGEND stores its data in [HDF5](https://www.hdfgroup.org/solutions/hdf5) format, a high-performance data format becoming popular in experimental physics. LEGEND Data Objects (LGDO) are represented as HDF5 objects according to a custom specification, documented [here](https://legend-exp.github.io/legend-data-format-specs/dev/hdf5).

## Reading data from disk

Let's start by downloading a small test LH5 file with the [legend-testdata](https://pypi.org/project/legend-testdata/) package (it takes a while depending on your internet connection):

In [None]:
from legend_testdata import LegendTestData

ldata = LegendTestData()
lh5_file = ldata.get_path("lh5/LDQTA_r117_20200110T105115Z_cal_geds_raw.lh5")

We can use `pygama.lgdo.lh5_store.ls()` [[docs]](https://pygama.readthedocs.io/en/stable/api/pygama.lgdo.html#pygama.lgdo.lh5_store.ls) to inspect the file contents:

In [None]:
from pygama.lgdo import ls

ls(lh5_file)

This particular file contains an HDF5 group (they behave like directories). The second argument of `ls()` can be used to inspect a group (without the trailing `/`, only the group name is returned, if existing):

In [None]:
ls(lh5_file, "geds/")  # returns ['geds/raw'], which is a group again
ls(lh5_file, "geds/raw/")

<div class="alert alert-info">

**Note:** Alternatively to `ls()`, `show()` [[docs]](https://pygama.readthedocs.io/en/stable/api/pygama.lgdo.html#pygama.lgdo.lh5_store.show) prints a nice representation of the LH5 file contents (with LGDO types) on screen:

</div>

In [None]:
from pygama.lgdo import show

show(lh5_file)

The group contains several LGDOs. Let's read them in memory. We start by initializing an `LH5Store` [[docs]](https://pygama.readthedocs.io/en/stable/api/pygama.lgdo.html#pygama.lgdo.lh5_store.LH5Store) object:

In [None]:
from pygama.lgdo import LH5Store

store = LH5Store()

`read_object()` [[docs]](https://pygama.readthedocs.io/en/stable/api/pygama.lgdo.html#pygama.lgdo.lh5_store.LH5Store.read_object) reads an LGDO from disk and returns the object in memory together with the number of rows (as a tuple), if an object has such a property. Let's try to read `geds/raw`:

In [None]:
store.read_object("geds/raw", lh5_file)

As shown by the type signature, it is interpreted as a `Table` with 100 rows. Its contents (or "columns") can be therefore viewed as LGDO objects of the same length. For example `timestamp`:

In [None]:
obj, n_rows = store.read_object("geds/raw/timestamp", lh5_file)
obj

is an LGDO `Array` with 100 elements.

`read_object()` also allows to perform more advanced data reading. For example, let's read only rows from 15 to 25:

In [None]:
obj, n_rows = store.read_object("geds/raw/timestamp", lh5_file, start_row=15, n_rows=10)
print(obj)

Or, let's read only columns `timestamp` and `energy` from the `geds/raw` table and rows `[1, 3, 7, 9, 10, 15]`:

In [None]:
obj, n_rows = store.read_object(
    "geds/raw", lh5_file, field_mask=("timestamp", "energy"), idx=[1, 3, 7, 9, 10, 15]
)
print(obj)

As you might have noticed, `read_object()` loads all the requested data in memory at once. This can be a problem when dealing with large datasets. `LH5Iterator` [[docs]](https://pygama.readthedocs.io/en/stable/api/pygama.lgdo.html#pygama.lgdo.lh5_store.LH5Iterator) makes it possible to handle data one chunk at a time (sequentially) to avoid running out of memory:

In [None]:
from pygama.lgdo import LH5Iterator

for lh5_obj, entry, n_rows in LH5Iterator(lh5_file, "geds/raw/energy", buffer_len=20):
    print(f"entry {entry}, energy = {lh5_obj} ({n_rows} rows)")

## Writing data to disk

Let's start by creating some LGDOs:

In [None]:
from pygama.lgdo import Array, Scalar, WaveformTable
import numpy as np

rng = np.random.default_rng(12345)

scalar = Scalar("made with pygama!")
array = Array(rng.random(size=10))
wf_table = WaveformTable(values=rng.integers(low=1000, high=5000, size=(10, 1000)))

The `write_object()` [[docs]](https://pygama.readthedocs.io/en/stable/api/pygama.lgdo.html#pygama.lgdo.lh5_store.LH5Store.write_object) method of `LH5Store` makes it possible to write LGDO objects on disk. Let's start by writing `scalar` with name `message` in a file named `my_data.lh5` in the current directory:

In [None]:
store = LH5Store()

store.write_object(
    scalar, name="message", lh5_file="my_objects.lh5", wo_mode="overwrite_file"
)

Let's now inspect the file contents:

In [None]:
from pygama.lgdo import show

show("my_objects.lh5")

The string object has been written at the root of the file `/`. Let's now write also `array` and `wf_table`, this time in a HDF5 group called `closet`:

In [None]:
store.write_object(array, name="numbers", group="closet", lh5_file="my_objects.lh5")
store.write_object(
    wf_table, name="waveforms", group="closet", lh5_file="my_objects.lh5"
)
show("my_objects.lh5")

Everything looks right!

<div class="alert alert-info">

**Note:** `write_objects()` allows for more advanced usage, like writing only some rows of the input object or appending to existing array-like structures. Have a look at the [[docs]](https://pygama.readthedocs.io/en/stable/api/pygama.lgdo.html#pygama.lgdo.lh5_store.LH5Store.write_object) for more information.

</div>