# Tutorial-0

This first exercise demonstrates how to create an empty HDF5eis file and add a single channel of data to it. To begin, let's import some necessary packages for this tutorial and create a directory where we can write some data files.

In [None]:
!pip install obspy

In [None]:
# Standard library imports
import io
import pathlib
import tempfile

# Third-party imports
import hdf5eis
import obspy.clients.fdsn
import pandas as pd


tmp_dir = tempfile.TemporaryDirectory()
OUTPUT_DIR = pathlib.Path(tmp_dir.name)

Core functionality for manipulating HDF5eis files is accessed via the `hdf5eis.File` class.

In [None]:
hdf5eis.File?

Let's create an empty HDF5eis file. The default `mode` is `"r"`, so to create a new file we need to specify `mode="w"` or `mode="a"`. Note that the `hdf5eis.File` class inherits from the `h5py.File` class and passes `*args` and `**kwargs` to its super-class initializer. Any valid positional or keyword arguments for the `h5py.File` intializer are therefore valid for `hdf5eis.File` as well.

In [None]:
file_out = hdf5eis.File(OUTPUT_DIR.joinpath("my_first_file.hdf5"), mode='w', overwrite=True)

`hdf5eis.File` instances have three properties (`timeseries`, `metadata`, and `products`) which provide functionality to manipulate the groups by the same name.

In [None]:
file_out.timeseries?

In [None]:
file_out.metadata?

In [None]:
file_out.products?

The `hdf5eis.File.timeseries` property has an `index` property that records the contents of the group. At present, it is empty.

In [None]:
file_out.timeseries.index

Let's download some timeseries data to add to the file. We will download one hour of data from IRIS for channel `AZ.BZN..HHZ`.

In [None]:
network = "AZ"
station = "BZN"
location = ""
channel = "HHZ"
start_time = obspy.UTCDateTime("2021-01-01T00:00:00Z")
end_time = obspy.UTCDateTime("2021-01-01T01:00:00Z")
client = obspy.clients.fdsn.Client()
stream = client.get_waveforms(
    network,
    station,
    location,
    channel,
    start_time,
    end_time
)

We can add the data using the `add` method.

In [None]:
file_out.timeseries.add?

In [None]:
for trace in stream:
    file_out.timeseries.add(
        trace.data,
        str(trace.stats.starttime),
        trace.stats.sampling_rate,
        tag=".".join((network, station, location, channel))
    )

Now we can see that there is a corresponding row in the timeseries index.

In [None]:
file_out.timeseries.index

We can retrieve the data now using a hybrid of dictionary-like and array-slicing syntax.

In [None]:
super_gather = file_out.timeseries["AZ.BZN..HHZ", "2021-01-01T00:00:00Z": "2021-01-01T00:30:00Z"]

Timeseries data are returned as a dictionary in which the key is the `tag` associated with the corresponding value and the value is a list of `hdf5eis.Gather` objects.

In [None]:
super_gather

Each `hdf5eis.Gather` object has a number of descriptive properties (a subset is demonstrated here).

In [None]:
gather = super_gather["AZ.BZN..HHZ"][0]
print("gather.data:", gather.data)           # The raw data array
print("gather.starttime:", gather.start_time) # The UTC time of the first temporal sample.
print("gather.times:", gather.times)         # The UTC time of each temporal sample.

A dictionary is returned when retrieving data because regular expressions are permitted when specifying the `tag` value. To demonstrate this, let's add data for one another station to  the file.

In [None]:
station = "CRY"
stream = client.get_waveforms(
    network,
    station,
    location,
    channel,
    start_time,
    end_time
)
for trace in stream:
    file_out.timeseries.add(
        trace.data,
        str(trace.stats.starttime),
        trace.stats.sampling_rate,
        tag=".".join((network, station, location, channel))
    )
    
file_out.timeseries.index

Now we can specify a regular expression to select data from both stations.

In [None]:
super_gather = file_out.timeseries["AZ.*", "2021-01-01T00:00:00Z": "2021-01-01T00:30:00Z"]
super_gather

Now that we can add and retrieve timeseries data, let's get the corresponding station metadata from IRIS.

In [None]:
inventory = sum(*[
    client.get_stations(
        network=network, 
        station=station, 
        location=location, 
        channel=channel
    )
    for station in ("BZN", "CRY")
])

We can write this metadata to STATIONXML format using a buffer.

In [None]:
buffer = io.BytesIO()
inventory.write(buffer, "STATIONXML")
buffer.seek(0)
stationxml = buffer.read()

# stationxml is now a stream of UTF-8 encoded bytes.
stationxml

And we can add this byte stream to the `/metadata` group using the `hdf5eis.File.metadata.add()` method.

In [None]:
file_out.metadata.add(stationxml, "network_as_UTF8_STATIONXML", fmt='STATIONXML')

We can retrieve this metadata using dictionary-like syntax.

In [None]:
string, fmt = file_out.metadata["network_as_UTF8_STATIONXML"]
print(string)

In [None]:
fmt

And we can parse the data using `obspy.read_inventory()`

In [None]:
buffer = io.BytesIO(string.encode("UTF-8"))
obspy.read_inventory(buffer, format=fmt)

Finally, we can convert the metadata to a `pandas.DataFrame`.

In [None]:
dataf = pd.DataFrame(
    [
        [
            network.code, 
            station.code, 
            station.latitude, 
            station.longitude, 
            station.elevation
        ]
        for network in inventory for station in network
    ],
    columns=["network", "station", "latitude", "longitude", "elevation"]
)
dataf

In [None]:
file_out.metadata.add(dataf, "network_geometry_as_table")
table, fmt = file_out.metadata["network_geometry_as_table"]
table

The `hdf5eis.File.products` attribute behaves exactly as the `metadata` attribute. Let's finish responsibly by closing our file.

In [None]:
file_out.close()

Note that using the context manager is the canonical way of opening and closing HDF5eis files.

In [None]:
with hdf5eis.File(OUTPUT_DIR.joinpath("my_first_file.hdf5"), mode="r") as file_in:
    table, fmt = file_in.metadata["network_geometry_as_table"]
    
table

That's it!  Those are the basics of adding data to and retrieving it from an HDF5eis file! In the next tutorial, we will learn how to add and retrieve multidimensional arrays and use HDF5eis external linking functionality.