# How to write data to an H5 files

Here we are going to utilize a Python library called H5py to write our data to a file.  

We will then load the file back in and analyze the structure

First we will define the libraries that we need.

In [1]:
import os
import sys
import h5py
import pandas as pd
import numpy as np

#The things below are just needed to open analyze the structure.  No need to import these if you're just
#writing out the h5 file.
sys.path.append('/home/klinetry/Desktop/Phobos3/')
import PhobosFunctions as PF
from ipytree import Tree, Node

We will create a series of Pandas DataFrames to store into an h5 file.  This step would be replaced by whatever data that you want to store.

I'm just going to make a list of DataFrames to start with.

In [5]:
dflist = []
for i in range(4):
    y = pd.DataFrame()
    y['Time'] = np.linspace(0,1000,500)
    y['Altitude'] = -.5*9.8*y['Time']**2
    dflist.append(y)
dflist

[            Time      Altitude
 0       0.000000 -0.000000e+00
 1       2.004008 -1.967864e+01
 2       4.008016 -7.871454e+01
 3       6.012024 -1.771077e+02
 4       8.016032 -3.148582e+02
 ..           ...           ...
 495   991.983968 -4.821758e+06
 496   993.987976 -4.841259e+06
 497   995.991984 -4.860800e+06
 498   997.995992 -4.880380e+06
 499  1000.000000 -4.900000e+06
 
 [500 rows x 2 columns],             Time      Altitude
 0       0.000000 -0.000000e+00
 1       2.004008 -1.967864e+01
 2       4.008016 -7.871454e+01
 3       6.012024 -1.771077e+02
 4       8.016032 -3.148582e+02
 ..           ...           ...
 495   991.983968 -4.821758e+06
 496   993.987976 -4.841259e+06
 497   995.991984 -4.860800e+06
 498   997.995992 -4.880380e+06
 499  1000.000000 -4.900000e+06
 
 [500 rows x 2 columns],             Time      Altitude
 0       0.000000 -0.000000e+00
 1       2.004008 -1.967864e+01
 2       4.008016 -7.871454e+01
 3       6.012024 -1.771077e+02
 4       8.016032 -3

Now we have our data, and are ready to write the data to an H5 file. 

The way data is stored inside an h5 file is through a series of groups and datasets.  The following diagram will show a basic outline of how H5 stores it's data.  

NOTE:  This has no bearing on writing data to H5 files... it is just a visual aid.

In [26]:
tree = Tree(stripes=True)
fnode = Node('MyFile.h5')
tree.add_node(fnode)

group1 = Node('Top Level Group 1')
group2 = Node('Top Level Group 2')
DataSet1 = Node('Dataset 1')
DataSet2 = Node('Dataset 2')
DataSet3 = Node('Dataset 3')
DataSet4 = Node('Dataset 4')
header11 = Node('Time')
header12 = Node('Altitude')
series11 = Node('Pandas Series[Time]')
series12 = Node('Pandas Series[Altitude]')

header21 = Node('Time')
header22 = Node('Altitude')
series21 = Node('Pandas Series[Time]')
series22 = Node('Pandas Series[Altitude]')

header31 = Node('Time')
header32 = Node('Altitude')
series31 = Node('Pandas Series[Time]')
series32 = Node('Pandas Series[Altitude]')

header41 = Node('Time')
header42 = Node('Altitude')
series41 = Node('Pandas Series[Time]')
series42 = Node('Pandas Series[Altitude]')

fnode.add_node(group1)
fnode.add_node(group2)
group1.add_node(DataSet1)
group1.add_node(DataSet2)
group2.add_node(DataSet3)
group2.add_node(DataSet4)

DataSet1.add_node(header11)
DataSet1.add_node(header12)
header11.add_node(series11)
header12.add_node(series12)

DataSet2.add_node(header21)
DataSet2.add_node(header22)
header21.add_node(series21)
header22.add_node(series22)

DataSet3.add_node(header31)
DataSet3.add_node(header32)
header31.add_node(series31)
header32.add_node(series32)

DataSet4.add_node(header41)
DataSet4.add_node(header42)
header41.add_node(series41)
header42.add_node(series42)

display(tree)

Tree(nodes=(Node(name='MyFile.h5', nodes=(Node(name='Top Level Group 1', nodes=(Node(name='Dataset 1', nodes=(…

Technically speaking, H5 files can have an arbitrary number of groups and subgroups, however, to be compliant with Ares, NO H5 SHOULD HAVE MORE THAN 1 LAYER OF GROUPS.  So the Tree above is about has hierarchical as the file should ever be.

NOTE:  You can put as many datasets under the groups as you want.  Here we just have 2 under each group.

Now lets start writing the file.
We will emulate the structure above.

In [27]:
OutDirectory = os.path.join(os.path.expanduser('~'),'Desktop')
with h5py.File(os.path.join(OutDirectory,'MyFile.h5'),'w') as hf:
    j = 0
    for i in range(len(dflist)):
        #Only create a new group when i is 0 or 2, because we want 2 datasets per group... This can obviously
        # be changed in your code, however you want to set up the data.
        if i%2 == 0:
            j+=1
            group = hf.create_group(f'Top Level Group {j}')
        dset = group.create_group(f'Dataset {i+1}')
        #This is where the data is getting written to the h5 file.  It needs to be under a group!
        for key in dflist[i]:
            dset.create_dataset(key,data=dflist[i][key])

        
    

Now to load our data and see if we can see the structure that we have created, utilizing Phobos Functions

In [28]:
fpath = os.path.join(OutDirectory,'MyFile.h5')
groups = PF.get_groups(fpath)
print(f'MyFile.h5 contains the following groups {groups}\n\n')
for group in groups:
    dsets = PF.get_dsets(fpath,group)
    print(f'\t{group} contains the following datasets {dsets}')
    for dset in dsets:
        headers = PF.get_headers(fpath,group,dset)
        print(f'\t\t{dset} contains the following headers {headers}')
    print('\n')

MyFile.h5 contains the following groups ['Top Level Group 1', 'Top Level Group 2']


	Top Level Group 1 contains the following datasets ['Dataset 1', 'Dataset 2']
		Dataset 1 contains the following headers ['Altitude', 'Time']
		Dataset 2 contains the following headers ['Altitude', 'Time']


	Top Level Group 2 contains the following datasets ['Dataset 3', 'Dataset 4']
		Dataset 3 contains the following headers ['Altitude', 'Time']
		Dataset 4 contains the following headers ['Altitude', 'Time']


