# HDF5 file management

This a simple example of file management using h5py python
module without having LMGC90 installed.

This notebook aims at explaining how to explore the file
within an interpretor to discover the structure of the file
in order to be able to find the relevant data.


## General directions

The documentation of the h5py module can be found [here](http://docs.h5py.org/en/stable/)

The first thing is to open the file to explore its content :

In [None]:
import h5py
hfile = h5py.File('../lmgc90.h5', 'r')

The `hfile` object is a hierarchic structure like a tree,
where each branch is called *a group* and each leaf of the
tree is *a dataset*.

The different groups can be explored by using the path of the group as a dictionnary key like this :

In [None]:
print( list( hfile.keys() ) )
print( list( hfile['Simulation'].keys() ) )

When arriving to a dataset, the previous line will throw an exception:

In [None]:
print( list( hfile['Simulation/nb_record'] ) )

To get the value contained in a dataset use the following syntax :
```python
file_object['group_path/dataset_name'][()]
```

In [None]:
hfile['Simulation/nb_record'][()]

Because data are read from a binary file and H5PY has its own way of typing them, you may encounter some strange behaviour. Thus, for single integer for example, it may be needed to cast them into the python native type: 

In [None]:
nb_steps = hfile['Simulation/nb_record'][()]
print( type(nb_steps) )
nb_steps = int( nb_steps )
print( type(nb_steps) )

## General layout

At the root of the file there are :

* *version* : dataset storing the version number of the file
* *Simulation* : group holding some variables of the whole computation
* *Evolution* : group holding data at a given time step
* *Help* : group holding a full description of the content of data layout of *Evolution*

The aim is that by exploring the groups, to find the relevant
data. And in case of uncertainty, that the *Help* group can
provide a way to identify what is stored in the file.

Let's explore the file step by step until information on the
interactions is obtained :

In [None]:
evolution = list( hfile['Evolution'].keys() )
print( "1st element of 'Evolution' group : ", evolution[0] )

Each time step is in fact recorded in a subgroup of *Evolution*
with the name *ID_xxxx* where *xxxx* is a time step number.

Let's check the content of this group :

In [None]:
evol_1 =  list( hfile['Evolution/ID_1'] )
print( "content of 'Evolution/ID_1' : ", evol_1)

Now the simulation time of this first record can be accessed with the *TPS* dataset :

In [None]:
print( "in first record - TPS = ", hfile['Evolution/ID_1/TPS'][()] )

Since it has been decided earlier to look for interaction, let's look into the *VlocRloc* group

In [None]:
evol_1_vlocrloc = list( hfile['Evolution/ID_1/VlocRloc'] )
print( "content of 'Evolution/ID_1/VlocRloc' ", evol_1_vlocrloc)

It is now possible to check that *idata* and *rdata* are datasets.... and contain quite a lot
data :

In [None]:
idata = hfile['Evolution/ID_1/VlocRloc/idata'][()]
rdata = hfile['Evolution/ID_1/VlocRloc/rdata'][()]
print( "idata is of size : ", idata.shape)
print( "rdata is of size : ", rdata.shape)

In fact the *i* and *r* of *idata* and *rdata* stand for *integer* and *real*
respectively. Thus in these two arrays, there are all the integer and real
data corresponding ot the interactions at a given time step.

## Getting some help !

This is where the *Help* group comes to... help ? Let's skip the exploration
and look to what information can be obtained with this :

In [None]:
rdata_list = list( hfile['Help/VlocRloc/rdata'].keys() )
print(rdata_list)

So if one is interested in the coordinates of the contact points,
one has to get :

In [None]:
field = hfile['Help/VlocRloc/rdata/coor/name'][()]
bound = hfile['Help/VlocRloc/rdata/coor/bound'][()]
print(field)
print(bound)

These informations are specifying that to get the coordinates
of the interactions one has to get the sixth and seventh
indices of the *rdata* and that the data are *x* and *y*
in that order.

**Warning** : the indices are in Fortran convention, thus
starting from 1. The array obtained use Python indices
which start form 0.

**Warning** : the field *name* may be interpreted as a
*bytes* object of Python. To make a *string* object from
this the *decode* method must be used.

In [None]:
print( field.decode() +' of inter 1 : ', rdata[0,5:7] )

For the real data, the description if straighforward when reading.
But for the integer data it is more difficult to know what is inside.

For example if listing the *idata* fields :

In [None]:
idata_list = list( hfile['Help/VlocRloc/idata'].keys() )
print(idata_list)

There is some *LMGC90* lingo :
* bdyty   : type of body
* ibdyty  : index of body
* tactype : type of contactor
* itacty  : index of contactor

And so on. But these data are all integers whereas for some of them a string is the natural human description.
Again the *Help* group provides the answer to this caveat with the *parameter* group. Let's check the content :

In [None]:
list( hfile['Help/parameters'].keys() )

If the type of interaction is of interest :

In [None]:
list( hfile['Help/parameters/inter_id'] )

In [None]:
inter_name = hfile['Help/parameters/inter_id/name'][()]
print(inter_name)

So to get the type of the first interaction one as to do something like

In [None]:
inter_id_index = hfile['Help/VlocRloc/idata/inter_id/bound'][()] - 1
print(inter_id_index)
print(idata[0,0])
inter_type = inter_name[ idata[0,inter_id_index[0]]-1 ]
print( inter_type.decode() )

With all this, it is possible to get back all information stored inside
the file. Of course it is a little awkward. That is why some functions
are proposed in another notebook to make the interpretation easier.

In [None]:
hfile.close()

In the next notebook, it will be explained how to use the *parameter* group
to define some python dictionnaries allowing to make data use a litlle easier.