*Forenote*: despite the way of working presented here may have an interest to extract direct data without having to reload the whole LMGC90 database, there is a lot of information missing since the geometries of the bodies are not present in the file. As such reading the documentation of [a posteriori management](../../apiPostpro.ipynb) of data with LMGC90 is still, in the authors' sense, the best approach since it only relies on the LMGC90 API.

# Post with NumPy

It is possible to extract the data stored in the HDF5 file and to store them in a numpy array. The benefit would be to easily access the data stored in an efficient way, without having to wonder how they have been saved in the file.

To make things easier, most of the job has been hidden in a `get_numpy_frame` function in the *utils* module next to this notebook.

Furthermore, for efficiency's sake, there are some internal data of LMGC90 which, instead of being represented with strings are represented with an integer parameter. For example, to describe if a body is rigid in 2D, instead of using the string `RBDY2` (which is the historical keyword for this), the code use just `1`.

So, the raw data extracted from the binary file is not straightforwardly usable. The first thing is to get the mapping between the integer number and the associated string (which is stored inside the file), which is done with the `get_parameters` function of the *utils* module.

To understand how these functions were written, the interested reader can have a look into:
* *HDF5_basis.ipynb* notebook which is in the *Tutorials/post/by_hand* directory and explains how to read the content of the file
* *HDF5_coordination.ipynb* notebook which is in the *Tutorials/post/by_hand* directory and show a simple example of direct information extraction.

### Imports

So let us start by importing everything needed in the notebook:

In [None]:
import h5py
import numpy as np

from utils import get_parameters, get_numpy_frame

### Parameters

The first thing to get is the different parameters mappings. Use the function aforementionned ; it is then possible to explore the content to get a rough understanding of what is stored in it.

In [None]:
parameters = get_parameters('../lmgc90.h5')

In [None]:
print( parameters.keys() )
print( parameters['bdyty'] )

It is important to remember this point if there is a need to directly look into the hdf5 file using either the `h5dump` utility (which drop all data in a text file) or using some third party graphical tools allowing to explore the content of your file.

For example, by looking into the *VlocRloc* section of a recording, looking into the integer data of particular interaction, the first column describes which type of interaction it is. Having a way to remap that this integer is in fact a classical interaction type of LMGC90 is more convenient :

In [None]:
parameters['inter_id'][15]

### Rough hierarchy

As the HDF name states (**H**ierarchy **D**ata **F**ile), there is a logical construction of the file. Without explaining everything, the requirement to understand how to extract data is to know that there are three groups at the root of the file:
* *Simulation* which contains fixed data along the simulation (number of time steps, dimension, integrator...)
* *Evolution* which contains subgroup with the pattern name *ID_x* with *x* a number of record which is a increasing integer starting at 1.
* *Help* which contains meta data on the content of each field and the parameters mapping.

There also some data stored directly into the root group allowing to check the version of LMGC90 with which the file has been generated.

Then in an *ID_x* group there may be several subgroups describing:
* *RBDY2*
* *RBDY3*
* *MAILx* which in itself may contain:
  * *mecax*
  * *therx*
  * *porox*
* *VlocRloc*

Generally speaking, each of this subgroup will have two sets of data associated which are *idata* for integer data and *rdata* with real data.

### Extracting a record

First thing is, the user must open the file to check how many records are stored and decide which one is to be read.

**Warning**: it is really important when opening the HDF5 file for reading, to close it once done with it. Otherwise, even if python is closed, the file itself, mays still considered itself opened and deemed *unavailable* or *already opened* when wanting to access it later. 

In [None]:
hfile = '../lmgc90.h5'

with h5py.File(hfile, 'r') as hf:
    nb_record = int( hf['Simulation/nb_record'][()] )

print( f"number of record saved: {nb_record}")

In [None]:
id_record = 1
assert 0 < id_record <= nb_record, "[ERROR] wrong record number"

Now it is possible to use the `get_numpy_frame` function to extract the data from the file, using the `parameters` dictionnary to remap the integer data to intelligible strings.

First thing is now to get two numpy arrays holding all integer and real data for the all the interactions of a given record:

In [None]:
basegroup = "Evolution/ID_"+str(id_record)

hgroup = 'VlocRloc/idata'
iinter = get_numpy_frame(hfile, basegroup, hgroup, parameters)

hgroup = 'VlocRloc/rdata'
rinter = get_numpy_frame(hfile, basegroup, hgroup)

Once the array is generated, each element of the array is described by the `dtype` of the array.
Slicing works as usual to access the different elements of the array.
To extract a *column* of the array, a list of string of the different fields can be provided.

In [None]:
print( iinter.shape )
iinter.dtype

In [None]:
iinter[0]

In [None]:
iinter[50:75]['inter_id']

From this point on, to quickly extract and manipulate the data of the array, a good understanding of how the [indexing of the array](https://numpy.org/doc/stable/reference/arrays.indexing.html#integer-array-indexing)
works and how to efficiently use [mask to extract data](https://numpy.org/doc/stable/reference/arrays.indexing.html#boolean-array-indexing).

In [None]:
# counting each type of interaction...
print( 'type of interactions : ', np.unique(iinter['inter_id']) )
for i_id in np.unique(iinter['inter_id']):
    mask = iinter['inter_id']==i_id
    print( i_id, np.sum( mask ) )

In [None]:
# adjancence table of the 12 first candidates:
list_cd = np.unique(iinter['ibdyty'][:,0])
for cd in list_cd[:12]:
    mask = iinter['ibdyty'][:,0] == cd
    print( f"candidate {cd} has {np.sum( mask )} antagonist : {iinter['ibdyty'][mask,1]}" )

Finally, remember that numpy offers a set of function allowing to select, or compute (sum, min, max) on array extremly efficiently. For example, to compute the mean of normal reaction on all *DKDKx* contacts, one would have to do:

In [None]:
# extract the list of DKDKx
dkdkx = iinter['inter_id'] == b'DKDKx'
dkdkx = dkdkx[:,0]
# compute mean of rn
np.mean( rinter['rl'][dkdkx,1] )

In [None]:
# compute mean of rn only counting when gap is positif
gap_ok = rinter['gapTT'] <= 0.
gap_ok = gap_ok[:,0]
dkdk_gap_ok = np.logical_and( dkdkx, gap_ok )
np.mean( rinter['rl'][dkdk_gap_ok,1])

As an example, to get the coordination number from this list of *DKDKx* interaction which have a null or negative *gapTT*, the adjacent map and coordination number can be generated (in the same way than in the *Tutorials/post/by_hand* notebook example):

In [None]:
list_cd = np.unique(iinter[dkdk_gap_ok]['ibdyty'])
coordination_number = {}
for cd in list_cd:
    mask = iinter['ibdyty'][:,0] == cd
    coordination_number[cd] = np.sum( mask )

nbc = np.array( [*coordination_number.values() ])

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
plt.hist( nbc, bins=np.max(nbc) )

Finally a different array can be generated to read the data of rigid bodies:

In [None]:
hgroup = 'RBDY2/idata'
ibody  = get_numpy_frame(hfile, basegroup, hgroup, mapper=parameters)

hgroup = 'RBDY2/rdata'
rbody  = get_numpy_frame(hfile, basegroup, hgroup)

# hgroup = 'MAILx/mecax/flux'
# fmeca  = get_numpy_frame(hf, basegroup, hgroup)