# Manipulating an HDF5 file with python. 

The aim of this notebook is to teach how to do basic manipulation (read / write) on an HDF5 file. This uses the SWAXSanalysis.utils module and some basic h5py functions. This is very base level, even the functions in the utils module use h5py only.

An example HDF5 file is provided in the **.\Data Treatment Center\Jupyter notebooks\NoteBook\Example HDF5** folder.

In [1]:
# Imports

%matplotlib ipympl

import os
import h5py

from pathlib import Path
from SWAXSanalysis.utils import explore_file, extract_from_h5, replace_h5_dataset
from SWAXSanalysis.class_nexus_file import NexusFile

  File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\AT280565\Desktop\Data Treatment Center\Jupyter notebooks\.venv\lib\site-packages\ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "C:\Users\AT280565\Desktop\Data Treatment Center\Jupyter notebooks\.venv\lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance
    app.start()
  File "C:\Users\AT280565\Desktop\Data Treatment Center\Jupyter notebooks\.venv\lib\site-packages\ipykernel\kernelapp.py", line 739, in start
    self.io_loop.start()
  File "C:\Users\AT280565\Desktop\Data Treatment Center\Jupyter notebooks\.venv\lib\site-packages\tornado\platform\asyncio.py", line 211, in start
    self.asyncio_loop.run_forever()
  File "C:\Python310\lib\asyncio\base_events.py", line 595, in run_forever
    self._run_once()
 

# Inspecting your h5 file

The function `explore_file` can be used to visualize the structure of the HDF5 file you want to treat. Thanks to this function you'll be able to know precisely where everything is. alternatively, you can use HDFView to visualize your HDF5 file.
There are three types of element :

    - GROUPS : you can view them as a directory, it can contain other groups, dataset or attributes
    - data_sets : you can view them as a file, it can only contain attributes
    - @attributes : you can view them as metadata, it cannot contain anything and gives additional information
    
To open an HDF5 file in a python script you need to use the `h5py` library and use the `with h5py.File("path", "r") as file:`, indent to get access to the file variable (which is an h5py object) and unindent when you're done. This ensures that the file is properly opened and closed.

In [2]:
example_hdf5_path =  Path(r".\Example HDF5\testSample_SAXS_00001.h5")

with h5py.File(example_hdf5_path, "r") as file_object:
    explore_file(file_object, explore_group=True, explore_attribute=False)

Exploring HDF5 structure...

├──Group : ENTRY
|  ├──Group : ENTRY/COLLECTION
|  |  ├──Dataset : ENTRY/COLLECTION/do_absolute_intensity
|  |  ├──Dataset : ENTRY/COLLECTION/experiment_type
|  |  ├──Dataset : ENTRY/COLLECTION/exposition_time
|  |  ├──Dataset : ENTRY/COLLECTION/geometry
|  |  ├──Dataset : ENTRY/COLLECTION/sample_fixture
|  |  ├──Dataset : ENTRY/COLLECTION/username
|  ├──Group : ENTRY/DATA
|  |  ├──Dataset : ENTRY/DATA/I
|  |  ├──Dataset : ENTRY/DATA/Idev
|  |  ├──Dataset : ENTRY/DATA/Q
|  |  ├──Dataset : ENTRY/DATA/Qmean
|  |  ├──Dataset : ENTRY/DATA/mask
|  ├──Group : ENTRY/DATA_ABS
|  |  ├──Dataset : ENTRY/DATA_ABS/I
|  |  ├──Dataset : ENTRY/DATA_ABS/Idev
|  |  ├──Dataset : ENTRY/DATA_ABS/Q
|  |  ├──Dataset : ENTRY/DATA_ABS/Qmean
|  |  ├──Dataset : ENTRY/DATA_ABS/mask
|  ├──Group : ENTRY/DATA_AZI_AVG
|  |  ├──Dataset : ENTRY/DATA_AZI_AVG/Chi
|  |  ├──Dataset : ENTRY/DATA_AZI_AVG/I
|  |  ├──Dataset : ENTRY/DATA_AZI_AVG/Idev
|  |  ├──Dataset : ENTRY/DATA_AZI_AVG/Qdev
|  | 

# Getting the value of a dataset
We can now see what's inside the HDF5 file. In case we want to change something in this file, we need to get it's path in the HDF5 file.

As an example, let's get the value of the source's wavelength. This parameter, "incident_wavelength", is present in the "SOURCE" group, itslef inside the "Instrument" group, itself inside the "ENTRY" group.

The path of the "wavelength" element is thus :\
**ENTRY/INSTRUMENT/SOURCE/incident_wavelength**

### Warning
If the value you're extracting is a string of characters, you need to decode it via the `.decode()` method !

Now that you have the path you can use the `extract_from_h5` function to get the value. here is how to do it :

In [3]:
with h5py.File(example_hdf5_path, "r") as file_object:
    wavelength_value = extract_from_h5(
        file_object, 
        h5path="ENTRY/INSTRUMENT/SOURCE/incident_wavelength", 
        data_type="dataset"
    )
    wavelength_unit = extract_from_h5(
        file_object,
        h5path="ENTRY/INSTRUMENT/SOURCE/incident_wavelength", 
        data_type="attribute", 
        attribute_name="units"
    )

# We can then use those values
print(f"Incident wavelength = {wavelength_value} {wavelength_unit}")

Incident wavelength = 0.9999999999999999 nm


# Changing the value of a Dataset or Attribute.
Let's say a mistake has been made during calibration and it affected the results. If you want to change the faulty value you can use the function `replace_h5_dataset`. 

To use it, you need to prive a file opened as an h5py object, using the same command as before, except this time we do not use `"r"`, which stands for read, but `"r+"` which stands for read/write. Then, we provide the arguments to the function and let it do it's thing.

Here is an example

In [4]:
with h5py.File(example_hdf5_path, "r+") as file_object:
    # We check the value
    sdd_value = extract_from_h5(
        file_object, 
        h5path="ENTRY/INSTRUMENT/DETECTOR/SDD", 
        data_type="dataset"
    )
    sdd_unit = extract_from_h5(
        file_object,
        h5path="ENTRY/INSTRUMENT/DETECTOR/SDD", 
        data_type="attribute", 
        attribute_name="units"
    )
    print(f"SDD before change : {sdd_value} {sdd_unit}")
    

    # We change the value
    replace_h5_dataset(
        file_object,
        old_h5path="ENTRY/INSTRUMENT/DETECTOR/SDD",
        new_dataset=0.975
    )

    # We change the unit
    attributes_dict = file_object["ENTRY/INSTRUMENT/DETECTOR/SDD"].attrs
    attributes_dict["units"] = "m"


    # We check the value again
    sdd_value = extract_from_h5(
        file_object, 
        h5path="ENTRY/INSTRUMENT/DETECTOR/SDD", 
        data_type="dataset"
    )
    sdd_unit = extract_from_h5(
        file_object,
        h5path="ENTRY/INSTRUMENT/DETECTOR/SDD", 
        data_type="attribute", 
        attribute_name="units"
    )
    print(f"SDD after change : {sdd_value} {sdd_unit}")

SDD before change : 1.0 arbitrary
SDD after change : 0.975 m


# Troubleshooting

1. In case you forgot to use the try: / finally: environement and you're having some error saying a file is already used by a process : Close the notebook and the command invite and go to the directory where your files are, there should be a .tmp file, you can delete it and reopen the notebook. To avoid this error please use the try: / environement: 