io -> Input/output


# Introduction

ESRF data come in (too many) different formats:

* Specfile
* EDF
* HDF5

and specific detector formats:

* MarCCD
* Pilatus CBF
* Dectris Eiger
* …


HDF5 is expected to become the standard ESRF data format. Some beamlines have already switched.

# Accessing ESRF data

## Libraries


* h5py
    * Access to HDF5 files
* FabIO
    * Provides access to several image data formats
    * Developed as part of the Fable project, initially an ID11 development.
    * Managed by the DAU
* silx
    * Started in 2015
    * Will provide input/output for PyMCA
    * Also provides fitting, image processing, plotting, a set of widgets…
    * Managed by the DAU


Those are already available for most ESRF computers

```bash
>>> apt-get install python3-silx python3-fabio python3-h5py
```

Cross platform (Available for Windows, Linux, Mac OS X)
```bash
>>> pip install silx fabio h5py
```


Also available from source code (under MIT license)

* https://github.com/silx-kit/silx
* https://github.com/silx-kit/fabio
* https://github.com/h5py/h5py

## Spec files

* text format from Spec
* contains evolution of measurments and instruments during a scan
* we do not recommand to use this format anymore
* silx provides a HDF5-like read access to Spec files

### Spec compatibility

* PyMCA was previously often used as a Python library to read Spec files
* now prefer using silx

```python
# instead of
from PyMca5.PyMca import specfilewrapper

# prefer using
from silx.io import specfilewrapper
```

### How to read a spec file

An example is given later in [spec files using silx](#Reading-spec-files-using-silx)

## EDF files


* ESRF data format
* It contains
    * 1D/2D/3D array of float/integer
    * Header containing various informations
    * Multi-frames (more than one image in a single file)
    * Often used as file series
* Library
    * Use FabIO
    * silx provides a HDF5-like read access

### reading EDF files using fabIO

In [None]:
import fabio

image = fabio.open("data/medipix.edf")

# here is the data as a numpy array
print(image.data)

# here is the header as key-value dictionary
print(image.header)

### writing files using fabIO

In [None]:
import numpy
import fabio

image = numpy.random.rand(10, 10)
metadata = {'pixel_size': '0.2'}

image = fabio.edfimage.EdfImage(data=image, header=metadata)
image.write('edf_writing_example.edf')

### Other formats using FabIO

#### Reading other formats

FabIO supports image formats from most manufacturers: 
Mar, Rayonix, Bruker, Dectris, ADSC, Rigaku, Oxford, General Electric…

```python
import fabio

pilatus_image    = fabio.open('filename.cbf')
marccd_image     = fabio.open('filename.mccd')

tiff_image       = fabio.open('filename.tif')
fit2d_mask_image = fabio.open('filename.msk')
jpeg_image       = fabio.open('filename.jpg')

```

#### File conversion

Using FabIO you can directly convert data to an other format 

```python
import fabio
image = fabio.open('data/medipix.edf')
image = image.convert('tif')
image.save('filename.tif')
```
(you can also use the command-line fabio-convert)

# HDF5

## HDF5 introduction

HDF5 (for Hierarchical Data Format) is a file format to structure and store data for high volume and complex data

* Hierarchical collection of data (directory and file, UNIX-like path)
* High-performance (binary)
* Standard exchange format for heterogeneous data
* Self-describing extensible types, rich metadata
* Support data compression

Data can be mostly anything: image, table, graphs, documents



## HDF5 description

The container is mostly structured with:

* **File**: the root of the container
* **Group**: a grouping structure containing groups or datasets
* **Dataset**: a multidimensional array of data elements
* And other features (links, attributes, datatypes)

![hdf5_class_diag](images/hdf5_model.png "hdf5 class diagram")



## HDF5 example

Here is an example of the file generated by pyFAI

![hdf5_example](images/hdf5_example.png "hdf5 example")

Here we read a specific dataset

In [None]:
import h5py

h5file = h5py.File('data/test.h5')

# print available names at the first level
print("First children:", list(h5file['/'].keys()))

In [None]:
# reaching a dataset from a sub group
dataset = h5file['/diff_map_0004/data/map']

# using size and types to not read the full stored data
print("Dataset:", dataset.shape, dataset.size, dataset.dtype)

datasets mimics numpy-array

In [None]:
# read and apply the operation
print(dataset[5, 5, 0:5])
print(2 * dataset[0, 5, 0:5])

In [None]:
# copy the data and store it as a numpy-array
b = dataset[...]
b[0, 0, 0:5] = 0
print(dataset[0, 0, 0:5])
print(b[0, 0, 0:5])

## h5py write example

In [None]:
import numpy
import h5py

data = numpy.arange(10000.0)
data.shape = 100, 100

# write
h5file = h5py.File('my_first_one.h5', mode='w')

# write data into a dataset from the root
h5file['/data1'] = data

# write data into a dataset from group1
h5file['/group1/data2'] = data

h5file.close()

## Usefull tools for HDF5

* h5ls, h5dump, hdfview
```bash
>>> h5ls -r my_first_one.h5 
>>> /                        Group
>>> /data1                   Dataset {100, 100}
>>> /group1                  Group
>>> /group1/data2            Dataset {100, 100}
```

* h5py
* silx
* silx view

==> The HDF group provides a web page with more tools https://support.hdfgroup.org/HDF5/doc/RM/Tools.html

# silx io

* Try to simplify the transition to HDF5
    * Provide a h5py-like API on top of format used at ESRF
    * Single way to access to Spec/EDF/HDF5 files
    * Based on NeXus specifications http://www.nexusformat.org/
* Read-only

## Read HDF5 using silx

For conveniance, ``silx`` also provides the h5py API for HDF5 files.

In [None]:
import silx.io

h5file = silx.io.open('data/test.h5')

In [None]:
# print available names at the first level
print("First children:", list(h5file['/'].keys()))

In [None]:
# reaching a dataset from a sub group
dataset = h5file['/diff_map_0004/data/map']

# using size and types to not read the full stored data
print("Dataset:", dataset.shape, dataset.size, dataset.dtype)

## spec files using silx

Silx can also expose spec file with a HDF5-like mapping

### HDF5-like mapping  (given for general information)

![mapping_spec](images/spech5_arrows.png "hdf5-like mapping for spec files")


### Reading spec files using silx

In [None]:
import silx.io
data = silx.io.open('data/oleg.dat')

# print available scans
print("First childs:", data['/'].keys())

# print available measurements from the scan 94.1
print("Containt of measurement:", data['/94.1/measurement'].keys())

# get data from measurement
xdata = data['/94.1/measurement/Epoch']
ydata = data['/94.1/measurement/bpmi']
for row in zip(xdata, ydata):
    print(row)

For more information and examples you can read the silx IO tutorial: https://github.com/silx-kit/silx-training/blob/master/silx/io/io.pdf

## EDF files using silx

Silx can also expose spec file with a HDF5-like mapping

### HDF5-like mapping (given for general information)

![mapping_spec](images/fabioh5_arrows.png "hdf5-like mapping for EDF files")


### Read EDF file using silx

In [None]:
import silx.io
data = silx.io.open('data/ID16B_diatomee.edf')

# Access to the frames
frames = data['/scan_0/instrument/detector_0/data']
len(frames)  # number of frames
frames[0]    # first frame
print("Number of frames:", len(frames))
print("Size of an image:", frames[0].shape)

# Access to motors, monitor, timestanp
srot = data['scan_0/instrument/positioners/srot'][...]
mon = data['scan_0/measurement/mon'][...]
timestamp = data['scan_0/instrument/detector_0/others/time_of_day'][...]
for row in zip(timestamp, srot, mon):
    print(row)

## Silx Tools / utils

### silx.io.utils.h5ls
List tree contains
`h5ls` allow you to display the tree contained into an HDF5 file.

In [None]:
import silx.io
import silx.io.utils

h5file = silx.io.open('data/test.h5')

string = silx.io.utils.h5ls(h5file)
print(string)

### silx.io.convert.write_to_h5

Convert spec file to HDF5

In [None]:
from silx.io.convert import write_to_h5

write_to_h5('data/oleg.dat', 'oleg.h5', mode='w')

In [None]:
ls -al oleg.*

# Exercise


1. Read the EDF file ``medipix.edf``.
2. Data processing. The goal of the processing is to clamp the pixels values to a new range of values ([10%, 90%] of the existing one). To do so:

    * Create a mask to detect pixel which are below 10% 
    * With the above mask, set the affected pixels to the 10% 'low value'.
    * do the same for value above 90%
    * create the mask of all the modify pixel

3. Store the source, the mask of changed pixels and the result inside ``process.h5``, as below.

   ![Output file structure](images/exercise-result.png)

4. Load ``process.h5`` and list the root content


In [None]:
# Load data/medipix.edf
# ...

# Process the data
# ...

# Save data into a new file (process.h5)
# ...

# Load process.h5 and list the root content
# ...

## Solution

In [None]:
# Load data/medipix.edf
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.load_data))

In [None]:
# process data
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.process_data))

In [None]:
# save data
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.save_data))

In [None]:
# list root
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.list_root))

In [None]:
# result
import exercicesolution
raw_data, proc_data, mask = exercicesolution.solution("data/medipix.edf")

In [None]:
%pylab

In [None]:
imshow(mask)

In [None]:
imshow(raw_data)

In [None]:
imshow(proc_data)

# Conclusion

Preconized library according to the use case and the file format.

| Formats              | Read       | Write |
|----------------------|------------|-------|
| HDF5                 | silx/h5py  | h5py  |
| Specfile             | silx       |       |
| EDF multiframe       | silx/fabio | fabio |
| EDF                  | fabio      | fabio |
| Other raster formats | fabio      | fabio |