# Data IO (input/output)


# Introduction

ESRF data comes in (too many) different formats:

* Specfile
* EDF
* HDF5

And specific detector formats:

* MarCCD
* Pilatus CBF
* Dectris Eiger
* …


# Accessing ESRF data

## Libraries


* h5py
    * Access to HDF5 files
* FabIO
    * Access to several image data formats
    * Managed by the DAU
* silx
    * Normalized way to access data
    * Helper to simplify the transition to HDF5
    * `silx view` to show the file structure
    * Data processing functions
    * Managed by the DAU

# Accessing ESRF data

## Libraries


Those are already available for most ESRF computers.

Cross platform (available for Windows, Linux, Mac OS X)

Also available from source code (under MIT license)

* https://github.com/silx-kit/silx
* https://github.com/silx-kit/fabio
* https://github.com/h5py/h5py

## Spec files

* Text format from Spec sequencer
* Contains evolution of measurments and instruments during a scan
* We do not recommand to use this format anymore
* `silx` provides a HDF5-like read access to Spec files

### Spec compatibility

* PyMCA was previously often used as a Python library to read Spec files
* Now prefer using silx

In [None]:
# instead of
from PyMca5.PyMca import specfilewrapper
# prefer using
from silx.io import specfilewrapper

### How to read a spec file

An example is given later in [spec files using silx](#Read-Spec-file-as-an-HDF5)

## EDF files


* ESRF data format
* It contains
    * Header containing various information
    * 1D/2D/3D array of float/integer
    * Multi-frames (more than one image in a single file)
    * Often used as file series
* Library
    * Use `fabio`
    * `silx` provides a HDF5-like read access

## Read a single EDF image

In [None]:
import fabio

image = fabio.open("data/medipix.edf")

In [None]:
# Here is the data as a Numpy array
print(image.data)
# Here is the header as a key-value dictionary
print(image.header.keys())

In [None]:
# Better to use a context manager
with fabio.open("data/medipix.edf") as image:
    print(image.header["dir"])

## Read a multi-frame EDF image

A file containing many frames.

In [None]:
import fabio

with fabio.open("data/ID16B_diatomee.edf") as image:

    print("Nb frames: %d" % image.nframes)

    for frame in image.frames():

        average = frame.data.mean()
        
        message = "Frame ID: %d    Data average: %0.2f"
        print(message % (frame.index, average))

## Read a file-series of EDF image

A file-series is a set of files that have to be iterated, and which may contains many frames.

`open_series` can be used:

- http://www.silx.org/doc/fabio/latest/getting_started.html#fabio-file-series

In [None]:
import fabio

with fabio.open_series(first_filename="data/ID19_D2H2T2_0000.edf") as series:

    print("Nb frames: %d" % series.nframes)

    for frame in series.frames():

        average = frame.data.mean()

        message = "Filename: %s    Frame ID: %d    Data average: %0.2f"
        print(message % (frame.file_container.filename, frame.index, average))

## Write an EDF file

In [None]:
import numpy as np
import fabio

image = np.random.rand(10, 10)
metadata = {'pixel_size': '0.2'}

image = fabio.edfimage.EdfImage(data=image, header=metadata)
image.write('edf_writing_example.edf')

## Other formats using FabIO

### Reading other formats

FabIO supports image formats from most manufacturers: 
Mar, Rayonix, Bruker, Dectris, ADSC, Rigaku, Oxford, General Electric…

In [None]:
import fabio

pilatus_image    = fabio.open('filename.cbf')
marccd_image     = fabio.open('filename.mccd')

tiff_image       = fabio.open('filename.tif')
fit2d_mask_image = fabio.open('filename.msk')
jpeg_image       = fabio.open('filename.jpg')

# Module `silx.io`

* Try to simplify the transition to HDF5
    * h5py-like API
    * Single way to access to Spec/EDF/HDF5 files
    * Based on NeXus specifications http://www.nexusformat.org/
* Read-only

## General mapping from Spec file

Silx can expose spec files with a HDF5-like mapping.

![Mapping from Spec to HDF5](images/spech5_arrows.png "hdf5-like mapping for spec files")


## General mapping from EDF image

Silx can expose EDF files (or any support formats from `fabio`) with a HDF5-like mapping.

![Mapping from EDF to HDF5](images/fabioh5_arrows.png "hdf5-like mapping for EDF files")


## Display the mapping with tools

* `silx view` a command line Qt program.
* `silx.io.utils.h5ls`

In [None]:
import silx.io
import silx.io.utils

with silx.io.open('data/oleg.dat') as h5file:
    string = silx.io.utils.h5ls(h5file)
    print(string)

## Read Spec file as HDF5

In [None]:
import time
import silx.io
data = silx.io.open('data/oleg.dat')

# Available scans
print("First childs:", data['/'].keys())

# Available measurements from the scan 94.1
print("Containt of measurement:", data['/94.1/measurement'].keys())

# Get data from measurement
epoch = data['/94.1/measurement/Epoch']
bpmi = data['/94.1/measurement/bpmi']
for t, data in zip(epoch, bpmi):
    t = time.strftime("%X", time.gmtime(int(t)))
    print("%s   BPMi: %0.4e" % (t, data))

For more information and examples you can read the silx IO tutorial: https://github.com/silx-kit/silx-training/blob/main/silx/io/io.pdf

## Read EDF image as HDF5

In [None]:
import silx.io
data = silx.io.open('data/ID16B_diatomee.edf')

# Access to the frames
frames = data['/scan_0/instrument/detector_0/data']
len(frames)  # number of frames
frames[0]    # first frame
print("Number of frames:", len(frames))
print("Size of an image:", frames[0].shape)

# Access to motors, monitors, timestamp
srot = data['scan_0/instrument/positioners/srot'][...]
mon = data['scan_0/measurement/mon'][...]
timestamp = data['scan_0/instrument/detector_0/others/time_of_day'][...]
for t, s, m in zip(timestamp, srot, mon):
    t = time.strftime("%X", time.gmtime(t))
    message = "%s   Rot:% 5.1fdeg   Monitor: %0.2f"
    print(message % (t, s, m))

## Read HDF5 using silx

For convenience, ``silx`` also provides the h5py API for HDF5 files.

In [None]:
import silx.io
h5file = silx.io.open('data/test.h5')

# print available names at the first level
print("First children:", h5file['/'].keys())

# reaching a dataset from a sub group
dataset = h5file['/diff_map_0004/data/map']

# using size and type do not read the full stored dataset
print("Dataset:", dataset.shape, dataset.size, dataset.dtype)

h5file.close()

# Exercice: Flat field correction

Flat-field correction is a technique used to improve quality in digital imaging.

The goal is to normalize images and remove artifacts caused by variations in the pixel-to-pixel sensitivity of the detector and/or by distortions in the optical path. (see https://en.wikipedia.org/wiki/Flat-field_correction)

$$ normalized = \frac{raw - dark}{flat - dark} $$

* `normalized`: Image after flat field correction
* `raw`: Raw image. It is acquired with the sample.
* `flat`: Flat field image. It is the response given out by the detector for a uniform input signal. This image is acquired without the sample.
* `dark`: Also named `background` or `dark current`. It is the response given out by the detector when there is no signal. This image is acquired without the beam.

# Exercice: Implementation with EDF files

Here is a function implementing the flat field correction:

In [None]:
import numpy as np

def flatfield_correction(raw, flat, dark):
    """
    Apply a flat-field correction to a raw data using a flat and a dark.
    """
    # Make sure that the computation is done using float
    # to avoid type overflow or loss of precision
    raw = raw.astype(np.float32)
    flat = flat.astype(np.float32)
    dark = dark.astype(np.float32)
    # Do the computation
    return (raw - dark) / (flat - dark)

And a `matplotlib` function to display the data.

In [None]:
def imshowmany(*args, **kwargs):
    """
    Display all array provided as argument as images.
    
    The image title is defined by the argument name.
    """
    from matplotlib import pyplot

    if len(kwargs) == 0:
        import collections
        kwargs = collections.OrderedDict()
    for i, arg in enumerate(args):
        if isinstance(arg, dict):
            kwargs.update(arg)
        else:
            kwargs["arg" + i]

    fig = pyplot.figure()
    columns = 3
    nbrows = len(kwargs) // columns + 1
    nbcols = len(kwargs) // nbrows
    for i, (key, value) in enumerate(kwargs.items()):
        a = fig.add_subplot(nbrows, nbcols, i + 1)
        imgplot = plt.imshow(value)
        a.set_title(key)

Here is an implementation of a flat field correction applied to a single EDF file.

The sample is a diatom, an unicellular algae inserted into a needle.

In [None]:
# Provides interactive display in the notebook
%pylab notebook

In [None]:
import fabio

# Read the data

with fabio.open("data/ID16_diatomee/dark.edf") as image:
    dark = image.data
with fabio.open("data/ID16_diatomee/flat.edf") as image:
    flat = image.data
with fabio.open("data/ID16_diatomee/data.edf") as image:
    raw = image.data

# Compute the result

normalized = flatfield_correction(raw, flat, dark)

# Save the result

image = fabio.edfimage.EdfImage(data=normalized)
image.save("result.edf")

# Check the saved result

with fabio.open("result.edf") as image:
    saved = image.data
imshowmany(Before=raw, After=saved)

# Conclusion

Preconized libraries according to the use case and the file format.

| Formats              | Read            | Write |
|----------------------|-----------------|-------|
| HDF5                 | silx/h5py       | h5py  |
| Specfile             | silx            |       |
| EDF                  | silx/fabio      | fabio |
| Other raster formats | silx/fabio      | fabio |