# Input/output


## Introduction

ESRF data come in (too many) different formats:

* Specfile
* EDF
* HDF5

and specific detector formats:

* MarCCD
* Pilatus CBF
* Dectris Eiger
* …


HDF5 is expected to become the standard ESRF data format. Some beamlines have already switched.

## Accessing ESRF data

### Libraries


* h5py
    * Access to HDF5 files
* FabIO
    * Provides access to several image data formats
    * Developed as part of the Fable project, initially an ID11 development.
    * Managed by the DAU
* silx
    * Started in 2015
    * Will provide input/output for PyMCA
    * Also provides fitting, image processing, plotting, a set of widgets…
    * Managed by the DAU


Those are already available for most ESRF computers

```bash
>>> apt-get install python3-silx python3-fabio python3-h5py
```

Cross platform (Available for Windows, Linux, Mac OS X)
```bash
>>> pip install silx fabio h5py
```


Also available from source code (under MIT license)

* https://github.com/silx-kit/silx
* https://github.com/silx-kit/fabio
* https://github.com/h5py/h5py

## Spec files

* text format from Spec
* contains evolution of measurments and instruments during a scan
* we do not recommand to use this format anymore
* silx provides a HDF5-like read access to Spec files

### Spec compatibility

* PyMCA was previously often used as a Python library to read Spec files
* now prefer using silx

```python
# instead of
from PyMca5.PyMca import specfilewrapper

# prefer using
from silx.io import specfilewrapper
```

## EDF files


* ESRF data format
* It contains
    * 1D/2D/3D array of float/integer
    * Header containing various informations
    * Multi-frames (more than one image in a single file)
    * Often used as file series
* Library
    * Use FabIO
    * silx provides a HDF5-like read access

### reading EDF files using fabIO

In [2]:
import fabio

image = fabio.open("data/medipix.edf")

# here is the data as a numpy array
print(image.data)

# here is the header as key-value dictionary
print(image.header)

[[0 0 0 ... 0 0 0]
 [2 0 1 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
{
  "{\nHeaderID": "EH:000001:000000:000000",
  "Image": "1",
  "ByteOrder": "LowByteFirst",
  "DataType": "UnsignedShort",
  "Dim_1": "256",
  "Dim_2": "256",
  "Size": "131072",
  "det_orient": "1",
  "d_sample_det": "1330",
  "pixel_zero_y": "80",
  "pixel_zero_x": "100",
  "pixel_size_y": "0.055",
  "pixel_size_x": "0.055",
  "offset": "0",
  "count_time": "1",
  "point_no": "101",
  "scan_no": "122",
  "preset": "0",
  "cnt_col_end": "249",
  "cnt_col_beg": "49",
  "cnt_row_end": "236",
  "cnt_row_beg": "36",
  "col_end": "255",
  "col_beg": "0",
  "row_end": "255",
  "row_beg": "0",
  "sample_pos": "4.08 4.08 4.08 90 90 90",
  "sample_mne": "U0 U1 U2 U3 U4 U5",
  "UB_pos": "1.99593e-16 2.73682e-16 -1.54 -1.08894 1.08894 1.6083e-16 1.08894 1.08894 9.28619e-17",
  "UB_mne": "UB0 UB1 UB2 UB3 UB4 UB5 UB6 UB7 UB8",
  "counter_pos": "1 175.682 2365 0 2.6729e-12 22428 

### writing files using fabIO

In [3]:
import numpy
import fabio

image = numpy.random.rand(10, 10)
metadata = {'pixel_size': '0.2'}

image = fabio.edfimage.edfimage(data=image, header=metadata)
image.write('edf_writing_example.edf')

### Other formats using FabIO

#### Reading other formats

FabIO supports image formats from most manufacturers: 
Mar, Rayonix, Bruker, Dectris, ADSC, Rigaku, Oxford, General Electric…

```python
import fabio

pilatus_image    = fabio.open('filename.cbf')
marccd_image     = fabio.open('filename.mccd')

tiff_image       = fabio.open('filename.tif')
fit2d_mask_image = fabio.open('filename.msk')
jpeg_image       = fabio.open('filename.jpg')

```

#### File conversion

Using FabIO you can directly convert data to an other format 

```python
import fabio
image = fabio.open('data/medipix.edf')
image = image.convert('tif')
image.save('filename.tif')
```
(you can also use the command-line fabio-convert)


## HDF5 introduction

HDF5 (for Hierarchical Data Format) is a file format to structure and store data for high volume and complex data

* Hierarchical collection of data (directory and file, UNIX-like path)
* High-performance (binary)
* Standard exchange format for heterogeneous data
* Self-describing extensible types, rich metadata
* Support data compression

Data can be mostly anything: image, table, graphs, documents



### HDF5 description

The container is mostly structured with:

* **File**: the root of the container
* **Group**: a grouping structure containing groups or datasets
* **Dataset**: a multidimensional array of data elements
* And other features (links, attributes, datatypes)

![hdf5_class_diag](images/hdf5_model.png "hdf5 class diagram")



### HDF5 example

Here is an example of the file generated by pyFAI

![hdf5_example](images/hdf5_example.png "hdf5 example")

Here we read a specific dataset

In [5]:
import h5py

h5file = h5py.File('data/test.h5')

# print available names at the first level
print("First children:", list(h5file['/'].keys()))

First children: ['diff_map_0000', 'diff_map_0001', 'diff_map_0002', 'diff_map_0003', 'diff_map_0004']


In [6]:
# reaching a dataset from a sub group
dataset = h5file['/diff_map_0004/data/map']

# using size and types to not read the full stored data
print("Dataset:", dataset.shape, dataset.size, dataset.dtype)

Dataset: (29, 78, 100) 226200 float32


datasets mimics numpy-array

In [9]:
# read and apply the operation
print(dataset[5, 5, 0:5])
print(2 * dataset[0, 5, 0:5])

[104.14766  103.352615 103.01642  103.24001  103.27751 ]
[205.95827 206.2795  206.5441  206.48112 206.46625]


In [11]:
# copy the data and store it as a numpy-array
b = dataset[...]
b[0, 0, 0:5] = 0
print(dataset[0, 0, 0:5])
print(b[0, 0, 0:5])

[103.45841  103.19393  103.12445  103.15601  103.203285]
[0. 0. 0. 0. 0.]


### h5py write example

In [12]:
import numpy
import h5py

data = numpy.arange(10000.0)
data.shape = 100, 100

# write
h5file = h5py.File('my_first_one.h5', mode='w')

# write data into a dataset from the root
h5file['/data1'] = data

# write data into a dataset from group1
h5file['/group1/data2'] = data

h5file.close()

### Usefull tools for HDF5

* h5ls, h5dump, hdfview
```bash
>>> h5ls -r my_first_one.h5 
>>> /                        Group
>>> /data1                   Dataset {100, 100}
>>> /group1                  Group
>>> /group1/data2            Dataset {100, 100}
```

* h5py
* silx
* silx view

==> The HDF group provides a web page with more tools https://support.hdfgroup.org/HDF5/doc/RM/Tools.html

## silx io

* Try to simplify the transition to HDF5
    * Provide a h5py-like API on top of format used at ESRF
    * Single way to access to Spec/EDF/HDF5 files
    * Based on NeXus specifications http://www.nexusformat.org/
* Read-only

### Read HDF5 using silx

For conveniance, ``silx`` also provides the h5py API for HDF5 files.

In [13]:
import silx.io

h5file = silx.io.open('data/test.h5')

In [14]:
# print available names at the first level
print("First children:", list(h5file['/'].keys()))

First children: ['diff_map_0000', 'diff_map_0001', 'diff_map_0002', 'diff_map_0003', 'diff_map_0004']


In [15]:
# reaching a dataset from a sub group
dataset = h5file['/diff_map_0004/data/map']

# using size and types to not read the full stored data
print("Dataset:", dataset.shape, dataset.size, dataset.dtype)

Dataset: (29, 78, 100) 226200 float32


### spec files using silx

Silx can also expose spec file with a HDF5-like mapping

#### HDF5-like mapping  (given for general information)

![mapping_spec](images/spech5_arrows.png "hdf5-like mapping for spec files")


### Reading spec files using silx

In [16]:
import silx.io
data = silx.io.open('data/oleg.dat')

# print available scans
print("First childs:", data['/'].keys())

# print available measurements from the scan 94.1
print("Containt of measurement:", data['/94.1/measurement'].keys())

# get data from measurement
xdata = data['/94.1/measurement/Epoch']
ydata = data['/94.1/measurement/bpmi']
for row in zip(xdata, ydata):
    print(row)

First childs: odict_keys(['94.1', '95.1', '96.1'])
Containt of measurement: odict_keys(['delta', 'H', 'K', 'L', 'Epoch', 'Seconds', 'Detector', 'Ion_m1', 'Ion_m2', 'srcur', 'curratt', 'ratio', 'all1', 'psd1', 'dir1', 'refl1', 'yoneda1', 'ACEdet', 'mcaLt1', 'twago', 'bpmi', 'tlangm', 'vO2', 'apdcnt', 'apdtemp', 'Monitor', 'detcorr', 'mca_0'])
(2011829.0, 6.247502e-07)
(2011831.0, 6.253457e-07)
(2011833.0, 6.258715e-07)
(2011836.0, 6.258831e-07)
(2011838.0, 6.255509e-07)
(2011840.0, 6.253553e-07)
(2011842.0, 6.257577e-07)
(2011844.0, 6.25633e-07)
(2011846.0, 6.25749e-07)
(2011848.0, 6.257645e-07)
(2011851.0, 6.256978e-07)
(2011853.0, 6.259445e-07)
(2011855.0, 6.258398e-07)
(2011857.0, 6.257358e-07)
(2011859.0, 6.258171e-07)
(2011861.0, 6.257585e-07)
(2011863.0, 6.258035e-07)
(2011866.0, 6.25884e-07)
(2011868.0, 6.256461e-07)
(2011870.0, 6.258715e-07)
(2011872.0, 6.257519e-07)


For more information and examples you can read the silx IO tutorial: https://github.com/silx-kit/silx-training/blob/master/silx/io/io.pdf

### EDF files using silx

Silx can also expose spec file with a HDF5-like mapping

#### HDF5-like mapping (given for general information)

![mapping_spec](images/fabioh5_arrows.png "hdf5-like mapping for EDF files")


#### Read EDF file using silx

In [17]:
import silx.io
data = silx.io.open('data/ID16B_diatomee.edf')

# Access to the frames
frames = data['/scan_0/instrument/detector_0/data']
len(frames)  # number of frames
frames[0]    # first frame
print("Number of frames:", len(frames))
print("Size of an image:", frames[0].shape)

# Access to motors, monitor, timestanp
srot = data['scan_0/instrument/positioners/srot'][...]
mon = data['scan_0/measurement/mon'][...]
timestamp = data['scan_0/instrument/detector_0/others/time_of_day'][...]
for row in zip(timestamp, srot, mon):
    print(row)

Number of frames: 6
Size of an image: (540, 640)
(1465802989.9281, 0.0, 0)
(1465803040.360028, 14.371199607849121, 0)
(1465803090.780985, 28.742399215698242, 0)
(1465803140.696993, 42.969888, 0)
(1465803191.6209, 57.484798431396484, 0)
(1465803266.291432, 71.85600280761719, 0)


### Silx Tools / utils

#### silx.io.utils.h5ls
List tree contains
`h5ls` allow you to display the tree contained into an HDF5 file.

In [18]:
import silx.io
import silx.io.utils

h5file = silx.io.open('data/test.h5')

string = silx.io.utils.h5ls(h5file)
print(string)

+diff_map_0000
	+data
		<HDF5 dataset "map": shape (29, 78, 100), type "<f4">
	<HDF5 dataset "program_name": shape (), type "|S5">
	+pyFAI
		<HDF5 dataset "PONIfile": shape (), type "|S4">
		<HDF5 dataset "date": shape (), type "|S25">
		<HDF5 dataset "detector": shape (), type "|S9">
		<HDF5 dataset "dim0": shape (), type "<i8">
		<HDF5 dataset "dim1": shape (), type "<i8">
		<HDF5 dataset "dim2": shape (), type "<i8">
		<HDF5 dataset "dist": shape (), type "<f8">
		<HDF5 dataset "inputfiles": shape (2230,), type "|S80">
		<HDF5 dataset "pixel1": shape (), type "<f8">
		<HDF5 dataset "pixel2": shape (), type "<f8">
		<HDF5 dataset "poni1": shape (), type "<f8">
		<HDF5 dataset "poni2": shape (), type "<f8">
		<HDF5 dataset "program": shape (1,), type "|S8">
		<HDF5 dataset "rot1": shape (), type "<f8">
		<HDF5 dataset "rot2": shape (), type "<f8">
		<HDF5 dataset "rot3": shape (), type "<f8">
		<HDF5 dataset "version": shape (), type "|S6">
		<HDF5 dataset "wavelength": shape (), type

#### silx.io.convert.write_to_h5

Convert spec file to HDF5

In [19]:
from silx.io.convert import write_to_h5

write_to_h5('data/oleg.dat', 'oleg.h5', mode='w')

In [20]:
ls -al oleg.*

-rw-r--r-- 1 payno soft 688824 Sep 20 16:35 oleg.h5


Exercise
========

1. Read the EDF file ``medipix.edf``.
2. Process the data
   The goal of the processing is to clamp the pixels values to a new range of values ([10%, 90%] of the existing one). To do so:

   - Create a mask to detect pixel which are below 10% or above 90% of the current range.
   - With the above mask, set the affected pixels to 10% 'low value'.

3. Store the source, the mask of changed pixels and the result inside ``process.h5``, as below.

   ![Output file structure](images/exercise-result.png)

4. Load ``process.h5`` and list the root content


In [None]:
# Load data/medipix.edf
# ...

# Process the data
# ...

# Save data into a new file (process.h5)
# ...

# Load process.h5 and list the root content
# ...

Solution
========

In [None]:
# Load data/medipix.edf
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.load_data))

In [None]:
# process data
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.process_data))

In [None]:
# save data
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.save_data))

In [None]:
# list root
import exercicesolution
import inspect
print(inspect.getsource(exercicesolution.list_root))

In [2]:
# result
import exercicesolution
raw_data, proc_data, mask = exercicesolution.solution("data/medipix.edf")

root level:
['mask', 'raw', 'result']


In [3]:
%pylab

Using matplotlib backend: TkAgg
Populating the interactive namespace from numpy and matplotlib


In [4]:
imshow(mask)

<matplotlib.image.AxesImage at 0x7f0c14ae1710>

In [5]:
imshow(raw_data)

<matplotlib.image.AxesImage at 0x7f0c10a43eb8>

In [6]:
imshow(proc_data)

<matplotlib.image.AxesImage at 0x7f0c10a029b0>

# Conclusion

Preconized library according to the use case and the file format.

| Formats              | Read       | Write |
|----------------------|------------|-------|
| HDF5                 | silx/h5py  | h5py  |
| Specfile             | silx       |       |
| EDF multiframe       | silx/fabio | fabio |
| EDF                  | fabio      | fabio |
| Other raster formats | fabio      | fabio |