## Introduction

**tl;dr** : a simple function to extract the image data from HDF5 file generated by the STXM experiment at Synchrotron SOLEIL

At the [Hermes beamline](https://www.synchrotron-soleil.fr/en/beamlines/hermes) of the [Synchrotron SOLEIL](https://www.synchrotron-soleil.fr/en), the [STXM](https://en.wikipedia.org/wiki/Scanning_transmission_X-ray_microscopy) experimental setup save the data in a [HDM5](https://support.hdfgroup.org/HDF5/) file. It's great because it stores all the data and metadata in a organized way. But most of our software doesn’t read it properly, and lots of macro already written and well established are based on the TIFF format.  

So this small pieces of code, take the mdf5 file and extract the data and the reverent metadata and save them in a TIFF and json format. To keep the maximum of precision, the TIFF is encoded in 64 bits, so usual preview may not work. It may be preferable to use [ImageJ](https://imagej.nih.gov/ij/) or a similar software to read the images.


*date:* 08/01/2017

## Prerequisites
### Standard
 - Python >= 3.6
 - Numpy 
 
### Non-standard
 - [Pillow](https://pypi.python.org/pypi/Pillow/5.0.0) >= 5.0 
 - [h5py](https://pypi.python.org/pypi/h5py/2.7.1) >= 2.7.1


## Usage
*extract_image_from_hdf5*(file_name, 
                            json_file_output = True,
                            json_file_with_data = False )

#### file_name
the name of the hdf5 file

#### json_file_output 
if true a json file will be generated with some revelant metadata : 
 - file_name: the file name
 - base_name: the file name of the tiff image without the tiff extension
 - file_number: the file number if the file name is formed like : Sample_Image_Year-Month-day_XXX.hdf5 that is the default
 - data: the raw data (if  json_file_with_data  is true)
 - dwell_time: the integration time for each point
 - polarization: the polarization code
 - polarization_name: the polarization name if CR or CL
 - start_date
 - stop_date
 - x_range: in um
 - x_step: in um
 - x_nb: the number of points along x (so x_step*x_nb==x_range)
 - y_range: in um
 - y_step: in um
 - y_nb: the number of points along y (so y_step*y_nb==y_range)

### json_file_with_data
if true, the raw data will be include in the json file (if json_file_output is also true)

## Author
Sylvain Martin.

## Licence
This project is licensed under the GPL Licence - see the [LICENSE](LICENSE) file for details

In [6]:
# for matrix
import numpy as np

# for tiff
from PIL import Image
from PIL.Image import FLIP_LEFT_RIGHT    

# Perl-style regular expression patterns
import re

# for JavaScript Object Notation
import json

#for dates
from datetime import datetime


#for the Hdf5 file format
import h5py



def extract_image_from_hdf5(file_name, 
                            json_file_output = True,
                            json_file_with_data = False ) :

    #extract the number of the file
    try :
        file_number = re.match(r"Sample_Image_(\d+-\d+-\d+)_(?P<nb>\d+).hdf5", file_name).group('nb')  
    except AttributeError:
        file_number = None


    # Import the Hdm5 file
    sample_image = h5py.File(file_name,'r')

    #Load the raw data
    raw_data = sample_image['entry1/Counter0/data']


    #put the raw data in a numpy array
    data = np.array(raw_data, dtype=np.uint32)

    # normalize the array over 32bits (from -2147483648  to 2147483647) 
    # because 64 bits tiff is realy not commun and not well managed by many softwares. 
    # And the probability to have more than 4 294 967 295 counts, is quite low ^_^
    data = (data/data.max())
    data_scale = (data - data.min())/(data.max()-data.min())*(2147483647+ 2147483648) -  2147483648 

    #put it in an image and do a rotate and a mirror to keep the original system of co-ordinate
    img = Image.fromarray(np.array(data_scale, dtype=np.uint32),'I')
    img = img.transpose(FLIP_LEFT_RIGHT)
    img = img.rotate(180)


    #Dwell time in ms
    dwell_time = sample_image['entry1/Counter0/count_time'][0]

    #Energy of the X beam in eV
    energy = sample_image['entry1/collection/energy/value'][0]

    # polarization of the X beam 3 = CL, 4 = CR
    polarization = int(sample_image['entry1/collection/polarization/value'][0])

    polarization_name = "None"
    if polarization == 3 : polarization_name = "CL"
    if polarization == 4 : polarization_name = "CR"    

    # start time of the measurement
    start_date = sample_image['entry1/start_time'][0].decode('UTF-8')
    start_date = start_date[:-3] + start_date[(-3+1):]
    start_date = datetime.strptime(start_date, '%Y-%m-%dT%H:%M:%S%z')

    # end time of the measurement
    stop_date = sample_image['entry1/end_time'][0].decode('UTF-8')
    stop_date = stop_date[:-3] + stop_date[(-3+1):]
    stop_date = datetime.strptime(stop_date, '%Y-%m-%dT%H:%M:%S%z')



    #a JSON like data with everithing we can set up in when we lanche a scan
    sample_image_json = json.loads((sample_image['entry1/collection/scan_request/scan_request'][0].decode('UTF-8')))

    sample_image_json['innerRegions'][0]

    #http://docs.python-guide.org/en/latest/scenarios/json/

    # X range (if fineX/piedzo only)
    sample_image_json['innerRegions'][0]['axes'][1]['trajectories'][0]['positionerName']
    x_range = sample_image_json['innerRegions'][0]['axes'][1]['trajectories'][0]['range']
    x_step = sample_image_json['innerRegions'][0]['axes'][1]['trajectories'][0]['step']


    #Y range and step(if fineX/piedzo only)
    sample_image_json['innerRegions'][0]['axes'][0]['trajectories'][0]['positionerName']
    y_range = sample_image_json['innerRegions'][0]['axes'][0]['trajectories'][0]['range']
    y_step = sample_image_json['innerRegions'][0]['axes'][0]['trajectories'][0]['step']

    def str_round_if_possible(nb):
        if(round(nb) == nb) :
            return str(round(nb))
        else :    
            return str(nb)



    base_name  = "SI_" 
    if file_number :  base_name += file_number + "_" 
    base_name +=  polarization_name + "_" 
    base_name +=  str_round_if_possible(x_range) + "x" + str_round_if_possible(y_range) +"um_"
    base_name +=  str_round_if_possible(dwell_time*1000) + "ms" 

    tiff_name = base_name + ".tiff"

    img.save(tiff_name)


    if json_file_output :
        json_data = dict()
        json_data['file_name']          = file_name
        json_data['base_name']          = base_name

        if file_number :
            json_data['file_number']        = int(file_number)

        if json_file_with_data :
            json_data['data']           = data.tolist()

        json_data['dwell_time']         = dwell_time

        json_data['polarization']       = polarization
        json_data['polarization_name']  = polarization_name

        json_data['start_date']         = start_date.isoformat()
        json_data['stop_date']          = stop_date.isoformat()

        json_data['x_range']            = x_range
        json_data['x_step']             = x_step
        json_data['x_nb']               = data.shape[1] # x and y are inversed in the matrix
        json_data['y_range']            = y_range
        json_data['y_step']             = y_step
        json_data['y_nb']               = data.shape[0]

        with open(base_name + ".json", "w") as json_file:
            json_file.write(json.dumps(json_data, indent=4, ensure_ascii=False))

        

    

In [9]:
file_name = 'Sample_Image_2018-01-08_000.hdf5'
extract_image_from_hdf5(file_name)