## Introduction

**tl;dr** : a simple function to extract the image data from HDF5 file generated by the STXM experiment at Synchrotron SOLEIL

At the [Hermes beamline](https://www.synchrotron-soleil.fr/en/beamlines/hermes) of the [Synchrotron SOLEIL](https://www.synchrotron-soleil.fr/en), the [STXM](https://en.wikipedia.org/wiki/Scanning_transmission_X-ray_microscopy) experimental setup save the data in a [HDM5](https://support.hdfgroup.org/HDF5/) file. It's great because it stores all the data and metadata in a organized way. But most of our software doesn’t read it properly, and lots of macro already written and well established are based on the TIFF format.  

So this small pieces of code, take the mdf5 file and extract the data and the reverent metadata and save them in a TIFF and json format. To keep the maximum of precision, the TIFF is encoded in 64 bits, so usual preview may not work. It may be preferable to use [ImageJ](https://imagej.nih.gov/ij/) or a similar software to read the images.


*date:* 08/01/2017

## Prerequisites
### Standard
 - Python >= 3.6
 - Numpy 
 
### Non-standard
 - [Pillow](https://pypi.python.org/pypi/Pillow/5.0.0) >= 5.0 
 - [h5py](https://pypi.python.org/pypi/h5py/2.7.1) >= 2.7.1
 - [imageio](https://pypi.python.org/pypi/imageio) >= 2.2


## Usage
*def extract_image_from_hdf5(file_path, 
                            json_file_output = True,
                            json_file_with_data = False,
                            save_jpg = True,
                            ImageJ_offset = True)*

#### hdf5_file
the path (including the basename) to the hdf5 file

#### json_file_output 
if true a json file will be generated with some relevant metadata : 
 - file_name: the file name
 - base_name: the file name of the tiff image without the tiff extension
 - file_number: the file number if the file name is formed like : Sample_Image_Year-Month-day_XXX.hdf5 that is the default
 - data: the raw data (if  json_file_with_data  is true)
 - dwell_time: the integration time for each point
 - polarization: the polarization code
 - polarization_name: the polarization name if CR or CL
 - start_date
 - stop_date
 - x_range: in um
 - x_step: in um
 - x_nb: the number of points along x (so x_step*x_nb==x_range)
 - y_range: in um
 - y_step: in um
 - y_nb: the number of points along y (so y_step*y_nb==y_range)

#### json_file_with_data
if true, the raw data will be included in the json file (if json_file_output is also true)

#### save_jpg
if true, the image will also be saved as jpeg

#### ImageJ_offset
if true active compatibility for ImageJ. For ImageJ the 32 bit tiff has to be encoded from $−2^{31}$ to $2^{31} − 1$ for mac Preview and Photoshop from $0$ to $2^{32}-1$

## Author
Sylvain Martin.

## Licence
This project is licensed under the GPL Licence - see the [LICENSE](LICENSE) file for details

## Notes
- I voluntarily not used OpenCV, to avoid the dependency with a big package that can be tricky to install.


In [44]:
import os

# for matrix
import numpy as np

# for tiff
from PIL import Image
#from PIL.Image import FLIP_LEFT_RIGHT    

#for the jpeg
import imageio

# Perl-style regular expression patterns
import re

# for JavaScript Object Notation
import json

#for dates
from datetime import datetime


#for the Hdf5 file format
import h5py

# energy list of edge used in XMCD (in eV)
# from http://www.ruppweb.org/Xray/elements.html
energy_list = {'Ni-L2': 871.9, 
            'Ni-L3': 854.7,
            'Co-L2': 793.8,
            'Co-L3': 778.6,
            'Fe-L2': 721.1,
            'Fe-L3': 708.1
           }
# tolerance for the energy edge (in eV)
#(energie between energie - tolerance and energie - tolerance)
energy_tolerance = 5

def energy_name(energy):
    """ Find the name of the edge from the energy value
        if the name is not find return the rounded energie as a string"""
    
    for name,edge in energy_list.items():
        if (energy >= edge - energy_tolerance and
            energy <= edge + energy_tolerance ):
            return name
    
    else: 
        return str(round(energy)) + "eV"
    
    
    
def extract_image_from_hdf5(hdf5_file, 
                            json_file_output = True,
                            json_file_with_data = False,
                            save_jpg = True,
                            ImageJ_offset = True) :

    #extract file name and path
    (file_path, file_name) = os.path.split(hdf5_file)
    
    #file_name = os.path.basename(file_path) 
    #extract the number of the file
    try :
        file_number = re.match(r"Sample_Image_(\d+-\d+-\d+)_(?P<nb>\d+).hdf5", file_name).group('nb')  
    except AttributeError:
        file_number = None


    # Import the Hdm5 file
    sample_image = h5py.File(hdf5_file,'r')

    #Load the raw data
    raw_data = sample_image['entry1/Counter0/data']


    #put the raw data in a numpy array
    data = np.array(raw_data, dtype=np.uint32)

    # normalize the array over 32bits (from -2147483648  to 2147483647) 
    # because 64 bits tiff is realy not commun and not well managed by many softwares. 
    # And the probability to have more than 4 294 967 295 counts, is quite low ^_^
    if ImageJ_offset : 
        offset = 2147483648 
    else : 
        offset = 0
        
    data_scale = (data - data.min())/(data.max()-data.min())*(2147483647+ 2147483648) -  offset 

    #put it in an image and do a rotate and a mirror to keep the original system of co-ordinate
    #the [::-1,:] is to flip the image
    img = Image.fromarray(np.array(data_scale[::-1,:], dtype=np.uint32),'I')
    #img = img.transpose(FLIP_LEFT_RIGHT)
    #img = img.rotate(180)


    #Dwell time in ms
    dwell_time = sample_image['entry1/Counter0/count_time'][0]

    #Energy of the X beam in eV
    energy = sample_image['entry1/collection/energy/value'][0]

    # polarization of the X beam 3 = CL, 4 = CR
    polarization = int(sample_image['entry1/collection/polarization/value'][0])

    polarization_name = "Lin"
    if polarization == 3 : polarization_name = "CL"
    if polarization == 4 : polarization_name = "CR"    

    # start time of the measurement
    start_date = sample_image['entry1/start_time'][0].decode('UTF-8')
    start_date = start_date[:-3] + start_date[(-3+1):]
    start_date = datetime.strptime(start_date, '%Y-%m-%dT%H:%M:%S%z')

    # end time of the measurement
    stop_date = sample_image['entry1/end_time'][0].decode('UTF-8')
    stop_date = stop_date[:-3] + stop_date[(-3+1):]
    stop_date = datetime.strptime(stop_date, '%Y-%m-%dT%H:%M:%S%z')



    #a JSON like data with everithing we can set up in when we lanche a scan
    sample_image_json = json.loads((sample_image['entry1/collection/scan_request/scan_request'][0].decode('UTF-8')))

    sample_image_json['innerRegions'][0]

    #http://docs.python-guide.org/en/latest/scenarios/json/

    # X range (if fineX/piedzo only)
    sample_image_json['innerRegions'][0]['axes'][1]['trajectories'][0]['positionerName']
    x_range = sample_image_json['innerRegions'][0]['axes'][1]['trajectories'][0]['range']
    x_step = sample_image_json['innerRegions'][0]['axes'][1]['trajectories'][0]['step']


    #Y range and step(if fineX/piedzo only)
    sample_image_json['innerRegions'][0]['axes'][0]['trajectories'][0]['positionerName']
    y_range = sample_image_json['innerRegions'][0]['axes'][0]['trajectories'][0]['range']
    y_step = sample_image_json['innerRegions'][0]['axes'][0]['trajectories'][0]['step']

    def str_round_if_possible(nb):
        nb2digits = nb-((nb*100)%1/100)
        if(round(nb2digits) == nb2digits) :
            return str(round(nb2digits))
        else :    
            return str(nb2digits)



    base_name  = "SI_" + start_date.strftime('%Y%m%d') +"_"
    
    if file_number :  base_name += file_number + "_" 
    base_name +=  polarization_name + "_" 
    base_name +=  str_round_if_possible(x_range) + "x" + str_round_if_possible(y_range) +"um_"
    base_name +=  energy_name(energy) #str_round_if_possible(dwell_time*1000) + "ms" 

    img.save(os.path.join(file_path, base_name + ".tiff"))
    
    if save_jpg :
        #the [::-1,:] is to flip the image
        data_scale = (data[::-1,:] - data.min())/(data.max()-data.min())*255
        #data_scale_flip = data_scale
        imageio.imwrite(os.path.join(file_path, base_name + '.jpg') , np.array(data_scale, dtype=np.uint8))

    if json_file_output :
        json_data = dict()
        json_data['file_name']          = file_name
        json_data['base_name']          = base_name
        json_data['energy']             = energy
        json_data['energy_name']        = energy_name(energy)
        if file_number :
            json_data['file_number']    = int(file_number)

        if json_file_with_data :
            json_data['data']           = data.tolist()

        json_data['dwell_time']         = dwell_time

        json_data['polarization']       = polarization
        json_data['polarization_name']  = polarization_name

        json_data['start_date']         = start_date.isoformat()
        json_data['stop_date']          = stop_date.isoformat()

        json_data['x_range']            = x_range
        json_data['x_step']             = x_step
        json_data['x_nb']               = data.shape[1] # x and y are inversed in the matrix
        json_data['y_range']            = y_range
        json_data['y_step']             = y_step
        json_data['y_nb']               = data.shape[0]

        with open(os.path.join(file_path, base_name + '.json'), "w") as json_file:
            json_file.write(json.dumps(json_data, indent=4, ensure_ascii=False))

    return file_name, base_name + ".tiff"

    

## To extract the image from 1 file

In [45]:
file_name = 'Sample_Image_2018-01-08_000.hdf5'
extract_image_from_hdf5(file_name, ImageJ_offset=False)

('Sample_Image_2018-01-08_000.hdf5', 'SI_20171202_000_CR_3x10um_Ni-L3.tiff')

## To extract all the files in a folder
(all the file with a name of the form Simple_Image_XXXXXXXXXXXX.hdf5)

In [27]:
from os import listdir
from os.path import isfile, join
from re import match, IGNORECASE

folder_path = '/path/to/STXM/files/'
pattern = r"Sample_Image_.*\.hdf5"
simple_image_list = [join(folder_path, f) for f in listdir(folder_path) if (isfile(join(folder_path, f)) and  re.match(pattern, f, flags=IGNORECASE))]

for f in simple_image_list:
    (file_name, tiff_name) = extract_image_from_hdf5(f)
    print(file_name + "-->" + tiff_name + " : Done !")
    



Sample_Image_2017-11-29_001.hdf5-->SI_20171129_001_Lin_1000x1000um_700.0eV.tiff : Done !
Sample_Image_2017-11-29_002.hdf5-->SI_20171129_002_Lin_1000x1000um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_003.hdf5-->SI_20171129_003_Lin_1000x1000um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_004.hdf5-->SI_20171129_004_Lin_200x200um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_006.hdf5-->SI_20171129_006_Lin_200x200um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_007.hdf5-->SI_20171129_007_Lin_25x25um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_009.hdf5-->SI_20171129_009_Lin_25x25um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_010.hdf5-->SI_20171129_010_Lin_100x100um_700.0eV.tiff : Done !
Sample_Image_2017-11-29_011.hdf5-->SI_20171129_011_Lin_25x25um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_013.hdf5-->SI_20171129_013_Lin_25x25um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_014.hdf5-->SI_20171129_014_Lin_5x5um_Ni-L3.tiff : Done !
Sample_Image_2017-11-29_016.hdf5-->SI_20171129_016_Lin_5x5um_Ni-L3.