# Beginners guide to DanMAX data handling and analysis

This notebook is intended as a beginners guide to working with the DanMAX data. The notebook will showcase some simple examples of how to work with the DanMAX data in jupyter notebook. You can use this notebook as inspiration to help ypu process your data and quickly prepare plots  

### Data structure
The data at DanMAX are saved in the HDF5 format (*.h5*), a hierachical data format. The files are generally divided into three catagories: master, raw, and processed. The master file contains the meta data (time stamp, motor positions, energy, etc.), the raw file contains the raw detector data, and the processed file contains the integrated data. Furthermore, the master file has an external link to the raw file, which makes it easier to access the data. All scans (measurements) have a scan number used to identify the relevant files. It is important to keep track of the scan numbers to know which files are important data and which are irrelevant alignment scans. The files are generally named *scan-####.h5*, with *####* being the scan number. The raw files have the detector name as suffix, e.g. *scan-####_pilatus.h5* and the processed files have the detector name + the data reduction method as suffix, e.g. *scan-####_pilatus_integrated.h5*.  
  
### DanMAX.py
The `DanMAX.py` file contains a wide selection of useful python functions for working with data at DanMAX. The functions are easily imported with `import DanMAX as DM` and called with `DM.`. If you wish to copy the scripts and notebooks to a local folder, then make sure to include the `__.init__.py` file as well. Use `help(DM)` to get a list of all the DanMAX functions.  

### Content:

[**Read datafrom the .h5 files**](#read_data)  
[**Read datafrom the .h5 files (manual)**](#read_data_man)  
[**Ploting a heatmap**](#plot_heat)  
[**Saving data in other formats**](#saving)  

#### Import relevant modules

In [1]:
%matplotlib widget
import os
import h5py as h5
import numpy as np
import matplotlib.pyplot as plt
import DanMAX as DM
print('Current proposal and visit:')
print(os.getcwd().split('scripts/')[0][:-7])

DanMAX.py Version 2.0.0
Current proposal and visit:
/data/visitors/danmax/20230160/2023101208/scripts_


#### Read data from the .h5 files <a id='read_data'></a>  
The `DanMAX.py` library already comes with functions for reading the data from the .h5 files, namley `DM.getAzintData()` and ` DM.getMetaData()`. Both functions return a python *dictionary* containing the data names (keys) and values.   

The most common meta data is easily read with `DM.getMetaData(fname)`. For an extended meta data dictionary, use `DM.getMetaDic(fname)`  
Use `DM.getMetaDic(fname).keys()` to get a list of the available meta data  
Use `help(DM.getAzintData)`, `help(DM.getMetaData)` and `help(DM.getMetaDic)` for more information about reading the data

First we specify the full path and file name of the master file. This can either be done manually as a string or with the `DM.findScan()` function. If left empty, `DM.findScan()` will return the latest completed scan in the proposal or it can be provided with a scan number to find a specific scan.

In [None]:
# Insert path for the .h5 file - TIP: Use tap for auto-complete
#fname = '/data/visitors/danmax/PROPOSAL/VISIT/raw/SAMPLE/scan-XXXX.h5'
fname = DM.findScan() # automatically find the latest scan in the proposal

The file name of the integrated data can be found from the master file name or be manually provided

In [None]:
# get the azimuthally integrated filename from master file name
aname = DM.getAzintFname(fname)

The data are easily read with the DM functions and we can then assign some more usefull variable names.  
  
Because of the 10-minute ring current top-up, it is a good idea to normalize the diffraction data with the incident beam intensity $I_0$, as this will remove (most of) the intensity variation.  
We do the normalization in the same cell as the data import, to avoid accidentally normalizing the same data several times.

In [None]:
# read the integrated data
data = DM.getAzintData(aname)
# read common meta data from the master file
meta = DM.getMetaData(fname)

# determine if the diffraction data use Q or 2theta
if type(data['q']) != type(None):
    x = data['q']
    Q = True
else:
    x = data['tth']
    Q = False

# assign new variable names
I = data['I']
t = meta['time'] # relative time stamp in seconds
T = meta['temp'] # temperature in Kelvin (if available, otherwise None)
I0 = meta['I0']  # relative incident beam intensity "I zero"
E = meta['energy'] # X-ray energy in keV

## normalize the diffraction intensities with the incident beam intensity I0
# the data are first transposed, then normalized, and then transposed back again
I = (I.T/I0).T

##### Plot average diffraction pattern
We can use `numpy` to quickly calculate the average diffraction pattern and plot it using `matplotlib.pyplot`.  
We need to specify which axis we wish to take the mean along. The diffraction data `I` has the shape \[*frames*, *radial bins*], so to get the (time) average pattern, we specify the *"frame"*-axis *axis=0*.

In [None]:
# calculate the average diffraction pattern
I_avg = np.mean(I,axis=0)

## plot the average pattern as function of x ##
# initialize the figure
plt.figure()
# plot the data
plt.plot(x, I_avg, label='Average pattern')
# set axes labels to Q or 2theta
if Q:
    plt.xlabel('Q')
else:
    plt.xlabel('2theta')
plt.ylabel('Intensity')
# add legend with the label specified in plt.plot()
plt.legend()

#### Read data from the .h5 files (manual) <a id='read_data_man'></a>  
While the `DM.getAzintData()` and ` DM.getMetaData()` functions are very convenient for most applications, sometimes one might wish to manually read directly from the *.h5* files.  
We will reuse the azimuthally integrated file name (*aname*) from the previous cells, but this time we will read the data with the `h5py` module.  

It is good practice to use *context managers* when reading/writing files in python. This ensures that the file is only open when we need it and closed automatically when we are done.  

One can think of the *.h5* files as virtual folders and subfolders. To get the data, we first need to navigate to the right folder (called *group*) and the read the data (called *dataset*). The azimuthally integrated diffraction data are located at:  
*entry/data1d/I*  
To read the data as a numpy array, we add `[:]` at the end. If the data are not an array but instead a scalar, we add `[()]` 

In [None]:
# initialize the file context manager in 'read' mode
with h5.File(aname,'r') as f:
    I = f['entry/data1d/I'][:]

# as soon as we end the indentation, the file is closed by the context manager
print(I.shape)

The manual approach comes in handy when the datasets are *very* large and we start to run out of computer memory. In that case we can read the data bit by bit, so we don't need to store everything in the memory.  
As an example we will calculate the average diffraction image for a dataset, without reading all images in the raw file at once.

In [None]:
# initialize the file context manager in 'read' mode
with h5.File(fname,'r') as f:
    # get the total number of frames in the file
    no_of_frames = f['/entry/instrument/pilatus/data'].shape[0]
    print(f'{no_of_frames} frames in scan')
    # read the first frame at index zero
    im = f['/entry/instrument/pilatus/data'][0]
    
    # iterate through all remaining frames and add the values to the initial frame
    for i in range(1,no_of_frames):
        im += f['/entry/instrument/pilatus/data'][i]

# divide the sum of all frames with the number of frames
im_avg = im/no_of_frames

# plot the average image
plt.figure()
plt.imshow(im_avg)

#### Plotting a heatmap <a id='plot_heat'></a>  
A very common way to visualize time-resolved diffraction data is with a heatmap (not to be confused with a waterfall plot)  
Make sure to include the incident beam intensity $I_0$ normalization during the data import, to remove the systematic intensity variation caused by the ring top-up  
  
We generally recommend to use `pcolormesh` for heatmaps, as it can handle non equidistant data, which is very convenient when converting between $2\theta$ and $Q$.

In [None]:
# initialize figure
fig = plt.figure()
# set figure title
plt.title('Heatmap')
# add ticks to the right axis
plt.tick_params('y',right=True)
# set axis labels
if Q:
    plt.xlabel(r'Q [$\AA^{-1}$]')
else:
    plt.xlabel(r'2$\theta$ [$\deg$]')
plt.ylabel('Time (s)')

# create plot
plt.pcolormesh(x,          # radial data (theta/Q)
               t,          # time
               I,          # diffraction data
               norm='log', # normalization option (here log scale)
              )

# add a colorbar
plt.colorbar(label='log(I)')

#### Saving data in other formats (*.xy .xye .dat*) <a id='saving'></a>  
We **highly** recommend working with the *.h5* format as much as possible, however, we realize that many analysis software require column-separated file formats. This type of file tends to take up a lot of storage space and slow down file browser systems.  
It is therefore <b style="color:red;">NOT ALLOWED</b> to export to this format at MAX IV!  
Instead, perfom the data export on your local system and only for the relevant data. Likewise, if you plan to sum the data to reduce the time-resolution, do this *before* exporting to column-separated files.

A script for exporting the data to column-separated *.xy* files could look like this:
```
import os
import numpy as np
import DanMAX as DM

#############################################################

# file name of master file
fname = 'myFolder/scan-0001.h5'
# file destination
destination = 'myFolder/xy_files'

# reduce time-resolution to improve statistics
rf = 1        # reduction factor
start = None  # first frame index (if None use default)
end =  None   # last  frame index (if None use default)

#############################################################

aname = DM.getAzintFname(fname)
# read the integrated data
data = DM.getAzintData(aname)
if type(data['q']) != type(None):
    Q = True
else:
    Q = False
# read common meta data from the master file
meta = DM.getMetaData(fname)

# apply data reduction
data = DM.reduceDic(data,reduction_factor=rf,start=start,end=end)
meta = DM.reduceDic(meta,reduction_factor=rf,start=start,end=end)

# assign new variable names
I = data['I']
I0 = meta['I0']  # relative incident beam intensity "I zero"
E = meta['energy'] # X-ray energy in keV
E = np.mean(E)
## normalize the diffraction intensities with the incident beam intensity I0
# the data are first transposed, then normalized, and then transposed back again
I = (I.T/I0).T

I *=rf # multiply by the reduction factor to retain absolute counts

# calculate effective time-resolution
dt = np.mean(np.diff(t[scan]))
print(f'Effective time-resolution: {dt:.2f} s')

# make file header
header=['DanMAX diffraction data',
       f'Energy (keV): {E:.2f}',
       f'Wavelength (A): {DM.keV2A(E):.4f}',
       f'Effective time-resolution (s): {dt:.3f}',
       ]
if rf > 1:
    # add information about the data reduction factor
    header += [f'Data reduction factor: {reduction_factor}',
               f'Frames: {i*reduction_factor} to {reduction_factor*(i+1)-1}',
              ]
if Q:
    header += ['Q(A-1)      I(counts)']
else:
    header += ['tth(deg)    I(counts)']

# iterate through the integrated data
for i,y in enumerate(I):
    # set destination file name
    dst = os.path.join(destination,f'{DM.getScan_id(fname)}_{i:05d}.xy')
    # stack the x- and y-data in columns
    columns = np.stack([x,y]).T
    # save to file
    np.savetxt(dst,                            # file destination path
               columns,                        # column data
               delimiter=' ',                  # column delimiter
               comments = '#',                 # header prefix
               fmt= ['%6.4f','%10.2f'],        # formatting (allocated space and decimal points)
               header='\n'.join(header))       # file header
```