# X-ray absorption spectroscopy (XAS)

In this exercise, you will use X-ray absorption data measured on Cr<sub>2</sub>O<sub>3</sub>. You will explore different aspects that need to be considered when performing data reduction for simple XAS measurements:  
- Locate the experimental files on disk.
- Visualize the contents of the files.
- Find the indices of the scans and the counters for the x-axis, signal, and monitor.
- Explore the different ways to aggregate the scans, sum of fractions, or fraction of sums.
- Explore the different ways to normalize the reduced signal: maximum or area.

<figure>
  <img src="assets/xas.png" alt="XAS" style="width:60%">
  <figcaption style="text-align: center; font-style: italic">J.K. Kowalska et al., Israel Journal of Chemistry 56, 803 (2016).</figcaption>
</figure>

## Import the required packages

In [None]:
%matplotlib ipympl

import sys
import logging

import numpy as np
import matplotlib.pyplot as plt
import silx.io.h5py_utils

plt.rcParams["figure.figsize"] = (6, 3.4)

## Locate the experimental data on disk

You can do this using operating system commands (note that you must prefix them with `!`). If you are using JupyterHub, you can alternatively use the file explorer on the left-hand side.
    
The data file for this exercise is `experimental_data/ihch1515/id26/Cr2O3_new/Cr2O3_new_0002/Cr2O3_new_0002.h5`. Navigate to it and open it using the `h5web` plugin, which allows basic plotting of the HDF5 items.

In [None]:
!ls experimental_data

## Exploring and accessing the data using the `silx` library

You can use the `io` module of the `silx` library to access data stored in the HDF5 file. Compared to the more widely used `h5py` library, it allows you to open files while they are written.

In [None]:
filename = "experimental_data/ihch1515/id26/Cr2O3_new/Cr2O3_new_0002/Cr2O3_new_0002.h5"

with silx.io.h5py_utils.File(filename) as sf:
    print(sf.keys())

Explore in more detail the components of the group using Python control flow statements.

In [None]:
with silx.io.h5py_utils.File(filename) as sf:
    group = sf["4.1"]
    for name in group:
        item = group[name]
        print(f"Found item {name}.")
        if silx.io.is_dataset(item):
            print(f"{name} is a dataset.")
        elif silx.io.is_group(item):
            print(f"{name} is group.")

To extract the data stored at a give path in the HDF5 file use the `sf["path_to_data"][()]` construct.

In [None]:
with silx.io.h5py_utils.File(filename) as sf:
    print(sf["4.1/title"][()])

## Plotting the experimental data 

Extract and plot the X-axis and signal data for the 4th scan knowing that one is stored at `measurement/hdh_energy` and the other is at `measurement/det_dtc_apd`.

In [None]:
fig, ax = plt.subplots()

with silx.io.h5py_utils.File(filename) as sf:
    ##################
    # YOUR CODE HERE #
    x = sf[f"4.1/measurement/hdh_energy"][()]
    signal = sf[f"4.1/measurement/det_dtc_apd"][()]
    ##################
    ax.plot(x, signal)

That was quite some work just to be able to plot one scan. However, now you can easily plot multiple scans at the same time. Plot the scans 4 to 10.

In [None]:
fig, ax = plt.subplots()

with silx.io.h5py_utils.File(filename) as sf:  
    # Loop over the scans.
    for scan_id in range(4, 11):
        scan_name = f"{scan_id}.1"
        x = sf[f"{scan_name}/measurement/hdh_energy"][()]
        signal = sf[f"{scan_name}/measurement/det_dtc_apd"][()]
        ax.plot(x, signal, label=f"{scan_id}")
ax.legend()

## Normalizing the signal data using information from additional counters

You can use additional counters stored in the HDF5 file to assess the properties of the X-ray beam. One such counter is the `I02`, which measures the current right before the sample. Extract it from the HDF5 file and plot it.

In [None]:
fig, ax = plt.subplots()  
with silx.io.h5py_utils.File(filename) as sf:  
    for scan_id in range(4, 11):
        scan_name = f"{scan_id}.1"
        ##################
        # YOUR CODE HERE #
        monitor = sf[f"{scan_name}/measurement/I02"][()]
        ##################
        ax.plot(monitor, label=f"{scan_id}")
ax.legend()

Notice that the intensity of the `I02` counter changes during the scan, which will affect the signal. Use the `I02` data to normalize the signal and plot the result. Compare the plot with the one without normalization.

In [None]:
fig, ax = plt.subplots() 
with silx.io.h5py_utils.File(filename) as sf:
    for scan_id in range(4, 11):
        scan_name = f"{scan_id}.1"
        x = sf[f"{scan_name}/measurement/hdh_energy"][()]
        signal = sf[f"{scan_name}/measurement/det_dtc_apd"][()]
        monitor = sf[f"{scan_name}/measurement/I02"][()]
        # Normalize the signal with the monitor.
        ##################
        # YOUR CODE HERE #
        signal = signal / monitor
        ##################
        ax.plot(x, signal, label=f"{scan_id}")
ax.legend()

Now you can average the data from the individual scans to improve the signal-to-noise ratio (SNR). You can do this in two ways, either using:

fraction of sums: $I_S = \frac{I_{S,1}(E)\ +\ I_{S,2}(E)\ +\ \cdots}{I_{M,1}(E)\ +\ I_{M,2}(E)\ +\ \cdots}$ (this may not correct well for instabilities in $I_M$), or

sum of fractions: $I_S = \frac{I_{S,1}(E)}{I_{M,1}(E)} + \frac{I_{S,2}(E)}{I_{M,2}(E)} + \cdots$ (this may not give the correct statistical weight between scans and lose the total counts).

In [None]:
fig, ax = plt.subplots() 

signal_fos = None
monitor_fos = None

signal_sof = None

with silx.io.h5py_utils.File(filename) as sf:
    for i, scan_id in enumerate(range(4, 11)):
        scan_name = f"{scan_id}.1"
        
        # Access the data from the file
        x = sf[f"{scan_name}/measurement/hdh_energy"][()]
        signal = sf[f"{scan_name}/measurement/det_dtc_apd"][()]
        monitor = sf[f"{scan_name}/measurement/I02"][()]
        
        # Initialize the arrays holding the aggregated data.
        if signal_fos is None:
            signal_fos = np.zeros_like(signal)
        if monitor_fos is None:
            monitor_fos = np.zeros_like(signal)
        if signal_sof is None:
            signal_sof = np.zeros_like(signal)
        
        # Acumulate the scan data for the fraction of sums.
        signal_fos += signal 
        monitor_fos += monitor
        
        # Accumulate the scan data for the sum of fractions.
        ##################
        # YOUR CODE HERE #
        signal_sof += signal / monitor
        ##################

# Calculate the final signal for the fraction of sums method.
signal_fos = signal_fos / monitor_fos
        
    
ax.plot(x, signal_fos, label="Fraction of sums")
ax.plot(x, signal_sof, label="Sum of fractions")
ax.legend()

## Does the final spectra depend on the aggregation procedure?

The final signal seems to depend on how you sum up the data from the individual scans. To check if the spectra are different, normalize them to the maximum intensity and plot the difference.

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(6, 6.8)) 

ax1, ax2 = ax

# Normalize the two signals to have a maximum of one.
##################
# YOUR CODE HERE #
signal_fos = signal_fos / signal_fos.max()
signal_sof = signal_sof / signal_sof.max()
##################

ax1.plot(x, signal_fos, label="Fraction of sums")
ax1.plot(x, signal_sof, label="Sum of fractions")

ax2.plot(x, signal_fos - signal_sof, label = "Difference")

ax1.legend()
ax2.legend()
plt.tight_layout()

## Redoing it all using the `daxs` library

In [None]:
from daxs.measurements import Hdf5Source, Xas

logging.basicConfig(level=logging.INFO, stream=sys.stdout)
logging.getLogger("daxs").setLevel(logging.INFO)

## Define the data source, create the measurement, and plot the average signal

In [None]:
filename = "experimental_data/ihch1515/id26/Cr2O3_new/Cr2O3_new_0002/Cr2O3_new_0002.h5"
included_scans = "4-10"
data_mappings = {"x": ".1/measurement/hdh_energy", "signal": ".1/measurement/det_dtc_apd", "monitor": ".1/measurement/I02"}

source = Hdf5Source(filename, included_scans, data_mappings=data_mappings)
measurement = Xas(source)

fig, ax = plt.subplots()

ax.plot(measurement.x, measurement.signal)

ax.set_xlabel("Incident energy (keV)")
ax.set_ylabel("Intensity (arb. units)")

plt.tight_layout()

## Explore the different aggregation methods

In [None]:
fig, ax = plt.subplots()

ax.plot(measurement.x, measurement.signal, label="Fraction of sums")

measurement.reset()
measurement.process(aggregation="sum of fractions")
ax.plot(measurement.x, measurement.signal, label="Sum of fractions")

ax.legend()
ax.set_xlabel("Incident energy (keV)")
ax.set_ylabel("Intensity (arb. units)")

plt.tight_layout()

## Normalize the data using the signal area

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1, gridspec_kw={"height_ratios": [3, 1]})

measurement.reset()
measurement.process(aggregation="fraction of sums", normalization="area")
ax1.plot(measurement.x, measurement.signal, label="Fraction of sums")

# Save the data for the calculation of the difference signal.
x = np.copy(measurement.x)
signal_fos = np.copy(measurement.signal)

measurement.reset()
measurement.process(aggregation="sum of fractions", normalization="area")
ax1.plot(measurement.x, measurement.signal, label="Sum of fractions")

# Save the data for the calculation of the difference signal.
signal_sof = np.copy(measurement.signal)

ax1.legend()
ax1.set_xlabel("Incident energy (keV)")
ax1.set_ylabel("Intensity (arb. units)")

# Plot the difference of the two signals.
ax2.plot(x, signal_fos - signal_sof)

plt.tight_layout()