# Data inspection

The datasets found in this data folder are described in the publication referenced below.

They represent transient absorption spectroscopy measurements on the so-called `co` and `c2o` compounds dissolved in toluene and excited at 530 nm.

The `co` sample is a compound containing one calix[4]arene unit, the `c2o` sample is a compound containing two calix[4]arene units.

The data in question originates from an experiment investigating a supramolecular building block consisting of a perylene bisimide chromophore (called orange) substituted with one or two calix[4]arene units, termed oc and o2c respectively. The orange chromophore is highly fluorescent with a quantum yield close to 1 and a lifetime of around 4 ns. In contrast, oc has a very low quantum yield. In the referenced study femtosecond transient absorption spectroscopy is used to investigate a particular quenching process, likely to be attributed to a rapid photo-induced electron-transfer process. For more detail see the publication.

---
**REFERENCE**

> Excited State Interactions in Calix[4]arene−Perylene Bisimide Dye Conjugates: Global and Target Analysis of Supramolecular Building Blocks
> 
> Catharina Hippius, Ivo H. M. van Stokkum, E. Zangrando, René M. Williams, and Frank Würthner
> 
> The Journal of Physical Chemistry C 2007 111 (37), 13988-13996
> 
> DOI: [10.1021/jp0733825](https://doi.org/10.1021/jp0733825)

---

## Requirements to run the notebook

Be sure to have installed [pyglotaran](https://pypi.org/project/pyglotaran/) version 0.8 or greater, as well as [pyglotaran-extras](https://pypi.org/project/pyglotaran-extras/).

```shell
pip install pyglotaran>0.8 pyglotaran-extras
```

## Imports

Imports needed for the whole notebook

In [None]:
# Primary import
# For plotting
from pyglotaran_extras import plot_data_overview

from glotaran.io import load_dataset

## First experiment

The first dataset is named `demo_data_Hippius_etal_JPCC2007_111_13988_Figs5_9.ascii` and was also the dataset included with the demo project provided with the (now [glotaran-legacy](https://github.com/glotaran/glotaran-legacy)) Glotaran software. 

### Load data

Note that the data files are loaded with respect to the location of the notebook.

In [None]:
dataset1 = load_dataset("demo_data_Hippius_etal_JPCC2007_111_13988_Figs5_9.ascii")
plot_data_overview(dataset1, linlog=True)
dataset1.data.coords

## Second experiment

Multiple datasets - co and c20 samples

In [None]:
dataset_co = load_dataset("2016co_tol.ascii")
dataset_c2o = load_dataset("2016c2o_tol.ascii")

In [None]:
plot_data_overview(dataset_co, linlog=True)
dataset_co.data.coords.keys()

In [None]:
plot_data_overview(dataset_c2o, linlog=True)
dataset_c2o.data.coords.keys()

## Compare dataset

Although the dataset `demo_data_Hippius_etal_JPCC2007_111_13988_Figs5_9.ascii` and `2016co_tol.ascii` concern the same sample/measurement, there are some differences in the spectral-time window, the  normalization of the data, and pre-processing (background subtraction), which is why the second experiment uses different data files than the first.

Using the fact, that the datasets are loaded in as xarray objects we can easily compare them as this section aims to illustrate.

In [None]:
print(f"{dataset1.coords=}")
print(f"{dataset1.data.to_numpy().max()=}")
print(f"{dataset_co.coords=}")
print(f"{dataset_co.data.to_numpy().max()=}")
# Note the different shapes of the data arrays, and almost 1000x difference in the maximum value

In [None]:
# Slice the dataset to the same time and spectral range for a quick visual inspection
dataset1_sliced = dataset1.sel(
    time=slice(dataset_co.time.min(), dataset_co.time.max()),
    spectral=slice(dataset_co.spectral.min(), dataset_co.spectral.max()),
)
plot_data_overview(dataset1_sliced, linlog=True)
plot_data_overview(dataset_co, linlog=True)

In [None]:
# We load the data with prepare=False to avoid the additional processing
dataset1 = load_dataset("demo_data_Hippius_etal_JPCC2007_111_13988_Figs5_9.ascii", prepare=False)
dataset_co = load_dataset("2016co_tol.ascii", prepare=False)

# Slice the dataset to the same time and spectral range
dataset1_sliced = dataset1.sel(
    time=slice(dataset_co.time.min(), dataset_co.time.max()),
    spectral=slice(dataset_co.spectral.min(), dataset_co.spectral.max()),
)

# Subtract mean spectrum for the first 20 timepoints from dataset1_sliced
mean_spectrum_dataset1 = dataset1_sliced.isel(time=slice(0, 20)).mean(dim="time")
dataset1_sliced_subtracted = dataset1_sliced - mean_spectrum_dataset1

# Subtract mean spectrum for the first 20 timepoints from dataset_co
mean_spectrum_dataset_co = dataset_co.isel(time=slice(0, 20)).mean(dim="time")
dataset_co_subtracted = dataset_co - mean_spectrum_dataset_co

# Normalize the subtracted datasets
dataset1_norm_subtracted = dataset1_sliced_subtracted / dataset1_sliced_subtracted.max()
dataset_co_norm_subtracted = dataset_co_subtracted / dataset_co_subtracted.max()

# Align the coordinates of dataset1 to dataset_co
dataset_1_interpolated_subtracted = dataset1_norm_subtracted.interp(
    time=dataset_co_norm_subtracted.time, spectral=dataset_co_norm_subtracted.spectral
)

# Calculate the difference between the aligned datasets
difference_subtracted = dataset_co_norm_subtracted - dataset_1_interpolated_subtracted

# Fill NaN values with 0
difference_subtracted = difference_subtracted.fillna(0)

# Plot the difference
plot_data_overview(difference_subtracted, linlog=True)
difference_subtracted