To scale the individual compound profiles for a given sample $k$ and wavelength $j$, multiply the elution profile by the corresponding concentration loading ($kth$ row of $A$) and a spectral loading ($j$th row of $C$), as below:

In [None]:
%load_ext autoreload
%autoreload 2

from tensorly.decomposition import parafac2 as tl_parafac2
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

from pca_analysis.get_sample_data import get_zhang_data
raw_data: xr.DataArray = get_zhang_data()
ds = xr.Dataset(data_vars={"input_data": raw_data})
ds


In [None]:
parafac2_args = dict(
    return_errors=True,
    verbose=True,
    n_iter_max=1,
    nn_modes="all",
    linesearch=False,
)

_decomp, err = tl_parafac2(
    tensor_slices=raw_data.to_numpy(),
    rank=3,
    **parafac2_args,
)


In [None]:
# create an xarray dataset data structure for the parafac2 results..
# got A, B, C and projections. We dont care abotu the pure B or the projections so
# combine them into Bs.
# for a tensor ijk A is ir, Bs is ijr and c is kr.
# can get the coordinates from the raw data for ijk, r comes from the rank inputted.

import numpy as np
import xarray as xr


In [None]:
from pca_analysis import parafac2_xr as pxr

parafac2_ds = pxr.decomp_as_xr(
    input_data=raw_data,
    rank=3,
    decomp=_decomp,
)

ds = xr.merge([ds, parafac2_ds])
ds


In [None]:
ds = ds.assign(components=pxr.comp_slices_to_xr(parafac2_ds))
ds


# Individual Factors

In [None]:
## A

ds = ds.assign_coords(
    {
        "rank_sample": (
            "sample",
            [x for x in range(len(parafac2_ds.A.coords["sample"]))],
        ),
        "rank_component": (
            "component",
            [str(x) for x in range(len(parafac2_ds.A.coords["component"]))],
        ),
    }
)

ds.A.plot.scatter(x="rank_sample", hue="rank_component")


As shown in the viz above, along all the samples (rank_sample), component 0 has the highest weighting, followed by 1 and 0. Presumably 2 is the noise component.

## Bs


In [None]:
ds.isel(sample=slice(5, 10)).Bs.plot.line(x="time", col="sample", col_wrap=3)


This is the elution profile of each sample prior to scaling, i.e. the pure profile. As we can see, component 2 corresponds to the background noise, while 1 and 3 represent the peaks.

## C

In [None]:
ds.C.plot.line(x="mz")
plt.title("The Spectral Profile")


The viz above is the pure spectral profile of the dataset. Note the extremely large maxima between 20 and 25 for the component corresponding to the noise, and how between 40 and 50 how an optimal S/N is reached, particularly for component 3.

# Reconstruction

The reconstruction is the recombination of the PARAFAC2 model into a 3 mode tensor. As I have already prepared the component slices, the simplest would be to sum them.

In [None]:
ds = ds.assign(recon=pxr.compute_reconstruction(components=ds.components))
ds


In [None]:
# while we earlier treated the components as a variable with a component dim,
# we now want to treat it component dim as a subset of a dim 'signal' of which
# input_data and recon would fall into as well. This is actually more correct
# as their units are the same - AU (?)
# doing this from the xarray dataset is not straight forward.A

# first make the "component" dim categorical by casting it to str and naming it "signal" then concat the input data and components vars, dropping differing coords.

ds_ = ds.rename({"component": "signal"})
components_ = (
    ds_.components.assign_coords(component=ds_.components.coords["signal"].astype(str))
    .drop_vars("rank_component")
    .drop_vars("component")
)
input_data_ = ds_.input_data.expand_dims(dim={"signal": ["input_data"]}).transpose(
    "sample", "signal", "time", "mz"
)

# secondly select the subset to plot and viz.
xr.concat(
    dim="signal",
    objs=[
        components_,
        input_data_,
    ],
).isel(mz=35, sample=slice(5, 10)).plot.line(
    x="time", col="sample", hue="signal", col_wrap=3
)

plt.title("Input Data and Components per Sample")


As we can see in the above viz, the decomposition looks sound, and fruthermore the visualisation is very informative, containing information about the pure analytes and noise compared to the convoluted signal.


# Conclusion

Using the Zhang et al. GC-MS peak data, observation of the following features was made: The scaled elution profiles, the pure elution profiles, A as a function of K for each component and C as a function of J. A reconstruction routine was developed and visualisations the results of reconstruction. The reconstruction routine/viz should be integrated into a pipeline.