In [None]:
%reload_ext autoreload
%autoreload 3 --print

import logging
from pca_analysis import xr_signal

from pca_analysis.definitions import PARAFAC2_TESTSET
from pca_analysis import xr_plotly
import plotly.io as pio
import xarray as xr
import darkdetect

logger = logging.getLogger(__name__)

logger.setLevel(logging.DEBUG)

xr.set_options(display_expand_data=False, display_expand_coords=False)

if darkdetect.isDark():
    pio.templates.default = "plotly_dark"

ds = xr.load_dataset(PARAFAC2_TESTSET)

# speed up development by using a subset.
ds = ds.sel(wavelength=slice(210, 260, 5), mins=slice(0, 30))
ds


## Baseline Subtraction

To simplify tool development, we should first subtract the baseline from each sample. Whether or not there is a baseline is questionable, however the rise and fall does roughly correspond with the change in concentration of methanol in the mobile phase, potentially introducing background absorption. Either way, the data will be easier to work with with zeroed baselines.

In [None]:
from pca_analysis.preprocessing import bcorr


ds = ds.pipe(bcorr.snip, core_dim="mins", max_half_window=30)
display(ds)


In [None]:
overlay_fig = (
    ds.transpose("sample", "wavelength", "mins")
    .isel(wavelength=0)
    .plotly.facet_plot_overlay(
        grouper="sample",
        var_keys=["raw_data", "baselines", "data_corr"],
        col_wrap=3,
        x_key="mins",
    )
)
overlay_fig


## Smoothing

The criteria is that with the default find_peaks params, no peaks are detected before the first 0.77 seconds. This can be achieved through savgol smoothing.

In [None]:
from pca_analysis.preprocessing import smooth

(
    ds.isel(sample=slice(2, 6))
    .assign(
        smoothed=ds.raw_data.pipe(
            smooth.savgol_smooth,
            input_core_dims=[
                ["mins"],
            ],
            output_core_dims=[["mins"]],
            window_length=60,
            polyorder=2,
        )
    )
    .sel(wavelength=260, mins=slice(0, 10))
    .plotly.facet_plot_overlay(
        grouper="sample", var_keys=["raw_data", "smoothed"], col_wrap=2
    )
)


## Sharpening

### Introduction

Sharpening is difficult to implement because there are no formal packages. This is because the algorithms are trivial.

- https://terpconnect.umd.edu/~toh/spectrum/ResolutionEnhancement.html
- @wahab_2019
- https://dsp.stackexchange.com/questions/71297/why-is-peak-detection-in-chromatography-not-completely-automatic
- https://bohr.wlu.ca/hfan/cp467/12/notes/cp467_12_lecture6_sharpening.pdf
- Sharpening, like smoothing, is achieved via a filter.
- unsharp masking subtracts the multiple of the laplacian from the signal multiplied by a factor $signal - a * laplacian(signal)$ https://www.idtools.com.au/unsharp-masking-with-python-and-opencv/
- another approach to unsharp masking is to use a Gaussian filter https://stackoverflow.com/questions/4993082/how-can-i-sharpen-an-image-in-opencv
- https://dsp.stackexchange.com/questions/70955/is-unsharp-mask-usm-equivalent-to-applying-laplacian-of-gaussian-filter-direct
- another definition is given as: enchanced_image = original + amount * (original - blurred)




### Unsharp Masking


In [None]:
from pca_analysis.preprocessing import unsharp_mask

# sharpen via laplacian
(
    ds.isel(sample=slice(2, 6))
    .preproc.unsharp.laplacian(a=0.1, core_dims=["mins"], var="raw_data")
    .isel(wavelength=3)
    # .sel(mins=slice(0, 10))
    .plotly.facet_plot_overlay(
        grouper="sample",
        var_keys=["raw_data", "sharpened", "laplace"],
        col_wrap=2,
        x_key="mins",
        trace_kwargs=dict(laplace=dict(opacity=0.3, line=dict(dash="dot"))),
    )
    .update_layout(height=1000, title=dict(text="Sharpening via the Laplacian"))
)


As we can see, while powerful, it has the effect of introducing negatives into the signal, which is something we definitely do not want. This is unavoidable as the negatives occur when a signal rapidly changes from baseline to peak, which is by definition a perfect chromatographic signal. Fiddling with the factor $a$ can result in an acceptable filter, however overall it is not ideal as we cannot avoid the negative. Good for a first pass sharpening though.

## Unsharp Masking with Gaussian Filter

The greater the sigma the less the filter fits the signal.

In [None]:
from pca_analysis import preprocessing_xr

(
    ds.isel(
        sample=slice(2, 6),
    )
    .preproc.unsharp.gaussian(var="raw_data", core_dims=["mins"], a=0.1, sigma=10)
    .isel(wavelength=3)
    .sel(mins=slice(0, 5))
    .plotly.facet_plot_overlay(
        grouper="sample",
        var_keys=["raw_data", "sharpened", "gaussian"],
        col_wrap=2,
        x_key="mins",
        trace_kwargs=dict(gaussian=dict(opacity=0.3, line=dict(dash="dot"))),
    )
    .update_layout(height=1000, title=dict(text="Sharpening via Gaussian Filter"))
)


Unsharp masking by Gaussian filter requires to parameters - $\sigma$ to define the filter and $a$, the strength of the mask. Fine tuning the parameters produces a result that is more gentle than the laplacian version but producs some odd mutations such as peaks becoming shorter while becoming sharper.

### Sharpening After Baseline Subtraction

As we can see in the previous example, the presence of a non-zero baseline makes sharpening difficult. Let's see the effect of sharpening after gross baseline removal.


In [None]:
from scipy import ndimage

a = 0.05
(
    ds.isel(
        sample=slice(2, 6),
    )
    .pipe(bcorr.snip, core_dim="mins", max_half_window=30)
    .assign(
        laplace=lambda x: xr.apply_ufunc(
            ndimage.laplace,
            x["data_corr"],
            input_core_dims=[
                ["mins"],
            ],
            output_core_dims=[
                ["mins"],
            ],
        )
    )
    .assign(sharpened=lambda x: x["data_corr"] + a * (x["raw_data"] - x["laplace"]))
    .isel(wavelength=3)
    # .sel(mins=slice(0, 10))
    .plotly.facet_plot_overlay(
        grouper="sample",
        var_keys=["data_corr", "sharpened", "laplace"],
        col_wrap=2,
        x_key="mins",
        trace_kwargs=dict(laplace=dict(opacity=0.3, line=dict(dash="dot"))),
    )
    .update_layout(height=1000)
)


As we can see, the results are not significantly different. Interestingly the sharpening appears to add an artificual baseline in the mid of the signal. My conclusion is that sharpening is useful but not a conclusive or automated solution. I can forsee a future wherein a number of alternating sharpening and baseline subtraction steps are taken.