# Resolution Enhancement: 1

In what is developing into a series of preprocessing notebooks, this particular article will address how to maximise peak detection in the Ringland signal. The problem is this - to fit the skewnorm distribution to a signal, the signal does need to possess somewhat normal peak shapes, and all peaks need to be detected. Currently, A number of peaks are not detected as they are not prominent enough. Furthermore, sharpening peaks into greater prominence will increase the depth of the gutters, the minima, such that it will make automated windowing simpler.

What are the available peak sharpening methods?

## Implementation of First Derivative Symmetrization

Before delving into advanced techniques, lets observe how the first derivative symmetrization affects the signal and peak detection


In [1]:
%reload_ext autoreload
%autoreload 2
from dataclasses import dataclass
from hplc_py.pipeline.preprocess_dashboard import DataSets, PreProcesser
from pandera.typing.polars import DataFrame, LazyFrame

import hvplot
import hvplot.pandas  # noqa
import pandera as pa
import panel as pn
import polars as pl
import sys

pl.Config.set_tbl_rows(10)
pn.extension()
pn.config.comms = "vscode"
sys.executable

dsets = DataSets()
ringland = dsets.ringland.fetch()
ringland.head()

idx,detection,color,varietal,id,code_wine,time,signal
i64,str,str,str,str,str,f64,f64
0,"""raw""","""red""","""shiraz""","""7b085f32-4d69-…","""a0301_2021 chr…",0.005833,0.001952
1,"""raw""","""red""","""shiraz""","""7b085f32-4d69-…","""a0301_2021 chr…",0.0125,0.001825
2,"""raw""","""red""","""shiraz""","""7b085f32-4d69-…","""a0301_2021 chr…",0.019167,0.002004
3,"""raw""","""red""","""shiraz""","""7b085f32-4d69-…","""a0301_2021 chr…",0.025833,0.002861
4,"""raw""","""red""","""shiraz""","""7b085f32-4d69-…","""a0301_2021 chr…",0.0325,0.003964


In [2]:
raw_data = ringland[["time", "signal"]]
raw_data

time,signal
f64,f64
0.005833,0.001952
0.0125,0.001825
0.019167,0.002004
0.025833,0.002861
0.0325,0.003964
…,…
26.9725,1.733087
26.979167,1.732111
26.985833,1.731098
26.9925,1.730151


## Enhancement by Subtraction of Derivatives (Apprimxated as finite difference)

In a DataFrame, this can be calculated by shifting Y forward one and back one then performing the calculation across the rows:

$$f_i'=\frac{f_{i+1}-f_{i-1}}{2 \Delta x_i}$$

and the second derivative is:

$$f^"_i=\frac{
  f_{i-1}-2f_i+f_{i+1}
  }
  {
    \Delta x^2
  }$$

  Where: $\Delta x = x_{i+1}-x_{i-1}$

  This assumes that x is evenly spaced

First create an evenly spaced x (time) based on the mean step

In [3]:
raw_data.select(pl.col("time").diff().forward_fill().backward_fill()).describe()

statistic,time
str,f64
"""count""",4050.0
"""null_count""",0.0
"""mean""",0.006667
"""std""",1.146e-15
"""min""",0.006667
"""25%""",0.006667
"""50%""",0.006667
"""75%""",0.006667
"""max""",0.006667


Actually, looks like this already is resampled. TODO: integrate resampling into main preprocessing

Next, shift the columns forward or back such that for each row "i", it also contains "i-1", and "i+1".

In [4]:
# add a row index to track movement forward and back. Shifting by 1 moves the array back one according to the current i, i.e. "idx_1f" is i = -1 relative to i = 0.
shifted_data = (
    raw_data
    .with_row_index("idx")
    .with_columns(
        pl.col('idx').shift(1).alias("idx_i-1"),
        pl.col('time').shift(1).alias("time_i-1"),
        pl.col('signal').shift(1).alias("signal_i-1"),
        pl.col('idx').shift(-1).alias("idx_i+1"),
        pl.col('time').shift(-1).alias("time_i+1"),
        pl.col('signal').shift(-1).alias("signal_i+1")
    )
    )  # fmt: skip

shifted_data

idx,time,signal,idx_i-1,time_i-1,signal_i-1,idx_i+1,time_i+1,signal_i+1
u32,f64,f64,u32,f64,f64,u32,f64,f64
0,0.005833,0.001952,,,,1,0.0125,0.001825
1,0.0125,0.001825,0,0.005833,0.001952,2,0.019167,0.002004
2,0.019167,0.002004,1,0.0125,0.001825,3,0.025833,0.002861
3,0.025833,0.002861,2,0.019167,0.002004,4,0.0325,0.003964
4,0.0325,0.003964,3,0.025833,0.002861,5,0.039167,0.005022
…,…,…,…,…,…,…,…,…
4045,26.9725,1.733087,4044,26.965833,1.733929,4046,26.979167,1.732111
4046,26.979167,1.732111,4045,26.9725,1.733087,4047,26.985833,1.731098
4047,26.985833,1.731098,4046,26.979167,1.732111,4048,26.9925,1.730151
4048,26.9925,1.730151,4047,26.985833,1.731098,4049,26.999167,1.728989


Then define the expressions

In [5]:
# first derivative

first_central_diff = (
    (pl.col("signal_i+1").sub(pl.col("signal_i-1")))
     .truediv(
         (pl.col("time_i+1").sub(pl.col("time_i-1"))).mul(pl.lit(2))
         )
     )  # fmt: skip

# second derivative

second_central_diff = (
    (pl.col("signal_i-1") - pl.lit(2).mul("signal") + pl.col("signal_i+1")
     )
    .truediv((pl.col("time_i+1").sub(pl.col("time_i-1"))).pow(2))
)  # fmt: skip

And apply

In [6]:
preprocessed_data_df = (
    shifted_data
    .select(
        pl.col('time'),
        pl.col('signal'),
        first_central_diff.alias('first_central_diff'),
        second_central_diff.alias('second_central_diff')
    )
)  # fmt: skip
preprocessed_data_df

time,signal,first_central_diff,second_central_diff
f64,f64,f64,f64
0.005833,0.001952,,
0.0125,0.001825,0.001956,1.71829
0.019167,0.002004,0.038836,3.813766
0.025833,0.002861,0.073481,1.383014
0.0325,0.003964,0.081025,-0.251457
…,…,…,…
26.9725,1.733087,-0.068173,-0.754371
26.979167,1.732111,-0.074599,-0.209548
26.985833,1.731098,-0.073481,0.377186
26.9925,1.730151,-0.079069,-1.215376


and plot the results

In [7]:
preprocessed_data_df.plot(x="time", y="signal")

In [8]:
preprocessed_data_df.plot(
    x="time",
    y=["signal", "first_central_diff", "second_central_diff"],
    subplots=True,
    shared_axes=False,
)

Now as we can see, the first and second derivatives are on much larger scales than the initial signal, which is to be expected considering the sharpness and prominence of the signal maxima. Thus the weighting factor mentioned above. Now define a symmetric difference expression, and as usual, the parameter space needs to be explored

In [9]:
# application of first diff
k1 = 0.02
preprocessed_data_df = (
    preprocessed_data_df
    .with_columns(
        pl.col("first_central_diff").mul(k1).alias("weighted_first_diff")
    )
    .with_columns(
        pl.col("signal").sub(pl.col("weighted_first_diff")).alias("signal_adjusted_first_diff")
    )
)  # fmt: skip

(
    preprocessed_data_df 
        .plot(
            x="time",
            y=["signal", "weighted_first_diff", "signal_adjusted_first_diff"],
            alpha=[0.5, 0.2, 1],
        )
        .opts(height=750, width=1000)
)  # fmt: skip

a weighting of 0.02 begins to resolve some peaks but does not do nearly enough

And the second diff?

In [10]:
# application of first diff
k2 = 6e-4
preprocessed_data_df = (
    preprocessed_data_df.with_columns(
        pl.col("second_central_diff").mul(k2).alias("weighted_second_diff")
    )
    .with_columns(
        pl.col("signal").sub(pl.col("weighted_second_diff")).alias("signal_adjusted_2nd_diff")
    )
)  # fmt: skip

(
    preprocessed_data_df 
        .plot(
            x="time",
            y=["signal", "weighted_second_diff", "signal_adjusted_2nd_diff"],
            alpha=[0.5, 0.2, 1],
        )
        .opts(height=750, width=1000)
)  # fmt: skip

a weighting of 6e-4 begins to resolve shoulder peaks without dipping below the baseline. The highly convoluted region between 4 and 5 mins is not nearly resolved enough though.

and what if both are subtracted at the same time?

In [11]:
preprocessed_data_df

time,signal,first_central_diff,second_central_diff,weighted_first_diff,signal_adjusted_first_diff,weighted_second_diff,signal_adjusted_2nd_diff
f64,f64,f64,f64,f64,f64,f64,f64
0.005833,0.001952,,,,,,
0.0125,0.001825,0.001956,1.71829,0.000039,0.001786,0.001031,0.000794
0.019167,0.002004,0.038836,3.813766,0.000777,0.001227,0.002288,-0.000284
0.025833,0.002861,0.073481,1.383014,0.00147,0.001391,0.00083,0.002031
0.0325,0.003964,0.081025,-0.251457,0.001621,0.002343,-0.000151,0.004115
…,…,…,…,…,…,…,…
26.9725,1.733087,-0.068173,-0.754371,-0.001363,1.73445,-0.000453,1.73354
26.979167,1.732111,-0.074599,-0.209548,-0.001492,1.733603,-0.000126,1.732237
26.985833,1.731098,-0.073481,0.377186,-0.00147,1.732567,0.000226,1.730871
26.9925,1.730151,-0.079069,-1.215376,-0.001581,1.731733,-0.000729,1.730881


In [12]:
def calculate_derivative_subtractions(
    df: pl.DataFrame, k1: float = 2e-2, k2: float = 6e-4
):
    # add a row index to track movement forward and back. Shifting by 1 moves the array back one according to the current i, i.e. "idx_1f" is i = -1 relative to i = 0.
    shifted_data = (
    df
    .with_row_index("idx")
    .with_columns(
        pl.col('idx').shift(1).alias("idx_i-1"),
        pl.col('time').shift(1).alias("time_i-1"),
        pl.col('signal').shift(1).alias("signal_i-1"),
        pl.col('idx').shift(-1).alias("idx_i+1"),
        pl.col('time').shift(-1).alias("time_i+1"),
        pl.col('signal').shift(-1).alias("signal_i+1")
    )
    )  # fmt: skip

    diffs = (
    shifted_data.select(
        pl.col('signal'),
        first_central_diff.alias('first_central_diff'),
        second_central_diff.alias('second_central_diff')
    )
    .with_columns(
        pl.col("first_central_diff").mul(k1).alias("weighted_first_diff")
    )
    .with_columns(
        pl.col("signal").sub(pl.col("weighted_first_diff")).alias("signal_adjusted_first_diff")
    )
    .with_columns(
        pl.col("second_central_diff").mul(k2).alias("weighted_second_diff")
    )
    .with_columns(
        pl.col("signal").sub(pl.col("weighted_second_diff")).alias("signal_adjusted_2nd_diff")
    )
    .with_columns(
    pl.col("signal")
    .sub(pl.col("weighted_first_diff"))
    .sub(pl.col("weighted_second_diff"))
    .alias("double_subtraction")
    )
    )  # fmt: skip

    out = pl.concat([df, diffs.drop("signal")], how="horizontal")
    display(out)
    return out


subtracted_data_without_baseline_correction = raw_data.pipe(
    calculate_derivative_subtractions
)

display(subtracted_data_without_baseline_correction)
(
    subtracted_data_without_baseline_correction.plot(
        x="time",
        y=["signal", "signal_adjusted_2nd_diff", "double_subtraction"],
        alpha=[0.5, 0.5],
    )
).opts(height=750, width=1000)

time,signal,first_central_diff,second_central_diff,weighted_first_diff,signal_adjusted_first_diff,weighted_second_diff,signal_adjusted_2nd_diff,double_subtraction
f64,f64,f64,f64,f64,f64,f64,f64,f64
0.005833,0.001952,,,,,,,
0.0125,0.001825,0.001956,1.71829,0.000039,0.001786,0.001031,0.000794,0.000755
0.019167,0.002004,0.038836,3.813766,0.000777,0.001227,0.002288,-0.000284,-0.001061
0.025833,0.002861,0.073481,1.383014,0.00147,0.001391,0.00083,0.002031,0.000562
0.0325,0.003964,0.081025,-0.251457,0.001621,0.002343,-0.000151,0.004115,0.002494
…,…,…,…,…,…,…,…,…
26.9725,1.733087,-0.068173,-0.754371,-0.001363,1.73445,-0.000453,1.73354,1.734903
26.979167,1.732111,-0.074599,-0.209548,-0.001492,1.733603,-0.000126,1.732237,1.733729
26.985833,1.731098,-0.073481,0.377186,-0.00147,1.732567,0.000226,1.730871,1.732341
26.9925,1.730151,-0.079069,-1.215376,-0.001581,1.731733,-0.000729,1.730881,1.732462


time,signal,first_central_diff,second_central_diff,weighted_first_diff,signal_adjusted_first_diff,weighted_second_diff,signal_adjusted_2nd_diff,double_subtraction
f64,f64,f64,f64,f64,f64,f64,f64,f64
0.005833,0.001952,,,,,,,
0.0125,0.001825,0.001956,1.71829,0.000039,0.001786,0.001031,0.000794,0.000755
0.019167,0.002004,0.038836,3.813766,0.000777,0.001227,0.002288,-0.000284,-0.001061
0.025833,0.002861,0.073481,1.383014,0.00147,0.001391,0.00083,0.002031,0.000562
0.0325,0.003964,0.081025,-0.251457,0.001621,0.002343,-0.000151,0.004115,0.002494
…,…,…,…,…,…,…,…,…
26.9725,1.733087,-0.068173,-0.754371,-0.001363,1.73445,-0.000453,1.73354,1.734903
26.979167,1.732111,-0.074599,-0.209548,-0.001492,1.733603,-0.000126,1.732237,1.733729
26.985833,1.731098,-0.073481,0.377186,-0.00147,1.732567,0.000226,1.730871,1.732341
26.9925,1.730151,-0.079069,-1.215376,-0.001581,1.731733,-0.000729,1.730881,1.732462


Looks good, appears to moderate the more extreme effects of the second derivative subtraction. A lot of peaks are still not resolved enough though.

Weightings pairs and observations:

1. k1=k1 = 2e-2, k2 = 6e-4. highest I can go without going below the baseline. Good, but plenty of shoulder peaks remain, especially in regions 4 - 5, 5 - 6, 6 - 7, 6 - 9, 11 - 12, 12 - 12.5, etc.

What happens if we subtract the baseline first? It shouldnt make a difference, although it would mean that the weighting would be different - the base value of the derivative wouldnt change, but its magnitude in relation to the signal would.

### Resolution Enhancement After Baseline Correction

Do the same as above but correct the baseline first.


In [13]:
prepro = PreProcesser()
prepro.ingest_signal(ringland, time_col="time", amp_col="signal")
prepro.signal_adjustment(bcorr__n_iter=39)
prepro.signal

adjusted_signal = prepro.signal.select(["time", "adjusted_signal"])
adjusted_signal

Performing baseline correction: 100%|██████████| 39/39 [00:00<00:00, 592.22it/s]


time,adjusted_signal
f64,f64
0.005833,2.2204e-16
0.0125,-6.6613e-16
0.019167,-4.4409e-16
0.025833,-4.4409e-16
0.0325,0.000023
…,…
26.9725,0.000444
26.979167,0.000296
26.985833,0.000094
26.9925,0.000108


In [14]:
subtracted_data_after_baseline_sub = calculate_derivative_subtractions(
    adjusted_signal.rename({"adjusted_signal": "signal"})
)

(
    subtracted_data_after_baseline_sub.plot(
        x="time",
        y=["signal", "signal_adjusted_2nd_diff", "double_subtraction"],
        alpha=[0.5, 0.5],
    )
).opts(height=750, width=1000)

time,signal,first_central_diff,second_central_diff,weighted_first_diff,signal_adjusted_first_diff,weighted_second_diff,signal_adjusted_2nd_diff,double_subtraction
f64,f64,f64,f64,f64,f64,f64,f64,f64
0.005833,2.2204e-16,,,,,,,
0.0125,-6.6613e-16,-2.4980e-14,6.2450e-12,-4.9960e-16,-1.6653e-16,3.7470e-15,-4.4131e-15,-3.9135e-15
0.019167,-4.4409e-16,8.3267e-15,-1.2490e-12,1.6653e-16,-6.1062e-16,-7.4940e-16,3.0531e-16,1.3878e-16
0.025833,-4.4409e-16,0.000858,0.12866,0.000017,-0.000017,0.000077,-0.000077,-0.000094
0.0325,0.000023,0.050325,7.291505,0.001007,-0.000984,0.004375,-0.004352,-0.005359
…,…,…,…,…,…,…,…,…
26.9725,0.000444,-0.00539,-0.86027,-0.000108,0.000552,-0.000516,0.00096,0.001068
26.979167,0.000296,-0.01313,-0.300669,-0.000263,0.000558,-0.00018,0.000476,0.000739
26.985833,0.000094,-0.007031,1.215531,-0.000141,0.000235,0.000729,-0.000635,-0.000495
26.9925,0.000108,-0.003522,-0.689262,-0.00007,0.000179,-0.000414,0.000522,0.000592


First thing to notice is that it is a lot noisier. A direct comparison is needed:

In [15]:
inter_1 = subtracted_data_without_baseline_correction.select(
    pl.col("time"), pl.col("double_subtraction").alias("no_baseline_sub")
)
inter_2 = subtracted_data_after_baseline_sub.select(
    pl.col("double_subtraction").alias("with_baseline_sub")
)

compare = pl.concat([inter_1, inter_2], how="horizontal")
compare.plot(
    x="time",
    y=["no_baseline_sub", "with_baseline_sub"],
    alpha=[0.75, 0.75],
    height=500,
    width=1000,
)

Looks like it needs smoothing before derivative calculations. Use Savitsky-Golay. Scipy has one, use that.

In [16]:
def deriv_subtraction_with_baseline_correction_and_smoothing(input_data):
    prepro = PreProcesser()
    prepro.ingest_signal(input_data, time_col="time", amp_col="signal")
    prepro.signal_adjustment(bcorr__n_iter=39)
    prepro.signal

    from scipy import signal

    adjusted_signal = prepro.signal.select(["time", "adjusted_signal", "background"])

    savgol_input = adjusted_signal.to_series(
        adjusted_signal.columns.index("adjusted_signal")
    ).to_numpy()

    smoothed = signal.savgol_filter(savgol_input, window_length=20, polyorder=2)

    adjusted_signal = adjusted_signal.with_columns(
        pl.Series(name="smoothed", values=smoothed)
    )

    display(adjusted_signal)
    display(
        adjusted_signal.plot(
            x="time",
            y=["adjusted_signal", "smoothed"],
            alpha=[0.75, 0.75],
            width=1000,
            height=500,
            xlim=(20, 26),
            grid=True,
        )
    )

    return adjusted_signal


smoothed_signal = ringland.pipe(
    deriv_subtraction_with_baseline_correction_and_smoothing
)

Performing baseline correction: 100%|██████████| 39/39 [00:00<00:00, 536.48it/s]


time,adjusted_signal,background,smoothed
f64,f64,f64,f64
0.005833,2.2204e-16,0.001952,-0.005903
0.0125,-6.6613e-16,0.001825,-0.002536
0.019167,-4.4409e-16,0.002004,0.000442
0.025833,-4.4409e-16,0.002861,0.003031
0.0325,0.000023,0.003941,0.00523
…,…,…,…
26.9725,0.000444,1.732643,-0.000004
26.979167,0.000296,1.731815,-0.000128
26.985833,0.000094,1.731004,-0.000105
26.9925,0.000108,1.730043,0.000064


Good result with a window length of 5. Now input that into the function

In [17]:
def calc_deriv_subtraction_after_baseline_corr_and_smooth(df):
    out = df.rename({"smoothed": "signal"}).pipe(calculate_derivative_subtractions)

    display(out)
    display(
        out.plot(
            x="time",
            y=["signal", "signal_adjusted_2nd_diff", "double_subtraction"],
            alpha=[0.5, 0.5],
        ).opts(height=750, width=1000)
    )

    return out


smoothed_double_sub = smoothed_signal.pipe(
    calc_deriv_subtraction_after_baseline_corr_and_smooth
)

time,adjusted_signal,background,signal,first_central_diff,second_central_diff,weighted_first_diff,signal_adjusted_first_diff,weighted_second_diff,signal_adjusted_2nd_diff,double_subtraction
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
0.005833,2.2204e-16,0.001952,-0.005903,,,,,,,
0.0125,-6.6613e-16,0.001825,-0.002536,0.237954,-2.190423,0.004759,-0.007295,-0.001314,-0.001221,-0.00598
0.019167,-4.4409e-16,0.002004,0.000442,0.208748,-2.190423,0.004175,-0.003733,-0.001314,0.001757,-0.002418
0.025833,-4.4409e-16,0.002861,0.003031,0.179542,-2.190423,0.003591,-0.00056,-0.001314,0.004345,0.000754
0.0325,0.000023,0.003941,0.00523,0.150337,-2.190423,0.003007,0.002223,-0.001314,0.006544,0.003538
…,…,…,…,…,…,…,…,…,…,…
26.9725,0.000444,1.732643,-0.000004,-0.014763,0.822769,-0.000295,0.000291,0.000494,-0.000498,-0.000202
26.979167,0.000296,1.731815,-0.000128,-0.003793,0.822769,-0.000076,-0.000052,0.000494,-0.000621,-0.000545
26.985833,0.000094,1.731004,-0.000105,0.007177,0.822769,0.000144,-0.000249,0.000494,-0.000599,-0.000742
26.9925,0.000108,1.730043,0.000064,0.018148,0.822769,0.000363,-0.000299,0.000494,-0.00043,-0.000793


time,adjusted_signal,background,signal,first_central_diff,second_central_diff,weighted_first_diff,signal_adjusted_first_diff,weighted_second_diff,signal_adjusted_2nd_diff,double_subtraction
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
0.005833,2.2204e-16,0.001952,-0.005903,,,,,,,
0.0125,-6.6613e-16,0.001825,-0.002536,0.237954,-2.190423,0.004759,-0.007295,-0.001314,-0.001221,-0.00598
0.019167,-4.4409e-16,0.002004,0.000442,0.208748,-2.190423,0.004175,-0.003733,-0.001314,0.001757,-0.002418
0.025833,-4.4409e-16,0.002861,0.003031,0.179542,-2.190423,0.003591,-0.00056,-0.001314,0.004345,0.000754
0.0325,0.000023,0.003941,0.00523,0.150337,-2.190423,0.003007,0.002223,-0.001314,0.006544,0.003538
…,…,…,…,…,…,…,…,…,…,…
26.9725,0.000444,1.732643,-0.000004,-0.014763,0.822769,-0.000295,0.000291,0.000494,-0.000498,-0.000202
26.979167,0.000296,1.731815,-0.000128,-0.003793,0.822769,-0.000076,-0.000052,0.000494,-0.000621,-0.000545
26.985833,0.000094,1.731004,-0.000105,0.007177,0.822769,0.000144,-0.000249,0.000494,-0.000599,-0.000742
26.9925,0.000108,1.730043,0.000064,0.018148,0.822769,0.000363,-0.000299,0.000494,-0.00043,-0.000793


Didnt appear to make a difference, but I cant go higher than 5 (data not shown) without losing peak resolution. Interesting problem.

Now I am curious, what would the data look like if i set all values less than say 1E-6 to zero?

In [18]:
def filter_small_values(df: pl.DataFrame, threshold: float = 1e-1):
    df = df.with_columns(
        pl.when(pl.col("double_subtraction").le(threshold))
        .then(pl.lit(0))
        .otherwise(pl.col("double_subtraction"))
        .alias("infintesimal_filtered")
    )

    return df


filtered = smoothed_double_sub.pipe(filter_small_values)
display(filtered)
display(
    filtered.plot(
        x="time",
        y=["double_subtraction", "infintesimal_filtered"],
        alpha=[0.75, 0.75],
        height=500,
        width=1000,
        grid=True,
    )
)

time,adjusted_signal,background,signal,first_central_diff,second_central_diff,weighted_first_diff,signal_adjusted_first_diff,weighted_second_diff,signal_adjusted_2nd_diff,double_subtraction,infintesimal_filtered
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
0.005833,2.2204e-16,0.001952,-0.005903,,,,,,,,
0.0125,-6.6613e-16,0.001825,-0.002536,0.237954,-2.190423,0.004759,-0.007295,-0.001314,-0.001221,-0.00598,0.0
0.019167,-4.4409e-16,0.002004,0.000442,0.208748,-2.190423,0.004175,-0.003733,-0.001314,0.001757,-0.002418,0.0
0.025833,-4.4409e-16,0.002861,0.003031,0.179542,-2.190423,0.003591,-0.00056,-0.001314,0.004345,0.000754,0.0
0.0325,0.000023,0.003941,0.00523,0.150337,-2.190423,0.003007,0.002223,-0.001314,0.006544,0.003538,0.0
…,…,…,…,…,…,…,…,…,…,…,…
26.9725,0.000444,1.732643,-0.000004,-0.014763,0.822769,-0.000295,0.000291,0.000494,-0.000498,-0.000202,0.0
26.979167,0.000296,1.731815,-0.000128,-0.003793,0.822769,-0.000076,-0.000052,0.000494,-0.000621,-0.000545,0.0
26.985833,0.000094,1.731004,-0.000105,0.007177,0.822769,0.000144,-0.000249,0.000494,-0.000599,-0.000742,0.0
26.9925,0.000108,1.730043,0.000064,0.018148,0.822769,0.000363,-0.000299,0.000494,-0.00043,-0.000793,0.0


Now, while the peaks are not fully resolved, and the 4 - 5 min region is looking very troubling, there seems to be some hope of reasonable window sizes. I am expecting at least 10 individual peak windows. Will this be the case? if not, I need to reassess the 'windowing' algorithm.

Side note - the new algorithm would simply demarcate via the baseline - if the baseline is zero after the infintesimal filter, its interpeak. if not, its peak.

Side note side note - DAG pipelines are what you need for problems such as this, where you need to apply stages in different  orders, possibly repeated. (I think?)

Addendum: I am not happy with the baseline subtraction from SNIP. Since N_ITER is the only variable, its not flexible enough, and not able to adapt to the different regions of the signal, i.e. 0.8 to 1.56. BEADS (see: https://www.sciencedirect.com/science/article/abs/pii/S0169743914002032?via%3Dihub) is a newer method with a python implementation (see: https://github.com/skotaro/pybeads). Lets test it out [here](preprocessing_2_beads.ipynb). Results - BEAD package is not documented enough for use, the default settings and my tinkering have produced an underfit estimation that obliterates too much peak information.

In [19]:
def window_after_infintesimal(df: pl.DataFrame):
    """
    To skip 'signal_adjustment' I need to set "signal_adjusted" to True and set 'signal' to an input, and rename the columns in df. Might as well skip 'injest_signal' while we're at it.
    """
    # display(df)
    input_signal = df.with_row_index("idx").select(
        pl.col("idx").cast(int),
        pl.col("time"),
        pl.col("signal"),
        pl.col("infintesimal_filtered").alias("adjusted_signal").fill_null(0),
        pl.col("background"),
    )

    prepro = PreProcesser()
    prepro.signal_injested = True
    prepro.signal_adjusted = True

    prepro.signal = prepro.signal.join(
        input_signal,
        how="outer_coalesce",
        on=["idx", "time", "signal", "adjusted_signal", "background"],
    )

    prepro.map_peaks(find_peaks_kwargs={"prominence": 0.0001})
    prepro.map_windows()
    display(prepro.viz_preprocessing().opts(height=500, width=1250))
    # display(prepro.window_mapper.window_bounds_)

    display(prepro.signal)

    display(prepro.peak_map.maxima)


window_after_infintesimal(df=filtered)

# TODO: window the peak map in preprocessor. Investigate the number of peaks per window



idx,time,signal,adjusted_signal,background,w_type,w_idx
i64,f64,f64,f64,f64,str,i64
0,0.005833,-0.005903,0.0,0.001952,"""interpeak""",0
1,0.0125,-0.002536,0.0,0.001825,"""interpeak""",0
2,0.019167,0.000442,0.0,0.002004,"""interpeak""",0
3,0.025833,0.003031,0.0,0.002861,"""interpeak""",0
4,0.0325,0.00523,0.0,0.003941,"""interpeak""",0
…,…,…,…,…,…,…
4045,26.9725,-0.000004,0.0,1.732643,"""interpeak""",26
4046,26.979167,-0.000128,0.0,1.731815,"""interpeak""",26
4047,26.985833,-0.000105,0.0,1.731004,"""interpeak""",26
4048,26.9925,0.000064,0.0,1.730043,"""interpeak""",26


Unnamed: 0,p_idx,loc,dim,value
0,0,maxima,idx,106.000000
1,1,maxima,idx,132.000000
2,2,maxima,idx,148.000000
3,3,maxima,idx,166.000000
4,4,maxima,idx,189.000000
...,...,...,...,...
165,80,maxima,X,0.371374
166,81,maxima,X,0.695790
167,82,maxima,X,0.129392
168,83,maxima,X,0.138736


Great success! Applicatino of the above has resulted in 25 individual peak windows 