---
title: "Testing Baseline Correction Estimators"
cdt: "2024-10-20T17:36:49"
description: "a demonstration of use of baseline correction estimators and their performance on the 0 - 100 idx subset of the shiraz dataset."
status: "closed"
conclusion: "Two estimators were developed: ASLS and SNIP. They were able to transform the data. A SQL based results data model was developed with a class based api able to provide visusalisations of the results in 2 and 3D. It was observed that the subset did not require baseline subtraction. ASLS overfit the data, especially convoluted peaks. SNIP performed very poorly without a baseline on the right side of the signal, indicating that it is unsuitable for busy signals."
project: parafac2
---


In [None]:
%reload_ext autoreload
%autoreload 2


# get the test data as two tables: metadata and a samplewise stacked img table

import duckdb as db
from pca_analysis.definitions import DB_PATH_UV
from pca_analysis.code.get_sample_data import get_ids_by_varietal

import polars as pl

from pca_analysis.notebooks.experiments.parafac2_pipeline.orchestrator import (
    Orchestrator,
)

from pca_analysis.notebooks.experiments.parafac2_pipeline.estimators import (
    BCorr_ASLS,
    BCorr_SNIP,
)

import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

con = db.connect(DB_PATH_UV)
ids = get_ids_by_varietal(con=con, varietal="shiraz")


testdata_filter_expr = pl.col("mins").is_between(0.7, 1.39) & pl.col("nm").is_between(
    240, 270
)

orc = Orchestrator()
orc.load_data(con=con, runids=ids, filter_expr=testdata_filter_expr)
orc.input_data.plot_3d()


# ASLS

We will first test ASLS.


In [None]:
bcorr = BCorr_ASLS(
    lam=1e5,
)
bcorr.fit_transform(XX)

bcorr.get_bcorr_results().viz_compare_signals(10)


As shown in the graphic above, the baseline fit is acceptable with a lam of 1E5. However on inspection of sample 5 and 0, it is evident that there is little no baseline present, and that ASLS easily overfits the convoluted peaks. It is not advisable to use baseline correction within this interval.


## SNIP

Another Baseline correction method I am familiar with is the SNIP algorithm.

In [None]:
bcorr_snip = BCorr_SNIP(
    max_half_window=15,
)

bcorr_snip.fit_transform(XX)

bcorr_snip.get_bcorr_results().viz_compare_signals(10)


As we can see, the results for snip are poor for the given dataset.

# Conclusion

It should first be stated that for the dataset range idx 0 - 100 that baseline correction is unnecessary. That being said, it is clear that ASLS is inclined to overfit convooluted peaks, and that SNIP performs very poorly if a baseline is not present on both sides of the signal. SNIP is thus unsuitable for highly packed signals, and that overall baseline correction should be restricted to the entire signal interval rather than subset by subset, and that a generous baseline should be included on either end.