# Backcor as Suitable Baseline Correction Method for Chromatographic Data

  [@ning2014] defines a chromatogram as a combination of "positive peaks and a relatively flat baseline". They assert that for such signals, a baseline fitting algorithm that incorporates an asymmetric penalty function that treats positive and negative values differently is suitable, providing examples from ([14] P. H. C. Eilers. Unimodal smoothing. J. Chemometrics, 19(5-7):317–328, May 2005), ([32] V. Mazet, D. Brie, and J. Idier. Baseline spectrum estimation using half-quadratic minimization. In Proc. Eur. Sig. Image Proc. Conf., pages 305–308, Vienna, Austria, Sep. 7-10, 2004.), ([33] V. Mazet, C. Carteret, D. Brie, J. Idier, and B. Humbert. Background removal from spectra by designing and minimising a non-quadratic cost function. Chemometr. Intell. Lab. Syst., 76(2):121–133, 2005.) as examples. They note that *backcor* incorporates an asymmetric penalty function.

In [None]:
# setup

%load_ext autoreload
%autoreload 2

import pandas as pd
import seaborn as sns
from wine_analysis_hplc_uv import definitions
from wine_analysis_hplc_uv.signal_processing.mindex_signal_processing import SignalProcessor
from pybaselines import Baseline
import matplotlib.pyplot as plt

scipro = SignalProcessor()
df = pd.read_parquet(definitions.TEST_PARQ_PATH)
df

In [None]:
methods = ['asymmetric_truncated_quadratic',
'asymmetric_huber',
'asymmetric_indec',
]
for method in methods:
    def backcorblinefunc(df: pd.DataFrame) -> pd.DataFrame:
        df = df.assign(
            bline=Baseline(
                x_data=df.index.get_level_values("mins").total_seconds(), assume_sorted=True
            ).penalized_poly(df["value"], cost_function=method)[0],
        ).assign(blinesub=lambda df: df.eval("value - bline"))

        return df


    (
        df4.stack(["samplecode", "wine"])
        .groupby(["samplecode"], group_keys=False)
        .apply(lambda df: backcorblinefunc(df))
        .unstack(["samplecode", "wine"])
        .pipe(lambda df: df if df.pipe(scipro.vars_subplots) else df)
    )
    plt.suptitle(method)
    plt.show()

However we can see that for this sample dataset, all of the asymmetric cost functions fit poorly, especially at the ends of the signals.