## Background Analyser

Now that the results are collected, I need to define an analysis class. it needs to perform the following calculations:

1. background 1st and second deriv
  - Turns out this isnt as easy as thought. need to decide between finite difference or polynomial fit
2. AUC of input and adjusted signal:
  1. py_hplc approximate AUC by summing the y values, but that doesnt include area in between the points. So 
  the lower the sampling rate, the less accurate that is. Good first start though.
3. num peaks per peak window
4. window start, finish, length

Also want to seperate the times into quartiles.

In [None]:
import seaborn as sns
import numpy as np
from numpy.polynomial import Polynomial
from numpy.polynomial import Legendre


class BackgroundAnalyser:
    """
    Main intent is to observe the 'roughness' of the estimated background via the SNIP
    method, providing a measure of overfit.
    """

    def __init__(self, results: Results):
        self.results = results
        self.background = self.results.signals.filter(pl.col("n_iter") == 40).select(
            pl.col("time"), pl.col("background")
        )
        self.x = np.linspace(0, 1, len(self.background))
        self.y = self.background.select("background").to_numpy().ravel()
        pass

    def finite_diffs(self):
        """
        Calculate the first and second central differences. The end points are handled as forward and back difference, respectively.
        """
        idx = "idx"
        x = str(Signals.time)
        y = str(Signals.background)

        self.background_diffs: pl.DataFrame = (
            self.results.signals.with_row_index(idx)
            .select(
                pl.col("n_iter"),
                pl.col(x),
                pl.col(y),
                pl.col(y).diff().alias("first_diff"),
                pl.col(y).diff().diff().alias("second_diff"),
            )
            .melt(id_vars=["n_iter", x], variable_name="diff", value_name="values")
        )

        display(
            sns.relplot(
                data=self.background_diffs, x=x, y=y, col="n_iter", by="diff", colwrap=2
            )
        )

    def fit_background_polynomial(self):
        """
        Approximate the background with a polynomial fit.

        The result is poor because the shape of the background is essentially two convoluted gaussian peaks,
        requiring a polynomial degree > 200 to fit the general shape.

        least squares fit assessment is done through observation of:
        - R squared
        - F score
        - Root Mean Square Error

        These are base on Sum of Squares Total and Sum of Squares Error. See <https://www.theanalysisfactor.com/assessing-the-fit-of-regression-models/#:~:text=Three%20statistics%20are%20used%20in,of%20Squares%20Error%20(SSE)>

        NOTE: currently only calculates R^2, and does nothing with it.

        TODO: determine how to store the results, and then what to do with them. Probably do average absolute deviation from the fitted line.
        """

        c: np.polynomial.legendre.Legendre
        out_diagnostics: list
        c, out_diagnostics = Legendre.fit(x=self.x, y=self.y, deg=200, full=True)
        fitted_background = c.linspace(len(self.background))

        # resid: Residual Sum of Squares - \sum^n_{i=1} { (y_i - f(x_i))^2 }

        diagnostics_keys = ["resid", "rank", "sv", "rcond"]

        diagnostics = dict(zip(diagnostics_keys, out_diagnostics))

        df = (
            self.background.with_columns(
                pl.Series(name="fitted_background", values=fitted_background[1])
            )
            .melt(id_vars="time", variable_name="f", value_name="values")
            .to_pandas()
        )

        # R^2 = 1 - RSS / TSS

        # TSS

        tss = ((self.y - self.y.mean()) ** 2).sum()

        # RSS

        rss = diagnostics["resid"]

        r_2 = 1 - (rss / tss)

        # display(diagnostics)
        display(df)
        display(r_2)
        display(sns.lineplot(data=df, x="time", y="values", hue="f"))

In [None]:
# background_analyser = BackgroundAnalyser(results=initial_results)
# analyzer.analyse_background()
# background_analyser.fit_background_polynomial()