## ANN Background Correction

[@kensart_2021] developed a ANN written in Python (Tensorflow) that is able to perform unsupervised background correction.

## Niezen Solution

[@niezen_2022] performed an in-depth analysis of a number of background correction methods and concluded that:

| method          | outcome                          |
|-----------------|----------------------------------|
| SASS and arPLS  | lowest RMSE, best looking result |
| SASS and LMV    | smallest error in peak area      |
| backcor and LMV | fastest drift correction         |
| arPLS           | slowest drift correction         |

LMV = Local Minimum Values. A method developed by [@fu_2016].

## Notes on Baseline Algorithms

TODO:
- [ ] complete notes on:
  - [ ] splines
  - [ ] smoothing
  - [ ] classification
  - [ ] optimizer
  - [ ] misc
- [ ] complete top level table describing each class
- [ ] justify why iasls has been selected
- [ ] 

According to PyBaselines, there are the following categories of baseline calculating algorithms:
* polynomial
* Whittaker-smoothing
* morphological
* spline
* smoothing
* baseline/peak classification
* optimizers
* misc.

| CLASS | DESCRIPTION |
|---|---|
| Polynomial | Relies on minimizing the least squares. |
| Whittaker | Also utilizes least squares, but includes cost functions that reward smoothness |
| Spline | Combines least squares methods with localised optimizations |
| Smoothing |test | 
| Classification |test|
| Optimizer |test|
| Misc. |  |


## Whittaker Baselines

* Whittaker-smoothing-based algorithms are also known as weighted least squares, penalized least squares, or asymmetric least squares .
* They work by making the baseline match the data while penalizing roughness to avoid overfitting.
* The core function is $$\sum\limits_{i}^N w_i (y_i - z_i)^2 + \lambda \sum\limits_{i}^{N - d} (\Delta^d z_i)^2$$ and the linear equation to solve the minimization is: $$(W + \lambda D_d^{\top} D_d) z = W y$$
* It is generally recommended to use the 2nd order difference matrix, but some adaptions use both the first and second order.
* The algorithm works iteratively by solving for the baseline $z$, updating the weights and solving for $z$ again, so on and so forth until a preset criteria is reached. (JS - how are the weights reset?, how is the criteria determined?)

| Abbreviation | Full Name                | Equation                                                                                 | Weighting | 
|--------------|--------------------------|------------------------------------------------------------------------------------------|---|
| asls         | Asymmetric Least Squares | $\sum\limits_{i}^N w_i (y_i - z_i)^2 + \lambda \sum\limits_{i}^{N - d} (\Delta^d z_i)^2$ |$w_i = \begin{array}{cr}p & y_i > z_i \\1 - p & y_i \le z_i\end{array}$ |
| iasls | Improved Assymmetric Least Squares | $\sum\limits_{i}^N (w_i (y_i - z_i))^2 + \lambda \sum\limits_{i}^{N - 2} (\Delta^2 z_i)^2 + \lambda_1 \sum\limits_{i}^{N - 1} (\Delta^1 (y_i - z_i))^2$ | $w_i = \begin{array}{cr}p & y_i > z_i \\1 - p & y_i \le z_i\end{array}$ |
| airpls | Adaptive Iteratively Reweighted Penalized Least Squares | $\sum\limits_{i}^N w_i (y_i - z_i)^2 + \lambda \sum\limits_{i}^{N - d} (\Delta^d z_i)^2$ | $w_i = \begin{array}{cr}0 & y_i \ge z_i \\exp{(\frac{t (y_i - z_i)}{\|\mathbf{r}^-\|}} & y_i < z_i\end{array}$ |
| arpls | Asymmetrically Reweighted Penalized Least Squares | $\sum\limits_{i}^N w_i (y_i - z_i)^2 + \lambda \sum\limits_{i}^{N - d} (\Delta^d z_i)^2$ | $w_i = \frac{1}{1 + exp{\left(\frac{2(r_i - (-\mu^- + 2 \sigma^-))}{\sigma^-}\right)}}$ |
| drpls | Doubly Reweighted Penalized Least Squares | $\sum\limits_{i}^N w_i (y_i - z_i)^2+ \lambda \sum\limits_{i}^{N - 2}(1 - \eta w_i) (\Delta^2 z_i)^2+ \sum\limits_{i}^{N - 1} (\Delta^1 (z_i))^2$ | $w_i = \frac{1}{2}\left(1 -\frac{exp(t)(r_i - (-\mu^- + 2 \sigma^-))/\sigma^-}{1 + abs[exp(t)(r_i - (-\mu^- + 2 \sigma^-))/\sigma^-]}\right)$ |
| iarpls | Improved Asymmetrically Reweighted Penalized Least Squares | $\sum\limits_{i}^N w_i (y_i - z_i)^2 + \lambda \sum\limits_{i}^{N - d} (\Delta^d z_i)^2$ | $w_i = \frac{1}{2}\left(1 -\frac{exp(t)(r_i - 2 \sigma^-)/\sigma^-}{\sqrt{1 + [exp(t)(r_i - 2 \sigma^-)/\sigma^-]^2}}\right)$ |
| aspls | Adaptive Smoothness Penalized Least Squares | $\sum\limits_{i}^N w_i (y_i - z_i)^2+ \lambda \sum\limits_{i}^{N - d} \alpha_i (\Delta^d z_i)^2$ | $w_i = \frac{1}{1 + exp{\left(\frac{0.5 (r_i - \sigma^-)}{\sigma^-}\right)}}$ |
| psalsa | Peaked Signal's Asymmetric Least Squares Algorithm |  $\sum\limits_{i}^N w_i (y_i - z_i)^2 + \lambda \sum\limits_{i}^{N - d} (\Delta^d z_i)^2$ | $w_i = \begin{array}{cr} p \cdot exp{\left(\frac{-(y_i - z_i)}{k}\right)} & y_i > z_i \\ 1 - p & y_i \le z_i \end{array}$ |
| derpsalsa | Derivative Peak-screening Asymmetric Least Squares Algorithm | $\sum\limits_{i}^N w_i (y_i - z_i)^2 + \lambda \sum\limits_{i}^{N - d} (\Delta^d z_i)^2$ | $w_i = w_{0i} * w_{1i} * w_{2i}$* |

* $w_{0i} = \begin{array}{cr}p \cdot exp{\left(\frac{-[(y_i - z_i)/k]^2}{2}\right)} & y_i > z_i \\1 - p & y_i \le z_i\end{array}$, $w_{1i} = exp{\left(\frac{-[y_{sm_i}' / rms(y_{sm}')]^2}{2}\right)}$, $w_{2i} = exp{\left(\frac{-[y_{sm_i}'' / rms(y_{sm}'')]^2}{2}\right)}$


## polynomial

### Fitting a polynomial

For the polynomial: $$p(x) = \beta_0 x^0 + \beta_1 x^1 + \beta_2 x^2 + ... + \beta_m x^m = \sum\limits_{j = 0}^m {\beta_j x^j}$$

Fitting the polynomial can be achieved by minimizing the least-squares: $$\sum\limits_{i}^N w_i^2 (y_i - p(x_i))^2$$ 

where $x_i$, $y_i$ is the measured data, $p(x_i)$ is the polynomial estimate at $x_i$, and $w_i$ is the weighting. 

### Fitting a baseline

To fit only the baseline of a polynomial, the least-squares algorithm must be altered to disregard peaks. This can be achieved through selective masking, thresholding or penalizing outliers.

#### Selective masking

The oldest method of fitting a baseline, it involves removing all peak regions from the dataset then fitting the remaining points. This is not a recommended approach as the manual selection of the baseline region is time-consuming and subjective, reducing the reproducibility of results.

#### thresholding

Thresholding is a two-fold process where a least-squares fit is established then a comparison is made between the fit and each data point in an iterative process. In each iteration The minimum between the datapoint and the fit is found, and the fit is adjusted. This continues until an exit criteria is reached.

#### Penalizing outliers

This approach lends less weight to outliers (i.e. peaks) when fitting the baseline.

### Algorithms

| abbrev. | Full Name | Description |
|---|---|---|
| poly | Regular Polynomial | Least squares polynomial fitting, use with selective masking |
| modpoly/ModPolyFit | Modified Polynomial | polynomial fitting with thresholding |
| imodpoly/IModPolyFit | Improved Modified Polynomial | adaption of modpoly for noisy data by including the stdev of the residual during thresholding |
| penalized_poly/backcor | Penalized Polynomial |  uses non-squadratic cost functions - Huber, truncated-quaratic, indec |
| loess/rbe | Locally Estimated Scatterplot Smoothing/Robust Baseline Estimate | element-wise calculation of baseline by applying polynomial regression on the k-nearest neighbours of the element. Outliers are reduced by iterative reweighting |
| quant_reg | Quantile Regression | Uses quantile regression |
| goldinc | Goldindec Method | Uses asyummetric non-quadratic cost functions |

Source [bybaselines](https://pybaselines.readthedocs.io/en/latest/algorithms/polynomial.html)

## Spline Baselines

- PyBaselines uses 'B-splines'

$$z(x) = \sum\limits_{i}^N \sum\limits_{j}^M {B_j(x_i) c_j}$$

where $N$ is the number of points in $x$, $M$ is the number of spline basis functions, $B_j(x_i)$ is the j-th basis function evaluated at $x_i$, and $c_j$ is the coefficient for the j-th basis (which is analogous to the height of the j-th basis). $M$ is calculated as the number of knots plus the spline degree minus 1.

- A zBspline is formed by minimizing the least squares: $$\sum\limits_{i}^N w_i (y_i - \sum\limits_{j}^M {B_j(x_i) c_j})^2$$

- T control smoothness of baseline, a penalty is added in the form of a cost function. These kinds of B-splines are called P-splines: $$\sum\limits_{i}^N w_i (y_i - \sum\limits_{j}^M {B_j(x_i) c_j})^2+ \lambda \sum\limits_{i}^{M - d} (\Delta^d c_i)^2$$ where $\lambda$ is the penalty scale factor, $\Delta^d$ is the finite difference operator of order $d$. This is solved with the linear equation: $$(B^{\top} W B + \lambda D_d^{\top} D_d) c = B^{\top} W y$$ where $W$ is the diagonal matrix of the weights, $B$ is the matrix containing all of the spline basis functions, $D_d$ is the matrix version of $\Delta^d$.

- P-splines can be thought of as an adaption of Whittaker smoothing, a P-spline with $M=N$ and $d=0$ has an equation equal to that of the solution to Whittaker smoothing.

### Algorithms

| Abbrev. | Full Name | Description |
|---|---|---|
| mixture_model | Mixture Model | test |
| irsqr | Iterative Reweighted Spline Quantile Regression | Quantile regression through P-splines with iterative reweighted least squares |
| corner_cutting | Corner-Cutting Method | test |