In this notebook we will try to explore the `PySINDy` package architecture, in order to easily implement our bayesian version of this algorithm with a sparsity inducing prior.

We will implement a "custom" `optimizer` module, that will implement a Maximum A Posteriori (_MAP_) algorithm over the distribution derived from data and a (possibly sparsity-inducing) prior distribution.
The main difference is that this object will retain information over the whole probability distribution for the coefficients $\boldsymbol{\xi}$ and not just the best estimates, so that we can evaluate uncertainties over such parameters.

The goal of this algorithm is to compute

$$
P(\boldsymbol{\xi}|\boldsymbol{\dot{u}},\boldsymbol{\Theta}) = P(\boldsymbol{\dot{u}}|\boldsymbol{\xi},\boldsymbol{\Theta})P(\boldsymbol{\xi})
$$

Under the assumption that the likelihood of observing $\boldsymbol{\dot{u}}$ given a certain coefficients vector $\boldsymbol{\xi}$ is a gaussian with mean given by the linear relation:

$$
P(\boldsymbol{\dot{u}}|\boldsymbol{\xi},\boldsymbol{\Theta}) \sim \mathcal{N}(\boldsymbol{\Theta}^T\boldsymbol{\xi},\sigma^2)
$$


And $P(\boldsymbol{\xi})$ will be a sparsity inducing prior, so that the original goal of finding the smallest amount of explanatory terms possible is somewhat obtained.

#### The `BaseOptimizer` class

This is the wrapper class for each optimizer algorithm that the package provides; we will build a optimizer module as a subclass of this wrapper. <a href=https://pysindy.readthedocs.io/en/latest/_modules/pysindy/optimizers/base.html#BaseOptimizer>Source code</a> is available on the documentation.

#### Bayesian Regression implementation

The class will evaluate the best coefficients by performing a Gradient Descent on the posterior distribution, obtained by Bayes theorem with the previously presented likelihood and a prior of choice:

$$
\boldsymbol{\xi}_{best} = \argmin  \left [ - P(\boldsymbol{\xi}|\boldsymbol{\boldsymbol{\dot{u}}},\boldsymbol{\Theta}) \right ]
$$

The initial guess for the coefficients will be the result of a OLS algorithm (already provided by the `BaseOptimizer` class)


In [3]:
from pysindy.optimizers import BaseOptimizer



class BayesianRegression (BaseOptimizer):
    """
    Bayesian Regression Optimizer.
    
    Evaluates the probability distribution over the coefficients w
    by assuming a data likelihood for y of a gaussian centered at X @ w,
    with a sparsity inducing prior.

    Parameters
    ----------
    fit_intercept : boolean, optional (default False)
        Whether to calculate the intercept for this model. If set to false, no
        intercept will be used in calculations.

    normalize_columns : boolean, optional (default False)
        Normalize the columns of x (the SINDy library terms) before regression
        by dividing by the L2-norm. Note that the 'normalize' option in sklearn
        is deprecated in sklearn versions >= 1.0 and will be removed.

    copy_X : boolean, optional (default True)
        If True, X will be copied; else, it may be overwritten.
    
    lr : float, optional (default 0.1)
        Learning rate for the gradient descent. 
    """