In [1]:
#importing cell
import numpy as np
import matplotlib.pyplot as plt
import pysindy as ps
# numerical differentiation
from scipy.misc import derivative

# PySINDy

We will first deploy the package `PySINDy` developed by the authors to perform the PDE-FIND method on our generated data.

The <a href=https://pysindy.readthedocs.io/en/latest/index.html#>documentation</a> provides a basic description of the package's workflow: it revolves around the use of the `SINDy` object that performs the regression task and consists of three basic components, one for each crucial step of the algorithm:

1. `differentiation_method`: determines the computation of the derivatives, but it's possible to supply them manually
2. `feature_library`: this determines the library the algorithm will use to try and guess the underlying PDE
3. `optimizer`: implements the actual sparse regression algorithm

The `SINDy` object has a similar syntax to `sklearn`'s model objects (it was written with `sklearn` compatibility in mind) and is capable of using the SINDy results to, for instance, evolve a different initial condition, or to predict derivatives, and much more.

Let's use this to try and retrieve the original Lorenz system from the snapshots of its simulated dynamics. First things first, we load up the data:

In [8]:
# load data
r = np.load("./data/lorenz_r.npy")
t = np.load("./data/lorenz_t.npy")

# r is shaped like (n_points,n_dimensions)
print("Data vector:")
print(r[:5])

# t is time axis
print("\n\nTime vector:")
print(t[:5])

Data vector:
[[1.         1.         1.        ]
 [1.00012952 1.02598928 0.99834857]
 [1.0005163  1.05196112 0.99672775]
 [1.00115757 1.07792254 0.99513773]
 [1.00205076 1.1038803  0.99357873]]


Time vector:
[0.         0.00100001 0.00200002 0.00300003 0.00400004]


Then we instantiate a `SINDy` model object. 

In order to do that, we will need to first specify an `optimizer` object that will operate the regression: the <a href=https://pysindy.readthedocs.io/en/latest/api/pysindy.optimizers.html>documentation</a> again provides an exhaustive description of the available optimizer algorithms. For now, we will simply let the model default to STLSQ submodule, which performs a sequentially thresholded least squares algorithm, i.e. minimizes the loss $||y- Xw ||^2 + \alpha ||w||_2 ^2$ by performing iteratively a least squares regression followed by a mask application that filters out $w$ coefficients whose magnitude lies under a certain threshold. This is the main algorithm suggested in the author's paper and it defaults to a ridge regression (with L2 norm, as opposed to Lasso regression).

Then, we need to define the features library, that is the subspace of possible derivatives combinations we wish to explore to find those that are the most informative about our system's evolution. Again, PySINDy provides a variety of different submodules, and the default one that is called when instantiating the model is `PolynomialLibrary`

In [None]:
# missing pieces


feature_names = ['x','y','z']

model = ps.SINDy(feature_names=feature_names)

# PDE-FIND implementation

### Data loading

In [15]:
x,y,z = np.load("./data/lorenz_r.npy").T
t = np.load("./data/lorenz_t.npy")
dt = t[1]-t[0] # its evenly spaced

In [17]:
# we will use numpy's gradient method that implements second order accurate central differences

xdot = np.gradient(x,dt)
ydot = np.gradient(y,dt)
zdot = np.gradient(z,dt)

Basically we want to interpolate a linear regression in a space composed of the original "features" plus their derivatives and/or non linear combinations. We want to do this three times, one for each equation of the system.

In [None]:
features = ['bias','x','y','z','xy','yz','zx','x^2','y^2','z^2']

# build dataframe with these columns

# initialize vecotr of coefficients xi

# ridge regression on them