Use of `glm_utils` in a `sklearn.pipeline`.

In [4]:
from tempfile import TemporaryDirectory
import numpy as np
from sklearn.linear_model import BayesianRidge
from sklearn.preprocessing import FunctionTransformer, PolynomialFeatures
from sklearn.pipeline import Pipeline

from glm_utils.preprocessing import time_delay_embedding

# Generate dummy data.
x = np.random.random((1000, 1))
y = np.random.random((1000,))

# Manipulations regarding X *and* y
X, y = time_delay_embedding(x, y, window_size=100)

# Transformations that only change X are included in the pipeline.
steps = [('quad_exp', PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)),
         ('ridge', BayesianRidge())]

with TemporaryDirectory() as tempdir:
    model = Pipeline(steps, memory=tempdir)
    model.fit(X, y)
    y_pred, y_pred_std = model.predict(X, return_std=True)
    print(f'r2={model.score(X, y):1.2}')

r2=1.0


## Notes
Processing steps that require manipulations of `X` *and* `y` - such as time delay embedding, balancing, test-train split - are currently not supported by sklearn pipelines (see [here](https://github.com/scikit-learn/scikit-learn/issues/4143)). These steps should be performed before the pipeline.

Processing steps that only affect `X` - feature normalization or scaling, basis function projections, PCA, polynomial expansions - can be included in the pipeline. Custom functions should follow this signature: `X = func(X, **kwargs)` and can be integrated into a pipeline with the [FunctionTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html#sklearn.preprocessing.FunctionTransformer). Alternatively, a custom class implementing the [Transformer interface](https://scikit-learn.org/stable/developers/contributing.html#rolling-your-own-estimator) can be used. Custom classes have the advantage of providing access to parameters of the transform - e.g. the basis vectors used in a basis function projection.
