## Pipelines

The ```compose.Pipeline``` contains all the logic for building and applying pipelines. A pipeline is essentially a list of estimators that are applied in sequence. The only requirement is that the first ``n - 1`` steps be transformers. The last step can be a regressor, a classifier, a clusterer, a transformer, etc. Here is an example:

In [31]:
from river import compose
from river import linear_model
from river import preprocessing
from river import feature_extraction

model = compose.Pipeline(
    preprocessing.StandardScaler(),
    feature_extraction.PolynomialExtender(),
    linear_model.LinearRegression()
)

You can also use the ``|`` operator, as so:

In [3]:
model = (
    preprocessing.StandardScaler() |
    feature_extraction.PolynomialExtender() |
    linear_model.LinearRegression()
)

In [4]:
model

In [32]:
from river import datasets
dataset = datasets.TrumpApproval()
x, y = next(iter(dataset))
x, y 

({'ordinal_date': 736389,
  'gallup': 43.843213,
  'ipsos': 46.19925042857143,
  'morning_consult': 48.318749,
  'rasmussen': 44.104692,
  'you_gov': 43.636914000000004},
 43.75505)

`compose.Pipeline` inherits from `base.Estimator`, which means that it has a `learn_one` method. You would expect `learn_one` to update each estimator, but that's not actually what happens. Instead, the transformers are updated when `predict_one` (or `predict_proba_one` for that matter) is called. Indeed, in online machine learning, we can update the unsupervised parts of our model when a sample arrives. We don't have to wait for the ground truth to arrive in order to update unsupervised estimators that don't depend on it. In other words, in a pipeline, `learn_one` updates the supervised parts, whilst `predict_one` updates the unsupervised parts. It's important to be aware of this behavior, as it is quite different to what is done in other libraries that rely on batch machine learning.

Quindi non ho ben capito il fattore parte non supervisionata e parte supervisionatata... Cioè abbiamo due parti nell'online machine Learning, una parte non supervisonata che quindi prende e impara soltanto dal dato che gli arriva. Mentre poi abbiamo la parte supervisionata, quindi in quella parte abbiamo bisogno di sapere quel è la ground truth.

Quindi tornato ai metodi che abbiamo... Quindi in una pipeline il `predict_one` aggiorna la parte **non supervisionata** mentre nella pipeline il `learn_one` aggiorna la parte **superivisonata**.

In [33]:
from river import datasets
dataset = datasets.TrumpApproval()
x, y = next(iter(dataset))
x, y

({'ordinal_date': 736389,
  'gallup': 43.843213,
  'ipsos': 46.19925042857143,
  'morning_consult': 48.318749,
  'rasmussen': 44.104692,
  'you_gov': 43.636914000000004},
 43.75505)

In [40]:
model.predict_one(x)
for x,y in dataset:
    model.learn_one(x,y)
    model.predict_one(x)
# quindi se chiamiamo questo metodo questo trasformerà una parte ma non aggiornerà la regressione lineare cioè il modello

In [41]:
model['StandardScaler'].means

defaultdict(float,
            {'ordinal_date': 736888.5029821073,
             'gallup': 40.80096647912525,
             'ipsos': 40.709485920382456,
             'morning_consult': 41.08428728370769,
             'rasmussen': 40.74683911729628,
             'you_gov': 40.79994094314171})

In [42]:
model.transform_one(x)

{'ordinal_date': 1.7201331526842627,
 'gallup': 1.1796051730721018,
 'ipsos': -0.05278731877136413,
 'morning_consult': -1.2183386193740897,
 'rasmussen': -0.24187513836415178,
 'you_gov': 0.4355392002918189,
 'ordinal_date*ordinal_date': 2.958858062963501,
 'gallup*ordinal_date': 2.0290779652791797,
 'ipsos*ordinal_date': -0.09080121705993574,
 'morning_consult*ordinal_date': -2.0957046503809447,
 'ordinal_date*rasmussen': -0.41605744431027064,
 'ordinal_date*you_gov': 0.749185417715549,
 'gallup*gallup': 1.391468364338463,
 'gallup*ipsos': -0.062268194295307194,
 'gallup*morning_consult': -1.4371585379671987,
 'gallup*rasmussen': -0.2853171644518838,
 'gallup*you_gov': 0.5137642937399158,
 'ipsos*ipsos': 0.002786501023069612,
 'ipsos*morning_consult': 0.06431282907236374,
 'ipsos*rasmussen': 0.012767940031696286,
 'ipsos*you_gov': -0.022990946603229256,
 'morning_consult*morning_consult': 1.484348991458363,
 'morning_consult*rasmussen': 0.2946858221354976,
 'morning_consult*you_gov':

In many cases, you might want to connect a step to multiple steps. For instance, you might to extract different kinds of features from a single input. An elegant way to do this is to use a `compose.TransformerUnion`. Essentially, the latter is a list of transformers who's results will be merged into a single dict when `transform_one` is called. As an example let's say that we want to apply a `feature_extraction`.RBFSampler as well as the `feature_extraction.PolynomialExtender`. This may be done as so:

In [28]:
model = compose.Pipeline(
    preprocessing.StandardScaler(),
    (feature_extraction.PolynomialExtender() + feature_extraction.RBFSampler()),
    linear_model.LinearRegression()
)

model

In [29]:
model = compose.Pipeline(
    preprocessing.StandardScaler(),
    compose.TransformerUnion(feature_extraction.PolynomialExtender(),feature_extraction.RBFSampler()),
    linear_model.LinearRegression()
)

model