## Pipelines
* Pipeline of transforms with a final estimator.
* Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods.
* Usually we have to re apply all the transformations twice, once to train and once to test
* reproducability, clean code, maintains the order of transformation

In [42]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

In [28]:
df = pd.read_csv("data/iris.csv")
df.head()

Unnamed: 0,sepal.length,sepal.width,petal.length,petal.width,variety
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


In [44]:
x = df.drop(["variety", "petal.width"], 1)
y = df["petal.width"]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

In [45]:
scaler = StandardScaler()
lr = LinearRegression()

In [46]:
pipe = Pipeline([('normalizer', scaler), ("regression", lr)])

In [47]:
pipe.fit(x_train,y_train)

Pipeline(memory=None,
         steps=[('normalizer',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('regression',
                 LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
                                  normalize=False))],
         verbose=False)

In [48]:
pipe.predict(x_test)

array([1.80468006, 1.84626861, 1.96384163, 0.16258293, 0.1672353 ,
       0.18792255, 1.36008695, 0.11286992, 1.315206  , 0.28228527,
       1.19741891, 0.0140517 , 1.61225128, 2.01198062, 1.67411613,
       2.15238738, 2.03889426, 1.2297199 , 1.26338086, 1.13050793,
       1.83560389, 1.52000068, 0.22697085, 1.65247986, 1.14326759,
       1.79634153, 0.29157283, 0.26392421, 1.33084712, 1.00150085])