[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/justmarkham/scikit-learn-tips/master?filepath=notebooks%2F08_pipeline.ipynb)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/justmarkham/scikit-learn-tips/blob/master/notebooks/08_pipeline.ipynb)

# ðŸ¤–âš¡ scikit-learn tip #8 ([video](https://www.youtube.com/watch?v=1Y6O9nCo0-I&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=8))

Q: What does "pipeline" do?

A: Chains together multiple steps: output of each step is used as input to the next step.

Makes it easy to apply the same preprocessing to train and test!

See example ðŸ‘‡

In [1]:
import pandas as pd
import numpy as np

In [2]:
train = pd.DataFrame({'feat1':[10, 20, np.nan, 2], 'feat2':[25., 20, 5, 3], 'label':['A', 'A', 'B', 'B']})
test = pd.DataFrame({'feat1':[30., 5, 15], 'feat2':[12, 10, np.nan]})

In [3]:
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

In [4]:
imputer = SimpleImputer()
clf = LogisticRegression()

In [5]:
# 2-step pipeline: impute missing values, then pass the results to the classifier
pipe = make_pipeline(imputer, clf)

In [6]:
train

Unnamed: 0,feat1,feat2,label
0,10.0,25.0,A
1,20.0,20.0,A
2,,5.0,B
3,2.0,3.0,B


In [7]:
test

Unnamed: 0,feat1,feat2
0,30.0,12.0
1,5.0,10.0
2,15.0,


In [8]:
features = ['feat1', 'feat2']

In [9]:
X, y = train[features], train['label']
X_new = test[features]

In [10]:
# pipeline applies the imputer to X before fitting the classifier
pipe.fit(X, y)

# pipeline applies the imputer to X_new before making predictions
# note: pipeline uses imputation values learned during the "fit" step
pipe.predict(X_new)

array(['A', 'B', 'A'], dtype=object)

### Want more tips? [View all tips on GitHub](https://github.com/justmarkham/scikit-learn-tips) or [Sign up to receive 2 tips by email every week](https://scikit-learn.tips) ðŸ’Œ

Â© 2020 [Data School](https://www.dataschool.io). All rights reserved.