# Pipeline

Klasa Pipeline jest klasą ułatwiającą tworzenie modeli w których, przed fazą uczenia, występuje faza transformacji zmiennych.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html

In [1]:
import numpy as np
import pandas as pd

In [2]:
from sklearn.preprocessing import PolynomialFeatures

In [3]:
true_fun = lambda X: np.cos(1.5 * np.pi * X)
n_samples=20
X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1
X=np.vstack(X)

In [4]:
poly = PolynomialFeatures(5)

In [5]:
X_poly = pd.DataFrame(poly.fit_transform(X))
X_poly.head()

Unnamed: 0,0,1,2,3,4,5
0,1.0,0.058542,0.003427,0.000201,1.2e-05,6.876268e-07
1,1.0,0.069404,0.004817,0.000334,2.3e-05,1.61031e-06
2,1.0,0.137382,0.018874,0.002593,0.000356,4.893757e-05
3,1.0,0.248296,0.061651,0.015308,0.003801,0.0009437236
4,1.0,0.251581,0.063293,0.015923,0.004006,0.001007827


In [6]:
from sklearn.linear_model import LinearRegression

In [7]:
reg = LinearRegression().fit(X_poly, y)

In [8]:
reg.score(X_poly, y)

0.9875129528191277

albo można tak:

In [9]:
from sklearn.pipeline import make_pipeline

In [10]:
model = make_pipeline(PolynomialFeatures(5), LinearRegression())

Funkcja make pipeline zwraca obiekt klasy Pipeline

In [11]:
type(model)

sklearn.pipeline.Pipeline

In [12]:
model.fit(X, y)
model.score(X, y)

0.9875129528191277

## Zadanie

Utwórz pipeline składający się z:
 - Wyliczenie wielomianu 5 stopnia
 - Standaryzacji danych
 - Modelu liniowego z regularyzacją L2 o parametrze alfa = 100
 
Podziel zbiór 'boston' na część trenującą i testową, następnie naucz model na zestawie trenującym i sprawdź jego działanie na części trenującej i testowej, jako miary użyj R^2.

In [13]:
from sklearn.datasets import load_boston

In [14]:
data = load_boston()
X = data["data"]
X = pd.DataFrame(X)
X.columns = data["feature_names"]
y = data["target"]

In [15]:
X.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [31]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PowerTransformer
from sklearn.linear_model import Ridge
from sklearn import metrics
np.random.seed(10)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = make_pipeline(PolynomialFeatures(5), StandardScaler(), Ridge(100))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(model.score(X_train,y_train))
print(model.score(X_test,y_test))
print(metrics.mean_squared_error(y_train,model.predict(X_train)))
print(metrics.mean_squared_error(y_test,model.predict(X_test)))


0.942841686960729
0.827958094558203
4.398952305779598
17.992270295394665
