**Table of contents**<a id='toc0_'></a>    
- [Pipeline 을 이용한 Classifier 구성](#toc1_)    
- [Pipeline steps](#toc2_)    
- [set_params](#toc3_)    
- [데이터 분리](#toc4_)    
- [Accuracy Score](#toc5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

**pipeline을 이용하면 scaler -> 데이터 분리 -> clf 의 절차들을 `하나의 명령어`처럼 만들어서 사용할 수 있도록 해줌**

In [2]:
import pandas as pd

In [None]:
red_url = 'https://raw.githubusercontent.com/PinkWink/forML_study_data/refs/heads/main/data/winequality-red.csv'
white_url = 'https://raw.githubusercontent.com/PinkWink/forML_study_data/refs/heads/main/data/winequality-white.csv'

red_wine = pd.read_csv(red_url, sep=';')
white_wine = pd.read_csv(white_url, sep=';')

red_wine['color'] = 1
white_wine['color'] = 0

wine = pd.concat([red_wine, white_wine])

X = wine.drop(['color'], axis=1)
y = wine['color']

# <a id='toc1_'></a>[Pipeline 을 이용한 Classifier 구성](#toc0_)

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler

estimators = [
    ('scaler', StandardScaler()), 
    ('clf', DecisionTreeClassifier())
]

pipe = Pipeline(estimators)

# <a id='toc2_'></a>[Pipeline steps](#toc0_)

In [5]:
pipe.steps

[('scaler', StandardScaler()), ('clf', DecisionTreeClassifier())]

In [7]:
pipe.steps[0]

('scaler', StandardScaler())

In [8]:
pipe.steps[1]

('clf', DecisionTreeClassifier())

In [None]:
pipe[0]

# <a id='toc3_'></a>[set_params](#toc0_)

In [None]:
pipe.set_params(clf__max_depth=2)
pipe.set_params(clf__random_state=13)

# <a id='toc4_'></a>[데이터 분리](#toc0_)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13, stratify=y)
pipe.fit(X_train, y_train)

# <a id='toc5_'></a>[Accuracy Score](#toc0_)

In [15]:
from sklearn.metrics import accuracy_score

y_pred_tr = pipe.predict(X_train)
y_pred_test = pipe.predict(X_test)

accuracy_score(y_train, y_pred_tr), accuracy_score(y_test, y_pred_test)

(0.9657494708485664, 0.9576923076923077)