<a href="https://colab.research.google.com/github/jiminmini/mini/blob/main/9_8_%ED%95%84%EC%82%AC%EA%B3%BC%EC%A0%9C.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**[개념 정리]**

##**[투표기반 분류기]**

- 직접 투표: 직접튜표 분류기

##**[배깅과 페이스팅]**

- 배깅: 훈련 세트에서 중복 허용하여 샘플링 (반복)

- 페이스팅: 중복 허용하지 않고 샘플링

- 수집 함수: 분류일 때는 통계적 최빈값, 회귀는 평균 계산

###**[사이킷런의 배깅과 페이스팅]**

- 사이킷런: 간편한 API로 구성된 BaggingClassifier

- 일반적으로 배깅을 더 선호

##**[oob 평가]**

- oob_score=True로 지정> 훈련이 끝난 후 자동으로 oob 평가 수행

##**[엑스트라 트리]**

- 익스트릠 랜덤 트리: 극단적으로 무작위한 트리의 랜덤 포레스트

#**[코드 필사]**

In [6]:
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

x,y=make_moons(n_samples=100, noise=0.15)
polynomial_svm_clf=Pipeline([
    ("poly_features", PolynomialFeatures(degree=3)),
    ("scaler",StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
])
polynomial_svm_clf.fit(x,y)

In [7]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42
)

log_clf=LogisticRegression()
rnd_clf=RandomForestClassifier()
svm_clf=SVC()


voting_clf=VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')
voting_clf.fit(x_train,y_train)

In [11]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(x_train, y_train)
    y_pred=clf.predict(x_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.95
RandomForestClassifier 0.95
SVC 0.95
VotingClassifier 0.95


In [17]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf=BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=80, bootstrap=True, n_jobs=-1)
bag_clf.fit(x_train, y_train)
y_pred=bag_clf.predict(x_test)

In [18]:
bag_clf= BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    bootstrap=True, n_jobs=-1, oob_score=True)
bag_clf.fit(x_train, y_train)
bag_clf.oob_score_

0.9625

In [19]:
from sklearn.metrics import accuracy_score
y_pred=bag_clf.predict(x_test)
accuracy_score(y_test, y_pred)

0.95

In [20]:
bag_clf.oob_decision_function_

array([[0.01142857, 0.98857143],
       [0.12121212, 0.87878788],
       [1.        , 0.        ],
       [0.63020833, 0.36979167],
       [0.02564103, 0.97435897],
       [1.        , 0.        ],
       [0.        , 1.        ],
       [0.96907216, 0.03092784],
       [0.00537634, 0.99462366],
       [0.6284153 , 0.3715847 ],
       [0.30939227, 0.69060773],
       [0.03664921, 0.96335079],
       [1.        , 0.        ],
       [1.        , 0.        ],
       [0.00483092, 0.99516908],
       [0.80628272, 0.19371728],
       [0.85964912, 0.14035088],
       [0.03867403, 0.96132597],
       [0.04347826, 0.95652174],
       [0.79661017, 0.20338983],
       [0.00636943, 0.99363057],
       [0.05670103, 0.94329897],
       [0.44919786, 0.55080214],
       [0.0060241 , 0.9939759 ],
       [0.        , 1.        ],
       [0.96648045, 0.03351955],
       [0.11173184, 0.88826816],
       [0.        , 1.        ],
       [0.96703297, 0.03296703],
       [1.        , 0.        ],
       [1.

In [21]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf=RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(x_train, y_train)

y_pred_rf=rnd_clf.predict(x_test)

In [22]:
bag_clf=BaggingClassifier(
    DecisionTreeClassifier(max_features="auto", max_leaf_nodes=16),
    n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1)

In [23]:
from sklearn.datasets import load_iris
iris=load_iris()
rnd_clf=RandomForestClassifier(n_estimators=500, n_jobs=-1)
rnd_clf.fit(iris["data"], iris["target"])
for name, score in zip(iris["feature_names"], rnd_clf.feature_importances_):
    print(name, score)

sepal length (cm) 0.08991125063655515
sepal width (cm) 0.02218112465052495
petal length (cm) 0.44982232294354024
petal width (cm) 0.43808530176937965
