#### 앙상블(Ensemble) - Voting 방식
- 여러개의 모델 또는 동일 모델과 샘플링 데이터셋으로 병렬학습 진행하는 방식
- Voting 방식/기법
 + 구성 : 동일 데이터셋 + 학습 알고리즘이 다른 모델
 + 결과 도출 : Hard(직접), Soft(간접)

- 유방암 판별 모델 구현 ==> 피쳐 : 30개, 타겟 : 2개(2진분류)

[1] 데이터 준비

In [61]:
# 모듈 로딩
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np

In [62]:
# 데이터 뽐아오기
dataSet=load_breast_cancer(as_frame=True)
print(dataSet.keys())

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])


In [63]:
# # 데이터 확인
# print(f'target_name:{dataSet['target_names']}')

In [64]:
# 피쳐와 타겟 저장
featureDF = dataSet['data']
targetSR = dataSet['target']

In [65]:
featureDF.head(2)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902


In [66]:
targetSR.head(2)

0    0
1    0
Name: target, dtype: int32

[2] 학습용 데이터셋 준비

In [67]:
from sklearn.model_selection import train_test_split

In [68]:
X_train, X_test, y_train, y_test = train_test_split(featureDF,
                                                    targetSR,
                                                    stratify= targetSR,
                                                    random_state=10)

[3] 학습 진행 > 앙상블의 보팅 방식 진행
 - 데이터셋 동일
 - 알고리즘 모델 : KNeighborsClassifier, LogisticRegression, DecisionTreeClassifier

In [69]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier

In [70]:
# 알고리즘 인스턴스 생성
k_model = KNeighborsClassifier()
dt_model = DecisionTreeClassifier(random_state=10)
lr_model = LogisticRegression(solver='liblinear')

In [71]:
# 보딩 인스턴스 생성
v_model = VotingClassifier(estimators=[('k_model', k_model),
                                        ('dt_model', dt_model),
                                        ('lr_model', lr_model)],
                                        voting='hard')

vs_model = VotingClassifier(estimators=[('k_model', k_model),
                                        ('dt_model', dt_model),
                                        ('lr_model', lr_model)],
                                        voting='soft')

In [72]:
# 학습 진행
v_model.fit(X_train.values, y_train.values) # hard(직접)
vs_model.fit(X_train.values, y_train.values) # soft(간접)

In [73]:
# 모델 파라미터 확인
print(f'[ v_model.classes_ ] : {v_model.classes_}')
print(f'[ v_model.estimators_ ] : {v_model.estimators_}개')
print(f'[ v_model.max_features_ ] : {v_model.named_estimators_}개')
print()
print(f'[ v_model.n_features_in_ ] : {v_model.n_features_in_}')
# print(f'[ v_model.feature_names_in_ ] : {v_model.feature_names_in_}') >>> fit할 때 DF으로 전달한 경우, 지금은 array로 전달

[ v_model.classes_ ] : [0 1]
[ v_model.estimators_ ] : [KNeighborsClassifier(), DecisionTreeClassifier(random_state=10), LogisticRegression(solver='liblinear')]개
[ v_model.max_features_ ] : {'k_model': KNeighborsClassifier(), 'dt_model': DecisionTreeClassifier(random_state=10), 'lr_model': LogisticRegression(solver='liblinear')}개

[ v_model.n_features_in_ ] : 30


[4] 성능 확인 ==> trainDS와 validation DS 없어서 trainDS 와 testDS

In [74]:
train_score = v_model.score(X_train.values, y_train.values)
test_score = v_model.score(X_test.values, y_test.values)

soft_train_score = vs_model.score(X_train.values, y_train.values)
soft_test_score = vs_model.score(X_test.values, y_test.values)

In [75]:
print(f'[HARD Voting]train_score : test_score = {train_score} : {test_score}')

print(f'[SOFT Voting]train_score : test_score = {soft_train_score} : {soft_test_score}')

[HARD Voting]train_score : test_score = 0.9741784037558685 : 0.951048951048951
[SOFT Voting]train_score : test_score = 0.9882629107981221 : 0.951048951048951


In [76]:
#### 'Flags' object has no attribute 'c_contiguous'
### => ndarray 관련의 데이터로 변경