## Stacking Classifier
- Base: KNN, Random Forest, Gaussian NB
- Meta: Logistic Regression
- data: Iris (3-folds validation)

In [4]:
import numpy as np
import warnings
from mlxtend.classifier import StackingClassifier
from sklearn import datasets, model_selection
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
## Gaussian Naive Bayes: classification, 가우시안 분포 사용
from sklearn.naive_bayes import GaussianNB 
from sklearn.neighbors import KNeighborsClassifier

In [5]:
warnings.filterwarnings('ignore')

In [6]:
iris = datasets.load_iris()
X, Y = iris.data[:, 1:3], iris.target

In [21]:
print(list(iris.feature_names))
print(iris.data[0:5])

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


In [22]:
print(list(iris.target_names))
print(list(np.unique(iris.target)))

['setosa', 'versicolor', 'virginica']
[0, 1, 2]


In [8]:
print(X.shape, Y.shape)

(150, 2) (150,)


In [23]:
## base learners
clf1 = KNeighborsClassifier(n_neighbors = 1)
clf2 = RandomForestClassifier(random_state = 1)
clf3 = GaussianNB()

## meta learner
lr = LogisticRegression()

## stacking 
sclf = StackingClassifier(classifiers = [clf1, clf2, clf3],
                         use_probas = True,
                         meta_classifier = lr)

In [25]:
for clf, label in zip([clf1, clf2, clf3, sclf],
                     ['KNN', 'RF', 'GNB', 'Stacking']):
    scores = model_selection.cross_val_score(clf, X, Y,
                                            cv = 3, scoring='f1_macro')
    print('%s scores: %0.2f (+/- %0.2f)'
         % (label, scores.mean(), scores.std()))

KNN scores: 0.91 (+/- 0.01)
RF scores: 0.93 (+/- 0.05)
GNB scores: 0.92 (+/- 0.03)
Stacking scores: 0.94 (+/- 0.03)


가장 높은 score를 받은 모델은 Stacking Classifier로, 0.94+/- 0.03의 정확도를 보인다.

<br>

***

#### Reference
[Stacking Classifier](https://www.youtube.com/watch?v=sBrQnqwMpvA) by Bhavesh Bhatt