<span style="font-size:36px"><b>Advanced Machine Learning</b></span>

Copyright 2019 Gunawan Lumban Gaol

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language overning permissions and limitations under the License.

# Import Packages

In [12]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from mlxtend.classifier import StackingClassifier

# Import Data

Load iris datasets from scikit-learn datasets packages.

In [13]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

# Create Stacking Classifier

## 1st Layer

* knn
* decision tree
* logistic regression

ALl using default hyperparameters.

In [35]:
clf_knn = KNeighborsClassifier(10)
clf_dt = DecisionTreeClassifier(max_depth=2)
clf_lr = LogisticRegression(random_state=42, solver='lbfgs', multi_class='multinomial')

## 2nd Layer

* Logistic Regression as meta classifier, using probability from previous layer.

In [36]:
clf_meta = LogisticRegression(random_state=42, solver='lbfgs', multi_class='multinomial')

In [37]:
clf_stack = StackingClassifier(
    classifiers=[clf_knn, clf_dt, clf_lr],
    meta_classifier=clf_meta,
    use_probas=True,
    use_features_in_secondary=False)
clf_stack.fit(X_train, y_train)

StackingClassifier(average_probas=False,
                   classifiers=[KNeighborsClassifier(algorithm='auto',
                                                     leaf_size=30,
                                                     metric='minkowski',
                                                     metric_params=None,
                                                     n_jobs=None,
                                                     n_neighbors=10, p=2,
                                                     weights='uniform'),
                                DecisionTreeClassifier(class_weight=None,
                                                       criterion='gini',
                                                       max_depth=2,
                                                       max_features=None,
                                                       max_leaf_nodes=None,
                                                       min_impurity_decrease=0.0,
               

# Evaluation

In [38]:
train_preds = clf_stack.predict(X_train)
test_preds = clf_stack.predict(X_test)

In [39]:
from sklearn.metrics import roc_auc_score, classification_report

# roc_auc_train = roc_auc_score(y_train, train_preds)
# roc_auc_test = roc_auc_score(y_test, test_preds)

print(classification_report(y_train, train_preds))
print()
print(classification_report(y_test, test_preds))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        27
           1       0.94      0.94      0.94        31
           2       0.94      0.94      0.94        32

    accuracy                           0.96        90
   macro avg       0.96      0.96      0.96        90
weighted avg       0.96      0.96      0.96        90


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        23
           1       0.95      1.00      0.97        19
           2       1.00      0.94      0.97        18

    accuracy                           0.98        60
   macro avg       0.98      0.98      0.98        60
weighted avg       0.98      0.98      0.98        60

