## Ensemble Learning

#### 1. Voting Classifier.

(a) Load the MNIST dataset, which is a set of 70,000 small images (28x28 pixels)
of digits handwritten. Each image is labeled with the digit it represents.

In [36]:
import numpy as np
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1)
X, y = mnist["data"], mnist["target"]
y = y.astype(np.uint8)

(b) Split it into a training set, a validation set, and a test set (e.g., use first 50,000
instances for training, subsequent 10,000 for validation, and last 10,000 for
testing).

In [37]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=10000, random_state=0)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=10000, random_state=0) # 0.25 x 0.8 = 0.2

(c) Then train various classifiers, such as a Random Forest classifier, an Extra-
Trees classifier, a Softmax Regression.

In [38]:
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

knn = KNeighborsClassifier(n_neighbors=5)
rf = RandomForestClassifier(random_state=0)
et = ExtraTreesClassifier(random_state=0)

knn.fit(X_train, y_train)
rf.fit(X_train, y_train)
et.fit(X_train, y_train)


ExtraTreesClassifier(random_state=0)

I used KNeighborsClassifier since the soft max does not converge in a reasonable amount of time
even if you use a scaler.

(d) Next, try to combine them into an ensemble that outperforms them all on
the validation set, using a soft or hard voting classifier.

In [39]:
from sklearn.ensemble import VotingClassifier
import pandas as pd

voting_clf = VotingClassifier(estimators=[('knn', knn), ('rf', rf), ('et', et)], voting='soft')

#evaluate on the validation set
val_predictions = pd.DataFrame()  # used in exerice 2.
for clf in (knn, rf, et, voting_clf):
  if clf == voting_clf:
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_val)
  else:
    y_pred = clf.predict(X_val)
    val_predictions[clf.__class__.__name__] = y_pred
  print(clf.__class__.__name__, accuracy_score(y_val, y_pred))

KNeighborsClassifier 0.9679
RandomForestClassifier 0.9679
ExtraTreesClassifier 0.9706
VotingClassifier 0.9729


(e) Once you have found one, try it on the test set. How much better does it
perform compared to the individual classifiers?

In [40]:
test_predictions = pd.DataFrame() # used in exercise 2.
for clf in (knn, rf, et, voting_clf):
  if clf == voting_clf:
    y_pred = clf.predict(X_test)
  else:
    y_pred = clf.predict(X_test)
    test_predictions[clf.__class__.__name__] = y_pred
  print(clf.__class__.__name__, accuracy_score(y_pred, y_test))

KNeighborsClassifier 0.9718
RandomForestClassifier 0.9675
ExtraTreesClassifier 0.9681
VotingClassifier 0.9768


#### 2. Stacking

(a) Run the individual classifiers from the previous exercise to make predictions
on the validation set, and create a new training set with the resulting predictions:
each training instance is a vector containing the set of predictions

In [49]:
X_train_blended = val_predictions
y_train_blended = pd.DataFrame(y_val)


(10000, 1)

(b) Train a classifier on this new training set.
You have just trained a blender, and together with the classifiers they form
a stacking ensemble!

In [50]:
blended_rf  = RandomForestClassifier(random_state=0)
blended_rf.fit(X_train_blended, np.ravel(y_train_blended))
# since scikit-learn is expecting a flattened array, it would give a warning without using
# ravel to flatten it.

RandomForestClassifier(random_state=0)

(c) Now let’s evaluate the ensemble on the test set. For each image in the test
set, make predictions with all your classifiers, then feed the predictions to
the blender to get the ensemble’s predictions. How does it compare to the
voting classifier you trained earlier?

In [51]:
X_test_blended = test_predictions
y_test_blended = pd.DataFrame(y_test)
y_pred =blended_rf.predict(X_test_blended)
print(f"Stacked score: {accuracy_score(y_pred, y_test_blended)}")

Stacked score: 0.9708


In this case majority voting performed better compared to blended learning, although all models
are working well.