# Practice 10A: Ensemble Learning
Ref: Aurélien Géron. "Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow"

In [12]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression 
from sklearn.svm import LinearSVC
from sklearn.neural_network import MLPClassifier

# (1) Voting Classifier.

### (a) Load the MNIST dataset, 
which is a set of 70,000 small images (28x28 pixels) of digits handwritten. Each image is labeled with the digit it represents.

In [2]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist["data"], mnist["target"]
y = y.astype(np.uint8)

### (b) Split the dataset into a training set, a validation set, and a test set 
(e.g., use first 50,000 instances for training, subsequent 10,000 for validation, and last 10,000 for testing).

In [3]:
X_train_val, X_test, y_train_val, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

In [4]:
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=10000, random_state=42) 

### (c) Then train various classifiers, such as a Random Forest classifier, an Extra- Trees classifier, a Softmax Regression.

In [6]:
#Random forest
rnd_forest_clf = RandomForestClassifier(n_estimators=10, random_state=42)
rnd_forest_clf.fit(X_train, y_train)

RandomForestClassifier(n_estimators=10, random_state=42)

In [7]:
#Extra Tree
# we might increase the number of trees
ext_clf = ExtraTreesClassifier(n_estimators=10, random_state=42)
ext_clf.fit(X_train, y_train)


ExtraTreesClassifier(n_estimators=10, random_state=42)

In [8]:
## SoftMax
## consider scaling the features
softmax_clf = LogisticRegression(multi_class="multinomial",solver="lbfgs", C=10)
softmax_clf.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression(C=10, multi_class='multinomial')

In [13]:
# Support Vector Classifier
svm_clf = LinearSVC(random_state=42)
svm_clf.fit(X_train, y_train)



LinearSVC(random_state=42)

In [14]:
mlp_clf = MLPClassifier(random_state=42)
mlp_clf.fit(X_train, y_train)

MLPClassifier(random_state=42)

In [15]:
estimators = [rnd_forest_clf, ext_clf, softmax_clf, svm_clf, mlp_clf]
for estimator in estimators:
  print(estimator.score(X_val, y_val))

0.9446
0.9507
0.9219
0.8639
0.9621


The linear SVM is far outperformed by the other classifiers. However, let's keep it for now since it may improve the voting classifier's performance.

### (d) Next, try to combine them into an ensemble that outperforms them all on the validation set, using a soft or hard voting classifier.

In [16]:
from sklearn.ensemble import VotingClassifier

named_estimators = [
    ("random_forest_clf", rnd_forest_clf),
    ("extra_trees_clf", ext_clf),
    ("softmax_clf", softmax_clf),
    ("svm_clf", svm_clf),
    ("mlp_clf", mlp_clf),
]

In [17]:
voting_clf = VotingClassifier(named_estimators)

In [18]:
voting_clf.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


VotingClassifier(estimators=[('random_forest_clf',
                              RandomForestClassifier(n_estimators=10,
                                                     random_state=42)),
                             ('extra_trees_clf',
                              ExtraTreesClassifier(n_estimators=10,
                                                   random_state=42)),
                             ('softmax_clf',
                              LogisticRegression(C=10,
                                                 multi_class='multinomial')),
                             ('svm_clf', LinearSVC(random_state=42)),
                             ('mlp_clf', MLPClassifier(random_state=42))])

In [19]:
voting_clf.score(X_val, y_val)

0.9608

The Hard Voting classifier is performing as the best classifier in the ensamble. 

In [20]:
voting_clf.set_params(svm_clf=None)
voting_clf.score(X_val, y_val)

0.9608

Excluding the SVM for the Ensamble, does not change the performances of the model.

In [25]:
voting_clf.set_params(softmax_clf=None)
voting_clf.set_params(svm_clf=None)
voting_clf.estimators_

[('random_forest_clf',
  RandomForestClassifier(n_estimators=10, random_state=42)),
 ('extra_trees_clf', ExtraTreesClassifier(n_estimators=10, random_state=42)),
 ('softmax_clf', None),
 ('svm_clf', None),
 ('mlp_clf', MLPClassifier(random_state=42))]

In [28]:
del voting_clf.estimators_[2]
del voting_clf.estimators_[3]
voting_clf.estimators_

[RandomForestClassifier(n_estimators=10, random_state=42),
 ExtraTreesClassifier(n_estimators=10, random_state=42),
 MLPClassifier(random_state=42)]

In [29]:
voting_clf.voting = "soft"
voting_clf.score(X_val, y_val)

0.9718

That's a significant improvement, and it's much better than each of the individual classifiers.

### (e) Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers?

In [30]:
voting_clf.score(X_test, y_test)

0.971

In [33]:
for estimator in voting_clf.estimators_:
  print(estimator.score(X_test, y_test))

0.9432
0.9467
0.9626


The voting classifier reduced the error rate of each single model, and of the hard voting classifier. 

# (2)Stacking.

### (a) Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image’s class.


In [34]:
X_val_predictions = np.empty((len(X_val), len(estimators)), dtype=np.float32)

for index, estimator in enumerate(estimators):
    X_val_predictions[:, index] = estimator.predict(X_val)

In [35]:
X_val_predictions

array([[7., 7., 7., 7., 7.],
       [3., 3., 3., 3., 3.],
       [8., 8., 8., 8., 8.],
       ...,
       [9., 9., 9., 9., 9.],
       [8., 8., 8., 8., 8.],
       [2., 3., 3., 8., 1.]], dtype=float32)

### (b) Train a classifier on this new training set. You have just trained a blender, and together with the classifiers they form a stacking ensemble!

In [36]:
rnd_forest_blender = RandomForestClassifier(n_estimators=200, oob_score=True, random_state=42)
rnd_forest_blender.fit(X_val_predictions, y_val)

RandomForestClassifier(n_estimators=200, oob_score=True, random_state=42)

In [37]:
rnd_forest_blender.oob_score_

0.963

We might fine-tune this blender or try other types of blenders, then select the best one using cross-validation, as always.

### (c) Now let’s evaluate the ensemble on the test set. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get the ensemble’s predictions. How does it compare to the voting classifier you trained earlier?

In [38]:
X_test_predictions = np.empty((len(X_test), len(estimators)), dtype=np.float32)

for index, estimator in enumerate(estimators):
    X_test_predictions[:, index] = estimator.predict(X_test)

In [39]:
y_pred = rnd_forest_blender.predict(X_test_predictions)

In [40]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.9607

This stacking ensemble does not perform as well as the soft voting classifier we trained earlier, it's just as good as the best individual classifier.