# Exercises Chapter 7 - Ensemble Learning and Random Forests

1. If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?





As long as the models make errors independently, or errors are not too strongly dependent, there could still be improvements from combining the five models.

As all five models are already very accurate and have been trained on the same data, it is unlikely - but not impossible - that large improvements could be achieved.

2. What is the difference between hard and soft voting classifiers?

*Hard voting*: The prediction of the ensemble is the classification which the majority of models produce.
*Soft voting*: The probabilities for all possible classifications are averaged across all models. The prediction of the ensemble is the class with the highest average probability.

Soft voting has the advantage that predictions with high confidence are automatically weighted higher whereas uncertain classifiers are weighted lower.

3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, Random Forests, or stacking ensembles?

As boosting is sequential, training time cannot be increased by utilizing multiple servers. All other methods train independent learners and therefore training can be distributed.

4. What is the benefit of out-of-bag evaluation?

It is a "free" out of sample evaluation, and can be used to estimate out of sample generalization error rates without an additional hold-out sample.

5. What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests?

Extra trees use a random threshold for each feature at every node, rather than searching for the best threshold for each feature. The extra randomness creates a more diverse set of weak learners. Training of Extra Trees is much faster than Random Forests as the search for optimal thresholds is the most time-consuming part of training trees.

6. If your AdaBoost ensemble underfits the training data, which hyperparameters  should you tweak and how?

Underfitting indicates that the model capacity is insufficient to capture the data dynamics well. I would increase the capacity of the models (e.g. for a tree based method increase the depth of each tree), increase the number of learners, and decrease the learning rate. 

7. If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate?

Overfitting implies that more regularization is needed. Hence, the capacity of the learners should be decreased, fewer models trained, and the learning rate should be increased.

8. Load the MNIST data (introduced in Chapter 3), and split it into a training set, a validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation, and 10,000 for testing). Then train various classifiers, such as a Random Forest classifier, an Extra-Trees classifier, and an SVM classifier. Next, try to combine them into an ensemble that outperforms each individual classifier on the validation set, using soft or hard voting. Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers?

In [34]:
import numpy as np

from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

In [4]:
mnist = fetch_openml('mnist_784', version=1)

In [14]:
X = mnist.data / 256.
y = mnist.target.astype(np.uint8)

In [21]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=10000)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=10000)

In [37]:
rf = RandomForestClassifier(n_jobs = -1)
_ = rf.fit(X_train,y_train)

In [37]:
y_pred = rf.predict(X_val)
print('F1 Score: {:4.4f}'.format(f1_score(y_pred, y_val, average='macro')))

F1 Score: 0.9660


In [38]:
et = ExtraTreesClassifier(n_jobs = -1)
_ = et.fit(X_train, y_train)

In [39]:
y_pred = et.predict(X_val)
print('F1 Score: {:4.4f}'.format(f1_score(y_pred, y_val, average='macro')))

F1 Score: 0.9694


In [41]:
svm = SVC(probability=True)
_ = svm.fit(X_train, y_train)

In [42]:
y_pred = svm.predict(X_val)
print('F1 Score: {:4.4f}'.format(f1_score(y_pred, y_val, average='macro')))

F1 Score: 0.9783


In [43]:
clf = VotingClassifier(
    estimators=[('rf', rf),('et', et),('svm', svm)],
    voting='hard', n_jobs=-1)
_ = clf.fit(X_train, y_train)

VotingClassifier(estimators=[('rf',
                              RandomForestClassifier(bootstrap=True,
                                                     ccp_alpha=0.0,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     max_samples=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
        

In [53]:
y_pred = clf.predict(X_val)
print('F1 Score: {:4.4f}'.format(f1_score(y_pred, y_val, average='macro')))

F1 Score: 0.9733


In [45]:
clf.voting = 'soft'

VotingClassifier(estimators=[('rf',
                              RandomForestClassifier(bootstrap=True,
                                                     ccp_alpha=0.0,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     max_samples=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
        

In [51]:
y_pred = clf.predict(X_val)
print('F1 Score: {:4.4f}'.format(f1_score(y_pred, y_val, average='macro')))

F1 Score: 0.9789


In [52]:
clf.voting = 'hard'

9. Run the individual classifiers from the previous exercise to make predictions on
the validation set, and create a new training set with the resulting predictions:
each training instance is a vector containing the set of predictions from all your
classifiers for an image, and the target is the image’s class. Train a classifier on
this new training set. Congratulations, you have just trained a blender, and
together with the classifiers it forms a stacking ensemble! Now evaluate the
ensemble on the test set. For each image in the test set, make predictions with all
your classifiers, then feed the predictions to the blender to get the ensemble’s pre‐
dictions. How does it compare to the voting classifier you trained earlier?