# Chapter 7: Ensemble Learning and Random Forests

## Exercises

### Q1. If you have trained five different models on the exact same training data and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?

- We can definitely get better results if all the models are combined as long as each model is distinct from each other. If there are same type of models in an ensemble then they are most likely to make the same errors hence the overall result might not improve.

### Q2. What is the difference between hard and soft voting classifers?

- Hard voting classifiers aggregates the predictions by majority of votes.
- While soft voting classifiers considers the prediction probability and picks the class with the highest probability. This gives better results since hihgly confident classes are given more weights. However, not all models support getting prediction probability.

### Q3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, random forests or stacking ensembles?

- In bagging all predictors are trained on instances with replacement. All predictors can be trained across multiple servers since each predictor is independent of others. Similary for pasting as well as Random Forests.
- In boosting methods, each predictor tries to correct the mistakes made by its predecessor. Therefore in this case a next predictor cannot be trained until the previous predictior's predictions has been made. Therefore, they cannot scale and hence cannot be distributed across multiple servers.
- Stacking is just like bagging/pasting with an additional *blending* layer(s). Therefore they can be trained across multiple servers as well, but only per layer. Since the next layer can only be trained after the previous layer's predictions has been made.

### Q4. What is the benefit of out-of-bag evaluation?

- During bagging not all training instances get sampled for training. With some math it is found that only 63 % of training instances are sampled on average for each predictor. So those remaining 37 % instances can actually be used as a validation set to check the training performance. This eliminates the need of having a separate validation set.

### Q5. What makes Extra Trees more random than regular Random Forests? How can this extra randomness help? Are Extra Trees slower or faster than regular random forests?

- During the training of Random Forests, only a random subset of features is considerd for splitting. Extra Trees relies on the principle of randomizing the thresholds fopr each feature rather than finding the best possible thresholds. This significantly increases the performance than regular Random Forests.

### Q6. If your AdaBoost ensemble underfits the training data, what hyperparameters should you tweak and how?

- If AdaBoost underfits the training data, then more *n_estimators* should be added to the ensemble or regularizing the hyperparameters of the base estimator.

### Q7. If your Gradient Boosting ensemble overfits the training data, should you increase or decrease the learning rate?

- The learning rate hyperparameter basically means how much each estimator in the ensemble should contribute. If the gradient boosting is overfitting then it would be ideal to decrease the learning rate. Or techniques like early stopping could be implemented to find the right set of hyperparameters.

### Q8.

#### Load the MNIST data.

- 40,000 - Training
- 10,000 - Validation
- 10,000 - Testing

In [1]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

In [2]:
mnist = fetch_openml('mnist_784')

In [3]:
X_train_val, X_test, y_train_val, y_test = train_test_split(mnist.data,
                                                            mnist.target,
                                                            test_size=10000,
                                                            random_state=42)

X_train, X_val, y_train, y_val = train_test_split(X_train_val,
                                                  y_train_val,
                                                  test_size=10000,
                                                  random_state=42)

In [4]:
print(X_train.shape, y_train.shape)
print(X_val.shape, y_val.shape)
print(X_test.shape, y_test.shape)

(50000, 784) (50000,)
(10000, 784) (10000,)
(10000, 784) (10000,)


#### Train various classifiers such as RandomForest, ExtraTrees and SVM

In [6]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

In [7]:
# training various classifers
rf = RandomForestClassifier(random_state=42)
et = ExtraTreesClassifier(random_state=42)
log_reg = LogisticRegression(random_state=42)

In [None]:
classifiers = [rf, et, log_reg]

for classifier in classifiers:
    print("Training ", classifier)
    classifier.fit(X_train, y_train)

Training  RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators='warn',
                       n_jobs=None, oob_score=False, random_state=42, verbose=0,
                       warm_start=False)




Training  ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
                     max_depth=None, max_features='auto', max_leaf_nodes=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators='warn',
                     n_jobs=None, oob_score=False, random_state=42, verbose=0,
                     warm_start=False)




Training  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=42, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)




In [None]:
# predictor scores
for classifier in classifers:
    print("Classifier :", classifier)
    print(classifier.score(X_val, y_val))



In [None]:
# ensemble
classifiers = [
    ('RandomForest', rf),
    ('ExtraTrees', et),
    ('LogisticRegression', log_reg)
]

In [None]:
from sklearn.ensemble import VotingClassifier

In [None]:
# hard voting ensemble
hard_voting = VotingClassifier(classifiers,
                               voting='hard',
                               random_state=42)

In [None]:
# soft voting ensemble
soft_voting = VotingClassifier(classifiers,
                               voting='soft',
                               random_state=42)

In [None]:
# fit
voting_ensembles = [hard_voting, soft_voting]

for ensemble in voting_ensembles:
    print("Fitting ensemble : ", ensemble)
    ensemble.fit(X_train, y_train)