**<font color='blue'>Q1: If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?</font>**

Yes! You can try <font color='crimson'>combining them into a voting ensemble</font>, which will often give you even better results. It works better if the models are very different (e.g., SVM, DT, LR, and so on).

It is even better if they are trained on different training instances (that’s the whole point of bagging and pasting ensembles), but if not this will still be effective as long as the models are very different.

**<font color='blue'>Q2: What is the difference between hard and soft voting classifiers?</font>**

- Hard voting classifiers just counts the votes of each classifier in the ensemble and picks the class that gets the most votes.


- Soft voting classifiers computes the average estimated class probability for each class and picks the class with the highest probability.

**<font color='crimson'>Soft voting classifiers gives high-confidence votes more weight and often performs better</font>**, but it works only if every classifier is able to estimate class probabilities (e.g., for the SVM classifiers in Scikit-Learn you must set `probability=True`).



**<font color='blue'>Q3: Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, Random Forests, or stacking ensembles?</font>**

It is <font color='crimson'>quite possible to speed up training of a bagging ensemble by distributing it across multiple servers</font>, since each predictor in the ensemble is independent of the others. The same goes <font color='crimson'>for pasting ensembles and Random Forests, for the same reason</font>.

However, each predictor in a <font color='crimson'>boosting ensemble</font> is built based on the previous predictor, so training is necessarily sequential, and you will not <font color='crimson'>gain anything by distributing training across multiple servers</font>. 

Regarding <font color='crimson'>stacking ensembles</font>, all the predictors in a given layer are independent of each other, so they can be trained in parallel on multiple servers. However, the predictors in one layer can only be trained after the predictors in the previous layer have all been trained.

**<font color='blue'>Q4: What is the benefit of out-of-bag evaluation?</font>**

With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on (they were held out).

This makes it possible to **have a fairly unbiased evaluation of the ensemble without the need for an additional validation set**. Thus, you have **more instances available for training, and your ensemble can perform slightly better**.

**<font color='blue'>Q5: What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests?</font>**

When you are growing a tree in a Random Forest, only a random subset of the features is considered for splitting at each node. This is true as well for Extra-Trees, but they go one step further: <font color='crimson'>rather than searching for the best possible thresholds, they use random thresholds for each feature</font>.

**<font color='crimson'>Extra randomness acts like a form of regularization: if a Random Forest overfits the training data, Extra-Trees might perform better.</font>** Moreover, since Extra-Trees don’t search for the best possible thresholds, they are <font color='crimson'>much faster to train than Random Forests</font>. However, they are <font color='crimson'>neither faster nor slower than Random Forests when making predictions</font>.

**<font color='blue'>Q6: If your AdaBoost ensemble underfits the training data, which hyperparameters should you tweak and how?</font>**

- increase the number of estimators


- reduce the regularization hyperparameters of the base estimator


- try slightly increasing the learning rate

**<font color='blue'>Q7: If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate?</font>**

<font color='crimson'>If the Gradient Boosting ensemble overfits the training set, you should to decrease the learning rate.</font>

You could also use early stopping to find the right number of predictors (you probably have too many).

**<font color='blue'>Q8: Voting classifier</font>**

1. Load the MNIST data, and split it into a training set, a validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation, and 10,000 for testing).

2. Then train various classifiers, such as a Random Forest classifier, an Extra-Trees classifier, and an SVM classifier.

3. Next, try to combine them into an ensemble that outperforms each individual classifier on the validation set, using soft or hard voting. Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers?

In [1]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

np.random.seed(42)

In [2]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

mnist = fetch_openml('mnist_784', version=1)
mnist.target = mnist.target.astype(np.uint8)

x_train, x_test, y_train, y_test = train_test_split(
    mnist['data'], mnist['target'], test_size=10000, random_state=42)
x_train, x_val, y_train, y_val = train_test_split(
    x_train, y_train, test_size=10000, random_state=42)

x_train.shape, x_val.shape, x_test.shape

((50000, 784), (10000, 784), (10000, 784))

In [3]:
from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.neural_network import MLPClassifier

In [4]:
# Train various classifiers
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
extra_tree_clf = ExtraTreesClassifier(n_estimators=100, random_state=42)
linear_svc = LinearSVC(random_state=42)
mlp_clf = MLPClassifier(random_state=42)

estimators = [rf_clf, extra_tree_clf, linear_svc, mlp_clf]

for estimator in estimators:
    print('Train the', estimator)
    estimator.fit(x_train, y_train)

Train the RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=42, verbose=0,
                       warm_start=False)
Train the ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=None, max_features='auto',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=100,
                     n_



Train the MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(100,), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=42, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)


In [5]:
[estimator.score(x_val, y_val) for estimator in estimators]

[0.9692, 0.9715, 0.8641, 0.963]

The linear SVM is far outperformed by the other classifiers.

In [6]:
from sklearn.ensemble import VotingClassifier

<font color='crimson'> A hard voting classifiers:</font>

In [7]:
# List of (str, estimator) tuples
named_estimators = [
    ('rf_clf', rf_clf),
    ('extra_tree_clf', extra_tree_clf),
    ('linear_svc', linear_svc),
    ('mlp_clf', mlp_clf)
]
voting_clf = VotingClassifier(estimators=named_estimators, voting='hard', n_jobs=-1)
voting_clf.fit(x_train, y_train)

VotingClassifier(estimators=[('rf_clf',
                              RandomForestClassifier(bootstrap=True,
                                                     ccp_alpha=0.0,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     max_samples=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
    

In [8]:
voting_clf.score(x_val, y_val)

0.9718

In [9]:
# estimators_: The collection of fitted sub-estimators as defined in estimators that are not ‘drop’.
[estimator.score(x_val, y_val) for estimator in voting_clf.estimators_]

[0.9692, 0.9715, 0.8641, 0.9663]

**Remove the SVM to see if the performance imporves.**

In [10]:
voting_clf.set_params()

VotingClassifier(estimators=[('rf_clf',
                              RandomForestClassifier(bootstrap=True,
                                                     ccp_alpha=0.0,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     max_samples=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
    

Using `None` to drop an estimator is deprecated in 0.22 (sklearn) and support will be dropped in 0.24. Use the string `'drop'` instead.

In [11]:
# An estimator can be set to `'drop'` using `set_params`.
voting_clf.set_params(linear_svc='drop')
# voting_clf.set_params(linear_svc=None)

VotingClassifier(estimators=[('rf_clf',
                              RandomForestClassifier(bootstrap=True,
                                                     ccp_alpha=0.0,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=None,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     max_samples=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
    

In [12]:
# The updated list of the estimators
voting_clf.estimators

[('rf_clf',
  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                         criterion='gini', max_depth=None, max_features='auto',
                         max_leaf_nodes=None, max_samples=None,
                         min_impurity_decrease=0.0, min_impurity_split=None,
                         min_samples_leaf=1, min_samples_split=2,
                         min_weight_fraction_leaf=0.0, n_estimators=100,
                         n_jobs=None, oob_score=False, random_state=42, verbose=0,
                         warm_start=False)),
 ('extra_tree_clf',
  ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0,

**However, it did not update the list of trained estimators.**

In [13]:
voting_clf.estimators_

[RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                        criterion='gini', max_depth=None, max_features='auto',
                        max_leaf_nodes=None, max_samples=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, n_estimators=100,
                        n_jobs=None, oob_score=False, random_state=42, verbose=0,
                        warm_start=False),
 ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                      criterion='gini', max_depth=None, max_features='auto',
                      max_leaf_nodes=None, max_samples=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=100,
                      n_jobs

**So we can either fit the `VotingClassifier` again, or just remove the SVM from the list of trained estimators:**

In [14]:
# Remove the Linear SVM from the list of train estimators
del voting_clf.estimators_[2]

In [15]:
# Evaluate the Voting Clf again
voting_clf.score(x_val, y_val)

0.9742

A bit better! The SVM was hurting performance.

<font color='crimson'>A soft voting classifier:</font>

Try using a soft voting classifier. We do not actually need to retrain the classifier, we can just set voting to `'soft'`.

In [16]:
voting_clf.voting = 'soft'

In [17]:
voting_clf.estimators

[('rf_clf',
  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                         criterion='gini', max_depth=None, max_features='auto',
                         max_leaf_nodes=None, max_samples=None,
                         min_impurity_decrease=0.0, min_impurity_split=None,
                         min_samples_leaf=1, min_samples_split=2,
                         min_weight_fraction_leaf=0.0, n_estimators=100,
                         n_jobs=None, oob_score=False, random_state=42, verbose=0,
                         warm_start=False)),
 ('extra_tree_clf',
  ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0,

In [18]:
voting_clf.estimators_

[RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                        criterion='gini', max_depth=None, max_features='auto',
                        max_leaf_nodes=None, max_samples=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, n_estimators=100,
                        n_jobs=None, oob_score=False, random_state=42, verbose=0,
                        warm_start=False),
 ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                      criterion='gini', max_depth=None, max_features='auto',
                      max_leaf_nodes=None, max_samples=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=100,
                      n_jobs

In [19]:
voting_clf.score(x_val, y_val)

0.9709

Hard voting wins in this case.

In [20]:
# Evaluate the best one on the best one
voting_clf.voting = 'hard'
voting_clf.score(x_test, y_test)

0.9706

In [21]:
[estimator.score(x_test, y_test) for estimator in voting_clf.estimators_]

[0.9645, 0.9691, 0.9636]

The voting classifier only very slightly reduced the error rate of the best model in this case.

**<font color='blue'>Q9: Stacking ensemble</font>**

- <font color='crimson'>Step 1:</font> Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image’s class.

In [22]:
estimators

[RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                        criterion='gini', max_depth=None, max_features='auto',
                        max_leaf_nodes=None, max_samples=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, n_estimators=100,
                        n_jobs=None, oob_score=False, random_state=42, verbose=0,
                        warm_start=False),
 ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                      criterion='gini', max_depth=None, max_features='auto',
                      max_leaf_nodes=None, max_samples=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=100,
                      n_jobs

In [23]:
x_val_preds = np.empty(shape=(len(x_val), len(estimators)), dtype=np.float32)

for index, estimator in enumerate(estimators):
    x_val_preds[:, index] = estimator.predict(x_val)

In [24]:
x_val_preds

array([[5., 5., 5., 5.],
       [8., 8., 8., 8.],
       [2., 2., 2., 2.],
       ...,
       [7., 7., 7., 7.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]], dtype=float32)

- <font color='crimson'>Step 2:</font> Train a classifier on this new training set. Congratulations, you have just trained a blender, and together with the classifiers it forms a stacking ensemble!

In [25]:
rnd_blender = RandomForestClassifier(n_estimators=200, oob_score=True, random_state=42)
rnd_blender.fit(x_val_preds, y_val)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=200,
                       n_jobs=None, oob_score=True, random_state=42, verbose=0,
                       warm_start=False)

In [26]:
rnd_blender.oob_score_

0.969

- <font color='crimson'>Step 3:</font> Evaluate the ensemble on the test set. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get the ensemble’s predictions. 

In [27]:
x_test_preds = np.empty(shape=(len(x_test), len(estimators)), dtype=np.float32)

for idx, estimator in enumerate(estimators):
    x_test_preds[:, idx] = estimator.predict(x_test)

y_test_pred = rnd_blender.predict(x_test_preds)

In [28]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_test_pred)  # stacking ensemble

0.967

In [29]:
[estimator.score(x_test, y_test) for estimator in estimators]  # individual classifier

[0.9645, 0.9691, 0.8642, 0.9573]

In [30]:
voting_clf.score(x_test, y_test)  # hard voting classifier

0.9706

This stacking ensemble does not perform as well as the voting classifier we trained earlier, it's not quite as good as the best individual classifier.