## Ensemble Learning
If we aggregate
the predictions of a group of predictors (such as classifiers or regressors), you will
often get better predictions than with the best individual predictor. A group of predictors
is called an **ensemble**; thus, this technique is called **Ensemble Learning**, and an
Ensemble Learning algorithm is called an **Ensemble method**.

Such an ensemble (Decision Tree classifiers) of Decision Trees is called a **Random Forest**

Popular Ensemble methods, including **bagging,
boosting, stacking**

## Voting Classifiers

<br>

<img src="images/hard_voting_clf.jpg" width='600' />

Voting classifier often achieves a higher accuracy than the
best classifier in the ensemble.

Suppose you build an ensemble containing 1,000 classifiers that are individually
correct only 51% of the time (barely better than random guessing). If you predict
the majority voted class, you can hope for up to 75% accuracy! However, this is
only true if all classifiers are perfectly independent, making uncorrelated errors,
which is clearly not the case since they are trained on the same data. They are likely to
make the same types of errors, so there will be many majority votes for the wrong
class, reducing the ensemble’s accuracy.
- Ensemble methods work best when the predictors are as independent
from one another as possible. One way to get diverse classifiers
is to train them using very different algorithms. This increases the
chance that they will make very different types of errors, improving
the ensemble’s accuracy.

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [1]:
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier


In [2]:
log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

In [4]:
voting_clf = VotingClassifier(estimators= [('lr', log_clf), ('rf', rnd_clf), ('svm', svm_clf)], 
                              voting='hard')
voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('rf', RandomForestClassifier()), ('svm', SVC())])

In [7]:
# Let’s look at each classifier’s accuracy on the test set

from sklearn.metrics import accuracy_score

for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.888
SVC 0.896
VotingClassifier 0.904


If all classifiers are able to estimate class probabilities (i.e., they have a pre
dict_proba() method), then we can predict the class with the
highest class probability, averaged over all the individual classifiers. This is called **soft
voting.** <br>
It often achieves higher performance than hard voting because it gives more
weight to highly confident votes.


## Bagging and Pasting

One way to get a diverse set of classifiers is to use very different training algorithms. <br>
**Bagging:** Another approach is to use the same training algorithm for every predictor, but to train them on **different random subsets of the training set.**
- When sampling is performed with replacement, this method is called bagging (short for
bootstrap aggregating). <br>
- When sampling is performed without replacement, it is called **pasting.**

**Sample with replacement:** The two sample values are independent. Practically, this means that what we get on the first one doesn't affect what we get on the second. Mathematically, this means that the covariance between the two is zero. <br>
The outcome of the first draw does not affect the probability of the outcome on the second draw.

<br>

When we **sample without replacement**, the items in the sample are dependent because the outcome of one random draw is affected by the previous draw.

<img src="images/bagging_1.png" width='700' />

In [12]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), 
                            n_estimators=500, 
                            max_samples=100,
                            bootstrap=True,
                            n_jobs=-1)

In [13]:
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)

In [14]:
y_pred

array([0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
       1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0,
       0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0,
       1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0], dtype=int64)

The BaggingClassifier automatically performs soft voting
instead of hard voting if the base classifier can estimate class probabilities
(i.e., if it has a predict_proba() method), which is the case
with Decision Trees classifiers.

## Out-of-Bag Evaluation

With bagging, some instances may be sampled several times for any given predictor,
while others may not be sampled at all. <br>
The remaining training instances that are not sampled are called out-of-bag (oob) instances. <br>
Since a predictor never sees the oob instances during training, it can be evaluated on
these instances, without the need for a separate validation set. We can evaluate the
ensemble itself by averaging out the oob evaluations of each predictor.


In [15]:
# automatic oob evaluation after training
bag_clf = BaggingClassifier(DecisionTreeClassifier(),
                            n_estimators=500,
                            bootstrap=True, n_jobs=-1, oob_score=True)

bag_clf.fit(X_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=500,
                  n_jobs=-1, oob_score=True)

In [16]:
bag_clf.oob_score_

0.9013333333333333

In [17]:
# Let’s verify this accuracy of oob eveluation

from sklearn.metrics import accuracy_score
y_pred = bag_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.92

In [18]:
# oob decision function for each training instance is also available through the oob_decision_function_ variable.
bag_clf.oob_decision_function_

array([[0.39411765, 0.60588235],
       [0.39325843, 0.60674157],
       [1.        , 0.        ],
       [0.        , 1.        ],
       [0.00546448, 0.99453552],
       [0.08121827, 0.91878173],
       [0.32967033, 0.67032967],
       [0.015625  , 0.984375  ],
       [0.98895028, 0.01104972],
       [0.96825397, 0.03174603],
       [0.81578947, 0.18421053],
       [0.        , 1.        ],
       [0.79234973, 0.20765027],
       [0.83236994, 0.16763006],
       [0.95212766, 0.04787234],
       [0.05847953, 0.94152047],
       [0.        , 1.        ],
       [0.97752809, 0.02247191],
       [0.94767442, 0.05232558],
       [0.98930481, 0.01069519],
       [0.01932367, 0.98067633],
       [0.28961749, 0.71038251],
       [0.87628866, 0.12371134],
       [1.        , 0.        ],
       [0.94886364, 0.05113636],
       [0.        , 1.        ],
       [1.        , 0.        ],
       [1.        , 0.        ],
       [0.        , 1.        ],
       [0.61340206, 0.38659794],
       [0.

## Random Patches and Random Subspaces
BaggingClassifier class supports sampling the features as well.This is controlled
by two hyperparameters:
- max_features 
- bootstrap_features.

Thus, each predictor will be trained on a random subset of the input features.
This is particularly useful when you are dealing with high-dimensional inputs (such
as images). <br>
Sampling both training instances and features is called the **Random
Patches method**. <br>
Keeping all training instances (i.e., bootstrap=False and max_sam
ples=1.0) but sampling features (i.e., bootstrap_features=True and/or max_fea
tures smaller than 1.0) is called the **Random Subspaces method**.