# Chapter 7. Ensemble Learning and Random Forests

A group of predictors is called an *ensemble*; thus, this technique is called *ensemble learning*, and an ensemble
learning algorithm is called an *ensemble method*.

Such an ensemble of decision trees is called a *random forest*, and despite its simplicity, this is one of the most powerful machine learning algorithms available today.
In this chapter we will examine the most popular ensemble methods, including:
1. voting classifiers,
2. bagging and pasting ensembles,
3. random forests,
4. boosting
5. stacking ensembles.

#3 Voting Classifiers
Similarly, suppose you build an ensemble containing 1,000 classifiers that are
individually correct only 51% of the time (barely better than random
guessing). If you predict the majority voted class, you can hope for up to 75%
accuracy! However, this is only true if all classifiers are perfectly
independent, making uncorrelated errors, which is clearly not the case
because they are trained on the same data. They are likely to make the same
types of errors, so there will be many majority votes for the wrong class,
reducing the ensemble’s accuracy.

**Ensemble methods work best when the predictors are as independent from one another as
possible.**

In [1]:
from sklearn.datasets import make_moons
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

In [2]:
X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [3]:
voting_clf = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression(random_state=42)),
        ('rf', RandomForestClassifier(random_state=42)),
        ('svc', SVC(random_state=42))
    ]
)

In [4]:
voting_clf.fit(X_train, y_train)

In [5]:
for name, clf in voting_clf.named_estimators_.items():
    print(name, "=", clf.score(X_test, y_test))

lr = 0.864
rf = 0.896
svc = 0.896


In [6]:
voting_clf.predict(X_test[:1])

array([1], dtype=int64)

the voting classifier predicts class 1 for the first instance
of the test set, because two out of three classifiers predict that class

In [7]:
[clf.predict(X_test[:1]) for clf in voting_clf.estimators_]

[array([1], dtype=int64), array([1], dtype=int64), array([0], dtype=int64)]

performance of the voting classifier on the test set

In [9]:
voting_clf.score(X_test, y_test)

0.912

If all classifiers are able to estimate class probabilities (i.e., if they all have a
predict_proba() method), then you can tell Scikit-Learn to predict the class
with the highest class probability, averaged over all the individual classifiers.
This is called **soft voting**. *It often achieves higher performance than hard
voting because it gives more weight to highly confident votes.*

In [10]:
voting_clf.voting = "soft"
voting_clf.named_estimators["svc"].probability = True
voting_clf.fit(X_train, y_train)
voting_clf.score(X_test, y_test)

0.92