# Ensemble Learning and Random Forests

In [3]:
from pathlib import Path

IMAGES_PATH = Path() / "images" / "ensembles"
IMAGES_PATH.mkdir(parents=True, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = IMAGES_PATH / f"{fig_id}.{fig_extension}"
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

## Voting Classifiers

**Voting Classifiers** consist of techniques to improve the performance of models, making them perform better together than individually. In general terms, it can be divided into _Hard Voting_ and _Soft Voting_.

**Hard Voting** aggregates the predictions of the models, and the model that has the most votes is the _ensemble's prediction_.


<div class="alert alert-block alert-info">
<b>Note:</b> Even if the models are <i>weak learners</i> (slightly better than random guessing) the ensemble can still be a <i>strong learner</i> (having high accuracy) if there are enough <i>weak learners</i>.
</div>


It is possible to use `VotingClassifier` from Scikit-Learn. When using this class, it **clones the estimators and fits the clones**. It is possible to see the attributes of the estimators via `estimators_`, `named_estimators` or `named_estimators_`.

In [6]:
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import make_moons
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

voting_clf = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression(random_state=42)),
        ('rf', RandomForestClassifier(random_state=42)),
        ('svc', SVC(random_state=42))
    ]
)
voting_clf.fit(X_train, y_train)

The individual performance of the classifiers is:

In [8]:
for name, clf in voting_clf.named_estimators_.items():
    print(name, "=", clf.score(X_test, y_test))

lr = 0.864
rf = 0.896
svc = 0.896


In [9]:
voting_clf.predict(X_test[:1])

array([1], dtype=int64)

And the voting performance is:

In [11]:
voting_clf.score(X_test, y_test)

0.912

If the classifiers can estimate the probabilities of the classes (`predict_proba()`) they can predict with greater weight for the most confident voters, performing **Soft Voting**.

In [13]:
voting_clf.voting = "soft"
voting_clf.named_estimators["svc"].probability = True
voting_clf.fit(X_train, y_train)
voting_clf.score(X_test, y_test)

0.92

## Bagging and Pasting

Other strategies other than voting use a different tactic. The same training set is used for all predictors, but it is trained with different random subsets in the training set. **Bagging** and **Pasting** use this strategy with slight differences:

- **Bagging**: When _using_ replacement
- **Pasting**: When _not_using_ replacement

### Bagging and Pasting in Scikit-Learn

### Out-of-Bag Evaluation

### Random Patches and Random Subspaces

## Random Forests

### Extra-Trees

### Feature Importance

## Boosting

### AdaBoost

### Gradient Boosting

       
### Histogram-Based Gradient Boosting

## Stackin