# Ensemble Learning and Random Forests

If you aggregate the predictions of a group of predictors (such as
classifiers or regressors), you will often get better predictions than with the best individual predictor. A
group of predictors is called an ensemble; thus, this technique is called Ensemble Learning, and an
Ensemble Learning algorithm is called an Ensemble method.

For example, you can train a group of Decision Tree classifiers, each on a different random subset of the
training set. To make predictions, you just obtain the predictions of all individual trees, then predict the
class that gets the most votes. Such an ensemble of Decision Trees is called a Random Forest

### Voting Classifiers

Suppose you have trained a few classifiers, each one achieving about 80% accuracy. You may have a
Logistic Regression classifier, an SVM classifier, a Random Forest classifier, a K-Nearest Neighbors
classifiers as shown in the following figure.

![alt text](images/im1.png)

A very simple way to create an even better classifier is to aggregate the predictions of each classifier and
predict the class that gets the most votes. This majority-vote classifier is called a hard voting classifier.

![alt text](images/im2.png)


In [1]:
### Example
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # petal length, petal width
y = iris.target
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()

voting_clf = VotingClassifier(
estimators=[('lr', log_clf), ('rf', rnd_clf)],
voting='hard'
)
voting_clf.fit(X_train, y_train)



VotingClassifier(estimators=[('lr',
                              LogisticRegression(C=1.0, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=100,
                                                 multi_class='warn',
                                                 n_jobs=None, penalty='l2',
                                                 random_state=None,
                                                 solver='warn', tol=0.0001,
                                                 verbose=0, warm_start=False)),
                             ('rf',
                              RandomForestClassifier(bootstrap=True,
                                                     class_weight=None,
                                                     criterion='gini',
                                           

In [2]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.8
RandomForestClassifier 1.0
VotingClassifier 0.98




### Bagging and Pasting

One way to get a diverse set of classifiers is to use very different training algorithms. Another approach is to use the same training algorithm for every predictor, but to train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging. When sampling is performed without replacement, it is called pasting.

Both bagging and pasting allow training instances to be sampled several times across
multiple predictors, but only bagging allows training instances to be sampled several times for the same
predictor.

![alt text](images/im3.png)


In [3]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

## using Decision Trees
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)
y_pred = tree_clf.predict(X_test)
print( " Decision Tree accuracy                               ",accuracy_score(y_test, y_pred))

bag_clf = BaggingClassifier(
DecisionTreeClassifier(), n_estimators=500,
max_samples=100, bootstrap=True, n_jobs=-1
)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
print( " BaggingClassifier of Decision Tree accuracy          ",accuracy_score(y_test, y_pred))


 Decision Tree accuracy                                0.98
 BaggingClassifier of Decision Tree accuracy           1.0


### Random Forests

Random Forest is an ensemble of Decision Trees, generally trained via the bagging method (or sometimes pasting), typically with max_samples set to the size of the training set. Instead of building a BaggingClassifier and passing it a DecisionTreeClassifier, you can instead use the RandomForestClassifier class, which is more convenient and optimized for Decision Trees.

In [4]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)
y_pred_rf = rnd_clf.predict(X_test)
accuracy_score(y_test, y_pred)

1.0

### Boosting

Boosting (originally called hypothesis boosting) refers to any Ensemble method that can combine several
weak learners into a strong learner. The general idea of most boosting methods is to train predictors
sequentially, each trying to correct its predecessor. There are many boosting methods available, but by far
the most popular are AdaBoost.

### AdaBoost

ne way for a new predictor to correct its predecessor is to pay a bit more attention to the training
instances that the predecessor underfitted. This results in new predictors focusing more and more on the
hard cases. This is the technique used by AdaBoost.

For example, to build an AdaBoost classifier, a first base classifier (such as a Decision Tree) is trained
and used to make predictions on the training set. The relative weight of misclassified training instances is
then increased. A second classifier is trained using the updated weights and again it makes predictions on
the training set, weights are updated, and so on as shown  in the following figure.

![alt text](images/im4.png)


The following code trains an AdaBoost classifier based on 200 Decision Stumps using Scikit-Learn’s
AdaBoostClassifier class. A Decision Stump is a Decision Tree with max_depth=1 — in other words, a tree composed of a single decision node plus two leaf nodes. This is the default base estimator for the AdaBoostClassifier class:


In [5]:
from sklearn.ensemble import AdaBoostClassifier
ada_clf = AdaBoostClassifier(
DecisionTreeClassifier(max_depth=1), n_estimators=200,
algorithm="SAMME.R", learning_rate=0.5
)
ada_clf.fit(X_train, y_train)
accuracy_score(y_test, y_pred)

1.0