Based on the idea of crowd wisdom, Emsemble Methods agreagate the predictions of a group of predictors, and often get a predictions that is better than each individual predictor. The winning solutions in Machine Learning competitions often involve several Ensemble Methods (netflixprize.com). 

# Voting classifiers

Suppose we have many models, a simple way to aggregate the results is to pick the majority class : this is called a hard voting classifier. 

This aggregated model often performs way better than the separate models. In fact even if most of the classifiers are weak classifiers, the aggregated classifier is a strong classifier, provided there are sufficient numbers of weak classifiers and are sufficiently diverse. 

Ensemble methods work best when the predictors are as independent from one another as possible. One way to do so is to train them using different algorithms. This increases the chances that they'll make different types of errors. 

In [29]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Exemple of a voting classifier

from sklearn.ensemble import RandomForestClassifier 
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC(probability=True)

voting_clf = VotingClassifier(
    estimators = [('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)], voting = 'soft')

voting_clf.fit(X_train, y_train)

from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test,y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.896
SVC 0.896
VotingClassifier 0.912


# Bagging and Pasting 

Instead of training different models on the same data set we can train the same model on different samples of the same data set. 

If the sample are built with replement the method is called bagging. Otherwise it is called pasting. 

## Bagging and Pasting in Scikit-Learn

The following code trains 500 Decision Trees classfiers each on 100 training instances randomly sampled from the training set with replacement. 

In [48]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
import time

tic = time.time()
bag_clf = BaggingClassifier(
    
    DecisionTreeClassifier(), n_estimators = 500, 
    max_samples = 100, bootstrap = True, n_jobs = -1,oob_score= True)

bag_clf.fit(X_train, y_train)

tac = time.time() #67 seconds for 50k trees without parallel processing, 29.4 seconds with parallel processing 
print(tac-tic)
y_pred = bag_clf.predict(X_test)

tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)

print("Bagged: ", accuracy_score(y_test,y_pred))
print("OoB Score: ", bag_clf.oob_score_)
y_pred = tree.predict(X_test)
print("Tree: ", accuracy_score(y_test,y_pred))

29.451801538467407
Bagged:  0.912
OoB Score:  0.9253333333333333
Tree:  0.856


In [49]:
import multiprocessing as mp
print("Number of processors: ", mp.cpu_count())

Number of processors:  12


# Random Patches and Random Subspaces

The BaggingClassifier class supports ampling the features as well. This is very useful if you have a lot of features like when dealing with images for instance. Sampling both inputs and features is called Random Patches method. Keeping all instances and sampling only features is called Random Subspaces. 

# Random Forests

The random forest algorithm introduces more randomness when building the tree : instead of searching for the very best feature it searches rather for the best feature among a random subset of features. 

Extreemly Randomized Trees are also a way to build a random forest by selecting a random threshold rather than searching for the very best threshold. 

Random Forest Classifiers also compute the importance of each feature by looking at how much the tree nodes that use tha feature reduce impurity. It is a very handy way to get a quick understanding of what features actually matter and in particular if we need to perform feature selection. 