# Ensemble Learning 

The idea behind ensemble learning is to combine multiple models to give better generalised performance. The most general way to do this is by majority voting; this works by picking the class label most common amoung all the classifiers:

$$
\hat{y} = mode(C_{1}(\mathbf{x}), C_{2}(\mathbf{x}),...,C_{m}(\mathbf{x})) 
$$

In [11]:
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target

clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()

# voting='hard' uses predicted class labels, 'soft' uses predicted class probabilities.
# the weights parameter can be used to give additional strength to classifiers
eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard')

for clf, label in zip([clf1, clf2, clf3, eclf], ['Logistic Regression', 'Random Forest', 'naive Bayes', 'Ensemble']):
    scores = cross_val_score(clf, X, y, cv=10, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

Accuracy: 0.92 (+/- 0.04) [Logistic Regression]
Accuracy: 0.95 (+/- 0.05) [Random Forest]
Accuracy: 0.91 (+/- 0.05) [naive Bayes]
Accuracy: 0.95 (+/- 0.04) [Ensemble]


## Bagging 

Bagging is a form of majority voting that works by aggregating the results of a single model on multiple subsets of data. This is akin to cross validation, if all training sets informed the final algorithm. 

Bagging is a good way to reduce overfitting, and works well on already developed models. 
```
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
bagging = BaggingClassifier(KNeighborsClassifier(), max_samples=0.5, max_features=0.5)
```

## Adaptive Boosting 

In Adaptive boosting, a number of simple classifiers (AKA weak learners) are iteratively trained on the data. As the iterations progress, samples are weighted by their difficultly, resulting in the most difficult samples being focused on more heavily. 

However, repeated usage of the training set can result in poor generalization error, and ADA is not particularly parallelizable. 

In [12]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier

iris = load_iris()
# The base_estimator changes the type of weak learner (the default is the DecisionTreeClassifier). 
clf = AdaBoostClassifier(n_estimators=100, learning_rate=0.1)
scores = cross_val_score(clf, iris.data, iris.target)
scores.mean()   

0.95996732026143794

## Additional Complexity

While ensemble methods do often increase the accuracy of a model. They often do so at a large amount on increased complexity, which in many cases doesn't translate into real world applications. 