# Ensemble Methods

The list of all the functions along with their documentations is given below:

- [BaggingClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html#sklearn.ensemble.BaggingClassifier)
- [BaggingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html#sklearn.ensemble.BaggingRegressor)
- [RandomForestClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)
- [RandomForestRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor)
- [ExtraTreesClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html#sklearn.ensemble.ExtraTreesClassifier)
- [ExtraTreesRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor)

In [11]:
from sklearn import ensemble, neighbors, datasets
from sklearn.metrics import confusion_matrix
import numpy as np

## Bagging
Short for bootstrap aggregation. In scikit-learn, bagging methods are offered as a unified `BaggingClassifier` meta-estimator (resp. `BaggingRegressor`), taking as input a user-specified base estimator along with parameters specifying the strategy to draw random subsets. In particular, `max_samples` and `max_features` control the size of the subsets (in terms of samples and features), while `bootstrap` and `bootstrap_features` control whether samples and features are drawn with or without replacement. When using a subset of the available samples the generalization accuracy can be estimated with the out-of-bag samples by setting `oob_score=True`. **Bagging can also be parallelized using teh `n_jobs` argument.**

#### Fitting a bagging model
Fitting a KNN model on iris data set using bagging

In [3]:
bagging = ensemble.BaggingClassifier(neighbors.KNeighborsClassifier(), max_samples=0.5, 
                            max_features=0.5, oob_score = True, 
                            random_state = 2017-12-29)

n_neighbors = 15
iris = datasets.load_iris()
X, y = iris.data, iris.target

In [None]:
bagging.fit(X,y)
print(bagging.base_estimator_) # the base estimator
print(bagging.classes_) # class labels
print(bagging.oob_score_) # out of bag success rate

#### Prediction using bagging

In [13]:
pred = bagging.predict(X)
cmat = confusion_matrix(y, pred)
accuracy = round(100*cmat.diagonal().sum()/cmat.sum())
print(cmat)
print(accuracy)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')
[0 1 2]
0.946666666667
[[50  0  0]
 [ 0 49  1]
 [ 0  1 49]]
99.0


### Regression using bagging
Regression can also be done in a similar way by just using the base estimator as a regression model in `ensemble.BaggingRegressor`. One example is provided [here](http://scikit-learn.org/stable/auto_examples/ensemble/plot_bias_variance.html#sphx-glr-auto-examples-ensemble-plot-bias-variance-py).

## Random Forests

`RandomForestClassifier` and `RandomForestRegressor` are used for classification and regression respectively. The importance of features is stored in the attribute `feature_importances_` of the fitted object of the above two classes.

Each tree in the ensemble is built from a bootstrap sample drawn from the training set. The split is picked as the best split among a random subset of the features, instead of all features.

## Extra Trees

Short for Extremely Randomized Trees. `ExtraTreesClassifier` and `ExtraTreesRegressor` are used for classification and regression respectively.

In extremely randomized trees, randomness goes one step further in the way splits are computed. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule. This usually allows to reduce the variance of the model a bit more, at the expense of a slightly greater increase in bias.

## Best Choices for Parameters

The main parameters to adjust when using these methods is `n_estimators` and `max_features`. The former is the number of trees in the forest. The larger the better, but also the longer it will take to compute. In addition, note that results will stop getting significantly better beyond a critical number of trees. The latter is the size of the random subsets of features to consider when splitting a node. The lower the greater the reduction of variance, but also the greater the increase in bias. **Empirical good default values are `max_features=n_features` for regression problems, and `max_features=sqrt(n_features)` for classification** tasks (where `n_features` is the number of features in the data). Good results are often achieved when setting **`max_depth=None`** in combination with **`min_samples_split=2`** (i.e., when fully developing the trees). Bear in mind though that these values are usually not optimal, and might result in models that consume a lot of RAM. **The best parameter values should always be cross-validated**. In addition, note that **in random forests, bootstrap samples are used by default (`bootstrap=True`) while the default strategy for extra-trees is to use the whole dataset (`bootstrap=False`)**. When using bootstrap sampling the generalization accuracy can be estimated on the left out or out-of-bag samples. This can be enabled by setting `oob_score=True`.

## Parallelization

Finally, this module also features the parallel construction of the trees and the parallel computation of the predictions through the n_jobs parameter. If n_jobs=k then computations are partitioned into k jobs, and run on k cores of the machine. If n_jobs=-1 then all cores available on the machine are used.

## Feature importance evaluation

[Example](http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html#sphx-glr-auto-examples-ensemble-plot-forest-importances-py)

The relative rank (i.e. depth) of a feature used as a decision node in a tree can be used to assess the relative importance of that feature with respect to the predictability of the target variable. In practice those estimates are stored as an attribute named `feature_importances_` on the fitted model. This is an array with shape (`n_features`,) whose values are positive and sum to 1.0. The higher the value, the more important is the contribution of the matching feature to the prediction function.