### Bagging Meta Estimator

In [1]:
# Two types BaggingClassifier and BaggingRegressor

from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier

bagging = BaggingClassifier(base_estimator=KNeighborsClassifier(),
                            n_estimators= 20, #number of estimators to be used in this ensemble
                            max_samples=0.5,  #number of samples to draw from X to train each base estimator
                            max_features=0.5, #number of features to draw from X to train each base estimator
                            bootstrap=True, # if True samples are drawn with replacement
                            bootstrap_features=True, # if true features are drawn with replacement
                            oob_score=True,  # if true use out of bag samples to estimate
                            n_jobs=1, # if -1 runs number of jobs as number of cores
                            random_state=None # return what to use random samples
                           ) 

### Forests of randomized trees

In [2]:
#Random Forest RandomForestClassifier / RandomForestRegressor specialized ensemble specific for decisionTrees
# has config opts of DecisionTreesClassifier and and BaggingClassifier in it
# each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) 
# from the training set. In addition, when splitting a node during the construction of the tree, the split 
# that is chosen is no longer the best split among all features. Instead, the split that is picked is the best 
# split among a random subset of the features.
from sklearn.ensemble import RandomForestClassifier
X = [[0,0],[1,1]]
Y = [0,1]
clf = RandomForestClassifier(n_estimators=10)
print clf.fit(X,Y)


#ExtraTrees
# As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative 
# thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated 
# thresholds is picked as the splitting rule. This usually allows to reduce the variance of the model a bit more,
# at the expense of a slightly greater increase in bias
from sklearn.ensemble import ExtraTreesClassifier
clf = ExtraTreesClassifier(n_estimators=10)
clf.fit(X,Y)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)


ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

main params are n_estimators and max_features

n_estimators larger the better but more time to compute also note that results will stop getting  

significantly better beyond a critical number of trees

max_features = the size of the random subsets of features to consider when splitting a node.

The lower the greater the reduction of variance, but also the greater the increase in bias

Empirical good default values are:

    max_features=n_features for regression problems, 
    max_features=sqrt(n_features) for classification tasks (where n_features is the number of features in the data).
    max_depth=None in combination with min_samples_split=1 (full develop trees settings)
    
Note: The best parameter values should always be cross-validated. 

n_jobs = enables parallelization of computation

The relative rank (i.e. depth) of a feature used as a decision node in a tree can be used to assess the relative importance of that feature with respect to the predictability of the target variable. Features used at the top of the tree contribute to the final prediction decision of a larger fraction of the input samples. The expected fraction of the samples they contribute to can thus be used as an estimate of the relative importance of the features.

    


one of the outputs consumed from ExtraTreesClassifier is feature_importance

To Continue from 

http://scikit-learn.org/stable/modules/ensemble.html#b1999

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_iris.html#sphx-glr-auto-examples-ensemble-plot-forest-iris-py

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances_faces.html#sphx-glr-auto-examples-ensemble-plot-forest-importances-faces-py

http://scikit-learn.org/stable/auto_examples/plot_multioutput_face_completion.html#sphx-glr-auto-examples-plot-multioutput-face-completion-py


Feature importance

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html#sphx-glr-auto-examples-ensemble-plot-forest-importances-py

RamdonTreesEmbedding - Feature Transformation
http://scikit-learn.org/stable/auto_examples/ensemble/plot_random_forest_embedding.html#sphx-glr-auto-examples-ensemble-plot-random-forest-embedding-py




