# Decision trees

### References:
1. scikit-learn documentation https://scikit-learn.org/stable/modules/tree.html

# Random forest and ensamble methods

The goal of ensemble methods is to *combine the predictions of several base estimators* built with a given learning algorithm in order to **improve generalizability / robustness over a single estimator**.

Two families of ensemble methods are usually distinguished:

- In **averaging methods**, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.

Examples: Bagging methods, Forests of randomized trees, …

- By contrast, in **boosting methods**, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.

Examples: AdaBoost, Gradient Tree Boosting, …

In [2]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier

X, y = make_blobs(n_samples=10000, n_features=10, centers=100,
                  random_state=0)

clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,
                             random_state=0)
scores = cross_val_score(clf, X, y, cv=5)
scores.mean()

0.9823000000000001

In [3]:
scores

array([0.982 , 0.982 , 0.9855, 0.982 , 0.98  ])

### References:
1. scikit-learn documentation https://scikit-learn.org/stable/modules/ensemble.html#forest