## Random Forests vs Bagged trees
---

With bagged decision trees, we generate many different trees on pretty similar data. These trees are **strongly correlated** with one another. Because these trees are correlated with one another, they will have high variance.

By "de-correlating" our trees from one another, we can drastically reduce the variance of our model.

That's the difference between bagged decision trees and random forests! This will reduce our variance (at the expense of a small increase in bias) and thus should greatly improve the overall performance of the final model.

### So how do we "de-correlate" our trees?

Random forests differ from bagging decision trees in only one way: They use a modified tree learning algorithm that selects, at each split in the learning process, a **random subset of the features**. This process is sometimes called the *random subspace method*.

TLDR: Only use a few features, not all like bagged trees.

For a problem with $p$ features, it is typical to use:

- $\sqrt{p}$ (rounded down) features in each split for a classification problem.
- $p/3$ (rounded down) with a minimum node size of 5 as the default for a regression problem.

Random forests, a step beyond bagged decision trees, are **very widely used** classifiers and regressors. They are relatively simple to use because they require very few parameters to set and they perform pretty well.



## Extremely Randomized Trees (ExtraTrees)
Adding another step of randomization (and thus de-correlation) yields extremely randomized trees, or _ExtraTrees_. Like Random Forests, these are trained using the random subspace method (sampling of features). However, they are trained on the entire dataset instead of bootstrapped samples. A layer of randomness is introduced in the way the nodes are split. Instead of computing the locally optimal feature/split combination (based on, e.g., information gain or the Gini impurity) for each feature under consideration, a random value is selected for the split. This value is selected from the feature's empirical range.

This further reduces the variance, but causes an increase in bias.

In [2]:
# from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
# from sklearn.model_selection import cross_val_score, train_test_split, GridSearchCV

In [3]:
# # Instantiate
# rf = RandomForestClassifier(n_estimators=100)
# et = ExtraTreesClassifier(n_estimators=100)

In [None]:
# # cross val score
# cross_val_score(rf, X_train, y_train, cv=5).mean()

In [4]:
# # Gridsearch over hyperparameters

# rf_params = {
#     'n_estimators': [100, 150, 200],
#     'max_depth': [None, 1, 2, 3, 4, 5],
# }

# gs = GridSearchCV(rf, param_grid=rf_params, cv=5)
# gs.fit(X_train, y_train)
# print(gs.best_score_)
# gs.best_params_