##### Random forest is an extension of the bagging ensemble.

#### Like bagging, the random forest ensemble fits a decision tree on different bootstrap samples of the training dataset. Unlike bagging, random forest will also sample the features (columns) of each dataset.

##### Specifically, split points are chosen in the data while constructing each decision tree. Rather than considering all features when choosing a split point, random forest limits the features to a random subset of features, such as 3 if there were 10 features.

##### The random forest ensemble is available in scikit-learn via the RandomForestClassifier and RandomForestRegressor classes. You can specify the number of trees to create via the n_estimators argument and the number of randomly selected features to consider at each split point via the max_features argument, which is set to the square root of the number of features in your dataset by default.

##### The complete example of evaluating a random forest ensemble for classification is listed below.

In [1]:
# example of evaluating a random forest ensemble for classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import RandomForestClassifier
# create the synthetic classification dataset
X, y = make_classification(random_state=1)
# configure the ensemble model
model = RandomForestClassifier(n_estimators=50)
# configure the resampling method
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the ensemble on the dataset using the resampling method
n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report ensemble performance
print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))


Mean Accuracy: 0.957 (0.067)
