Whereas voting uses multiple models on the full training set, Bagging takes smaller sub-samples of the training set with replacement.

So if there are 100 instances being trained on, we'll grab 10 at random, train the model and then put that model aside. Then we'll put those 10 back, grab 10 again and train a second model. So on.

Finally we will get a number of predictions from each model. The ensemble aggregates these predictors (usually by taking the mode)

In [31]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
X, y = make_moons(n_samples=10000, noise=0.6, random_state=15)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=15)

In [32]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

In [43]:
# Build a Bagging Ensemble using 500 decision trees
# Each "bag" will have 100 samples max, with replacement (bootstrap)
# and we will use all available cores (-1)

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=.5, bootstrap=True, n_jobs=-1)

In [44]:
bag_clf.fit(X_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                                                        class_weight=None,
                                                        criterion='gini',
                                                        max_depth=None,
                                                        max_features=None,
                                                        max_leaf_nodes=None,
                                                        min_impurity_decrease=0.0,
                                                        min_impurity_split=None,
                                                        min_samples_leaf=1,
                                                        min_samples_split=2,
                                                        min_weight_fraction_leaf=0.0,
                                                        presort='deprecated',
                                                        random_state=None,


In [45]:
y_pred = bag_clf.predict(X_test)

In [46]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.7527272727272727

By adding the Out of Bag score, you can use the non-used samples as a validation set to estimate how the model will perform.

In [47]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    bootstrap=True, n_jobs=-1, oob_score=True)
bag_clf.fit(X_train, y_train)
bag_clf.oob_score_


0.7567164179104477

You can also create a model but without replacement by setting bootstrap to false. 

In [48]:
bag_clf_2 = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=100,
    max_samples=100, bootstrap=False, n_jobs=-1)
bag_clf_2.fit(X_train, y_train)
accuracy_score(y_test,y_pred)

0.7527272727272727

You can actually even grab random subsets of features as well with and without replacement.
`max_features`
`bootstrap_features`

* Random Patches = Subset of samples and features
* Random Spaces = no replacement, max_samples = 1, bootstrap_features = true, max_features [<1.0]