## Bagging Classifier

The BaggingClassifier is a popular ensemble learning method provided by the Scikit-learn library, which implements the bagging (Bootstrap Aggregating) technique for classification tasks. It aggregates the predictions of multiple base classifiers trained on different subsets of the training data to make a final prediction. Here's a detailed overview of the BaggingClassifier:

### Key Features and Parameters:

1. **Base Estimator**:
   - The base estimator is the machine learning algorithm used to train each base classifier in the ensemble.
   - It can be any classifier from Scikit-learn, such as decision trees, support vector machines, k-nearest neighbors, etc.

2. **n_estimators**:
   - The number of base classifiers (estimators) to include in the ensemble.
   - Increasing the number of estimators typically leads to better performance, but it also increases computational complexity.

3. **max_samples**:
   - The number or proportion of samples to draw from the training data for each base classifier.
   - It controls the size of the bootstrap sample used for training each base classifier.
   - By default, it is set to the size of the training dataset.

4. **max_features**:
   - The number or proportion of features to consider for each base classifier.
   - It controls the size of the random subset of features used for training each base classifier.
   - If set to 1.0, all features are considered for each base classifier.
   - If set to less than 1.0, it specifies the proportion of features to consider.
   - If set to 'sqrt', it considers the square root of the total number of features.
   - If set to 'log2', it considers the logarithm base 2 of the total number of features.

5. **bootstrap**:
   - Whether to use bootstrap sampling (with replacement) when creating the training datasets for each base classifier.
   - If set to True (default), bootstrap sampling is used.
   - If set to False, pasting (sampling without replacement) is used.

6. **bootstrap_features**:
   - Whether to use bootstrap sampling when selecting features for each base classifier.
   - If set to True (default), bootstrap sampling is used.
   - If set to False, all features are considered for each base classifier.

7. **n_jobs**:
   - The number of CPU cores to use for parallelizing the training of base classifiers.
   - If set to -1 (default), all available CPU cores are used.

### How BaggingClassifier Works:

1. **Training**:
   - The BaggingClassifier first creates multiple bootstrap samples (or subsets) of the training data, each containing a random subset of instances (samples).
   - Then, it trains a base classifier (specified by the base estimator parameter) independently on each bootstrap sample.
   - Each base classifier learns to classify instances based on the features present in its respective bootstrap sample.

2. **Prediction Aggregation**:
   - During prediction, each base classifier makes its own individual predictions on the unseen data.
   - For classification tasks, the BaggingClassifier aggregates the predictions of all base classifiers using majority voting.
   - The class that receives the most votes among the base classifiers is considered the final prediction.
   - For regression tasks, predictions from all base classifiers are typically averaged to obtain the final prediction.

### Advantages:

- **Variance Reduction**: BaggingClassifier reduces overfitting and variance by combining predictions from multiple base classifiers trained on different subsets of the data.
  
- **Robustness**: By training models on diverse subsets of the data, BaggingClassifier is more robust to outliers and noisy data.

- **Parallelization**: BaggingClassifier supports parallel training of base classifiers, allowing for efficient utilization of computational resources.

### Example Usage:

```python
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

# Define base classifier
base_classifier = DecisionTreeClassifier()

# Create BaggingClassifier
bagging_classifier = BaggingClassifier(base_estimator=base_classifier, n_estimators=10, max_samples=0.8, max_features=0.8)

# Train BaggingClassifier
bagging_classifier.fit(X_train, y_train)

# Make predictions
y_pred = bagging_classifier.predict(X_test)
```

In this example, we create a BaggingClassifier with a base decision tree classifier and train it on the training data (X_train, y_train). We then make predictions on the test data (X_test) using the trained BaggingClassifier.

### Out-of-Bag (OOB) Score:

1. **Definition**:
   - When creating each bootstrap sample for training the base classifiers in a BaggingClassifier, some instances may not be included in the bootstrap sample (due to sampling with replacement).
   - The OOB score is calculated by evaluating each base classifier on the instances that were not included in its corresponding bootstrap sample during training.
   - Since these instances were not used in training, they essentially act as a validation set for assessing the performance of the base classifier.

2. **Calculation**:
   - For each instance in the training dataset, the OOB score is calculated by averaging the predictions made by the base classifiers that did not include that instance in their training bootstrap sample.
   - The OOB score is computed across all instances in the training dataset to obtain an overall measure of performance.

3. **Usage**:
   - The OOB score provides an estimate of the model's performance on unseen data without the need for a separate validation set.
   - It is especially useful when the dataset is limited, and splitting it into separate training and validation sets may reduce the amount of data available for training.

4. **Interpretation**:
   - A higher OOB score indicates better predictive performance of the BaggingClassifier.
   - Comparing the OOB score to the accuracy obtained on a separate validation set can help assess the generalization performance of the model.

### Advantages:

- **Efficiency**: The OOB score utilizes all available training data, making efficient use of the dataset without the need for a separate validation set.
  
- **Unbiased Estimate**: Since the OOB score is calculated on instances that were not included in the training bootstrap samples, it provides an unbiased estimate of the model's performance on unseen data.

- **Convenience**: OOB score computation is built-in to Scikit-learn's BaggingClassifier, making it convenient to assess model performance during training.

### Example Usage:

```python
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

# Define base classifier
base_classifier = DecisionTreeClassifier()

# Create BaggingClassifier with OOB scoring enabled
bagging_classifier = BaggingClassifier(base_estimator=base_classifier, n_estimators=10, oob_score=True)

# Train BaggingClassifier
bagging_classifier.fit(X_train, y_train)

# Obtain OOB score
oob_score = bagging_classifier.oob_score_
print("Out-of-Bag (OOB) Score:", oob_score)
```

In this example, we create a BaggingClassifier with a base decision tree classifier and set the `oob_score` parameter to `True` to enable OOB scoring. After training the BaggingClassifier, we can access the OOB score using the `oob_score_` attribute.

In [50]:
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

In [54]:
X,y = make_classification(n_samples=10000, n_features=10,n_informative=3)

In [55]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [56]:
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train,y_train)
y_pred = dt.predict(X_test)

print("Decision Tree accuracy",accuracy_score(y_test,y_pred))

Decision Tree accuracy 0.8755


# Bagging

In [64]:
bag = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=500,
    max_samples=0.5,
    bootstrap=True,
    random_state=42
)

In [65]:
bag.fit(X_train,y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                                                        class_weight=None,
                                                        criterion='gini',
                                                        max_depth=None,
                                                        max_features=None,
                                                        max_leaf_nodes=None,
                                                        min_impurity_decrease=0.0,
                                                        min_impurity_split=None,
                                                        min_samples_leaf=1,
                                                        min_samples_split=2,
                                                        min_weight_fraction_leaf=0.0,
                                                        presort='deprecated',
                                                        random_state=None,


In [66]:
y_pred = bag.predict(X_test)

In [67]:
accuracy_score(y_test,y_pred)

0.9165

In [68]:
bag.estimators_samples_[0].shape

(4000,)

In [69]:
bag.estimators_features_[0].shape

(10,)

# Bagging using SVM

In [70]:
bag = BaggingClassifier(
    base_estimator=SVC(),
    n_estimators=500,
    max_samples=0.25,
    bootstrap=True,
    random_state=42
)

In [71]:
bag.fit(X_train,y_train)
y_pred = bag.predict(X_test)
print("Bagging using SVM",accuracy_score(y_test,y_pred))

Bagging using SVM 0.901


# Pasting

In [72]:
bag = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=500,
    max_samples=0.25,
    bootstrap=False,
    random_state=42,
    verbose = 1,
    n_jobs=-1
)

In [73]:
bag.fit(X_train,y_train)
y_pred = bag.predict(X_test)
print("Pasting classifier",accuracy_score(y_test,y_pred))

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    8.3s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    8.3s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


Pasting classifier 0.9165


[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.5s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.5s finished


# Random Subspaces

In [74]:
bag = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=500,
    max_samples=1.0,
    bootstrap=False,
    max_features=0.5,
    bootstrap_features=True,
    random_state=42
)

In [75]:
bag.fit(X_train,y_train)
y_pred = bag.predict(X_test)
print("Random Subspaces classifier",accuracy_score(y_test,y_pred))

Random Subspaces classifier 0.911


In [76]:
bag.estimators_samples_[0].shape

(8000,)

In [77]:
bag.estimators_features_[0].shape

(5,)

# Random Patches

In [78]:
bag = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=500,
    max_samples=0.25,
    bootstrap=True,
    max_features=0.5,
    bootstrap_features=True,
    random_state=42
)

In [79]:
bag.fit(X_train,y_train)
y_pred = bag.predict(X_test)
print("Random Patches classifier",accuracy_score(y_test,y_pred))

Random Patches classifier 0.909


# OOB Score

In [80]:
bag = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=500,
    max_samples=0.25,
    bootstrap=True,
    oob_score=True,
    random_state=42
)

In [81]:
bag.fit(X_train,y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                                                        class_weight=None,
                                                        criterion='gini',
                                                        max_depth=None,
                                                        max_features=None,
                                                        max_leaf_nodes=None,
                                                        min_impurity_decrease=0.0,
                                                        min_impurity_split=None,
                                                        min_samples_leaf=1,
                                                        min_samples_split=2,
                                                        min_weight_fraction_leaf=0.0,
                                                        presort='deprecated',
                                                        random_state=None,


In [82]:
bag.oob_score_

0.90425

In [83]:
y_pred = bag.predict(X_test)
print("Accuracy",accuracy_score(y_test,y_pred))

Accuracy 0.9195


# Bagging Tips

- Bagging generally gives better results than Pasting
- Good results come around the 25% to 50% row sampling mark
- Random patches and subspaces should be used while dealing with high dimensional data
- To find the correct hyperparameter values we can do GridSearchCV/RandomSearchCV

# Applying GridSearchCV

In [9]:
from sklearn.model_selection import GridSearchCV

In [10]:
parameters = {
    'n_estimators': [50,100,500], 
    'max_samples': [0.1,0.4,0.7,1.0],
    'bootstrap' : [True,False],
    'max_features' : [0.1,0.4,0.7,1.0]
    }

In [11]:
search = GridSearchCV(BaggingClassifier(), parameters, cv=5)

In [12]:
search.fit(X_train,y_train)

GridSearchCV(cv=5, error_score=nan,
             estimator=BaggingClassifier(base_estimator=None, bootstrap=True,
                                         bootstrap_features=False,
                                         max_features=1.0, max_samples=1.0,
                                         n_estimators=10, n_jobs=None,
                                         oob_score=False, random_state=None,
                                         verbose=0, warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid={'bootstrap': [True, False],
                         'max_features': [0.1, 0.4, 0.7, 1.0],
                         'max_samples': [0.1, 0.4, 0.7, 1.0],
                         'n_estimators': [50, 100, 500]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [13]:
search.best_params_
search.best_score_

0.8986249999999998

In [14]:
search.best_params_

{'bootstrap': True,
 'max_features': 0.7,
 'max_samples': 0.4,
 'n_estimators': 500}