<a href="https://colab.research.google.com/github/look4pritam/ArtificialIntelligence/blob/master/MachineLearning/EnsembleMethods/Notebooks/Bagging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bagging

In this example, we will use bagging for classification.

# Set the root directory for processing.

In [1]:
import os

root_dir = '/content/'
os.chdir(root_dir)

!ls -al

total 16
drwxr-xr-x 1 root root 4096 Feb 29 14:23 .
drwxr-xr-x 1 root root 4096 Mar  5 05:49 ..
drwxr-xr-x 4 root root 4096 Feb 29 14:22 .config
drwxr-xr-x 1 root root 4096 Feb 29 14:23 sample_data


# Create a synthetic dataset.

### Import required python modules.

In [2]:
from sklearn.datasets import make_classification

### create a synthetic dataset for binary classification problem with 1,000 examples and 20 input features.

In [3]:
seed = 7

In [4]:
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=seed)
print(X.shape, y.shape)

(1000, 20) (1000,)


### Import required python modules.

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
test_size = 0.25

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = test_size, random_state = seed)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(750, 20)
(250, 20)
(750,)
(250,)


# Create logistic regression model.

### Import required python modules.

See [link](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) for more details.

In [8]:
from sklearn.linear_model import LogisticRegression

### Create logistic regression model.

In [9]:
model = LogisticRegression(max_iter=500, random_state=16)
model.fit(X_train, y_train)

# Evaluate the model.

### Import required python modules.

In [10]:
from sklearn import metrics
from sklearn.metrics import classification_report

### Evaluate the model on the training dataset.

In [11]:
y_train_predict = model.predict(X_train)

confusion_matrix = metrics.confusion_matrix(y_train, y_train_predict)
confusion_matrix

array([[331,  48],
       [ 50, 321]])

### Evaluate the model on the test dataset.

In [12]:
y_test_predict = model.predict(X_test)

confusion_matrix = metrics.confusion_matrix(y_test, y_test_predict)
confusion_matrix

array([[104,  18],
       [ 18, 110]])

### Show classification report.

In [13]:
target_names = ['Class-0', 'Class-1']
print(classification_report(y_test, y_test_predict, target_names=target_names))

              precision    recall  f1-score   support

     Class-0       0.85      0.85      0.85       122
     Class-1       0.86      0.86      0.86       128

    accuracy                           0.86       250
   macro avg       0.86      0.86      0.86       250
weighted avg       0.86      0.86      0.86       250



### Import required python modules.

In [14]:
from sklearn.ensemble import BaggingClassifier

### Create bagging classifier model.

In [15]:
model = BaggingClassifier(random_state=seed)
model.fit(X_train, y_train)

# Evaluate the model.

### Evaluate the model on the training dataset.

In [16]:
y_train_predict = model.predict(X_train)

confusion_matrix = metrics.confusion_matrix(y_train, y_train_predict)
confusion_matrix

array([[378,   1],
       [  3, 368]])

### Evaluate the model on the test dataset.

In [17]:
y_test_predict = model.predict(X_test)

confusion_matrix = metrics.confusion_matrix(y_test, y_test_predict)
confusion_matrix

array([[113,   9],
       [ 19, 109]])

### Show classification report.

In [18]:
target_names = ['Class-0', 'Class-1']
print(classification_report(y_test, y_test_predict, target_names=target_names))

              precision    recall  f1-score   support

     Class-0       0.86      0.93      0.89       122
     Class-1       0.92      0.85      0.89       128

    accuracy                           0.89       250
   macro avg       0.89      0.89      0.89       250
weighted avg       0.89      0.89      0.89       250

