# Bagging Classifier Model

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction.

Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

Here, we use the concept of weak learners which acts as the one node decision tree then we combine these weak learners to build a great learner

### Import Libaries

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

### Read the dataset

In [9]:
# Read in our dataset
df = pd.read_table('SMSSpamCollection',
                   sep='\t', 
                   header=None, 
                   names=['label', 'sms_message'])

# Fix our response value
df['label'] = df.label.map({'ham':0, 'spam':1})

# Split our dataset into training and testing data
X_train, X_test, y_train, y_test = train_test_split(df['sms_message'], 
                                                    df['label'], 
                                                    random_state=1)

# Instantiate the CountVectorizer method
count_vector = CountVectorizer()

# Fit the training data and then return the matrix
training_data = count_vector.fit_transform(X_train)

# Transform testing data and return the matrix. Note we are not fitting the testing data to the CountVectorizer()
testing_data = count_vector.transform(X_test)

### Import Model

In [10]:
from sklearn.ensemble import BaggingClassifier

### Instantiate a Bagging Classifier
Instantiate a Bagging Classifier with 200 weak learners (n_estimators) and everything else as default values.

In [11]:
baggingModel = BaggingClassifier(n_estimators = 200)

### Fit the training data to the model

In [12]:
baggingModel = baggingModel.fit(training_data, y_train)

### Predict on the testing data

In [15]:
preds = baggingModel.predict(testing_data)

### Print all the types of scores that model achieved

In [17]:
print('Accuracy score: ', format(accuracy_score(y_test, preds)))
print('Precision score: ', format(precision_score(y_test, preds)))
print('Recall score: ', format(recall_score(y_test, preds)))
print('F1 score: ', format(f1_score(y_test, preds)))

Accuracy score:  0.9734386216798278
Precision score:  0.9065934065934066
Recall score:  0.8918918918918919
F1 score:  0.8991825613079019
