# AdaBoost Classifier

An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

### Import Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

### Read the dataset

In [2]:
# Read in our dataset
df = pd.read_table('SMSSpamCollection',
                   sep='\t', 
                   header=None, 
                   names=['label', 'sms_message'])

# Fix our response value
df['label'] = df.label.map({'ham':0, 'spam':1})

# Split our dataset into training and testing data
X_train, X_test, y_train, y_test = train_test_split(df['sms_message'], 
                                                    df['label'], 
                                                    random_state=1)

# Instantiate the CountVectorizer method
count_vector = CountVectorizer()

# Fit the training data and then return the matrix
training_data = count_vector.fit_transform(X_train)

# Transform testing data and return the matrix. Note we are not fitting the testing data to the CountVectorizer()
testing_data = count_vector.transform(X_test)

### Import Model

In [3]:
from sklearn.ensemble import AdaBoostClassifier

  from numpy.core.umath_tests import inner1d


### Instantiate a AdaBoost Classifier
Instantiate a Bagging Classifier with 200 weak learners (n_estimators) and everything else as default values.

In [4]:
adaboostModel = AdaBoostClassifier(n_estimators = 200)

### Fit the training data to the model

In [6]:
adaboostModel = adaboostModel.fit(training_data, y_train)

### Predict  on the testing data

In [7]:
preds = adaboostModel.predict(testing_data)

### Print all the types of scores that model achieved

In [8]:
print('Accuracy score: ', format(accuracy_score(y_test, preds)))
print('Precision score: ', format(precision_score(y_test, preds)))
print('Recall score: ', format(recall_score(y_test, preds)))
print('F1 score: ', format(f1_score(y_test, preds)))

Accuracy score:  0.9827709978463748
Precision score:  0.9653179190751445
Recall score:  0.9027027027027027
F1 score:  0.9329608938547486
