## Hyperparameters
When we define the model, we can specify the hyperparameters. In practice, the most common ones are

* base_estimator: The model utilized for the weak learners (Warning: Don't forget to import the model that you decide to use for the weak learner).
* n_estimators: The maximum number of weak learners used.

This is the code from the Spam project that uses the naive bayes classifier

In [8]:
# Import our libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Read in our dataset
df = pd.read_table('Data/SMSSpamCollection', 
                   sep='\t', header=None, names=['label', 'sms_message'])

# Fix our response value
df['label'] = df.label.map({'ham':0, 'spam':1})

# Split our dataset between training and testing
X_train, X_test, y_train, y_test = train_test_split(df['sms_message'], 
                                                    df['label'], random_state=1)

# Init the Count vecotirzer
count_vector = CountVectorizer()

# Fit the training data
training_data = count_vector.fit_transform(X_train)
testing_data = count_vector.transform(X_test)

# init model
naive_bayes = MultinomialNB()

# Fit the data
naive_bayes.fit(training_data, y_train)

# Predict
predictions = naive_bayes.predict(testing_data)

# Score our model
print('Accuracy score: ', format(accuracy_score(y_test, predictions)))
print('Precision score: ', format(precision_score(y_test, predictions)))
print('Recall score: ', format(recall_score(y_test, predictions)))
print('F1 score: ', format(f1_score(y_test, predictions)))

Accuracy score:  0.9885139985642498
Precision score:  0.9720670391061452
Recall score:  0.9405405405405406
F1 score:  0.9560439560439562


Implementing Ensembles

In [9]:
# importing ensemble methods
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, AdaBoostClassifier

In [10]:
# Instantiate a BaggingClassifier with:
# 200 weak learners (n_estimators) and everything else as default values
bagging_class = BaggingClassifier(n_estimators=200)


# Instantiate a RandomForestClassifier with:
# 200 weak learners (n_estimators) and everything else as default values
randomForest_class = RandomForestClassifier(n_estimators=200)

# Instantiate an a AdaBoostClassifier with:
# With 300 weak learners (n_estimators) and a learning_rate of 0.2
adaBoost_class = AdaBoostClassifier(n_estimators=300, learning_rate=0.2)

In [11]:
# Fit your BaggingClassifier to the training data
bagging_class.fit(training_data, y_train)

# Fit your RandomForestClassifier to the training data
randomForest_class.fit(training_data, y_train)

# Fit your AdaBoostClassifier to the training data
adaBoost_class.fit(training_data, y_train)


AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=0.2,
                   n_estimators=300, random_state=None)

In [16]:
# Predict using BaggingClassifier on the test data
bagging_pred = bagging_class.predict(testing_data)

# Predict using RandomForestClassifier on the test data
random_pred = randomForest_class.predict(testing_data)

# Predict using AdaBoostClassifier on the test data
ada_pred = adaBoost_class.predict(testing_data)


In [17]:
def print_metrics(y_true, preds, model_name=None):
    '''
    INPUT:
    y_true - the y values that are actually true in the dataset (NumPy array or pandas series)
    preds - the predictions for those values from some model (NumPy array or pandas series)
    model_name - (str - optional) a name associated with the model if you would like to add it to the print statements 
    
    OUTPUT:
    None - prints the accuracy, precision, recall, and F1 score
    '''
    if model_name == None:
        print('Accuracy score: ', format(accuracy_score(y_true, preds)))
        print('Precision score: ', format(precision_score(y_true, preds)))
        print('Recall score: ', format(recall_score(y_true, preds)))
        print('F1 score: ', format(f1_score(y_true, preds)))
        print('\n\n')
    
    else:
        print('Accuracy score for ' + model_name + ' :' , format(accuracy_score(y_true, preds)))
        print('Precision score ' + model_name + ' :', format(precision_score(y_true, preds)))
        print('Recall score ' + model_name + ' :', format(recall_score(y_true, preds)))
        print('F1 score ' + model_name + ' :', format(f1_score(y_true, preds)))
        print('\n\n')

In [23]:
# Print Bagging scores
print_metrics(y_test, bagging_pred, "bagging")

# Print Random Forest scores
print_metrics(y_test, random_pred, "Random Forest")


# Print AdaBoost scores
print_metrics(y_test, ada_pred, "AdaBoost")


# Naive Bayes Classifier scores
print_metrics(y_test, predictions, "Naive Bayes")




Accuracy score for bagging : 0.9748743718592965
Precision score bagging : 0.9120879120879121
Recall score bagging : 0.8972972972972973
F1 score bagging : 0.9046321525885558



Accuracy score for Random Forest : 0.9834888729361091
Precision score Random Forest : 1.0
Recall score Random Forest : 0.8756756756756757
F1 score Random Forest : 0.9337175792507205



Accuracy score for AdaBoost : 0.9770279971284996
Precision score AdaBoost : 0.9693251533742331
Recall score AdaBoost : 0.8540540540540541
F1 score AdaBoost : 0.9080459770114943



Accuracy score for Naive Bayes : 0.9885139985642498
Precision score Naive Bayes : 0.9720670391061452
Recall score Naive Bayes : 0.9405405405405406
F1 score Naive Bayes : 0.9560439560439562



