# Bert for Email Spam Detection

As per the paper, we use the simpletransformers library to instantiate our bert model. More information, including other available models can be found here: https://simpletransformers.ai/docs/classification-specifics/

We are running this notebook on kaggle using GPU P100

In [18]:
!pip install simpletransformers



In [32]:
import pandas as pd
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, f1_score
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import os
import torch
import numpy as np

torch.cuda.is_available()

True

In [20]:
# Load your training data into a pandas DataFrame
train_df = pd.read_csv("/kaggle/input/email-spam/train.csv") 
train_df.rename(columns={'spam': 'labels'}, inplace=True)
train_df = train_df[['text', 'labels']]
train_df.head()

Unnamed: 0,text,labels
0,subject institute international finance annual...,0
1,subject mortgage even worst credit zwzm detail...,1
2,subject partnership mr edward moko independenc...,1
3,subject de la part de enfants ama rue de marty...,1
4,subject synfuel option valuation lenny believe...,0


In [21]:
# Load your training data into a pandas DataFrame
test_df = pd.read_csv("/kaggle/input/email-spam/test.csv") 
test_df.rename(columns={'spam': 'labels'}, inplace=True)
test_df = test_df[['text', 'labels']]
test_df.head()

Unnamed: 0,text,labels
0,subject perfect logo charset koi r thinking br...,1
1,subject storage model security stinson added t...,0
2,subject wall street micro news report homeland...,1
3,subject logo stationer website design much lt ...,1
4,subject video conference ross mcintyre vince r...,0


## Instantiate Model

We set our hyperparameters based on the paper's guidelines

In [22]:
train_args = ClassificationArgs()

train_args.learning_rate = 4e-5
train_args.num_train_epochs = 3
train_args.train_batch_size = 32
train_args.max_seq_length = 300
train_args.optimizer = "AdamW"
train_args.eval_batch_size = 32

# https://github.com/ThilinaRajapakse/simpletransformers/issues/638#issuecomment-1060211019
train_args.use_multiprocessing=False
train_args.use_multiprocessing_for_evaluation=False
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [23]:
# Instantiate the BERT model
model = ClassificationModel(
    "bert",
    "bert-base-cased",
    num_labels=2,  # Binary Classification
    args=train_args,
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [24]:
# Train the model
model.train_model(train_df)

Epoch:   0%|          | 0/3 [00:00<?, ?it/s]

Running Epoch 1 of 3:   0%|          | 0/157 [00:00<?, ?it/s]

Running Epoch 2 of 3:   0%|          | 0/157 [00:00<?, ?it/s]

Running Epoch 3 of 3:   0%|          | 0/157 [00:00<?, ?it/s]

## Results

In [42]:
# run evaluation for the training dataset
result, model_outputs, wrong_predictions = model.eval_model(train_df)

# Extract predicted labels and true labels
predictions = np.argmax(model_outputs, axis=1)
true_labels = train_df['labels'] 

# Calculate accuracy and F1 score
conf_matrix = confusion_matrix(true_labels, predictions)
class_report = classification_report(true_labels, predictions)

print("Train Set Results:")
print(f"Accuracy: {result['accuracy']}")
print(f"F1 Score: {result['f1_score']}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

print(f"Raw Result information: {result}")

Running Evaluation:   0%|          | 0/157 [00:00<?, ?it/s]

Train Set Results:
Accuracy: 0.9998
F1 Score: 0.9997499374843711
Confusion Matrix:
[[3000    0]
 [   1 1999]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      3000
           1       1.00      1.00      1.00      2000

    accuracy                           1.00      5000
   macro avg       1.00      1.00      1.00      5000
weighted avg       1.00      1.00      1.00      5000

Raw Result information: {'mcc': 0.9995833853920756, 'accuracy': 0.9998, 'f1_score': 0.9997499374843711, 'tp': 1999, 'tn': 3000, 'fp': 0, 'fn': 1, 'auroc': 1.0, 'auprc': 0.9999999999999998, 'eval_loss': 0.0004956607890736525}


In [43]:
# Then run evaluation for the test dataset
result, model_outputs, wrong_predictions = model.eval_model(test_df)

# Extract predicted labels and true labels
predictions = np.argmax(model_outputs, axis=1)
true_labels = test_df['labels'] 

# Calculate accuracy and F1 score
conf_matrix = confusion_matrix(true_labels, predictions)
class_report = classification_report(true_labels, predictions)

print("Test Set Results:")
print(f"Accuracy: {result['accuracy']}")
print(f"F1 Score: {result['f1_score']}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

print(f"Raw Result information: {result}")

Running Evaluation:   0%|          | 0/8 [00:00<?, ?it/s]

Test Set Results:
Accuracy: 0.9778761061946902
F1 Score: 0.9775784753363229
Confusion Matrix:
[[112   1]
 [  4 109]]
Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.99      0.98       113
           1       0.99      0.96      0.98       113

    accuracy                           0.98       226
   macro avg       0.98      0.98      0.98       226
weighted avg       0.98      0.98      0.98       226

Raw Result information: {'mcc': 0.9560892129252899, 'accuracy': 0.9778761061946902, 'f1_score': 0.9775784753363229, 'tp': 109, 'tn': 112, 'fp': 1, 'fn': 4, 'auroc': 0.9962017385856371, 'auprc': 0.995752021635639, 'eval_loss': 0.6934738447889686}
