# Email Fraud Detector: BERT Model Build

#### Ross Willett

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Email-Fraud-Detector:-BERT-Model-Build" data-toc-modified-id="Email-Fraud-Detector:-BERT-Model-Build-1">Email Fraud Detector: BERT Model Build</a></span><ul class="toc-item"><li><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Ross-Willett" data-toc-modified-id="Ross-Willett-1.0.0.1">Ross Willett</a></span></li></ul></li></ul></li><li><span><a href="#File-Introduction" data-toc-modified-id="File-Introduction-1.1">File Introduction</a></span></li><li><span><a href="#Preparing-the-Model" data-toc-modified-id="Preparing-the-Model-1.2">Preparing the Model</a></span></li><li><span><a href="#Building-the-Model" data-toc-modified-id="Building-the-Model-1.3">Building the Model</a></span></li></ul></li></ul></div>

## File Introduction

In this file, a BERT model loaded from [TensorflowHub](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4) with additional layers added on top of it for email classification will be trained. A BERT model will be used as the base for a transfer learning model for several reasons. One reason a BERT model will be used is due to the fact it uses word embeddings which will place words with similar meanings in a relatively close vector space. This means that the model will not rely on specific words in order to positively identify a fraudulent email, thus making the model more generalizable. In addition to this, the BERT model takes into account the context of words in relation to other words in the input. (Up to 512 words) This will allow the model to better numerically express the context of the text content which should allow for a better ability to classify an email as fraudulent or not.

## Preparing the Model

Before the model can be built and trained, the data needs to be prepared.

In [12]:
# Import data manipulation libraries
import pandas as pd
import numpy as np

# Model selection libraries
from sklearn.model_selection import train_test_split

# Model Evaluation Libraries
from sklearn.metrics import accuracy_score

# Import Tensor Flow and keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Import Tensor Flow Hub and Tensor Flow Text (Required libraries for the pre-trained BERT Model)
import tensorflow_hub as hub
import tensorflow_text

# Import library for saving files
import joblib

In [6]:
# Import warnings and supress them
import warnings
warnings.filterwarnings('ignore')

In [7]:
# Configure Pandas to show all columns / rows
pd.options.display.max_columns = 2000
pd.options.display.max_rows = 2000
# Set column max width larger
pd.set_option('display.max_colwidth', 200)

In [9]:
# Load X remainder
X_remainder = pd.read_csv('./data/X_remainder.csv')
# Load X test
X_test = pd.read_csv('./data/X_test.csv')
# Load y remainder
y_remainder = pd.read_csv('./data/y_remainder.csv')
# Load y test
y_test = pd.read_csv('./data/y_test.csv')

## Building the Model

Now that the data has been appropriately separated, the model can be built and trained. First the BERT encoder and pre-trained BERT model to be built on needs to be loaded from Tensor Flow Hub. The BERT model that will be used is a more compact version of the original model so as to allow for faster training and testing. This model consists of 12 hidden layers (i.e. Transformer blocks), a layer node size of 768, and 12 attention heads. For the purposes of this project, the BERT enoding layers will be frozen such that only the layers added to the output will be trained. This will be done since the BERT model has already been trained upon a large data set of words and will output appropriate relationships between these words, and therefore should require no further training.

In [11]:
# Load the BERT encoder from Tensor Flow Hub
preprocessor = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
# Load the BERT model from Tensor Flow Hub
encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4", trainable=False)



In [2]:
# Instantiate the input layer of the model
text_input = layers.Input(shape=(), dtype=tf.string)
# Pass the input layer to the BERT tokenizing layer
encoder_inputs = preprocessor(text_input)
# Pass the tokenized input to the BERT encoder
outputs = encoder(encoder_inputs)
# Get the pooled output of the BERT encoder
pooled_output = outputs["pooled_output"]
# Pass the pooled output of the BERT encoder to a 128 node relu layer
relu_layer = layers.Dense(128, activation='relu')(pooled_output)
# Pass the relu layer to a 1 node sigmoid layer for classification
output = layers.Dense(1, activation='sigmoid')(relu_layer)

NameError: name 'preprocessor' is not defined

In [3]:
# Instantiate the model
bert_model = tf.keras.Model(inputs=text_input, outputs=output)

NameError: name 'output' is not defined

In [13]:
# Examine the layers of the model
bert_model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None,)]            0           []                               
                                                                                                  
 keras_layer (KerasLayer)       {'input_mask': (Non  0           ['input_1[0][0]']                
                                e, 128),                                                          
                                 'input_word_ids':                                                
                                (None, 128),                                                      
                                 'input_type_ids':                                                
                                (None, 128)}                                                  

In [17]:
# Compile the BERT model using the adam optimizer, binary cross entropy loss and record binary accuracy and recall
bert_model.compile(
    # Optimizer
    optimizer=keras.optimizers.Adam(),
    # Loss function to minimize
    loss=keras.losses.BinaryCrossentropy(),
    # Metric used to evaluate model
    metrics=[keras.metrics.BinaryAccuracy(), keras.metrics.Recall()]
)

In [20]:
# Fit the model
history = bert_model.fit(X_remainder['content'], y_remainder, epochs=5, verbose=1, validation_split=0.2)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
# Save the model to a tensor flow
bert_model.save('bert_model_5_relu_sig', save_format='hd5')

In [None]:
# Save the model history to a pickle file
joblib.dump(history, 'bert_model_5_relu_sig_hist.pkl')

Now that the model has been built and saved, it will be loaded and further evaluated in a separate file.