<a href="https://colab.research.google.com/github/sjung-stat/Customer-Support-Chat-Intent-Classification/blob/main/Model%20Building.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Before building a text classification model using BERT, we need to preprocess the dataset. First of all, we do one-hot encoding on our target variable. And after that, we use one of the companions of BERT models which is designed to work alongside BERT models, and its purpose is to transform unprocessed textual inputs into the appropriate input format required by BERT. More details can be found below.

In [1]:
# Import necessary libraries

from google.colab import drive
import pickle
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

In [2]:
# Load the preprocessed training and testing data
drive.mount('/content/drive')

with open('/content/drive/MyDrive/df_training_complete.pkl', 'rb') as f:   # Load `df_training_complete` from Google Drive
    df_training_complete = pickle.load(f)

with open('/content/drive/MyDrive/df_testing_copy.pkl', 'rb') as f:   # Load `df_testing_copy` from Google Drive
    df_testing_complete = pickle.load(f)

Mounted at /content/drive


# Preprocessing

## One Hot Encoding

In [3]:
# Split training and validation sets and reformat testing set. 
trainfeatures, validfeatures, trainlabels, validlabels = train_test_split(df_training_complete['text'],df_training_complete['category'], stratify=df_training_complete['category'], test_size=0.2)

testfeatures=df_testing_complete.copy()
testlabels=testfeatures.pop("category")


# One-Hot-Encoding of class-labels
binarizer=LabelBinarizer()  

trainlabels=binarizer.fit_transform(trainlabels.values)
validlabels=binarizer.transform(validlabels.values)
testlabels=binarizer.transform(testlabels.values)
trainfeatures = pd.DataFrame(trainfeatures)
validfeatures = pd.DataFrame(validfeatures)

In [4]:
print("Total number of training examples: ", trainfeatures.shape[0])
print("Total number of validation examples: ", validfeatures.shape[0])
print("Total number of testing examples: ", testfeatures.shape[0])

Total number of training examples:  12733
Total number of validation examples:  3184
Total number of testing examples:  3080


In [5]:
!pip install -q tensorflow_text

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [6]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text
from sklearn.preprocessing import LabelBinarizer

In [7]:
# Note that the following preprocessing model cannot take pandas dataframe as input
# Need to convert the input data (text) into list
bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3") # The text data we will be working with will undergo preprocessing using a TensorFlow model.
                                                                                              # As this preprocessor is a TensorFlow model, it can be easily integrated directly into your own model.



# Model Building

For this intent classification project, we use BERT which is a pretrained language model. BERT is useful for intent classification because it is pre-trained on a large corpus of text data, allowing it to understand the nuances of natural language. Additionally, BERT uses a bidirectional approach, which means it can analyze a text input in both directions, allowing it to better understand the context and meaning of the input. By using BERT as a pre-processing step, you can improve the accuracy and efficiency of your intent classification model.

In [8]:
bert_model = hub.KerasLayer("https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1")

In [None]:
def intent_classification_bert():
  
  # Initializing the BERT layers
  input_text = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') # define an input tensor --> the input data to the model will be a string of variable length
  text_preprocessed = bert_preprocess(input_text) # This layer converts the input string into a format that can be understood by the BERT model
  output_bert = bert_model(text_preprocessed) # This layer encodes the input text using the BERT model and returns the encoded outputs
                                              # After that, this will be fed into the neural network layers.

  # Initializing the neural network layers
  encoded_text = output_bert['pooled_output']
  layer_1 = tf.keras.layers.Dense(512, activation='relu')(encoded_text)
  layer_2 = tf.keras.layers.Dense(256, activation='relu')(layer_1)
  layer_3 = tf.keras.layers.Dense(128, activation='relu')(layer_2)
  layer_4 = tf.keras.layers.Dropout(0.1)(layer_3) # This layer will be used to prevent model overfitting
                                                                                 # We will use 0.1% of the neurons to handle overfitting
  output = tf.keras.layers.Dense(trainlabels.shape[1], activation='softmax')(layer_4)  # It only has one neuron. We also initialize the activation function as sigmoid. 
                                                                                 # sigmoid is used when we have output values that between 0 and 1. 
                                                                                 # In our case, when making predictions, 
                                                                                 # the prediction probability will lie between 0 and 1. That’s why it is best suited.
                                                                                 # We also name the layer as output because this is our output layer.
  return tf.keras.Model(input_text, output)

In [None]:
classifier_model = intent_classification_bert()

loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) # Since this is a non-binary classification problem and the model outputs probabilities, 
                                                                 # you’ll use losses.CategoricalCrossentropy loss function.
metrics = tf.metrics.CategoricalAccuracy()

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


In [None]:
epochs=20
optimizer=tf.keras.optimizers.Adam(1e-5)
classifier_model.compile(optimizer=optimizer,
                         loss=loss,
                         metrics=metrics)

In [None]:
history = classifier_model.fit(x=trainfeatures, y=trainlabels,
                               validation_data=(validfeatures,validlabels),
                               batch_size=32,
                               #class_weight=class_weights_dict,
                               epochs=epochs)

Epoch 1/20


  output, from_logits = _get_logits(




  output, from_logits = _get_logits(


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [9]:
# Model building version 2

In [10]:
def intent_classification_bert_v2():
  
  # Initializing the BERT layers
  input_text = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') # define an input tensor --> the input data to the model will be a string of variable length
  text_preprocessed = bert_preprocess(input_text) # This layer converts the input string into a format that can be understood by the BERT model
  output_bert = bert_model(text_preprocessed) # This layer encodes the input text using the BERT model and returns the encoded outputs
                                              # After that, this will be fed into the neural network layers.

  # Initializing the neural network layers
  encoded_text = output_bert['pooled_output']
  layer_1 = tf.keras.layers.Dense(512, activation='relu')(encoded_text)
  layer_2 = tf.keras.layers.Dense(256, activation='relu')(layer_1)
  layer_3 = tf.keras.layers.Dense(128, activation='relu')(layer_2)
  #layer_4 = tf.keras.layers.Dropout(0.1)(layer_3) # This layer will be used to prevent model overfitting
                                                                                 # We will use 0.1% of the neurons to handle overfitting
  output = tf.keras.layers.Dense(trainlabels.shape[1], activation='softmax')(layer_3)  # It only has one neuron. We also initialize the activation function as sigmoid. 
                                                                                 # sigmoid is used when we have output values that between 0 and 1. 
                                                                                 # In our case, when making predictions, 
                                                                                 # the prediction probability will lie between 0 and 1. That’s why it is best suited.
                                                                                 # We also name the layer as output because this is our output layer.
  return tf.keras.Model(input_text, output)

In [11]:
classifier_model_v2 = intent_classification_bert_v2()

loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) # Since this is a non-binary classification problem and the model outputs probabilities, 
                                                                 # you’ll use losses.CategoricalCrossentropy loss function.
metrics = tf.metrics.CategoricalAccuracy()

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


In [12]:
epochs=20
optimizer=tf.keras.optimizers.Adam(1e-5)
classifier_model_v2.compile(optimizer=optimizer,
                         loss=loss,
                         metrics=metrics)

In [13]:
history_v2 = classifier_model_v2.fit(x=trainfeatures, y=trainlabels,
                               validation_data=(validfeatures,validlabels),
                               batch_size=32,
                               #class_weight=class_weights_dict,
                               epochs=epochs)

Epoch 1/20


  output, from_logits = _get_logits(




  output, from_logits = _get_logits(


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
# Model Building #3

In [14]:
def intent_classification_bert_v3():
  
  # Initializing the BERT layers
  input_text = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') # define an input tensor --> the input data to the model will be a string of variable length
  text_preprocessed = bert_preprocess(input_text) # This layer converts the input string into a format that can be understood by the BERT model
  output_bert = bert_model(text_preprocessed) # This layer encodes the input text using the BERT model and returns the encoded outputs
                                              # After that, this will be fed into the neural network layers.

  # Initializing the neural network layers
  encoded_text = output_bert['pooled_output']
  layer_1 = tf.keras.layers.Dense(512, activation='relu')(encoded_text)
  layer_2 = tf.keras.layers.Dense(256, activation='relu')(layer_1)
  layer_3 = tf.keras.layers.Dense(128, activation='relu')(layer_2)
  layer_4 = tf.keras.layers.Dense(64, activation='relu')(layer_3)

  #layer_4 = tf.keras.layers.Dropout(0.1)(layer_3) # This layer will be used to prevent model overfitting
                                                                                 # We will use 0.1% of the neurons to handle overfitting
  output = tf.keras.layers.Dense(trainlabels.shape[1], activation='softmax')(layer_4)  # It only has one neuron. We also initialize the activation function as sigmoid. 
                                                                                 # sigmoid is used when we have output values that between 0 and 1. 
                                                                                 # In our case, when making predictions, 
                                                                                 # the prediction probability will lie between 0 and 1. That’s why it is best suited.
                                                                                 # We also name the layer as output because this is our output layer.
  return tf.keras.Model(input_text, output)

In [15]:
classifier_model_v3 = intent_classification_bert_v3()

loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) # Since this is a non-binary classification problem and the model outputs probabilities, 
                                                                 # you’ll use losses.CategoricalCrossentropy loss function.
metrics = tf.metrics.CategoricalAccuracy()

In [16]:
epochs=20
optimizer=tf.keras.optimizers.Adam(1e-5)
classifier_model_v3.compile(optimizer=optimizer,
                         loss=loss,
                         metrics=metrics)

In [17]:
history_v3 = classifier_model_v3.fit(x=trainfeatures, y=trainlabels,
                               validation_data=(validfeatures,validlabels),
                               batch_size=32,
                               #class_weight=class_weights_dict,
                               epochs=epochs)

Epoch 1/20


  output, from_logits = _get_logits(




  output, from_logits = _get_logits(


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [18]:
from tensorflow.keras.layers import Input, Dense, Dropout, GlobalMaxPooling1D, concatenate
from tensorflow.keras.models import Model

In [19]:
def intent_classification_bert_v4():
  
  # Initializing the BERT layers
  input_text = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') # define an input tensor --> the input data to the model will be a string of variable length
  text_preprocessed = bert_preprocess(input_text) # This layer converts the input string into a format that can be understood by the BERT model
  output_bert = bert_model(text_preprocessed) # This layer encodes the input text using the BERT model and returns the encoded outputs
                                              # After that, this will be fed into the neural network layers.

  # Initializing the neural network layers
  encoded_text = output_bert['pooled_output']




  x1 = GlobalMaxPooling1D()(output_bert["sequence_output"])
  x2 = GlobalMaxPooling1D()(output_bert["pooled_output"])
  x = concatenate([x1, x2])
  x = Dense(512, activation="relu")(x)
  x = Dense(256, activation="relu")(x)






  
  x = tf.keras.layers.Dense(512, activation='relu')(x)
  x = tf.keras.layers.Dense(256, activation='relu')(x)
  x = tf.keras.layers.Dense(128, activation='relu')(x)

  #layer_4 = tf.keras.layers.Dropout(0.1)(layer_3) # This layer will be used to prevent model overfitting
                                                                                 # We will use 0.1% of the neurons to handle overfitting
  output = tf.keras.layers.Dense(trainlabels.shape[1], activation='softmax')(x)  # It only has one neuron. We also initialize the activation function as sigmoid. 
                                                                                 # sigmoid is used when we have output values that between 0 and 1. 
                                                                                 # In our case, when making predictions, 
                                                                                 # the prediction probability will lie between 0 and 1. That’s why it is best suited.
                                                                                 # We also name the layer as output because this is our output layer.
  return tf.keras.Model(input_text, output)

In [21]:
input_text = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text') # define an input tensor --> the input data to the model will be a string of variable length
text_preprocessed = bert_preprocess(input_text) # This layer converts the input string into a format that can be understood by the BERT model
output_bert = bert_model(text_preprocessed) # This layer encodes the input text using the BERT model and returns the encoded outputs
                                              # After that, this will be fed into the neural network layers.

  # Initializing the neural network layers
encoded_text = output_bert['pooled_output']




x1 = GlobalMaxPooling1D()(output_bert["sequence_output"])
x2 = GlobalMaxPooling1D()(output_bert["pooled_output"])
x = concatenate([x1, x2])
x = Dense(512, activation="relu")(x)
x = Dense(256, activation="relu")(x)
x = Dense(trainlabels.shape[1], activation="softmax")(x)

# Define the model
model = Model(inputs=input_text, outputs=x)

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

ValueError: ignored

In [23]:
# Load the pre-trained BERT model from TensorFlow Hub
bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
bert_model = hub.KerasLayer("https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1", trainable=True)

# Define the input layers
input_text = Input(shape=(), dtype=tf.string, name="input_text")
preprocessed_text = bert_preprocess(input_text)
bert_outputs = bert_model(preprocessed_text)

# Add some additional layers
x1 = GlobalMaxPooling1D()(bert_outputs["sequence_output"])
x2 = GlobalMaxPooling1D()(bert_outputs["pooled_output"])
x = concatenate([x1, x2])
x = Dense(256, activation="relu")(x)
x = Dropout(0.2)(x)
x = Dense(trainlabels.shape[1], activation="softmax")(x)

# Define the model
model = Model(inputs=input_text, outputs=x)

# Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Load your training data
train_texts =  trainfeatures # your training texts
train_labels = trainlabels # your training labels
batch_size = 32

# Convert your labels to one-hot encoding
train_labels_one_hot = tf.keras.utils.to_categorical(train_labels, num_classes=trainlabels.shape[1])

# Define the training data as a TensorFlow Dataset
train_data = tf.data.Dataset.from_tensor_slices((train_texts, train_labels_one_hot))
train_data = train_data.batch(batch_size)

# Train the model
model.fit(train_data, epochs=40)





ValueError: ignored

In [20]:
classifier_model_v4 = intent_classification_bert_v4()

loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) # Since this is a non-binary classification problem and the model outputs probabilities, 
                                                                 # you’ll use losses.CategoricalCrossentropy loss function.
metrics = tf.metrics.CategoricalAccuracy()

ValueError: ignored

In [None]:
epochs=20
optimizer=tf.keras.optimizers.Adam(1e-5)
classifier_model_v4.compile(optimizer=optimizer,
                         loss=loss,
                         metrics=metrics)

In [None]:
history_v4 = classifier_model_v4.fit(x=trainfeatures, y=trainlabels,
                               validation_data=(validfeatures,validlabels),
                               batch_size=32,
                               #class_weight=class_weights_dict,
                               epochs=epochs)