# Keras NLP - BERT Base Multi Sentiment Analysis example for cross-validation

An example usage of sentiment analysis with a BERT Base Multi model using Keras NLP / Keras Hub. This notebook is for Google Colab, use at least the T4 GPU. This notebook was used for cross-validation, where the cross-validation folds have been made already created and saved. Then they are loaded one by one (not optimal, it would be better to preprocess the data and then create the folds).

First install keras_nlp (or keras_hub which is the same thing right now)

In [None]:
!pip install keras_nlp



####Imports

In [None]:
import os
import keras
import tensorflow as tf
import numpy as np
from keras import layers
import keras_nlp
import keras_hub

#### Setup

KERAS_BACKEND specifies which backend is used for computation. Can choose from tensorflow, pytorch and jax. Second line specifies precision policy.

In [None]:
os.environ["KERAS_BACKEND"] = "tensorflow"
keras.mixed_precision.set_global_policy("mixed_float16")

#### Loading data

First, unzip the data (or load them in a different way). File available at https://github.com/immm00/diplomka/blob/main/datasets/splits/extracted/cross_val_folds_extracted.zip.

In [None]:
!unzip cross_val_folds_extracted.zip

Data is loaded using text_dataset_from_directory, which expects a specific directory structure. Folders are separated into train and test. Furthermore, each folder contains subfolders for classes - in this case positive, negative and neutral. Inside then are individual text files. Each text file is one instance (line of text).

Validation data is not used since this notebook was used for cross-validation.

Data will be processed in batches. Batch size is set to 32 here.

In [None]:

batch_size = 32
raw_train_ds = keras.utils.text_dataset_from_directory(
    "cross_val_folds/fold_1/train",
    batch_size=batch_size
)

raw_test_ds = keras.utils.text_dataset_from_directory(
    "cross_val_folds/fold_1/test", batch_size=batch_size
)

print(f"Number of batches in raw_train_ds: {raw_train_ds.cardinality()}")
print(f"Number of batches in raw_test_ds: {raw_test_ds.cardinality()}")


Found 6750 files belonging to 3 classes.
Found 750 files belonging to 3 classes.
Number of batches in raw_train_ds: 211
Number of batches in raw_test_ds: 24


#### Initializing the model

Using keras_nlp, a specific pretrained model is loaded as a part of a classifier. Available pretrained models are listed here: https://keras.io/keras_hub/presets/.

Bert_base_multi, a multilingual model is used, as there are no pretrained models for Czech specifically available. It is pretrained on wikipedias of different languages.

The number of classes is set to 3 (positive, negative, neutral).

The summary shows the layers, parameters, etc. Part of the classifier is a preprocessor (BertTokenizer). Tokenization will happen automatically, there is no need to preprocess the data beforehand.

On the extracted economics dataset, the 3 epochs will take around 10 minutes with the T4 GPU.

In [None]:
classifier = keras_nlp.models.BertClassifier.from_preset(
    "bert_base_multi",
    num_classes=3,
)

#classifier.summary()


#### Fine-tuning

The pretrained model needs to be fine-tuned for the sentiment analysis task. The training and validation data is used for this. The number of epochs refers to how many times a machine learning model goes through the entire training dataset during training. It is set to 3 here.

The output will show step number, time, and evaluation metrics (loss function and sparse categorical accuracy). Since there is no validation data, it only shows metrics for training data.

In [None]:
classifier.fit(
    raw_train_ds,
    epochs=3
)

Epoch 1/3
[1m  3/211[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m3:05[0m 892ms/step - loss: 1.1235 - sparse_categorical_accuracy: 0.2899

#### Evaluation

Since the evaluation metrics provided by the evaluate function are limited, it is necessary to calculate them manually. Prediction is done for the training data and then metrics like recall, precision and f-score is calculated using the real and predicted labels.

In [None]:
import numpy as np
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

predictions = []
true_labels = []
texts_list = []

for texts, labels in raw_test_ds:
    preds = classifier.predict(texts)
    preds = np.argmax(preds, axis=-1)
    predictions.extend(preds)
    true_labels.extend(labels.numpy())

accuracy = accuracy_score(true_labels, predictions)
f1 = f1_score(true_labels, predictions, average='macro')

precision_total = precision_score(true_labels, predictions, average='weighted')
recall_total = recall_score(true_labels, predictions, average='weighted')

precision_per_class = precision_score(true_labels, predictions, average=None)
recall_per_class = recall_score(true_labels, predictions, average=None)
f1_per_class = f1_score(true_labels, predictions, average=None)

print(f"Accuracy: {accuracy:.4f}")
print(f"Macro F1 Score: {f1:.4f}")
print(f"Total Precision (Weighted): {precision_total:.4f}")
print(f"Total Recall (Weighted): {recall_total:.4f}")

for i, (prec, rec, f1_val) in enumerate(zip(precision_per_class, recall_per_class, f1_per_class)):
    print(f"Class {i}: Precision = {prec:.4f}, Recall = {rec:.4f}, F1 Score = {f1_val:.4f}")