# Visualisation of the BERT Attention using `bertviz` 

- Please run the `bert_model.ipynb` before running this nodebook.

## Load the Tokenizer model

In [1]:
from transformers import BertTokenizer, TFBertForSequenceClassification, BertConfig
import os
import tensorflow as tf

tokenizer_model_path = './models/BERT/'
bert_model_path = './models/BERT/'

# Load the tokenizer
if os.path.exists(tokenizer_model_path):
    print("Loading the existing tokenizer...")
    tokenizer = BertTokenizer.from_pretrained(tokenizer_model_path)
else:
    print("No existing tokenizer found... Please run the `bert_model.ipynb` notebook first.")


2024-04-20 14:19:12.384847: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-20 14:19:12.409535: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Loading the existing tokenizer...


## Load the BERT model

In [2]:
config = BertConfig.from_pretrained('bert-base-uncased', output_attentions=True)

optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5, epsilon=1e-08)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')

# Load the BERT model
if os.path.exists(bert_model_path):
    print("Loading the existing model...")
    model_bert = TFBertForSequenceClassification.from_pretrained(bert_model_path, config=config)
    model_bert.compile(optimizer=optimizer, loss=loss, metrics=[metric])
else:
    print("No existing model found... Please run the `bert_model.ipynb` notebook first.")
    

2024-04-20 14:19:13.701007: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-20 14:19:13.701600: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-20 14:19:13.701620: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-20 14:19:13.704991: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-20 14:19:13.705020: I external/local_xla/xla/stream_executor

Loading the existing model...


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at ./models/BERT/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


## Visualise the attention score for the sentence `love it, a great upgrade from the original.  I've had mine for a couple of years`

In [3]:
from bertviz import head_view
import torch

sentence = 'love it, a great upgrade from the original.  I\'ve had mine for a couple of years'
encoded_dict = tokenizer.encode_plus(
    sentence,
    add_special_tokens = True,
    max_length = 50,
    pad_to_max_length = True,
    return_attention_mask = True,
    return_tensors = 'tf',
)

outputs = model_bert(input_ids=encoded_dict['input_ids'], attention_mask=encoded_dict['attention_mask'], return_dict=True)

attention = outputs[-1]
attention = [torch.from_numpy(layer_attn.numpy()) for layer_attn in attention]
tokens = tokenizer.convert_ids_to_tokens(encoded_dict['input_ids'][0])  

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [4]:
attention

[tensor([[[[6.1886e-02, 2.4570e-02, 4.2420e-02,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           [1.0106e-01, 2.7977e-02, 2.1189e-02,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           [6.5129e-02, 3.8453e-02, 2.5244e-02,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           ...,
           [9.9843e-03, 9.7306e-02, 2.5506e-02,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           [1.0021e-02, 9.3434e-02, 2.3530e-02,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           [1.0887e-02, 9.5544e-02, 2.5241e-02,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00]],
 
          [[1.1200e-01, 8.3530e-04, 2.9107e-03,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           [8.0928e-03, 2.8304e-02, 5.7806e-03,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           [6.2635e-03, 8.2858e-02, 1.1790e-02,  ..., 0.0000e+00,
            0.0000e+00, 0.0000e+00],
           ...,
           [7.5797e-02, 2.8596e-02, 7.

In [5]:
head_view(attention, tokens)

<IPython.core.display.Javascript object>