<a href="https://colab.research.google.com/github/ralph27/ZAKA-hands-on/blob/master/Visualizing_Multi_head_Attention_with_the_BertViz_Library.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualizing Multi-head Attention with the BertViz Library
----
© 2023, Zaka AI, Inc. All Rights Reserved.

In this notebook, we'll learn how to visually represent the attention mechanism in a NLP context. We will use the BertViz library which simplifies this process.

# Installing Packages and Data Loading
Let's start by installing BertViz library, and the Transformers library from HuggingFace. The latter will allow us to easily load a pre-trained model.

In [None]:
! pip install bertviz transformers

Collecting bertviz
  Downloading bertviz-1.4.0-py3-none-any.whl (157 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m157.6/157.6 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
Collecting boto3 (from bertviz)
  Downloading boto3-1.28.80-py3-none-any.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.8/135.8 kB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece (from bertviz)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m44.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[

Now we import the required modules from these libraries

In [None]:
from bertviz import model_view, head_view
from transformers import AutoTokenizer, AutoModel, utils

utils.logging.set_verbosity_error()  # Suppress standard warnings

## Setting up our Model
We now get our tokenizer and model using the transformers library from HuggingFace. We will use the "distilbert-base-uncased" for both, importing each with the AutoTokenizer and AutoModel modules.

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModel.from_pretrained("distilbert-base-uncased", output_attentions=True)

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Let's now setup our own input for testing, and pass it onto the model

In [None]:
inputs = tokenizer.encode("Alice is feeling cold and is wearing a jacket", return_tensors='pt')
outputs = model(inputs)

## Visualizing Attention Weights
Finally, we set our attention as our outputs, and our tokens as our inputs. We then run the head_view function on these variables

In [None]:
attention = outputs[-1]  # Output includes attention weights when output_attentions=True
tokens = tokenizer.convert_ids_to_tokens(inputs[0])
head_view(attention, tokens)

<IPython.core.display.Javascript object>

Alternatively you can visualize with the model_view function to visualize each head separately

In [None]:
model_view(attention, tokens)

<IPython.core.display.Javascript object>