https://github.com/jessevig/bertviz/blob/master/notebooks/model_view_encoder_decoder.ipynb

In [5]:
!pip install bertviz



https://github.com/jessevig/bertviz/blob/master/notebooks/model_view_bart.ipynb

In [None]:
from transformers import AutoTokenizer, AutoModel, utils
from bertviz import model_view

utils.logging.set_verbosity_error()  # Remove line to see warnings

# Initialize tokenizer and model. Be sure to set output_attentions=True.
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model = AutoModel.from_pretrained("Helsinki-NLP/opus-mt-en-de", output_attentions=True)

# get encoded input vectors
encoder_input_ids = tokenizer("She sees the small elephant.", return_tensors="pt", add_special_tokens=True).input_ids

# create ids of encoded input vectors
with tokenizer.as_target_tokenizer():
    decoder_input_ids = tokenizer("Sie sieht den kleinen Elefanten.", return_tensors="pt", add_special_tokens=True).input_ids

outputs = model(input_ids=encoder_input_ids, decoder_input_ids=decoder_input_ids)

encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])

model_view(
    encoder_attention=outputs.encoder_attentions,
    decoder_attention=outputs.decoder_attentions,
    cross_attention=outputs.cross_attentions,
    encoder_tokens= encoder_text,
    decoder_tokens=decoder_text
)
 

In [None]:

from transformers import AutoTokenizer, AutoModel, utils
from bertviz import model_view

utils.logging.set_verbosity_error()  # Remove line to see warnings

# Initialize tokenizer and model. Be sure to set output_attentions=True.
# Load BART fine-tuned for summarization on CNN/Daily Mail dataset
model_name = "facebook/bart-large-cnn"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_attentions=True)

# get encoded input vectors
encoder_input_ids = tokenizer("The House Budget Committee voted Saturday to pass a $3.5 trillion spending bill", return_tensors="pt", add_special_tokens=True).input_ids

# create ids of encoded input vectors
decoder_input_ids = tokenizer("The House Budget Committee passed a spending bill.", return_tensors="pt", add_special_tokens=True).input_ids

outputs = model(input_ids=encoder_input_ids, decoder_input_ids=decoder_input_ids)

encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])

model_view(
    encoder_attention=outputs.encoder_attentions,
    decoder_attention=outputs.decoder_attentions,
    cross_attention=outputs.cross_attentions,
    encoder_tokens= encoder_text,
    decoder_tokens=decoder_text
)

https://github.com/jessevig/bertviz/blob/master/notebooks/neuron_view_bert.ipynb

But I cannot get this to show the neuron view that his examples show. Instead, it looks
pretty much like the head view.

In [None]:
from bertviz.transformers_neuron_view import BertModel, BertTokenizer
from bertviz.neuron_view import show

In [None]:
model_type = 'bert'
model_version = 'bert-base-uncased'
do_lower_case = True
model = BertModel.from_pretrained(model_version)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show(model, model_type, tokenizer, sentence_a, sentence_b, display_mode='dark', layer=2, head=0)

https://github.com/jessevig/bertviz/blob/master/notebooks/neuron_view_gpt2.ipynb

In [6]:
from bertviz.transformers_neuron_view import GPT2Model, GPT2Tokenizer
from bertviz.neuron_view import show

In [7]:
model_type = 'gpt2'
model_version = 'gpt2'
model = GPT2Model.from_pretrained(model_version)
tokenizer = GPT2Tokenizer.from_pretrained(model_version)
text = "At the store, she bought apples, oranges, bananas,"
show(model, model_type, tokenizer, text, display_mode='dark')

100%|██████████| 665/665 [00:00<00:00, 970498.32B/s]
100%|██████████| 548118077/548118077 [00:49<00:00, 10994162.72B/s]
100%|██████████| 1042301/1042301 [00:00<00:00, 2528338.45B/s]
100%|██████████| 456318/456318 [00:00<00:00, 1123529.82B/s]


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

My impression: Obviously this works as results with transformers have been game-changing. But it seems like a case of throwing spaghetti against a wall. The vast majority of the attention heads don't seem to have anything reasonable. But there are enough of them that eventually one of them might indicate useful relationships. I

It would be interesting to (1) test this without weights, to see how the base calculations for Q,K,V perform, and to perform the same test (2) without base calculations--just start with random weights. It almost seems like the latter case is what is really happening. Perhaps the base calculations provide a useful nudge.

The entire value of Q,K,V is based upon their embeddings usefully containing relationships between words (tokens). For instance, random embeddings would essentally be case (2) above.

I've seen explanations as to why separator symbols develop strong relationship connections in this system, but that still seems like it is usually a waste to me, as there is very little value in a relationship with an artificial symbol which is not part of the language. Knowing where the sentences break is useful, for instance, but I'm not convinced that is what these relationships represent.