## Analysis of attention

This notebooks dives into the attention layers of the model and looks of explanations of predictions. Furthermore, it seeks to understand if there is the expected connections between 

Uses the bertviz lib

Please note that the output os written to files because the size crashes most notebooks

# Model View
<b>The model view provides a birds-eye view of attention throughout the entire model</b>. Each cell shows the attention weights for a particular head, indexed by layer (row) and head (column).  The lines in each cell represent the attention from one token (left) to another (right), with line weight proportional to the attention value (ranges from 0 to 1).  For a more detailed explanation, please refer to the [blog](https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1).

## Usage
👉 **Click** on any **cell** for a detailed view of attention for the associated attention head (or to unselect that cell). <br/>
👉 Then **hover** over any **token** on the left side of detail view to filter the attention from that token.

________________________________________________________________________________________________

# Head View
<b>The head view visualizes attention in one or more heads from a single Transformer layer.</b> Each line shows the attention from one token (left) to another (right). Line weight reflects the attention value (ranges from 0 to 1), while line color identifies the attention head. When multiple heads are selected (indicated by the colored tiles at the top), the corresponding  visualizations are overlaid onto one another.  For a more detailed explanation of attention in Transformer models, please refer to the [blog](https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1).

## Usage
👉 **Hover** over any **token** on the left/right side of the visualization to filter attention from/to that token. <br/>
👉 **Double-click** on any of the **colored tiles** at the top to filter to the corresponding attention head.<br/>
👉 **Single-click** on any of the **colored tiles** to toggle selection of the corresponding attention head. <br/>
👉 **Click** on the **Layer** drop-down to change the model layer (zero-indexed).


### SQUAD

In [4]:
# Squad
from transformers import RobertaModel, RobertaTokenizer, AutoModelForQuestionAnswering, AutoTokenizer
from bertviz import head_view, model_view
import torch


model_version = 'deepset/roberta-base-squad2'
model = AutoModelForQuestionAnswering.from_pretrained(model_version, output_attentions=True)
tokenizer = RobertaTokenizer.from_pretrained(model_version)

sentence_b = "In 1872, the Central Pacific Railroad established a station near Easterby's—by now a hugely productive wheat farm—for its new Southern Pacific line. Soon there was a store around the station and the store grew the town of Fresno Station, later called Fresno. Many Millerton residents, drawn by the convenience of the railroad and worried about flooding, moved to the new community. Fresno became an incorporated city in 1885. By 1931 the Fresno Traction Company operated 47 streetcars over 49 miles of track."
sentence_a = 'How many streetcars did the Fresno Traction Company operate in 1931?'
inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
input_ids = inputs['input_ids']
outputs=model(input_ids)
attention = outputs[-1]
input_id_list = input_ids[0].tolist() # Batch index 0

tokens = tokenizer.convert_ids_to_tokens(input_id_list)
answer_start_scores, answer_end_scores = outputs.start_logits, outputs.end_logits
answer_start = torch.argmax(
    answer_start_scores
)  # Get the most likely beginning of answer with the argmax of the score
answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids.squeeze()[answer_start:answer_end]))

print(f"Answer: {answer}\n {answer_start_scores.max().tolist()+answer_end_scores.max().tolist()}")

Answer:  47
 12.300061702728271


In [2]:
html_data = head_view(attention, tokens, input_id_list.index(2), html_action='return')
with open("./Figures_For_report/head_view_squad.html", 'w') as file:
    file.write(html_data.data)

In [5]:
html_data = model_view(attention, tokens, input_id_list.index(2), html_action='return')
with open("./Figures_For_report/model_view_squad.html", 'w') as file:
    file.write(html_data.data)

### CUAD

For ease of use the CUAD checkpoint is used, but this should be checkpoint from the thesis

In [13]:
model_version = 'Rakib/roberta-base-on-cuad'
model = AutoModelForQuestionAnswering.from_pretrained(model_version, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(model_version)

sentence_b = "EXHIBIT 10.6 DISTRIBUTOR AGREEMENT THIS DISTRIBUTOR AGREEMENT is made by and between Electric City Corp., a Delaware corporation ('Company') and Electric City of Illinois LLC ('Distributor') this 7th day of September, 1999. RECITALS A."
sentence_a = 'Highlight the parts (if any) of this contract related to "Document Name" that should be reviewed by a lawyer. Details: The name of the contract'
inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
input_ids = inputs['input_ids']
outputs=model(input_ids)
attention = outputs[-1]
input_id_list = input_ids[0].tolist() # Batch index 0

tokens = tokenizer.convert_ids_to_tokens(input_id_list)
answer_start_scores, answer_end_scores = outputs.start_logits, outputs.end_logits
answer_start = torch.argmax(
    answer_start_scores
)  # Get the most likely beginning of answer with the argmax of the score
answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids.squeeze()[answer_start:answer_end]))

print(f"Answer: {answer}\n {answer_start_scores.max().tolist()+answer_end_scores.max().tolist()}")


Answer:  DISTRIBUTOR AGREEMENT
 11.535788536071777


In [14]:
html_data = head_view(attention, tokens, input_id_list.index(2), html_action='return')
with open("./Figures_For_report/head_view_cuad.html", 'w') as file:
    file.write(html_data.data)

In [15]:
html_data = model_view(attention, tokens, input_id_list.index(2), html_action='return')
with open("./Figures_For_report/model_view_cuad.html", 'w') as file:
    file.write(html_data.data)

### Legal bert

In [20]:
model_version = 'nlpaueb/legal-bert-base-uncased'
model = AutoModelForQuestionAnswering.from_pretrained(model_version, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(model_version)

sentence_b = "EXHIBIT 10.6 DISTRIBUTOR AGREEMENT THIS DISTRIBUTOR AGREEMENT is made by and between Electric City Corp., a Delaware corporation ('Company') and Electric City of Illinois LLC ('Distributor') this 7th day of September, 1999. RECITALS A."
sentence_a = 'Highlight the parts (if any) of this contract related to "Document Name" that should be reviewed by a lawyer. Details: The name of the contract'
inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
input_ids = inputs['input_ids']
outputs=model(input_ids)
attention = outputs[-1]
input_id_list = input_ids[0].tolist() # Batch index 0
tokens = tokenizer.convert_ids_to_tokens(input_id_list)

print("This model is not fine-tuned for QA")


Some weights of the model checkpoint at nlpaueb/legal-bert-base-uncased were not used when initializing BertForQuestionAnswering: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForQuestionAnswering were not initialized fr

This model is not fine-tuned for QA


In [22]:
html_data = head_view(attention, tokens, input_id_list.index(102), html_action='return')
with open("./Figures_For_report/head_view_legal.html", 'w') as file:
    file.write(html_data.data)