# Demo: Using BertViz To Detect Bias

Auto-regressive models trained on data created by human beings (e.g., data from the internet) will exhibit many of the same biases of those human beings. Thus, being able to visualize what's happening inside transformers is important to understanding this bias. One tool for visualizing attention is [BertViz](https://github.com/jessevig/bertviz?tab=readme-ov-file#self-attention-models-bert-gpt-2-etc).

## The doctor asked the nurse...

Two sentences are run through the auto-regressive GPT2 model:

* "The doctor asked the nurse a question. She"
* "The doctor asked the nurse a question. He"

We notice that the attention head #10 in the 5th layer shows the pronouns "She" and "He" attending to the words "nurse" and "doctor", respectively.

In [1]:
from transformers import AutoModel, AutoTokenizer, utils

utils.logging.set_verbosity_error()  # Suppress standard warnings

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModel.from_pretrained("gpt2", output_attentions=True)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

In [2]:
from bertviz import head_view

inputs = tokenizer.encode(
    "The doctor asked the nurse a question. She", return_tensors="pt"
)
outputs = model(inputs)
attention = outputs[-1]
tokens = tokenizer.convert_ids_to_tokens(inputs[0])

head_view(attention, tokens, layer=5, heads=[10])

<IPython.core.display.Javascript object>

In [3]:
inputs = tokenizer.encode(
    "The doctor asked the nurse a question. He", return_tensors="pt"
)
outputs = model(inputs)
attention = outputs[-1]
tokens = tokenizer.convert_ids_to_tokens(inputs[0])

head_view(attention, tokens, layer=5, heads=[10])

<IPython.core.display.Javascript object>

## The teacher asked the inspector...

A similar analysis on the following two sentences can be made:

* "The teacher asked the inspector if the school was structurally sound. He"
* "The teacher asked the inspector if the school was structurally sound. She"

In this case, the attention mechanism shows "He" attending to both teacher and inspector more or less equally but "She" attends to "teacher" disproportionately.

In [4]:
inputs = tokenizer.encode(
    "The teacher asked the inspector if the school was structurally sound. He",
    return_tensors="pt",
)
outputs = model(inputs)
attention = outputs[-1]
tokens = tokenizer.convert_ids_to_tokens(inputs[0])

head_view(attention, tokens, layer=5, heads=[10])

<IPython.core.display.Javascript object>

In [5]:
inputs = tokenizer.encode(
    "The teacher asked the inspector if the school was structurally sound. She",
    return_tensors="pt",
)
outputs = model(inputs)
attention = outputs[-1]
tokens = tokenizer.convert_ids_to_tokens(inputs[0])

head_view(attention, tokens, layer=5, heads=[10])

<IPython.core.display.Javascript object>

Useful Links:
* [BertViz Website](https://github.com/jessevig/bertviz)
* [BertViz Paper](https://aclanthology.org/P19-3007.pdf)
* [BertViz Colab Tutorial](https://colab.research.google.com/drive/1hXIQ77A4TYS4y3UthWF-Ci7V7vVUoxmQ?usp=sharing)