## CS310 Natural Language Processing
## Lab 10: Explore BERT

In this lab, we will practice using pre-trained BERT models provided by the HuggingFace `transformers` library. 

In [2]:
from pprint import pprint
from typing import List
import torch
import torch.nn.functional as F

## T1. Explore Pretrained BERT Model

In this task, you will explore the pretrained BERT model using the Hugging Face Transformers library. 

First, you will load a pretrained BERT model and the correponding tokenizer. If you use the default model string `'bert-base-uncased'`, it will automatically download the model.

In our case, to avoid any network issue, you can follow these steps to load the model locally:
- Download the `bert-base-uncased.zip` file from the course website and unzip it to the folder `bert-base-uncased` in the same directory as this notebook. 
- When you load the model, you simply specify the folder path `bert-base-uncased/` (which contains all model files) to the `from_pretrained()` function. 
- *Note* that don't exclude the last `/` in the path.

In [3]:
from transformers import BertTokenizer, BertModel

bert_model = BertModel.from_pretrained('bert-base-uncased/') # Make sure you download the model files first
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased/')

Let's start by counting the number of parameters in the model.

In [4]:
n_tensors = 0
for param in bert_model.parameters():
    n_tensors += 1

print("Number of tensors: ", n_tensors)

Number of tensors:  199


In [5]:
n_params = 0
for param in bert_model.parameters():
    n_params += param.numel()

print("Number of parameters: ", n_params)

Number of parameters:  109482240


Next, if you are interested in how the parameters are organized, you can print the model's `_modules` attribute.

In [6]:
print(bert_model._modules)

OrderedDict([('embeddings', BertEmbeddings(
  (word_embeddings): Embedding(30522, 768, padding_idx=0)
  (position_embeddings): Embedding(512, 768)
  (token_type_embeddings): Embedding(2, 768)
  (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
)), ('encoder', BertEncoder(
  (layer): ModuleList(
    (0-11): 12 x BertLayer(
      (attention): BertAttention(
        (self): BertSdpaSelfAttention(
          (query): Linear(in_features=768, out_features=768, bias=True)
          (key): Linear(in_features=768, out_features=768, bias=True)
          (value): Linear(in_features=768, out_features=768, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (output): BertSelfOutput(
          (dense): Linear(in_features=768, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (intermedi

In theory, you can access the parameters at any layer of the model, by specifying the layer name and index. 

For example, if you want to check the the query matrix $W^Q$ in the self-attention layer of the first transformer block, you can do the following:

In [7]:
pprint(bert_model._modules['encoder']._modules['layer'][0]._modules['attention']._modules['self']._modules['query'])

Linear(in_features=768, out_features=768, bias=True)


As you can see, the $W^Q$ matrix is implemented as a `nn.Linear` module.

Also, the same inquiry can be simplified by using the `get_submodule()` function.

In [8]:
W_q = bert_model.get_submodule('encoder.layer.0.attention.self.query')
print(W_q)
print(W_q.weight.shape)

Linear(in_features=768, out_features=768, bias=True)
torch.Size([768, 768])


## T2. Get Contextual Embeddings from BERT

Let's move on and use the BERT model to get contextual embeddings for given texts.

First, we prepare some sentences:

In [9]:
text = (
        'I have a new CPU!\n'
        'I have a new Intel CPU!\n'
        'I have a new GPU!\n'
        'I have a new NVIDIA GPU!'
    )

sentences = text.split('\n')
pprint(sentences)

['I have a new CPU!',
 'I have a new Intel CPU!',
 'I have a new GPU!',
 'I have a new NVIDIA GPU!']


Try the `tokenize()` function of the previously initialized BERT tokenizer on each sentence:

In [10]:
### START YOUR CODE ###
tokens_in_string = [bert_tokenizer.tokenize(s) for s in sentences]
### END YOUR CODE ###

# Test
pprint(tokens_in_string)
# You should expect to see the following output:
# [['i', 'have', 'a', 'new', 'cpu', '!'],
#  ['i', 'have', 'a', 'new', 'intel', 'cpu', '!'],
#  ['i', 'have', 'a', 'new', 'gp', '##u', '!'],
#  ['i', 'have', 'a', 'new', 'n', '##vid', '##ia', 'gp', '##u', '!']]

[['i', 'have', 'a', 'new', 'cpu', '!'],
 ['i', 'have', 'a', 'new', 'intel', 'cpu', '!'],
 ['i', 'have', 'a', 'new', 'gp', '##u', '!'],
 ['i', 'have', 'a', 'new', 'n', '##vid', '##ia', 'gp', '##u', '!']]


**Note** that "CPU" and "Intel" are recognized as whole words, but "NVIDIA" and "GPU" are not. Thus, they appear as subwords such as "##u" "##vid" in the results

The above results are not integer token IDs yet, so now use the `batch_encode()` function, with argument `return_tensors='pt'`, to convert each sentence to integer token IDs. Here `'pt'` is for PyTorch tensors.

**Note**:
- Each token is represented as an integer in `torch.int64` data type.
- By default, the tokenizer adds special tokens `[CLS]` and `[SEP]` to the beginning and end of each sentence, which correpond to the token ID `101` and `102`, respectively.

In [11]:
### START YOUR CODE ###
token_ids_list = [bert_tokenizer.encode(tokens, return_tensors='pt') for tokens in tokens_in_string]
### END YOUR CODE ###


# Test
print(token_ids_list[0].dtype)
pprint(token_ids_list)

# You should expect to see the following output:
# torch.int64
# [tensor([[  101,  1045,  2031,  1037,  2047, 17368,   999,   102]]),
#  tensor([[  101,  1045,  2031,  1037,  2047, 13420, 17368,   999,   102]]),
#  tensor([[  101,  1045,  2031,  1037,  2047, 14246,  2226,   999,   102]]),
#  tensor([[  101,  1045,  2031,  1037,  2047,  1050, 17258,  2401, 14246,  2226,
#            999,   102]])]

torch.int64
[tensor([[  101,  1045,  2031,  1037,  2047, 17368,   999,   102]]),
 tensor([[  101,  1045,  2031,  1037,  2047, 13420, 17368,   999,   102]]),
 tensor([[  101,  1045,  2031,  1037,  2047, 14246,  2226,   999,   102]]),
 tensor([[  101,  1045,  2031,  1037,  2047,  1050, 17258,  2401, 14246,  2226,
           999,   102]])]


So now `"CPU"` is tokenized to `17368`, `"Intel"` to `13420`, while `"GPU"` to `[14246, 2226]`, and `"NVIDIA"` to `[1050, 17258,  2401]`.

You can use the `ids_to_tokens` dictionary to map integer token IDs back to token strings, and use `decode()` function to convert a list of token IDs back to a sentence.

In [12]:
print(bert_tokenizer.ids_to_tokens[101])
print(bert_tokenizer.ids_to_tokens[102])
print(bert_tokenizer.ids_to_tokens[17368])
print(bert_tokenizer.decode(token_ids_list[0].squeeze().tolist()))

[CLS]
[SEP]
cpu
[CLS] i have a new cpu! [SEP]


Note that in last example above, we `squeeze` the token IDs first, becaseu the encoded IDs are of dimension $1\times N$, where $N$ is sentence length, because PyTorch uses first dimension as batch size.

It indicates that we can tokenize multiple sentences in one batch by using the `batch_encode_plus()` function, and specify the argument `padding=True` to pad all sentences to the same length.

In [13]:
encoded_sentences = bert_tokenizer.batch_encode_plus(sentences, return_tensors='pt', padding=True, return_attention_mask=False, return_token_type_ids=False)
print(encoded_sentences)

{'input_ids': tensor([[  101,  1045,  2031,  1037,  2047, 17368,   999,   102,     0,     0,
             0,     0],
        [  101,  1045,  2031,  1037,  2047, 13420, 17368,   999,   102,     0,
             0,     0],
        [  101,  1045,  2031,  1037,  2047, 14246,  2226,   999,   102,     0,
             0,     0],
        [  101,  1045,  2031,  1037,  2047,  1050, 17258,  2401, 14246,  2226,
           999,   102]])}


As you can see, the returned dictionary contains an item keyed by `'input_ids'`, which is exactly the token IDs we need. 

**Note**:
- It is a tensor of shape $B\times N$, where $B$ is the batch size (here, $B=4$) and $N$ is the maximum sentence length in the batch.
- The default padding token is `0`.


In the above example, we deliberately set `return_attention_mask=False` to show simpler results. 

If you set it to `True`, then then returned dictionary will also contain an item keyed by `'attention_mask'`, which is a tensor of shape $B\times N$ with `1` for real tokens and `0` for padding tokens. This information is useful for follow-up computations.

Try if you can get the attention mask tensor:

In [14]:
### START YOUR CODE ###
encoded_sentences = bert_tokenizer.batch_encode_plus(sentences, return_tensors='pt', padding=True, return_attention_mask=True, return_token_type_ids=False)
attn_mask = encoded_sentences['attention_mask']
### END YOUR CODE ###

# Test
print(attn_mask)
# You should expect to see the following output:
# tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
#         [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
#         [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
#         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])


Now, let's obtain the contextual embeddings for the four target words `"CPU"`, `"Intel"`, `"NVIDIA"`, and `"GPU"` in our sentences.

First, pass the token IDs in one batch to the BERT model to get the output object, which has a `last_hidden_state` attribute that contains the contextual embeddings.

**Note**:
- You can manually specify `input_ids` and `attention_mask` as the input arguments to the model.
- Or you can directly pass the dictionary returned by `batch_encode_plus()` to the model, and use the `**` operator as most tutorials did:
  - `outputs = model(**encoded_sentences)`

In [15]:
bert_model.eval()

with torch.no_grad():
    ### START YOUR CODE ###
    outputs = bert_model(**encoded_sentences)
    ### END YOUR CODE ###


# Test
print(outputs.last_hidden_state.shape)
# You should expect to see the following output:
# torch.Size([4, 12, 768])

torch.Size([4, 12, 768])


Next, for ``"CPU"`` and ``"Intel"``, you can directly use the output vectors at the corresponding positions, because they are recognized as whole words.

Compute the average vector of `"CPU"`s in the first two sentences, and compute its cosine similarity with the vector of `"Intel"`.

*Hint*:
- Use `F.cosine_similarity()` function

In [16]:
### START YOUR CODE ###
vec_cpu1 = outputs.last_hidden_state[0, 5, :] # bert tokenizer adds [CLS] token at the beginning
vec_cpu2 = outputs.last_hidden_state[1, 6, :]
vec_cpu_avg = (vec_cpu1 + vec_cpu2) / 2
vec_intel = outputs.last_hidden_state[1, 5, :]
cos_cpu_intel = F.cosine_similarity(vec_cpu_avg, vec_intel, dim=-1)
### END YOUR CODE ###

# Test
print('cos_cpu_intel:', cos_cpu_intel.item())
# You should expect to see the following output:
# cos_cpu_intel: 0.7551645636558533

cos_cpu_intel: 0.7551644444465637


For `"NVIDIA"` and `"GPU"`, it's a bit trickier, as you need to use the sum of subword vectors to get the vector of the whole word.

In sentence 3, `"GPU"` is tokenized to `[14246, 2226]`, so you need to sum the vectors at these two positions.

In sentence 4, `"NVIDIA"` is tokenized to `[1050, 17258,  2401]`, so you need to sum the vectors at these three positions.

In [17]:
### START YOUR CODE ###
vec_gpu1 = outputs.last_hidden_state[2, 5:7, :].sum(dim=0)
vec_gpu2 = outputs.last_hidden_state[3, 8:10, :].sum(dim=0)
vec_gpu = vec_gpu1 + vec_gpu2
vec_nvidia = outputs.last_hidden_state[3, 5:8, :].sum(dim=0)
cos_gpu_nv = F.cosine_similarity(vec_gpu, vec_nvidia, dim=0)
### END YOUR CODE ###

# Test
print('cos_gpu_nv:', cos_gpu_nv.item())
# You should expect to see the following output:
# cos_gpu_nv: 0.7273837327957153

cos_gpu_nv: 0.7273839116096497


Now let's see if `"NVIDIA"` is closer to `"GPU"` than `"CPU"`, and vice versa for `"Intel"`.

In [18]:
### START YOUR CODE ###
cos_cpu_nv = F.cosine_similarity(vec_cpu_avg, vec_nvidia, dim=-1)
cos_gpu_intel = F.cosine_similarity(vec_gpu, vec_intel, dim=-1)
### END YOUR CODE ###

# Test
print('cos_cpu_nv:', cos_cpu_nv.item())
print('cos_gpu_intel:', cos_gpu_intel.item())
# You should expect to see the following output:
# cos_cpu_nv: 0.5931224226951599
# cos_gpu_intel: 0.5778647661209106

cos_cpu_nv: 0.5931226015090942
cos_gpu_intel: 0.5778647661209106


That's interesting, right?

How about the distance between the two products `"CPU"` and `"GPU"`? or between the two companies `"Intel"` and `"NVIDIA"`? Check it out yourself.

In [19]:
### START YOUR CODE ###
cos_cpu_gpu = F.cosine_similarity(vec_cpu_avg, vec_gpu, dim=-1)
cos_intel_nv = F.cosine_similarity(vec_intel, vec_nvidia, dim=-1)
### END YOUR CODE ###

# Test
print('cos_cpu_gpu:', cos_cpu_gpu.item())
print('cos_intel_nv:', cos_intel_nv.item())
# You should expect to see the following output:
# cos_cpu_gpu: 0.6914964914321899
# cos_intel_nv: 0.6179742813110352

cos_cpu_gpu: 0.6914966106414795
cos_intel_nv: 0.6179742813110352


## T3. Access all hidden states

Let's be more adventurous and access all hidden states returned by the BERT model.

*Hint*: Simply set the argument `output_hidden_states=True` when calling the model.

In [20]:
bert_model.eval()

with torch.no_grad():
    ### START YOUR CODE ###
    outputs = bert_model(**encoded_sentences, output_hidden_states=True)
    ### END YOUR CODE ###

# Test
print(type(outputs.hidden_states))
print(len(outputs.hidden_states))
print(outputs.hidden_states[-1].shape)
print(outputs.hidden_states[-2].shape)
print(outputs.hidden_states[-3].shape)
print(outputs.hidden_states[-4].shape)

# You should expect to see the following output:
# <class 'tuple'>
# 13
# torch.Size([4, 12, 768])
# torch.Size([4, 12, 768])
# torch.Size([4, 12, 768])
# torch.Size([4, 12, 768])

<class 'tuple'>
13
torch.Size([4, 12, 768])
torch.Size([4, 12, 768])
torch.Size([4, 12, 768])
torch.Size([4, 12, 768])


Compute the average vector of the word `"CPU"` in the first sentence, using the hidden states of the last **four** layer.

In [21]:
### START YOUR CODE ###
vec_last4 = outputs.hidden_states[-4:]
cpu_last4 = [tensor[0,5,:] for tensor in vec_last4]
vec_cpu_avg_last4 = torch.stack(cpu_last4).mean(dim=0)
### END YOUR CODE ###

# Test
cos = F.cosine_similarity(vec_cpu_avg_last4, vec_cpu_avg, dim=0)
print('cos:', cos.item())

# You should expect to see the following output:
# cos: 0.9002149701118469

cos: 0.900215208530426


## T4. Fine tune a BERT model for text classification task

For the last task, we will practice fine-tuning a BERT-based model for a text classification task -- sentiment analysis on IMDB movie reviews.

First, load the IMDB dataset

In [22]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

from datasets import load_dataset
imdb = load_dataset('imdb')


Using device: cuda


There are two fields in this dataset:
- `text`: a string, the review text
- `label`: an integer, 0 for negative, 1 for positive

In [23]:
pprint(imdb['test'][0])

{'label': 0,
 'text': 'I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV '
         'are usually underfunded, under-appreciated and misunderstood. I '
         'tried to like this, I really did, but it is to good TV sci-fi as '
         'Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap '
         "cardboard sets, stilted dialogues, CG that doesn't match the "
         'background, and painfully one-dimensional characters cannot be '
         "overcome with a 'sci-fi' setting. (I'm sure there are those of you "
         "out there who think Babylon 5 is good sci-fi TV. It's not. It's "
         'clichéd and uninspiring.) While US viewers might like emotion and '
         'character development, sci-fi is a genre that does not take itself '
         'seriously (cf. Star Trek). It may treat important issues, yet not as '
         "a serious philosophy. It's really difficult to care about the "
         'characters here as they are not simply foolish, ju

Load a DistilBERT model tokenizer to process the text

In [24]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased/')

Create a preprocessing function to tokenize the `text` field of an example with truncation, so that it does not exceed the maximum length of the model (512)

In [25]:
def preprocess_imdb(examples):
    return tokenizer(examples['text'], truncation=True)

Use the `.map()` function to apply the preocessing function to the entire dataset, and speed it up using `batched=True`

(takes a few seconds to run)

In [26]:
tokenized_imdb = imdb.map(preprocess_imdb, batched=True)

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

Now all `text` field are tokenized to `input_ids`:

In [27]:
pprint(tokenized_imdb['test'][0]['text'])
pprint(tokenized_imdb['test'][0]['input_ids'])

('I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are '
 'usually underfunded, under-appreciated and misunderstood. I tried to like '
 'this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek '
 '(the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, '
 "CG that doesn't match the background, and painfully one-dimensional "
 "characters cannot be overcome with a 'sci-fi' setting. (I'm sure there are "
 "those of you out there who think Babylon 5 is good sci-fi TV. It's not. It's "
 'clichéd and uninspiring.) While US viewers might like emotion and character '
 'development, sci-fi is a genre that does not take itself seriously (cf. Star '
 "Trek). It may treat important issues, yet not as a serious philosophy. It's "
 'really difficult to care about the characters here as they are not simply '
 'foolish, just missing a spark of life. Their actions and reactions are '
 'wooden and predictable, often painful to watch. The maker

Next, use `DataCollatorWithPadding` to pad the sequences in one batch to the longest sequence in the batch *dynamically*. 

This is a more efficient way than padding in the tokenizer.

In [28]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Next, we can define a DistilBERT model as an instance of `AutoModelForSequenceClassification` with 2 output classes (positive and negative)

In [29]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased/", num_labels=2)
model.gradient_checkpointing_enable()  # Enable gradient checkpointing
model.to(device)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased/ and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

Let's see how it works on one example

In [30]:
with torch.no_grad():
    input_tensor = torch.tensor(tokenized_imdb['test'][0]['input_ids']).unsqueeze(0).to(device)
    attention_mask = torch.tensor(tokenized_imdb['test'][0]['attention_mask']).unsqueeze(0).to(device)
    outputs = model(input_ids=input_tensor, attention_mask=attention_mask)
    print(outputs.logits)

tensor([[0.1252, 0.0530]], device='cuda:0')


Now, let's load the `TrainingArguments` and `Trainer` from the `transformers` library to fine tune the model.

- Training hyperparameters are set in `TrainingArguments`
- `Trainer` takes model, tokenizer, dataset, data_collator, and training arguments as input
- Call `trainer.train()` to start finetuning

In [31]:
from transformers import TrainingArguments, Trainer

In [32]:
training_args = TrainingArguments(
    output_dir='output',
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
    fp16=True if torch.cuda.is_available() else False,  # Mixed precision
    gradient_accumulation_steps=8, 
    report_to="none",  # Disable logging to external services
)



Before launching the trainer, we will need an evaluation metric. 

In [33]:
import evaluate
import numpy as np

accuracy = evaluate.load('accuracy')
# If you have problem connecting to huggingface, you can git clone the evaluate repo https://github.com/huggingface/evaluate.git
# and copy the `metrics/accuracy` folder to your current directory

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

In [34]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_imdb["train"].select(range(5000)),  # Use a smaller subset for faster training
    eval_dataset=tokenized_imdb["test"].select(range(5000)),  # Use a smaller subset for faster evaluation
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

Now we are good to go!

Note that it runs very slowly on CPU, and you better wrap up all the code to one Python script and run it on a GPU server.

In [35]:
trainer.train()
model.save_pretrained("output/final_model")
tokenizer.save_pretrained("output/final_model")

  0%|          | 0/156 [00:00<?, ?it/s]

  0%|          | 0/1250 [00:00<?, ?it/s]

{'eval_loss': 0.0011771568097174168, 'eval_accuracy': 1.0, 'eval_runtime': 646.4633, 'eval_samples_per_second': 7.734, 'eval_steps_per_second': 1.934, 'epoch': 1.0}
{'train_runtime': 3380.5811, 'train_samples_per_second': 1.479, 'train_steps_per_second': 0.046, 'train_loss': 0.02827310256468944, 'epoch': 1.0}


('output/final_model\\tokenizer_config.json',
 'output/final_model\\special_tokens_map.json',
 'output/final_model\\vocab.txt',
 'output/final_model\\added_tokens.json',
 'output/final_model\\tokenizer.json')