Models expect a batch of inputs

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence) 
ids = tokenizer.convert_tokens_to_ids(tokens)

input_ids = torch.tensor([ids]) # add one more dimension
print("Input IDs:", input_ids) 

output = model(input_ids)
print("Logits:", output.logits)

  from .autonotebook import tqdm as notebook_tqdm


[2023-11-27 17:23:08,825] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Input IDs: tensor([[ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,
          2026,  2878,  2166,  1012]])
Logits: tensor([[-2.7276,  2.8789]], grad_fn=<AddmmBackward0>)


Padding
* Handling different lengths

In [2]:
batched_ids = [
    [200,200,200],
    [200,200]
]

padding_id = 100
batched_ids = [
    [200,200,200],
    [200,200,padding_id]
]

In [3]:
sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]
print(model(torch.tensor(sequence1_ids)).logits)
print(model(torch.tensor(sequence2_ids)).logits)
print(model(torch.tensor(batched_ids)).logits)

tensor([[ 1.5694, -1.3895]], grad_fn=<AddmmBackward0>)
tensor([[ 0.5803, -0.4125]], grad_fn=<AddmmBackward0>)
tensor([[ 1.5694, -1.3895],
        [ 1.3374, -1.2163]], grad_fn=<AddmmBackward0>)


Second row should be the same as the logits for second sentence. This is because of Transformer model's attention layers contextualize each token including the padding tokens. Need to tell attention layers to ignore the padding tokens by using attention mask.

In [4]:
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]

attention_mask = [
    [1, 1, 1],
    [1, 1, 0],
]

outputs = model(torch.tensor(batched_ids), attention_mask=torch.tensor(attention_mask))
print(outputs.logits)

tensor([[ 1.5694, -1.3895],
        [ 0.5803, -0.4125]], grad_fn=<AddmmBackward0>)


Longer sequences
* Some model handle 512 or 1024 tokens
* Would crash when process longer sequences
* Use a model that supports longer sequence length
* Truncate sequence