# Handling Multiple Sequences

* Outlines handling multiple sequence using the `AutoTokenizer` and `AutoModel` classes from the Hugging Face `Transformers` library
* All classes and functions are imported from the `Transformers` library

## Setup

In [1]:
model_provider = "distilbert"
model_name = "distilbert-base-uncased"
model = f"{model_provider}/{model_name}"

---

## Models Expect a Batch of Inputs

* One saw how sequences get translated into lists of numbers
* Let one convert this list of numbers to a tensor and send it to the model

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = torch.tensor(ids)

# This line will fail
model(input_ids)

IndexError: too many indices for tensor of dimension 1

* The problem is that one sent a single sequence to the model, whereas Transformers models expect multiple sentences by default
* Here one tried to do everything the tokenizer did behind the scenes when one applied it to a sequence
* But if one looks close, one will see that the tokenizer did not just convert the list of input IDs into a tensor. It added a dimension on top of it:

In [3]:
tokenized_inputs = tokenizer(sequence, return_tensors="pt")

In [4]:
print(tokenized_inputs["input_ids"])

tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102]])


Add a new dimension to fix this error:

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)

input_ids = torch.tensor([ids])
output = model(input_ids)

In [None]:
print("Input IDs:", input_ids)
print("Logits:", output.logits)

* Batching is the act of sending multiple sentences through the model, all at once
* If one only has one sentence, one can just build a batch with a single sequence:

In [None]:
batched_ids = [ids, ids]

* Batching allows the model to work when one feeds it multiple sentences
* Using multiple sequences is just as simple as building a batch with a single sequence
* There is a second issue, though:
	* When one is trying to batch together two (or more) sentences, they might be of different lengths
	* If one has ever worked with tensors before, one knows that they need to be of rectangular shape, so one will not be able to convert the list of input IDs into a tensor directly
	* To work around this problem, we usually pad the inputs

---

## Padding the Inputs

The following list of lists cannot be converted to a tensor:

In [None]:
batched_ids = [
    [200, 200, 200],
    [200, 200]
]

* To work around this, one will use padding to make the tensors have a rectangular shape
* Padding makes sure all sentences have the same length by adding a special word called the padding token to the sentences with fewer values
* For example, if one has 10 sentences with 10 words and 1 sentence with 20 words, padding will ensure all the sentences have 20 words
* In our example, the resulting tensor looks like this:

In [None]:
padding_id = 100

batched_ids = [
    [200, 200, 200],
    [200, 200, padding_id],
]

* The padding token ID can be found in `tokenizer.pad_token_id`
* Use it and send two sentences through the model individually and batched together:

In [3]:
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]

In [None]:
print(model(torch.tensor(sequence1_ids)).logits)
print(model(torch.tensor(sequence2_ids)).logits)
print(model(torch.tensor(batched_ids)).logits)

* There is something wrong with the logits in the batched predictions:
	* The second row should be the same as the logits for the second sentence,
	* But one has received completely different values
* This is because the key feature of Transformer models is attention layers that contextualize each token
* These will take into account the padding tokens since it attends to all the tokens of a sequence
* To get the same result when passing individual sentences of different lengths through the model or when passing a batch with the same sentences and padding applied, one needs to tell those attention layers to ignore the padding tokens
* This is done by using an attention mask

---

## Attention Masks

**Attention masks are tensors with the exact same shape as the input IDs tensor:**

* Filled with $0$s and $1$s
	* $1$ indicates the corresponding tokens should be attended to
	* $0$ indicates the corresponding tokens should not be attended to
		* e.g., they should be ignored by the attention layers of the model

Use the previous example with an attention mask:

In [None]:
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]

attention_mask = [
    [1, 1, 1],
    [1, 1, 0],
]

outputs = model(
	torch.tensor(batched_ids),
	attention_mask=torch.tensor(attention_mask)
)

In [None]:
print(outputs.logits)

Now one receives the same logits for the second sentence in the batch.

Notice how the last value of the second sequence is a padding ID, which is a $0$ value in the attention mask.

---

## Longer Sequences

* With Transformer models, there is a limit to the lengths of the sequences one can pass to the models
* Most models handle sequences of up to 512 or 1024 tokens, and will crash when asked to process longer sequences
* There are two solutions to this problem:
	* Use a model with a longer supported sequence length
	* Truncate one's sequences
* Models have different supported sequence lengths, and some specialize in handling very long sequences
	* Longformer
	* LED
* If one is working on a task that requires very long sequences, take a look at those models
* Otherwise, truncate sequences by specifying the `max_sequence_length` parameter:

In [None]:
sequence = sequence[:max_sequence_length]