# Attention masks
_Source: [What are attention masks](https://lukesalamone.github.io/posts/what-are-attention-masks/)_

- Predictions in a GPT can be made in parallel.
- However, sending in parallel requests needs all the inputs to be of the same length.
    - The same length meaning, they need to contain the same number of tokens.
    - `It will rain in the` and `My dog is` have different token sizes
- The solution is to add padding.
- Attention mask tells a transformer which tokens are padding tokens.
- Padding is added to the right side by default but can be added to the left side as well.
- The `Tokenizer` needs a padding token. 
- Huggingface documentation defines `pad_token` as:
    > A special token used to make arrays of tokens the same size for batching purpose. Will then be ignored by attention mechanisms or loss computation. Will be associated to `self.pad_token` and `self.pad_token_id`.

In [8]:
import torch
from transformers import AutoTokenizer

tokeniser = AutoTokenizer.from_pretrained("gpt2")
tokeniser.padding_side = "left"
tokeniser.pad_token = tokeniser.eos_token


sentences = ["It will rain in the", "My dog is"]

tokeniser(sentences, return_tensors="pt", padding=True)


{'input_ids': tensor([[ 1026,   481,  6290,   287,   262],
        [50256, 50256,  3666,  3290,   318]]), 'attention_mask': tensor([[1, 1, 1, 1, 1],
        [0, 0, 1, 1, 1]])}