# USING TRANSFORMERS

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

# Behind the pipeline

In [None]:
from transformers import pipeline

classifier = pipeline('sentiment-analysis')

In [52]:
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)

[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

## Preprocessing with a tokenizer

The *tokenizer* is responsible for
* splitting the input into words, subwords, or symbols that are called *tokens*
* mapping each token to an integer
* adding additional inputs that may be useful to the model

The default checkpoint of the `sentiment-analysis` pipeline is `distilbert-base-uncased-finetuned-sst-2-english`.

In [53]:
from transformers import AutoTokenizer

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Once we have the tokenizer, we can pass the sentences to it and get back a dictionary that is ready to feed to our model. The only thing left to do is to convert the list of input IDs to tensors.

To specify the type of tensors we want to get back, we use the `return_tensors` argument:

In [54]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

inputs = tokenizer(
    raw_inputs,
    padding=True,
    truncation=True,
    return_tensors='pt',
)

print(inputs)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


## Going through the model

In [55]:
from transformers import AutoModel

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'

model = AutoModel.from_pretrained(checkpoint)

### A high-dimensional vector?

The vector output by the Transformer module generally has three dimensions:
* **Batch size**: the number of sequences processed at a time
* **Sequence length**: the length of the numerical representation of the sequence
* **Hidden size**: the vector dimension of each model input

The hidden size can be high dimensional.

In [56]:
outputs = model(**inputs)

print(outputs.last_hidden_state.shape) # shape: (batch, seq_length, hidden_size)

torch.Size([2, 16, 768])


In [57]:
# another way to access
outputs['last_hidden_state'].shape

torch.Size([2, 16, 768])

In [58]:
# Or we know exactly what layer we look for
outputs[0].shape

torch.Size([2, 16, 768])

### Model heads: Making sense out of numbers

The model heads take the high-dimensional vector of hidden states as input and project them onto a different dimension.

In our example, we need a model with a sequence classification head (to be able to classify the sentences as positive or negative. Hence, we will not actually use the `AutoModel` class, but the `AutoModelForSequenceClassification`:

In [59]:
from transformers import AutoModelForSequenceClassification

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

The model head takes an input the high-dimensional vectors we saw before, and outputs vectors containing two values (one per label):

In [60]:
outputs.logits.shape

torch.Size([2, 2])

Since we have just two sentences nad two labels, the results we get is of shape 2x2.

## Postprocessing the output

In [61]:
outputs.logits

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)

All Transformers models output the logits, as the loss function for training will generally fuse that last activation function.

In [62]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)


These are recognized as the probability scores.

To get the labels corresponding to each position, we can inspect the `id2label` attribute of the model config:

In [63]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

# Models

## Creating a Transformer

To initialize a BERT model, we need to load a configuration object

In [64]:
from transformers import BertConfig, BertModel

# build the config
config = BertConfig()

# build the model from the config
model = BertModel(config)

The configuration contains many attributes that are used to build the model:

In [65]:
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.44.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



For example, the `hidden_size` defines the size of the `hidden_states` vector, and the `num_hidden_layers` defines the number of layers the Transformer model has.

### Different loading methods

Create a model from the default configuration initializes it with random values:

In [66]:
config = BertConfig()
model = BertModel(config)
# model is randomly initialized

Load a Transformer model that is already trained:

In [67]:
model = BertModel.from_pretrained('bert-base-cased')

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

The weights have been downloaded and cached (so future calls to the `from_pretrained()` method will NOT re-download them) in the cache folder, which defaults to *~/.cache/huggingface/transformers*.

To customize our cache folder, we need to set the `HF_HOME` environment variable.

### Saving methods

In [68]:
saving_directory = './my_bert_model'
model.save_pretrained(saving_directory)

This saves two files:
* *config.json* - necessary attributes to build the model architecture. This file also contains some metadata, such as where the checkpoint originated and when HuggingFace Transformers version we were using when we last saved the checkpoint.
* *pytorch_model.bin* - *state dictionary* containing all model weights.

## Using a Transformer model for inference

Transformer models can only process numbers - numbers that the tokenizer generates.

Tokenizers can take care of casting the inputs to the appropriate framework's tensors.

In [69]:
sequences = [
    'Hello!',
    'Cool.',
    'Nice!'
]

# tokenizer converts to vocabulary indices called input IDs:
encoded_sequences = [
    [101, 7592, 999, 102], # Hello!
    [101, 4658, 1012, 102], # Cool.
    [101, 3835, 999, 102], # Nice!
]

In [70]:
# convert them to tensor object
import torch

model_inputs = torch.tensor(encoded_sequences)

### Using the tensors as inputs to the model

In [71]:
output = model(model_inputs)

# Tokenizers

Tokenizers translate text into data that can be processed by the model.

## Loading and saving

Load the BERT tokenizer trained with the same checkpoint as BERT is done the same way as loading the model:

In [72]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

In [73]:
# load from AutoTokenizer
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

In [74]:
tokenizer("Using a Transformer network is simple")

{'input_ids': [101, 7993, 170, 13809, 23763, 2443, 1110, 3014, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

Save a tokenizer:

In [75]:
saving_directory = './my_bert_tokenizer'
tokenizer.save_pretrained(saving_directory)

('./my_bert_tokenizer/tokenizer_config.json',
 './my_bert_tokenizer/special_tokens_map.json',
 './my_bert_tokenizer/vocab.txt',
 './my_bert_tokenizer/added_tokens.json',
 './my_bert_tokenizer/tokenizer.json')

## Encoding

Encoding is to translate text to numbers. It is done in a two-step process: the tokenization, followed by the conversion to input IDs.

Tokenization is to split the text into words, usually called *tokens*.

The second step is to convert those tokens into numbers, so we can build a tensor out of them and feed them to the model. To do this, the tokenizer has a *vocabulary*, which is the part we download when we instantiate it with the `from_pretrained()` method.

### Tokenization

The tokenization process is done by the `tokenize()` method of the tokenizer:

In [76]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

In [77]:
sequence = "Using a transformer network is simple!"
tokens = tokenizer.tokenize(sequence)

print(tokens)

['Using', 'a', 'transform', '##er', 'network', 'is', 'simple', '!']


This tokenizer is a subword tokenizer: it splits the words until it obtains tokens that can be represented by its vocabulary. For example, the word `transformer` is split into `transform` and `##er`.

### From tokens to input IDs

The conversion to input IDs is handled by the `convert_tokens_to_ids()` tokenizer method:

In [78]:
ids = tokenizer.convert_tokens_to_ids(tokens)

print(ids)

[7993, 170, 11303, 1200, 2443, 1110, 3014, 106]


These outputs, once converted to the appropriate framework tensor, can be used as inputs to a model.

## Decoding

The `decoded` method is to convert the input IDs into a string.

In [79]:
decoded_string = tokenizer.decode(ids)

print(decoded_string)

Using a transformer network is simple!


In [80]:
print(tokenizer.convert_ids_to_tokens(ids))

['Using', 'a', 'transform', '##er', 'network', 'is', 'simple', '!']


# Handling multiple sequences

## Models expect a batch of inputs

In [81]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

In [82]:
sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = torch.tensor(ids)
print(input_ids, input_ids.shape)

tensor([ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,
         2026,  2878,  2166,  1012]) torch.Size([14])


In [83]:
model(input_ids)

IndexError: too many indices for tensor of dimension 1

Transformers models expect multiple sentences by default. Need to include the batch dimension.

In [84]:
tokenized_inputs = tokenizer(sequence, return_tensors='pt')

print(tokenized_inputs['input_ids'], tokenized_inputs['input_ids'].shape)

tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102]]) torch.Size([1, 16])


Hence, we need to add a new dimension

In [85]:
tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)

input_ids = torch.tensor([ids])
print("Input IDs:", input_ids)

output = model(input_ids)
print("Logits:", output.logits)

Input IDs: tensor([[ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,
          2026,  2878,  2166,  1012]])
Logits: tensor([[-2.7276,  2.8789]], grad_fn=<AddmmBackward0>)


In [86]:
# Batching
batched_ids = [ids, ids, ids]

input_ids = torch.tensor(batched_ids)
print("Input IDs:", input_ids)

output = model(input_ids)
print("Logits:", output.logits)

Input IDs: tensor([[ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,
          2026,  2878,  2166,  1012],
        [ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,
          2026,  2878,  2166,  1012],
        [ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,
          2026,  2878,  2166,  1012]])
Logits: tensor([[-2.7276,  2.8789],
        [-2.7276,  2.8789],
        [-2.7276,  2.8789]], grad_fn=<AddmmBackward0>)


## Padding the inputs

In [87]:
batched_ids = [
    [200, 200, 200],
    [200, 200]
]

This cannot be converted to a tensor.

We need to use *padding* to make our tensors have a rectangluar shape. Padding makes sure all our sentences have the same length by adding a special word called the *padding token* to the sentences with fewer values.

In [88]:
padding_id = 100

batched_ids = [
    [200, 200, 200],
    [200, 200, padding_id],
]

The padding token ID can be found in `tokenizer.pad_token_id`.

In [89]:
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]

batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]

In [90]:
print(model(torch.tensor(sequence1_ids)).logits)
print(model(torch.tensor(sequence2_ids)).logits)
print(model(torch.tensor(batched_ids)).logits)

tensor([[ 1.5694, -1.3895]], grad_fn=<AddmmBackward0>)
tensor([[ 0.5803, -0.4125]], grad_fn=<AddmmBackward0>)
tensor([[ 1.5694, -1.3895],
        [ 1.3374, -1.2163]], grad_fn=<AddmmBackward0>)


The second row should be the same as the logits for the second sentence, but we got completely different values! This is because the key feature of Transformer models is attention layers that *contextualize* each token. These will take into account the padding tokens since they attend to all of the tokens of a sequence. To get the same result when passing individual sentences of different lengths through the model or when passing a batch with the same sentences and padding applied, we need to tell those attention layers to ignore the padding tokens. This is done by using an attention mask.

## Attention masks

*Attention masks* are tensors with the exact same shape as the input IDs tensor, filled with 0s and 1s: 1s indicate the corresponding tokens should be attended to, and 0s indicate the corresponding tokens should not be attended to.

In [91]:
sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]

batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]
# add attention mask
attention_mask = [
    [1, 1, 1],
    [1, 1, 0],
]

In [92]:
print(model(torch.tensor(sequence1_ids)).logits)
print(model(torch.tensor(sequence2_ids)).logits)

outputs = model(torch.tensor(batched_ids),
                attention_mask=torch.tensor(attention_mask))
print(outputs.logits)

tensor([[ 1.5694, -1.3895]], grad_fn=<AddmmBackward0>)
tensor([[ 0.5803, -0.4125]], grad_fn=<AddmmBackward0>)
tensor([[ 1.5694, -1.3895],
        [ 0.5803, -0.4125]], grad_fn=<AddmmBackward0>)


Now the results are the same.

# Putting it all together

In [93]:
checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

model_inputs = tokenizer(sequence)

The `model_inputs` variable contains everything that's necessary for a model to operate well. For DistilBERT, that includes the input IDs as well as the attention mask.

In [94]:
sequence = "I've been waiting for a HuggingFace course my whole life."

model_inputs = tokenizer(sequence)
print(model_inputs)

{'input_ids': [101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [95]:
# multiple sequences
sequences = ["I've been waiting for a HuggingFace course my whole life.", "So have I!"]

model_inputs = tokenizer(sequences)
print(model_inputs)

{'input_ids': [[101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102], [101, 2061, 2031, 1045, 999, 102]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}


It can pad according to several objectives:

In [97]:
# pad the sequences up to the max sequence length
model_inputs = tokenizer(sequences, padding='longest')
print(model_inputs)

{'input_ids': [[101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102], [101, 2061, 2031, 1045, 999, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]}


In [98]:
# pad the sequences up to the model max length
# (512 for BERT or DistilBERT)
model_inputs = tokenizer(sequences, padding='max_length')
print(model_inputs)

{'input_ids': [[101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [99]:
# pad the sequences up to the specified max length
model_inputs = tokenizer(sequences, padding='max_length', max_length=8)
print(model_inputs)

{'input_ids': [[101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102], [101, 2061, 2031, 1045, 999, 102, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 0, 0]]}


It can also truncate sequences:

In [100]:
# truncate the sequences that are longer than the model max length
# (512 for BERT or DistilBERT)
model_inputs = tokenizer(sequences, truncation=True)
print(model_inputs)

{'input_ids': [[101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102], [101, 2061, 2031, 1045, 999, 102]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}


In [101]:
# truncate the sequences that are longer than the specified max length
model_inputs = tokenizer(sequences, max_length=8, truncation=True)
print(model_inputs)

{'input_ids': [[101, 1045, 1005, 2310, 2042, 3403, 2005, 102], [101, 2061, 2031, 1045, 999, 102]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}


The `tokenizer` can handle the conversion to specific framework tensors, which can be sent to the model.

In [102]:
# return PyTorch tensors
model_inputs = tokenizer(sequences, padding=True, return_tensors='pt')
print(model_inputs)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  2061,  2031,  1045,   999,   102,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}


In [103]:
# return TensorFlow tensors
model_inputs = tokenizer(sequences, padding=True, return_tensors='tf')
print(model_inputs)

{'input_ids': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
array([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662,
        12172,  2607,  2026,  2878,  2166,  1012,   102],
       [  101,  2061,  2031,  1045,   999,   102,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}


In [104]:
# return numpy arrays
model_inputs = tokenizer(sequences, padding=True, return_tensors='np')
print(model_inputs)

{'input_ids': array([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662,
        12172,  2607,  2026,  2878,  2166,  1012,   102],
       [  101,  2061,  2031,  1045,   999,   102,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0]]), 'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}


## Special tokens

In [105]:
sequence = "I've been waiting for a HuggingFace course my whole life."

model_inputs = tokenizer(sequence)
print(model_inputs['input_ids'])

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

[101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102]
[1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012]


There are a bit different between two input IDs: one token ID 101 was added at the beginning, and one at the end.

In [106]:
# decode the results
print(tokenizer.decode(model_inputs['input_ids']))
print(tokenizer.decode(ids))

[CLS] i've been waiting for a huggingface course my whole life. [SEP]
i've been waiting for a huggingface course my whole life.


The tokenizer added the speical word `[CLS]` at the beginning and the speical word `[SEP]` at the end.

This is because the model was pretrained with those, so to get the same results for inference we need to add them as well.

## Wrapping up: From tokenizer to model

To handle multiple sequences (padding!), very long sequences (truncation!), and multiple types of tensors with its main API:

In [107]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequences = [
    'I’ve been waiting for a HuggingFace course my whole life.',
    'So have I!',
]

tokens = tokenizer(sequences,
                   padding=True,
                   truncation=True,
                   return_tensors='pt')

output = model(**tokens)
print(output.logits)



tensor([[-1.5979,  1.6390],
        [-3.6183,  3.9137]], grad_fn=<AddmmBackward0>)
