In [None]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [None]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

## Using a model

- architecture: a blueprint of the mathematical functions and its layers (eg gpt or bert)
- checkpoint: the weights loaded for a given architecture
- model: can refer to an architecture or checkpoint

As an example, BERT is an architecture and "bert-base-cased-finetuned-mrpc" is a checkpoint with a given set of weights,

We can downloand a model from huggingface.  It contains only the base Transformer module: given some inputs, it outputs 
hidden states (also known as features). For each model input, we get a vector representing the contextual understanding
by the Transformer model.

> The hidden states are the weights in the hidden layers. These hidden states can be used as features for downstream 
> tasks.  They were learned by the model during pretraining without supervision.

In [None]:
from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

## The head

While hidden states (featurees) can be used on their own, usually, they are used in another part of the model called a 
head.  Different tasks like ner or summarization require different heads.

### Output vector

the vector output of a model is usually large and consists of:

- batch size (number of samples in the batch, 2 in our example)
- sequence length (length of the numerical representation of the sequence, 16 in our example)
- hidden size (dimensionality of the hidden state, also known as embedding dimensionality)

The head tahes the high dimensional output vector, and projects it to a different dimension usually smaller.  The 
smaller vector is used for downstream tasks.  The lower dimensions are usually composed of a few linear layers.


In [None]:
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)