#### Pipeline

In [1]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

#### Pipeline from scratch
1. Tokenize raw text
2. Pass tokenized text into model
3. Receive logits from model prediction
4. Turn logits into prediction probabilities

#### Tokenizer Preprocessing
- Every model will have different parameters for how it choses to tokenize.
- `AutoTokenizer` class can fetch the tokenizer with the `from_pretrained()` function

In [2]:
from transformers import AutoTokenizer

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [3]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, 
                   padding=True, # Pads to size of largest sequence
                   truncation=True, # Don't know
                   return_tensors="pt" # Return pytorch tensors
                   )
print(inputs)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


#### AutoModel

Similar to getting the tokens from the model, the pretrained model weights and architecture can also be loaded with `AutoModel`

In [4]:
from transformers import AutoModel

model = AutoModel.from_pretrained(checkpoint)

In [6]:
outputs = model(**inputs) # **inputs breaks out the dictionary so it can take the input properly
print(outputs.last_hidden_state.shape)
# (batch_size, seq_length, hidden_size)

torch.Size([2, 16, 768])


#### Model Heads

A single model can be used for many different tasks. This makes the model architecture basically the same across tasks apart from the model head. Thus the `AutoModel` class is rarely used, but instead specific indicated task version of it. In this case, we want to do sentiment analysis, so we will use `AutoModelForSequenceClassification` for the current model.

In [7]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
print(outputs.logits.shape) # 2 sentences, 2 labels

torch.Size([2, 2])


In [8]:
print(outputs.logits)

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)


In [9]:
# Pass outputs through softmax layer to get predictions
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predictions

tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)

In [10]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

In [16]:
pred_id = torch.argmax(predictions, dim=-1)
for i in range(2):
    print(raw_inputs[i])
    print(model.config.id2label[int(pred_id[i])])

I've been waiting for a HuggingFace course my whole life.
POSITIVE
I hate this so much!
NEGATIVE
