<a href="https://colab.research.google.com/github/sdaigo/playground-transformers/blob/main/transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [70]:
!pip install transformers --quiet

In [71]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(
    [
     "I've been waiting for a Hugging Face course my whole life.",
     "I hate this so much!"
    ]
)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9982948899269104},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

## Preprocessing with a tokenizer

In [73]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [74]:
raw_inputs = [
  "I've been waiting for a HuggingFace course my whole life.",
  "I hate this so much!",
]

In [75]:
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


In [76]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

In [77]:
outputs.logits.shape

torch.Size([2, 2])

In [78]:
# not probabilities!
outputs.logits

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)

In [79]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

In [80]:
predictions

tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)

In [81]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

In [91]:
for pred in predictions:
    print(f"{model.config.id2label[0]}: {pred[0] * 100:.2f}%, {model.config.id2label[1]}: {pred[1] * 100:.2f}%")

NEGATIVE: 4.02%, POSITIVE: 95.98%
NEGATIVE: 99.95%, POSITIVE: 0.05%
