# Transformers

In [1]:
from transformers import AutoTokenizer

## Loading a pretrained AutoTokenizer
Download model using the `from_pretrained` method of `AutoTokenizer`

In [3]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint) 

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

## Tokenize raw text
Return PyTorch tensors -> `return_tensors="pt"` as a dictionary of two values 
- input ids - the tokens for the input values with 0 for the padded values
- attention mask - Shows where padding has been applied so the model does not pay attention to it


In [5]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.", 
    "I hate this so much!",
]

input = tokenizer(raw_inputs, padding = True, truncation= True, return_tensors="pt")
print(input)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


## Download Model
Download and cache the model (pretrained weights and configuration) using the `from_pretrained` method of `AutoModel`

In [11]:
from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

Some weights of the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing DistilBertModel: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


The output of the model generally has 3 dimensions - the batch size(2 in the above example), Sequence length and the hidden state or layers

In [12]:
output=model(**input) 
print(output.last_hidden_state.shape)

torch.Size([2, 16, 768])


### AutoModelForSequenceClassification
ince we are working with a sentiment classification problem which is a sequence classification problem, we can use `AutoModelForSequenceClassification` instead of `AutoModel`

In [13]:
from transformers import AutoModelForSequenceClassification
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

In [15]:
# Send the tokenized input to the model and get the output
output = model(**input)

In [18]:
# check the putput shape - sice we have two sentences 
output.logits.shape

torch.Size([2, 2])

## Postprocessing

In [19]:
print(output.logits)

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)


In [21]:
#Convert logits to probablities
import torch

predictions = torch.nn.functional.softmax(output.logits, dim=-1)
print(predictions)

tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)


In [23]:
# check the model labels
print(model.config.id2label)

{0: 'NEGATIVE', 1: 'POSITIVE'}
