# Behind the Pipeline

* Outlines what the `pipeline` function from the Hugging Face `Transformers` library does behind the scenes
* Uses a sentiment analysis example
* Uses the `distilbert/distilbert-base-uncased-finetuned-sst-2-english` model
* All classes and functions are imported from the `Transformers` library

## Setup

In [17]:
model_provider = "distilbert"
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = f"{model_provider}/{model_name}"
task_name = "sentiment-analysis"

## Sentiment Analysis Pipeline

In [18]:
from transformers import pipeline

classifier = pipeline(task=task_name, model=model)
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)

Device set to use mps:0


[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

## Preprocessing with Tokenizer

* Raw text needs to be tokenized and converted into integers for input into the model
* This preprocessing must be done the same way as when the model was pretrained
* Use the `AutoTokenizer` class via its `from_pretraining` method with the model's checkpoint name to:
    * Automatically fetch the data associated with the model's tokenizer
    * Cache the data so it can be reused as needed
* Once the `tokenizer` object is set, text sentences can be passed to it
* The `raw_inputs` variable defines an array of the text sentences to pass to the tokenizer
* Transformer models only accept *tensors* as input
* The `return_tensors` argument of the tokenizer specifies the type of tensors to return
    * `pt` returns PyTorch tensors
* The `inputs` variable is the output of the tokenizer to be used as input to the model
    * It contains a dictionary of 2 keys:
        * `input_ids`: Contains 2 rows of integers (one for each sentence) that are unique identifiers of the tokens in each sentence
        * `attention_masks`

In [19]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

In [20]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [21]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

In [22]:
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")

In [23]:
print(inputs)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


## Going through Model

* The pretrained model can be downloaded the same way as the tokenizer
* Use the `AutoModel` class via its `from_pretrained` method with the model's checkpoint name to:
    * Use the cached checkpoint used with the tokenizer
    * Instantiate a model
* This architecture only contains the base Transformer module:
    * Given inputs, it outputs *hidden states* or *features*
    * For each model input, a high-dimensional vector will be retrieved representing the contextual understanding of that input by the Transformer model
    * Hidden states are useful on their own, but they are usually inputs to another part of the model known as the *head*
* The vector output by the Transformer module is usually large and has 3 dimensions:
    * Batch size: Number of sequences processed at a time (2 from the example)
    * Sequence length: Length of the numerical representation of the sequence (16 from the example)
    * Hidden size: Vector dimension of model input
* The vector is highly dimensional because of the last value:
    * The hidden size can be very large (768 is common and larger models reach 3072+)
* The `outputs` variable contains attributes that shows the 3 dimensions of the vector
    * This is noted via `torch.Size` from the `shape` attribute of the `last_hidden_state` attribute

In [24]:
from transformers import AutoModel

model = AutoModel.from_pretrained(checkpoint)

In [25]:
outputs = model(**inputs)

In [26]:
print(outputs.last_hidden_state.shape)

torch.Size([2, 16, 768])


### Model Heads

* Model heads take the high-dimensional vector of hidden states as input and project them onto a different dimension
* They are usually composed of one or a few linear layers
* The output of the Transformer model is sent directly to the model head to be processed
* The model is represented by its embeddings layer and the subsequent layers
* The embeddings layer converts each input ID in the tokenized input into a vector that represents the associated token
* The subsequent layers manipulate those vectors using the attention mechanism to produce the final representation of the sentences

![Transfer Network with Head](Transfer%20Network%20with%20Head.png)

* For the example, a model with a sequence classification head is used
    * Used to classify sentences as positive or negative
* In place of the `AutoModel` class, the `AutoModelForSequenceClassification` is used
* The dimensionality will be much lower for the shape of the `outputs` variable
    * The model takes as input the high-dimensional vectors
    * The output vectors contain 2 values, one per label
* Since there are two sentences and two labels, the result from the model is of shape 2 x 2

In [27]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

In [28]:
outputs = model(**inputs)

In [29]:
print(outputs.logits.shape)

torch.Size([2, 2])


## Postprocessing Output

* The model output is not probabilities but logits, which are the raw, unnormalized scores outputted by the last layer of the model
* To convert to probabilities, the logits need to go through a SoftMax layer
    * All Transformer models output logits
    * The loss function for training will generally fuse the last activation function, such as SoftMax, with the actual loss function, such as cross entropy

In [30]:
print(outputs.logits)

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)


In [31]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

In [32]:
print(predictions)

tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)


* With the logits converted to probabilities, the model predicted:
    * [0.0402, 0.9598] for the first sentence
    * [0.9995, 0.0005] for the second sentence
* To get the labels corresponding to each position, use the `id2label` attribute of the `model` variable's `config` attribute

In [33]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

One can conclude that the model predicted the following:

* First sentence: `NEGATIVE: 0.0402, POSITIVE: 0.9598`
* Second sentence: `NEGATIVE: 0.9995, POSITIVE: 0.0005`