# pipeline

the most basic object in transformers library

it allows you to do a `task` using a `model`

task: eg- `fill-mask`, `text-classification`, `text-generation`, `summarization`, `translation`

In [None]:
from transformers import pipeline

In [None]:
pipe_obj = pipeline("zero-shot-classification")

In [None]:
pipe_obj(
    "I purchased a LinkedIn subscription and my expenses were over my budget",
    candidate_labels=["personal-finance", "money", "exercise"]
    )

In [None]:
sentiment = pipeline(task="sentiment-analysis")

In [None]:
sentiment(["Attention mechanism is wild!", "What a disgrace to not understand the basics"])

In [None]:
gpt = pipeline("text-generation")

In [None]:
gpt("In the cold winter of Toronto, the mailman was out and about without")

**Good practice**: Choose model for the task

In [None]:
distilgpt = pipeline(
    task="text-generation",
    # model="deepseek-ai/DeepSeek-R1" # ImportError: cannot import name 'is_torch_greater_or_equal_than_1_13' from 'transformers.pytorch_utils' 
    # model="deepseek-ai/DeepSeek-R1-Distill-Qwen-14B" # model-00001-of-000004.safetensors = 8.71G
    model="distilbert/distilgpt2"
    )

In [None]:
distilgpt(
    text_inputs="In this huggingface NLP course we will",
    max_length=30,
    num_return_sequences=2
)

In [None]:
import tiktoken
vocab = tiktoken.get_encoding(encoding_name="gpt2")
# same as the eos_token_id set in the above gpt-2 text generation
vocab.n_vocab

# transformer model families

* **GPT like**: autoregressive transformer models
* **BERT like**: autoencoding transformer models
* **BART/T5 like**: sequence-to-sequence tranformer models

## self-supervised learning:

- type of learning used to develop transformers
- objective is automatically computed from the input. (think the autoregressive target used in `Vaswani et.al. 2017`)


# transfer learning

- in context of LLM's: self-supervised learning is not enough, so general pretrained models are **fine-tuned** using **transfer learning**

ways of transfer learning:
- causal language modeling
  - task accomplished: next word prediction (given the previous n-words)
  - GPT-2 was pretrained using this technique
- masked language modeling
  - task accomplished: predict a masked word in the sentence
  - BERT was pretrained using this technique

# carbon footprint

LLM training equates to the total carbon emissions of 5 cars, throughout their lifetimes

useful libraries: `codecarbon`

online resources: [ML CO<sub>2</sub> Impact](https://mlco2.github.io/impact/)

**Q.** Why use pretrained model, instead of training model from scratch using the dataset specific to the task?

**A.**

- The pretrained model has statistical understanding of the language it was trained on.
- The pretrained model was trained on much larger dataset than the fine-tuning dataset, so the fine-tuning dataset requires less training to get satisfactory results

# Encoder Decoder

- Encoder-decoder a.k.a sequence to sequence
- **Encoder**: bidirectional, self-attention
- **Decoder**: masked self-attention, autoregressive, unidirectional
- These 2 can be used together or separately

- **Encoder only model:** 
  - High dimensional representation of the inputs
  - Good for tasks that require understanding the data: `text-classification (sentiment-analysis)`, `token-classification (ner)`
- **Decoder only model:**
  - Good for generative tasks: `text-generation`
- **Encoder-Decoder model:**
  - Generative tasks that require and input: `translation`, `summarization`