# HuggingFace - Pipeline, Models, APIs - an exploration of the library and what it does

## Introduction

This purpose of this notebook is to explore a bit more in depth into:

1. HF's `pipeline` and `HuggingFacePipeline`
  * What do these objects do
  * When to use one or the other
2. HF Model load
  * Does this load model artifacts from the HF API server?
  * What is the impact of a model not fitting into serverless API

## References

1. Accompanying notebook: [HuggingFace - Naive RAG and LLM Judge.ipynb](https://colab.research.google.com/drive/1iZpEjLO_6JS6F8oWSuwYJ2UiBppXN8p8?usp=sharing)
2. HF documentation:
  * [NLP Learn](https://huggingface.co/learn/nlp-course/chapter1/1?fw=pt)
  * [Pipelines](https://huggingface.co/docs/transformers/en/main_classes/pipelines)

## TODOs

* Model cache?

## Prep

### Install dependencies

In [1]:
!pip install torch transformers

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
Collectin

### Imports

In [2]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

## Breakdown of pipeline

`Pipeline` consists of:
1. `Tokenizer` object
2. `Model` object
3. Post Processing object



### Build/use a pipeline by calling `Tokenizer` and `Model` separately

Reference: https://huggingface.co/learn/nlp-course/chapter2/6?fw=pt

In [5]:
!export HF_HOME=/.hf_cache

In [16]:
# Model card: https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english
# Model type: text-classification
model_checkpoint = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)

sequences = [
    "I've been waiting for a HuggingFace course my whole life.",
    "So have I!"
]

inputs = tokenizer(sequences, padding=True, truncation=True, return_tensors="pt") # pt return_tensor = torch.Tensor return type - also supports tf and np

assert isinstance(inputs['input_ids'], torch.Tensor)
assert 'input_ids' in list(inputs.keys())
assert 'attention_mask' in list(inputs.keys())

inputs

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  2061,  2031,  1045,   999,   102,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}

In [28]:
inputs['input_ids'].numpy()

array([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662,
        12172,  2607,  2026,  2878,  2166,  1012,   102],
       [  101,  2061,  2031,  1045,   999,   102,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0]])

In [35]:
print(tokenizer.decode(inputs['input_ids'][0])) # what the tokens look like
print(tokenizer.decode(inputs['input_ids'][1])) # what the tokens look like

print(tokenizer.pad_token_id)
print(tokenizer.pad_token)

[CLS] i've been waiting for a huggingface course my whole life. [SEP]
[CLS] so have i! [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
0
[PAD]


In [33]:
output = model(**inputs)
output

SequenceClassifierOutput(loss=None, logits=tensor([[-1.5607,  1.6123],
        [-3.6183,  3.9137]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

### Model output Logits to Classification

Logit is the model output prior to activation?

In [44]:
torch.sigmoid(output.logits)

tensor([[0.1735, 0.8337],
        [0.0261, 0.9804]], grad_fn=<SigmoidBackward0>)

In [64]:
mask_of_probabilities = torch.sigmoid(output.logits).squeeze(dim=0) > 0.5

torch.nn.Softmax(dim=1)(output.logits) # softmax on the logits returned by the model

tensor([[4.0195e-02, 9.5980e-01],
        [5.3534e-04, 9.9946e-01]], grad_fn=<SoftmaxBackward0>)

In [39]:
model.config.label2id # the softmax of the model output logits shows that the index of the highest probability result = 1 - which indicates POSITIVE

{'NEGATIVE': 0, 'POSITIVE': 1}

## Pipeline object

In [36]:
sentiment_classification_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="sentiment-analysis",
    # temperature=0.2,
    # do_sample=True,
    # repetition_penalty=1.1,
    # return_full_text=True,
    # max_new_tokens=400,
)

sentiment_classification_pipeline(sequences)

[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'POSITIVE', 'score': 0.9994646906852722}]