# Run a Large Language Model using the [HuggingFace `Transformers`](https://huggingface.co/docs/transformers/index) API.  

The cells below are good defaults for development.  

The `autoreload` lines help load libraries on the fly, while they are changing. This works well with the editable install we created via `pip install -e .`  
This means we can edit the source code directly and have the change reflected live in the notebook.  

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

# Introduction

Imagine we have a list of product review from our users. Now we want to find out whether those reviews were good or bad. It will take a lot of effort to manually go through and check each one. But, using an LLM, we can automatically get a label for a given product review. 

How would this be useful? We could use it to find the more negative reviews to see where our product needs improving. Or, we can look at the more positive ones to see what we're doing right.  

The broader task in NLP of figuring out a statement's tone is called `Sentiment Analysis`.

## First, a Pipeline

A HuggingFace model is based on 3 key pieces: 
1. Config file.  
2. Preprocessor file.   
3. Model file.   

The HuggingFace API gives us a way of automatically using these pieces directly: the `pipeline`.  

Let's get right it and create a Sentiment Analysis `pipeline`.

In [None]:
# load in the pipeline object from huggingface
from transformers import pipeline

# create the sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████| 629/629 [00:00<00:00, 1.06MB/s]
Downloading model.safetensors: 100%|██████████| 268M/268M [00:34<00:00, 7.76MB/s] 
Downloading (…)okenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 432kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 9.00MB/s]


We can see in the output message above that HuggingFace automatically picked a decent, default model for us since we didn't specify one. Specifically, it chose a [distilbert model](distilbert-base-uncased-finetuned-sst-2-english).  

We will learn more about what exactly `distilbert` is and how it works later on. For now, think of it as a useful NLP genie who can tell us how it feels about a given sentence. 

In [None]:
# example from the HuggingFace tutorial
classifier("We are very happy to show you the 🤗 Transformers library.")

[{'label': 'POSITIVE', 'score': 0.9997795224189758}]

In [None]:
# passing in several sentences at once, inside a python list
results = classifier([
    "We are very happy to show you the 🤗 Transformers library.",
    "We hope you don't hate it.",
    "I love Fractal! I'm so glad it's not a cult!", 
])

# print the output of each results
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
label: POSITIVE, with score: 0.999


# Inspecting the `classifier`, notebook style.

What is the `classifier`, exactly?

In [None]:
classifier

<transformers.pipelines.text_classification.TextClassificationPipeline>

In [None]:
## showing the lookup's auto-complete
# classifier.

In [None]:
## viewing all of a class' methods and properties
dir(classifier)

['__abstractmethods__',
 '__call__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_batch_size',
 '_ensure_tensor_on_device',
 '_forward',
 '_forward_params',
 '_num_workers',
 '_postprocess_params',
 '_preprocess_params',
 '_sanitize_parameters',
 'binary_output',
 'call_count',
 'check_model_type',
 'default_input_names',
 'device',
 'device_placement',
 'ensure_tensor_on_device',
 'feature_extractor',
 'forward',
 'framework',
 'function_to_apply',
 'get_inference_context',
 'get_iterator',
 'image_processor',
 'iterate',
 'model',
 'modelcard',
 'postprocess',
 'predict',
 'preprocess',
 'return_all_scores',
 'run_mul

Jupyter notebooks have powerful ways of inspecting and analyzing the code, as we're running it. 

In [None]:
## refresher
classifier

<transformers.pipelines.text_classification.TextClassificationPipeline>

In [None]:
## the power of asking questions
# classifier? # help(classifier)

[0;31mSignature:[0m      [0mclassifier[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m           TextClassificationPipeline
[0;31mString form:[0m    <transformers.pipelines.text_classification.TextClassificationPipeline object>
[0;31mFile:[0m           ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/text_classification.py
[0;31mDocstring:[0m     
Text classification pipeline using any `ModelForSequenceClassification`. See the [sequence classification
examples](../task_summary#sequence-classification) for more information.

Example:

```python
>>> from transformers import pipeline

>>> classifier = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")
>>> classifier("This movie is disgustingly good !")
[{'label': 'POSITIVE', 'score': 1.0}]

>>> classifier("Director tried too much.")
[{'label': 'NEGATIVE', 'score': 0.996}]
```

Learn more about the ba

In [None]:
## again, with feeling
# classifier?? # ?? shows you the source code of the object

[0;31mSignature:[0m      [0mclassifier[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m           TextClassificationPipeline
[0;31mString form:[0m    <transformers.pipelines.text_classification.TextClassificationPipeline object>
[0;31mFile:[0m           ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/text_classification.py
[0;31mSource:[0m        
[0;34m@[0m[0madd_end_docstrings[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mPIPELINE_INIT_ARGS[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34mr"""[0m
[0;34m        return_all_scores (`bool`, *optional*, defaults to `False`):[0m
[0;34m            Whether to return all prediction scores or just the one of the predicted class.[0m
[0;34m        function_to_apply (`str`, *optional*, defaults to `"default"`):[0m
[0;34m            The function to apply to the model outputs in order to retrieve the scores. Accep

In [None]:
classifier.forward # what actually runs the inputs

<bound method Pipeline.forward of <transformers.pipelines.text_classification.TextClassificationPipeline object>>

In [None]:
classifier.forward??

[0;31mSignature:[0m [0mclassifier[0m[0;34m.[0m[0mforward[0m[0;34m([0m[0mmodel_inputs[0m[0;34m,[0m [0;34m**[0m[0mforward_params[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
    [0;32mdef[0m [0mforward[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mmodel_inputs[0m[0;34m,[0m [0;34m**[0m[0mforward_params[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;32mwith[0m [0mself[0m[0;34m.[0m[0mdevice_placement[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m            [0;32mif[0m [0mself[0m[0;34m.[0m[0mframework[0m [0;34m==[0m [0;34m"tf"[0m[0;34m:[0m[0;34m[0m
[0;34m[0m                [0mmodel_inputs[0m[0;34m[[0m[0;34m"training"[0m[0;34m][0m [0;34m=[0m [0;32mFalse[0m[0;34m[0m
[0;34m[0m                [0mmodel_outputs[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0m_forward[0m[0;34m([0m[0mmodel_inputs[0m[0;34m,[0m [0;34m**[0m[0mforward_params[0m

In [None]:
classifier._forward??

[0;31mSignature:[0m [0mclassifier[0m[0;34m.[0m[0m_forward[0m[0;34m([0m[0mmodel_inputs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
_forward will receive the prepared dictionary from `preprocess` and run it on the model. This method might
involve the GPU or the CPU and should be agnostic to it. Isolating this function is the reason for `preprocess`
and `postprocess` to exist, so that the hot path, this method generally can run as fast as possible.

It is not meant to be called directly, `forward` is preferred. It is basically the same but contains additional
code surrounding `_forward` making sure tensors and models are on the same device, disabling the training part
of the code (leading to faster inference).
[0;31mSource:[0m   
    [0;32mdef[0m [0m_forward[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mmodel_inputs[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;31m# `XXXForSequenceClassification` models should not use `use_cache=True` even

In [None]:
classifier.model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

> Summary: `?` and `??` are very powerful and useful inspection tools for Jupyter notebooks. 

# Peeking inside the `pipeline`

We can see the pipeline loaded the model. 

It then handled the three key pieces (Config, Preprocess, Model) underneath the hood. What exactly is `pipeline` doing?  

Let's build or own pipeline from scratch, stepping one small level below the abstraction. To do this, we will create each of the key pieces manually.  

### Config class

In [None]:
from transformers import DistilBertConfig

### Preprocessor class

In [None]:
from transformers import DistilBertTokenizer

### Model class

In [None]:
# from transformers import DistilBertModel
from transformers import DistilBertForSequenceClassification

Now we can use the model's name from up above and build each piece ourselves. HuggingFace uses the `from_pretrained` method to make this quick and easy. 

In [None]:
# the model we are using
model_name = 'distilbert-base-uncased-finetuned-sst-2-english'

In [None]:
# creating the config
config = DistilBertConfig.from_pretrained(model_name)

# creating the preprocessor 
tokenizer = DistilBertTokenizer.from_pretrained(model_name)

# creating the model
model = DistilBertForSequenceClassification.from_pretrained(model_name)

Next we build a simple pipeline with these manual pieces.  

In [None]:
def preprocess(text: str):
    """
    Sends `text` through the LLM's tokenizer.  
    The tokenizers turns words and characters into special inputs for the LLM.
    """
    tokenized_inputs = tokenizer(text, return_tensors='pt')
    return tokenized_inputs


In [None]:
text = "I love Fractal! I'm so glad it's not a cult!"
preprocess(text)

{'input_ids': tensor([[  101,  1045,  2293, 25312, 25572,  2140,   999,  1045,  1005,  1049,
          2061,  5580,  2009,  1005,  1055,  2025,  1037,  8754,   999,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [None]:
def forward(text):
    """
    First we preprocess the `text` into tokens.
    Then we send the `token_inputs` to the model.
    """
    token_inputs = preprocess(text)
    outputs = model(**token_inputs)
    return outputs

In [None]:
outputs = forward(text); outputs

SequenceClassifierOutput(loss=None, logits=tensor([[-3.3825,  3.5515]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [None]:
config.id2label[1]

'POSITIVE'

In [None]:
outputs.logits

tensor([[-3.3825,  3.5515]], grad_fn=<AddmmBackward0>)

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [None]:
f"{count_parameters(model):,}"

'66,955,010'

In [None]:
def process_outputs(outs):
    """
    Here is where HuggingFace does the most for us via `pipeline`.  
    """
    # grab the raw "scores" that from the model for Positive and Negative labels
    logits = outs.logits

    # find the strongest label score, aka the model's decision
    pred_idx = logits.argmax(1).item()

    # use the `config` object to find the class label
    pred_label = config.id2label[pred_idx]  

    # calculate the human-readable number for the score
    pred_score = logits.softmax(-1)[:, pred_idx].item()

    return {
        'label': pred_label,
        'score': pred_score, 
    }


In [None]:
def simple_pipeline(text):
    model_outs = forward(text)
    preds = process_outputs(model_outs)
    return preds

Let's call this pipeline on the same example text from before.

In [None]:
text = "We are very happy to show you the 🤗 Transformers library."

In [None]:
simple_pipeline(text)

{'label': 'POSITIVE', 'score': 0.9997795224189758}

# More HF magic

`Auto` classes.

In [None]:
from transformers import AutoConfig
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

In [None]:
model_name = "finiteautomata/bertweet-base-sentiment-analysis"

In [None]:
config = AutoConfig.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

Downloading (…)lve/main/config.json: 100%|██████████| 949/949 [00:00<00:00, 3.91MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 338/338 [00:00<00:00, 1.30MB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 843k/843k [00:00<00:00, 5.35MB/s]
Downloading (…)solve/main/bpe.codes: 100%|██████████| 1.08M/1.08M [00:00<00:00, 17.1MB/s]
Downloading (…)in/added_tokens.json: 100%|██████████| 22.0/22.0 [00:00<00:00, 103kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 167/167 [00:00<00:00, 717kB/s]
emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0
Downloading pytorch_model.bin: 100%|██████████| 540M/540M [00:30<00:00, 17.6MB/s] 


In [None]:
simple_pipeline(text)

{'label': 'POS', 'score': 0.9929807186126709}