# Transformer - Based Models

Jesus Felix B. Valenzuela, Ph.D.    
17 January 2025

## Preliminaries

In [1]:
# Uncomment and run these to install HuggingFace in case you haven't yet
#!pip install transformers datasets huggingface_hub[cli]
# Install this to simplify usage of BERT derivatives when doing text embeddings
#!pip install sentence-transformers 
# Install this to use BART
#!pip install sentencepiece
# Install this to use Flan-T5
#!pip install accelerate

In [2]:
import torch

## Using BERT

In [17]:
from transformers import AutoTokenizer, BertModel
modelname = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(modelname)
bertmodel = BertModel.from_pretrained(modelname)

In [18]:
input_text = ["Natural Language Processing is a very wide field."]
tokens = tokenizer(input_text, return_tensors="pt")
print(tokens)

{'input_ids': tensor([[ 101, 3019, 2653, 6364, 2003, 1037, 2200, 2898, 2492, 1012,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


In [19]:
tokenizer.tokenize("Natural Language Processing is a very wide field.")

['natural', 'language', 'processing', 'is', 'a', 'very', 'wide', 'field', '.']

In [20]:
bertmodel.eval() # No training, so optimizes things a bit
with torch.no_grad(): # No need for backprop, so optimizes things a bit. Can be removed
    outputs = bertmodel(**tokens)
outputs

BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[-0.1693, -0.2122, -0.3980,  ..., -0.2662, -0.2869,  0.7585],
         [-0.1032, -0.0960, -0.8725,  ..., -0.4872,  0.2756,  0.5792],
         [-0.7408,  0.2634,  0.2827,  ..., -0.7560, -0.5950,  0.1122],
         ...,
         [ 0.2949,  0.2133,  0.2558,  ..., -0.5548, -0.0925,  0.2125],
         [ 0.7430, -0.1533, -0.7244,  ..., -0.0267, -0.7870, -0.0513],
         [-0.2513, -0.2722, -0.2822,  ...,  0.3031, -0.7245, -0.0602]]]), pooler_output=tensor([[-0.9398, -0.6118, -0.9636,  0.8349,  0.8394, -0.5285,  0.8046,  0.4773,
         -0.8753, -1.0000, -0.6853,  0.9650,  0.9809,  0.5288,  0.8542, -0.8130,
         -0.5902, -0.6530,  0.4613, -0.1233,  0.6875,  1.0000, -0.3017,  0.5543,
          0.5611,  0.9905, -0.8589,  0.9317,  0.9538,  0.7726, -0.7375,  0.5178,
         -0.9925, -0.3157, -0.9516, -0.9908,  0.7284, -0.7133, -0.0892, -0.3199,
         -0.8978,  0.5811,  1.0000, -0.6053,  0.6963, -0.5386, -1.0000,  0.

In [21]:
outputs.last_hidden_state.shape

torch.Size([1, 11, 768])

## Using BERT derivatives with `sentence-transformers`

`sentence-transformers` simplifies the usage of BERT-derived models somewhat, especially when we need them to generate text embeddings.

In [22]:
from sentence_transformers import SentenceTransformer

model.safetensors:  20%|#9        | 241M/1.22G [00:00<?, ?B/s]

In [23]:
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
outputs2 = model.encode(input_text)
outputs2

array([[ 4.60704230e-02, -5.71745783e-02,  2.92225182e-02,
        -3.62470038e-02,  9.62645281e-03, -2.18184409e-03,
        -4.17182669e-02,  3.77881713e-02, -8.20113812e-03,
         3.52722667e-02, -3.65186408e-02,  2.49261223e-02,
         1.84047557e-02,  1.01558911e-02,  8.39706510e-02,
         6.14678897e-02, -3.13437469e-02, -4.06650752e-02,
        -1.20070934e-01, -1.11588858e-01,  2.77423579e-02,
         8.83724242e-02, -3.47970128e-02, -4.72619645e-02,
         4.66794195e-03,  6.27674013e-02, -1.79232489e-02,
        -1.11432225e-01,  6.68309256e-02, -1.01601956e-02,
        -1.00937234e-02,  6.43252060e-02,  4.29243594e-02,
         1.00725211e-01, -2.45182272e-02,  4.68435474e-02,
        -2.21674461e-02,  4.87085208e-02,  3.31762955e-02,
         7.10165454e-03, -7.10705668e-02, -8.55962373e-03,
        -2.11429363e-03, -2.88667176e-02,  1.13575406e-01,
        -1.17396256e-02, -1.06823646e-01, -3.02070621e-02,
        -3.97232659e-02,  1.96568333e-02, -1.31978035e-0

In [24]:
outputs2.shape

(1, 384)

**IMPORTANT.** Do **NOT** mix embeddings calculated using different models!!!

## Using BART

In [25]:
from transformers import BartTokenizer, BartModel

tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
bartmodel = BartModel.from_pretrained('facebook/bart-large')

inputs = tokenizer(input_text, return_tensors="pt")
bartmodel.eval() # No training, so optimizes things a bit
with torch.no_grad(): # No need for backprop, so optimizes things a bit. Can be removed
    outputs = bartmodel(**inputs)
outputs.last_hidden_state

tensor([[[-1.5298e-02,  7.2315e-01, -1.3559e+00,  ...,  3.0942e-01,
           1.5447e-01,  3.9448e-01],
         [-1.5299e-02,  7.2315e-01, -1.3559e+00,  ...,  3.0942e-01,
           1.5447e-01,  3.9448e-01],
         [ 1.0334e-01,  1.2594e+00, -3.9142e+00,  ..., -5.4476e-01,
          -1.0373e+00,  2.1848e+00],
         ...,
         [ 2.8136e-01,  7.0862e+00, -5.0988e+00,  ...,  6.2025e-01,
           9.3308e-01, -8.3185e-01],
         [ 3.6580e-01,  4.7318e+00, -1.5565e+00,  ...,  4.6762e-01,
          -1.6339e-01,  8.5705e-01],
         [ 2.6700e-01,  1.8109e+00, -7.2498e-01,  ...,  3.4869e-03,
          -1.0636e-01,  4.0109e-01]]])

## Using T5

In [33]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
prompt = "Translate English to German: " 
input_ids = tokenizer(prompt + input_text[0], return_tensors="pt").input_ids

model.eval() # No training, so optimizes things a bit
with torch.no_grad(): # No need for backprop, so optimizes things a bit. Can be removed
    outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Die natürliche Sprachenverarbeitung ist ein sehr weites Feld.


## Using Flan-T5

`Flan-T5` is the latest evolution of the T5 model.

In [35]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl")
prompt = "Translate English to Spanish: " 
input_ids = tokenizer(prompt + input_text[0], return_tensors="pt").input_ids

model.eval() # No training, so optimizes things a bit
with torch.no_grad(): # No need for backprop, so optimizes things a bit. Can be removed
    outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

El procesamiento natural de idiomas es un campo muy ampli


## Using `transformers` Pipelines

In [38]:
from transformers import pipeline

In [36]:
# Sentiment analysis
classifier = pipeline("sentiment-analysis")
classifier(input_text[0])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'label': 'POSITIVE', 'score': 0.9986552000045776}]

In [37]:
# Summarization
summarizer = pipeline("summarization")
to_summarize = "ChatGPT is credited with starting the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence (AI).[3] By January 2023, it had become what was then the fastest-growing consumer software application in history, gaining over 100 million users and contributing to the growth of OpenAI's current valuation of $86 billion.[4][5] ChatGPT's release spurred the release of competing products, including Gemini, Claude, Llama, Ernie, and Grok.[6] Microsoft launched Copilot, initially based on OpenAI's GPT-4. In June 2024, a partnership between Apple Inc. and OpenAI was announced in which ChatGPT is integrated into the Apple Intelligence feature of Apple operating systems.[7] Some observers raised concern about the potential of ChatGPT and similar programs to displace or atrophy human intelligence, enable plagiarism, or fuel misinformation.[8][9]"
summarizer(to_summarize)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'summary_text': " ChatGPT is credited with starting the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence . By January 2023, it had become what was then the fastest-growing consumer software application in history, gaining over 100 million users and contributing to the growth of OpenAI's current valuation of $86 billion ."}]