#Pre-Trained Models with Pipelines (Part I)
In this tutorial, we illustrate how to use pre-trained models for inference from *transformers* library in a very convinient way - using *pipelines*.

Various piplines are available for different tasks: token classification, text classification, NER, question answering, summarization, text generation, etc.

Have fun!

In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.3-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 4.7 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 71.4 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 42.8 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.21.3


#1. Feature Extraction
There's a convenient pipeline for feature extraction. However, the output is said to be the last hidden layer. If you want other layer, you have to take the manual approach we did in Tutorial 1.

In [1]:
import numpy as np
from transformers import AutoTokenizer, AutoModel, pipeline

model = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
fe = pipeline('feature-extraction', model=model, tokenizer=tokenizer)

2023-08-26 15:35:27.622975: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are init

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [2]:
features = fe('Do you like cookies?')
features = np.squeeze(features)
print(features.shape)

(7, 768)


Remember our earlier exercise measuring similarity between sentences? Let's try it here.

In [3]:
import torch
from scipy.spatial.distance import cosine

In [4]:
sents = ["What's the time now in Singapore?",
         "What is the weather in Seattle today?",
         "Apple is looking at buying the U.K. startup for $1 billion."]

vec0 = torch.tensor(np.squeeze(fe(sents[0])))
sent0 = torch.mean(vec0, dim=0)
print(sent0.size())

vec1 = torch.tensor(np.squeeze(fe(sents[1])))
sent1 = torch.mean(vec1, dim=0)

vec2 = torch.tensor(np.squeeze(fe(sents[2])))
sent2 = torch.mean(vec2, dim=0)

torch.Size([768])


In [5]:
sim_01 = 1 - cosine(sent0, sent1)

sim_02 = 1 - cosine(sent0, sent2)

print('Vector similarity for example 0 & 1:  %.2f' % sim_01)
print('Vector similarity for example 0 & 2:  %.2f' % sim_02)

Vector similarity for example 0 & 1:  0.78
Vector similarity for example 0 & 2:  0.55


# 2. Sentiment Classification
Initialize the pipeline with keyword "sentiment-analysis" with a model that has been fine-tuned for sentiment classification. By default, the model downloaded for this pipeline is called “distilbert-base-uncased-finetuned-sst-2-english”. It uses the DistilBERT architecture and has been fine-tuned on a dataset called SST-2 for the sentiment analysis task.

The result returned includes the sentiment label and score.

In [6]:
#using fine-tuned models
from transformers import pipeline

#for sentiment classification
sa = pipeline("sentiment-analysis")

result = sa("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

result = sa("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

result = sa("This story is terribly good")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

result = sa("This dress is pretty ugly")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


label: NEGATIVE, with score: 0.9991
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.9999
label: NEGATIVE, with score: 0.9998


# 3. Sequence Classification
To classify a sequence of two sentences A and B into predefined classes like whether B is a paraphrase of A. Here we use a model finetuened on GLUE MRPC dataset (The Microsoft Research Paraphrase Corpus).

In [7]:
#====sequence classification=========
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

classes = ["not paraphrase", "is paraphrase"]

sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

# Should be paraphrase
paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
paraphrase_classification_logits = model(**paraphrase)[0]
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]


Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/433 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/433M [00:00<?, ?B/s]

In [8]:
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

not paraphrase: 10%
is paraphrase: 90%


In [9]:
# Should not be paraphrase
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")
not_paraphrase_classification_logits = model(**not_paraphrase)[0]
not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0]

for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")


not paraphrase: 94%
is paraphrase: 6%


In [10]:
print(not_paraphrase_results)
not_paraphrase_classification_logits

[0.94038325548172, 0.05961676687002182]


tensor([[ 0.5386, -2.2197]], grad_fn=<AddmmBackward>)

# 4. Question Answering (Extractive)
This is the task of identifying the segment of text in "context" that's best for the given "question". The default model is finetuned on SQuAD to predict the start and end of the answer segment in context.

In [11]:
#====Extractive question answering
from transformers import pipeline

qa = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
"""

result = qa(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")


No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Answer: 'the task of extracting an answer from a text given a question', score: 0.6226, start: 34, end: 95


In [12]:
result = qa(question="What is a good example of a question answering dataset?", context=context)
print(result['answer'], result['score'])
result = qa(question="What do you need if you want to finetune a model?", context=context)
print(result['answer'], result['score'])

SQuAD dataset 0.5052592158317566
leverage the examples/question-answering/run_squad.py script 0.36913201212882996


If you want to use a specific model, and have many questions, these are the example codes.

In [13]:
#====QA with multiple answers====
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [14]:
#Let's try this example first.
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""

questions = [
    "How many pretrained models are available in 🤗 Transformers?",
    "What does 🤗 Transformers provide?",
    "🤗 Transformers provides interoperability between which frameworks?",
]


In [15]:
#Another example.
text = r"""
INGAPORE: Singapore reported nine new COVID-19 cases as of noon on Thursday (Dec 3), including one locally transmitted infection.
The local case lives in a dormitory, said the Ministry of Health (MOH) in its preliminary update.
There were no new cases in the community.
The rest of the infections are imported cases, all of whom were placed on stay-home notice upon arrival in Singapore.
More details on the new cases will be provided later tonight, MOH added.
"""

questions = [
    "How many new cases are reported?",
    "How many local transmitted cases are there?",
    "Are there new cases in the community?",
    "when will details be released?"
]

In [16]:
for question in questions:
    inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")

    outputs = model(**inputs)
    answer_start_scores = outputs['start_logits']
    answer_end_scores = outputs['end_logits']

    answer_start = torch.argmax(
        answer_start_scores
    )  # Get the most likely beginning of answer with the argmax of the score
    answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

    input_ids = inputs["input_ids"].tolist()[0]
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

    print(f"Question: {question}")
    print(f"Answer: {answer}")


Question: How many new cases are reported?
Answer: nine
Question: How many local transmitted cases are there?
Answer: one
Question: Are there new cases in the community?
Answer: there were no new cases
Question: when will details be released?
Answer: later tonight


# 5. Fill in the Blank ([MASK])
Maked language modeling allows the model to perform this cloze task - fill in the blank considering the context from both left and right.

In [19]:
#=======Masked Language Modelling============
from transformers import pipeline
from transformers import AutoModelWithLMHead, AutoTokenizer
import torch
from pprint import pprint

fm = pipeline("fill-mask")


No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading pytorch_model.bin:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [39]:
fm("see <mask>")

[{'score': 0.18094217777252197,
  'token': 901,
  'token_str': ' More',
  'sequence': 'see More'},
 {'score': 0.12158650904893875,
  'token': 55,
  'token_str': ' more',
  'sequence': 'see more'},
 {'score': 0.06664672493934631,
  'token': 742,
  'token_str': ']',
  'sequence': 'see]'},
 {'score': 0.05112472176551819,
  'token': 9313,
  'token_str': ' »',
  'sequence': 'see »'},
 {'score': 0.04805416986346245,
  'token': 67,
  'token_str': ' also',
  'sequence': 'see also'}]

In [20]:
pprint(fm(f"HuggingFace is creating a {fm.tokenizer.mask_token} that the community uses to solve NLP tasks."))

[{'score': 0.17927458882331848,
  'sequence': 'HuggingFace is creating a tool that the community uses to solve '
              'NLP tasks.',
  'token': 3944,
  'token_str': ' tool'},
 {'score': 0.11349401623010635,
  'sequence': 'HuggingFace is creating a framework that the community uses to '
              'solve NLP tasks.',
  'token': 7208,
  'token_str': ' framework'},
 {'score': 0.05243555083870888,
  'sequence': 'HuggingFace is creating a library that the community uses to '
              'solve NLP tasks.',
  'token': 5560,
  'token_str': ' library'},
 {'score': 0.03493533283472061,
  'sequence': 'HuggingFace is creating a database that the community uses to '
              'solve NLP tasks.',
  'token': 8503,
  'token_str': ' database'},
 {'score': 0.028602583333849907,
  'sequence': 'HuggingFace is creating a prototype that the community uses to '
              'solve NLP tasks.',
  'token': 17715,
  'token_str': ' prototype'}]


In [21]:
#see more details using a specific model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased")
sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
input = tokenizer.encode(sequence, return_tensors="pt")
mask_token_index = torch.where(input == tokenizer.mask_token_id)[1]
token_logits = model(input)[0]
token_logits.size()

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]



Downloading pytorch_model.bin:   0%|          | 0.00/263M [00:00<?, ?B/s]

torch.Size([1, 30, 28996])

In [22]:
#get the logits for the masked token
mask_token_logits = token_logits[0, mask_token_index, :]
print(mask_token_logits.size())

top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()
for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

torch.Size([1, 28996])
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.


#Reference
Transformers documentations: https://huggingface.co/transformers/index.html