# Pipelines for inference


In this notebook, we look at a few tasks for which we can run inference on pretrained models using predefined pipelines. The complete list of tasks for which a pipeline is available can be found here: https://huggingface.co/docs/transformers/main_classes/pipelines.

If you are using Google Colab, make sure that you are using a GPU (Runtime > Change runtime type > Hardware accelerator > GPU).

In [1]:
# First, we need to install the transformers library and import the pipeline.
!pip install transformers
from transformers import pipeline



  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


### 1 Sentiment analysis pipeline

To start with, we look at a sentiment analysis example.

Run the code below. Try it also on some sentence where the sentiment is not as obvious. Are you still getting good results?:

In [2]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [6]:
texts = [
    "Transformers are cool models with great performance on most NLP tasks.",
    "I don't want to learn about all these different models, it's boring.",
    "Some times i enjoy going out but i like to stay at home too.",
    "I love my dog, but I hate him too.",
    "I hate my dog, but I love him too"
]

for text in texts:
  print("Text: ", text)
  result = classifier(text)[0]
  print(f"label: {result['label']}, with score: {round(result['score'], 4)}\n")

Text:  Transformers are cool models with great performance on most NLP tasks.
label: POSITIVE, with score: 0.9997

Text:  I don't want to learn about all these different models, it's boring.
label: NEGATIVE, with score: 0.9995

Text:  Some times i enjoy going out but i like to stay at home too.
label: POSITIVE, with score: 0.9983

Text:  I love my dog, but I hate him too.
label: NEGATIVE, with score: 0.9669

Text:  I hate my dog, but I love him too
label: POSITIVE, with score: 0.9988



### 2 Summarization pipeline

Next, we try the text summarization pipeline.

Run the code below. Try it also with your own text and/or different summary lengths.

In [7]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [8]:
text = """
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data.
It is used primarily in the fields of natural language processing (NLP) and computer vision (CV).

Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language,
with applications towards tasks such as translation and text summarization. However, unlike RNNs, transformers process the entire input all at once.
The attention mechanism provides context for any position in the input sequence. For example, if the input data is a natural language sentence,
the transformer does not have to process one word at a time. This allows for more parallelization than RNNs and therefore reduces training times.

Transformers were introduced in 2017 by a team at Google Brain and are increasingly the model of choice for NLP problems,
 replacing RNN models such as long short-term memory (LSTM). The additional training parallelization allows training on larger datasets.
 This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers)
 and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus and Common Crawl,
 and can be fine-tuned for specific tasks.
"""
print(summarizer(text, max_length=130, min_length=30))

[{'summary_text': ' A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data . It is used primarily in the fields of natural language processing (NLP) and computer vision .'}]


### 3 Question answering pipeline

The question answering pipeline does not generate answers on its own, but extracts them from the supplied text (in the example below: the Wikipedia article on the Transformer).

Run the code and try it with your own examples if you like.

In [9]:
question_answerer = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [10]:
context = r"""
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data.
It is used primarily in the fields of natural language processing (NLP) and computer vision (CV).

Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language,
with applications towards tasks such as translation and text summarization. However, unlike RNNs, transformers process the entire input all at once.
The attention mechanism provides context for any position in the input sequence. For example, if the input data is a natural language sentence,
the transformer does not have to process one word at a time. This allows for more parallelization than RNNs and therefore reduces training times.

Transformers were introduced in 2017 by a team at Google Brain and are increasingly the model of choice for NLP problems,
 replacing RNN models such as long short-term memory (LSTM). The additional training parallelization allows training on larger datasets.
 This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers)
 and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus and Common Crawl,
 and can be fine-tuned for specific tasks.
"""

questions = [
    "What is a transformer?",
    "In what fields are transformers primarily used?",
    "When were transformers introduced?"
]

for question in questions:
  print("Question: ", question)
  result = question_answerer(question=question, context=context)
  print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\n")

Question:  What is a transformer?
Answer: 'a deep learning model', score: 0.5247, start: 18, end: 39

Question:  In what fields are transformers primarily used?
Answer: 'natural language processing (NLP) and computer vision (CV)', score: 0.5768, start: 197, end: 255

Question:  When were transformers introduced?
Answer: '2017', score: 0.9723, start: 855, end: 859



### 4 Text generation pipeline

In [11]:
text_generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

  _torch_pytree._register_pytree_node(


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Try running the following cell several times and see how the output changes. Are you getting reasonable results? Try it with your own prompts! You can also change the maximum length of the generated text.

In [12]:
print(text_generator("A transformer is a deep learning model that", max_length=100))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'A transformer is a deep learning model that describes the computation of discrete neural networks with recurrent weights (Figure 1B). In an example, the training block for the first neuron is computed by averaging the inputs onto a recurrent neural network to train that network. When a neural network has tens of thousands to millions of neurons in it, it can compute the tens of thousands of neural networks needed to solve for weights (Supplementary Fig. 1). We call for some way of solving for it: an NNN'}]


### 5 Running a pipeline with your chosen pretrained model

As you may have noticed, the pipeline for each task is associated with a default model, which is automatically downloaded if no model is specified. However, it is also possible to specify the model that you want to use explicitly. Run inference on one of the above (or other) tasks using a different model.

The list of available pretrained models can be found here: https://huggingface.co/models. Tip: filter the models by language and task first (sentiment analysis falls under text classification).

If you click on your chosen model, the model card should show instructions on how to use it, e.g. https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english.

Try a different model on one of the tasks. How do the results compare to the default model?

In [14]:
from transformers import pipeline
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example     of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
"""

result = question_answerer(question="What is a good example of a question answering dataset?",     context=context)
print(
f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
)


Answer: 'SQuAD dataset', score: 0.5152, start: 151, end: 164


In [15]:
from transformers import pipeline
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

context = r"""
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data.
It is used primarily in the fields of natural language processing (NLP) and computer vision (CV).

Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language,
with applications towards tasks such as translation and text summarization. However, unlike RNNs, transformers process the entire input all at once.
The attention mechanism provides context for any position in the input sequence. For example, if the input data is a natural language sentence,
the transformer does not have to process one word at a time. This allows for more parallelization than RNNs and therefore reduces training times.

Transformers were introduced in 2017 by a team at Google Brain and are increasingly the model of choice for NLP problems,
 replacing RNN models such as long short-term memory (LSTM). The additional training parallelization allows training on larger datasets.
 This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers)
 and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus and Common Crawl,
 and can be fine-tuned for specific tasks.
"""

questions = [
    "What is a transformer?",
    "In what fields are transformers primarily used?",
    "When were transformers introduced?"
]

for question in questions:
  print("Question: ", question)
  result = question_answerer(question=question, context=context)
  print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\n")

Question:  What is a transformer?
Answer: 'a deep learning model', score: 0.5247, start: 18, end: 39

Question:  In what fields are transformers primarily used?
Answer: 'natural language processing (NLP) and computer vision (CV)', score: 0.5768, start: 197, end: 255

Question:  When were transformers introduced?
Answer: '2017', score: 0.9723, start: 855, end: 859

