Take a look at the pipeline() documentation for a complete list of supported tasks and available parameters.

In [1]:
from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This restaurant is awesome")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.





[{'label': 'POSITIVE', 'score': 0.9998743534088135}]

In [3]:
pipe = pipeline(model="FacebookAI/roberta-large-mnli")
pipe("This restaurant is awesome")

Some weights of the model checkpoint at FacebookAI/roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'label': 'NEUTRAL', 'score': 0.7313135266304016}]

In [4]:
pipe = pipeline("text-classification")
pipe(["This restaurant is awesome","This restaurant is awful"])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998743534088135},
 {'label': 'NEGATIVE', 'score': 0.9996669292449951}]

In [6]:
import datasets
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from tqdm.auto import tqdm

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")

Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You sho

KeyError: 'tags'

### Pipeline usage

In [7]:
from transformers import pipeline

transcriber = pipeline(task="automatic-speech-recognition")

No model was supplied, defaulted to facebook/wav2vec2-base-960h and revision 55bb623 (https://huggingface.co/facebook/wav2vec2-base-960h).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebo

In [8]:
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")

ValueError: ffmpeg was not found but is required to load audio files from filename

### Using pipelines on a dataset

In [16]:
def data():
    for i in range(100):
        yield f"My example is {i}"

pipe = pipeline(model="openai-community/gpt2", device=0)
generated_characters = 0
for out in pipe(data()):
    print(out[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 0.8 to the left in the image, the 1/0 of the right one is 0.8, and the 1/1 above is 0.8.

Figure 16. Differential scale of a standard deviation is


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 1 x 1 = 1 = 1.

The number of ways in which we can determine the number of ways in which we can detect whether a given quantity has a certain number of values. The numbers are as follows:




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 2.5mm (6.7 in) which is a very short range 1/9″ diameter.

I use 3-7-8.5mm for the 5/8″ line. This may not be a


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 3x the time which means i cannot change my weight for 20 minutes.

But here's the bad thing about my example, it's really simple.

Once I've changed my weight i put the weight back in and


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 4 times the size of an English tree. The largest one would be an entire acre. There are lots of things to keep in mind about our native, living, thriving area.

For example, consider the size of your school


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 5-7 years old. I love it. My 5-7 year old loves it.

When the kids grow older they don't want to be happy because that is one thing that makes them not happy. They want to


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 6 people at the cafe - there should be between 17 and 30.

When I told her that, the person smiled. "What is it?" he asked her.

I started laughing the whole time. "This is


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 7.4.6 (4.2.9.4), so we can add this to the table, as shown below (Note: There are several other variants of the same variable which does not change the values):




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 8,100,000 lines of code, and the line sizes are 6,846,000,000 lines...


If you could tell an average person who is writing about your code what you are writing now, it would


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 9x9 at 8x9. We call the number 9x9 and the number 9x9 equals 9x7. Now we can find the line in the first section that says 9x11 (which is 9x11


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 10 years old. I don't know if I've ever experienced it in my career. But, now, I have. The same things I do. I've changed. I've come. But, you know, it's going


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 11.3 in the previous edition and 4 in every other, except for 4.3 in 2008, so it's certainly not all that big. So, the fact remains, the cost of making a big system bigger can be high


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 12,000 girls are raped or molested.

This is a time when the first step of understanding the concept of rape is to look at the historical context and remember that the world goes on in a very different way than we


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 13 years ago; when I was 13-15, my mother was visiting with her brother in the same church in a small town in Ohio. Our church was in Ohio, and a little while ago, I went to meet my mother


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 14. It is only when our task is clear that the rest can be learned. This task is called "a learning task". A learning task starts by learning a number of words about yourself based on the context you describe. Your task


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 15 minutes ago.

As always, if you happen to be visiting on a trip and you need help, please leave a comment or email for the contact info for any assistance and to see if your child was able to get permission


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 16 years old, we are at age 40. We do something. We are getting the experience we've been waiting for."

While it hasn't been made official by the school district yet, they have been working with the school


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 17+ years old. My mother is very good with her son now.

My father came up to me and asked me something. I said the right thing. He said how can you do this thing? I said I have


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 18 years old and still living.

I would want my family to have no money on the loan we both took away in the case of 'fraud'; or, perhaps because they still don't have one.

You


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 19th century Russia where the czarist and tsarist governments were corrupt and at all times corrupt in power and power fell in line of international law, to be applied for state aid and support. And it was in fact the Russian government


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 20 x 14 x 16.5"

The model for these measurements is 2.85" wide by 1.5 inches deep, but the same model is sold as wide by 1.5".


The first example is


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 21st century. What I am describing here is an extremely rare and very dangerous epidemic.

Dr. Seuss's view on the emergence of this epidemic, and many others will recall, had never existed when Dr. Seuss


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My example is 22 years old at the time of writing. She is a woman of color in America. On a family budget I have a child of color who was born and raised in Louisiana, living on welfare and fighting injustice all by herself at home


KeyboardInterrupt: 

In [17]:
from transformers.pipelines.pt_utils import KeyDataset
from datasets import load_dataset

pipe = pipeline(model="hf-internal-testing/tiny-random-wav2vec2", device=0)
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation[:10]")
for out in pipe(KeyDataset(dataset, "audio")):
    print(out)

config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/829k [00:00<?, ?B/s]

Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at hf-internal-testing/tiny-random-wav2vec2 and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.feature_extractor.conv_layers.1.layer_norm.bias', 'wav2vec2.feature_extractor.conv_layers.1.layer_norm.weight', 'wav2vec2.feature_extractor.conv_layers.2.layer_norm.bias', 'wav2vec2.feature_extractor.conv_layers.2.layer_norm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/554 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/291 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/215 [00:00<?, ?B/s]

KeyError: 'tags'

### Vision pipeline

In [18]:
from transformers import pipeline

vision_classifier = pipeline(model="google/vit-base-patch16-224")
preds = vision_classifier(
    images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)

config.json:   0%|          | 0.00/69.7k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/346M [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.hf.co/google/vit-base-patch16-224/1cea07110a4a47edc51420b2dda6f3b8b58e7256e8f44b4ea6aa9696162ccb5d?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&Expires=1737035255&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczNzAzNTI1NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9nb29nbGUvdml0LWJhc2UtcGF0Y2gxNi0yMjQvMWNlYTA3MTEwYTRhNDdlZGM1MTQyMGIyZGRhNmYzYjhiNThlNzI1NmU4ZjQ0YjRlYTZhYTk2OTYxNjJjY2I1ZD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=X%7EX9CHkX1hovVAl17i4YbSDORQ5ZVFfezAw3fNRck5Ba746b%7EDDB-A5HgLi-XGHSyxerMeAFxSyXyTdkRe2sOz-m5FAO2qqPjb2x1cZNP%7E8JiTB0hLhE04GElxPh5xkmCBYm7%7Ew68aH6bmqecCP3gRiER3xKpFpFb-cfWjHmh1EWKEzVH4C-kZFXhJg025-Ia39nyaiRBZSc1Sy26AcvWgWksBm9lQPnPnch2ANpPKfCc5d9u8D%7EYXIOfCw74A2xwYf38KTHjJGw1JqgYrjddN778RtzUydSk-PbZ8%7EIrvkl3Urd50Qo3wZkn5lCcBk-2QekRx4mjHrGIXbri-kDfg__&Key-Pair-Id=K3RPWS32NSSJCE: H

model.safetensors:  21%|##1       | 73.4M/346M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

In [21]:
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

[{'score': 0.4335, 'label': 'lynx, catamount'},
 {'score': 0.0348,
  'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'},
 {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'},
 {'score': 0.0239, 'label': 'Egyptian cat'},
 {'score': 0.0229, 'label': 'tiger cat'}]

In [22]:
from transformers import pipeline

classifier = pipeline(model="facebook/bart-large-mnli")
classifier(
    "I have a problem with my iphone that needs to be resolved asap!!",
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
)

{'sequence': 'I have a problem with my iphone that needs to be resolved asap!!',
 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'],
 'scores': [0.503635048866272,
  0.4788002669811249,
  0.012600133195519447,
  0.002655784599483013,
  0.002308758208528161]}

In [28]:
from transformers import pipeline

vqa = pipeline(model="impira/layoutlm-document-qa")

output = vqa(
    image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png",
    question="What is the invoice number?",
)

ValueError: If you provide an image without word_boxes, then the pipeline will run OCR using Tesseract, but pytesseract is not available

In [26]:
output

NameError: name 'output' is not defined

### Using pipeline on large models with 🤗 accelerate :

In [29]:
!pip install accelerate




[notice] A new release of pip is available: 23.2.1 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [30]:
import torch
from transformers import pipeline

pipe = pipeline(model="facebook/opt-1.3b", torch_dtype=torch.bfloat16, device_map="auto")
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)

config.json:   0%|          | 0.00/653 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/2.63G [00:00<?, ?B/s]

pytorch_model.bin:  22%|##2       | 587M/2.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]