# **Understanding Pipelines for Inference in Transformers**

#### **What are Pipelines for Inference?**  
Pipelines in the Hugging Face `transformers` library provide a high-level API that simplifies using models for inference. These pipelines abstract the complexity of preprocessing, model loading, and post-processing, allowing you to run inference without deep knowledge of the underlying model architecture.  

Hugging Face pipelines support various **tasks**, including:  
- **Natural Language Processing (NLP)** (e.g., text classification, question answering)  
- **Computer Vision** (e.g., image classification, object detection)  
- **Speech Processing** (e.g., speech-to-text, text-to-speech)  
- **Multimodal Applications** (e.g., visual question answering)  

In [1]:
pip install transformers

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install torch diffusers

Note: you may need to restart the kernel to use updated packages.


**This is to ignore any unwanted warnings as we might not be using any models in pipelines as we are using different examples for understanding**

In [3]:
import logging
from transformers.utils import logging as hf_logging

hf_logging.set_verbosity_error()

import warnings
warnings.filterwarnings("ignore")

# **Let's look at some basic examples to get started with working with Pipelines**

## **1- Text Summarization**
**Summarizing long articles into concise summaries.**

In [4]:
from transformers import pipeline

summarizer = pipeline("summarization")
text = """Machine learning is a field of artificial intelligence that uses statistical techniques to give computers the ability to learn from data. The field has seen immense growth in the last decade, with applications in various industries such as healthcare, finance, and automation."""
print(summarizer(text, max_length=30, min_length=20))


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' Machine learning is a field of artificial intelligence that uses statistical techniques to give computers the ability to learn from data . The field has seen immense'}]


## **2- Sentiment Analysis**
**Classifying text sentiment as positive, negative, or neutral.**

In [5]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I love how user-friendly this library is!")
print(result)   

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.999618649482727}]


## **3- Named Entity Recognition (NER)**
**Detecting named entities like people, locations, and organizations.**

In [6]:
from transformers import pipeline

ner = pipeline("ner")
text = "Elon Musk is the CEO of Tesla and SpaceX, headquartered in the United States."
print(ner(text))

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

[{'entity': 'I-PER', 'score': 0.9996457, 'index': 1, 'word': 'El', 'start': 0, 'end': 2}, {'entity': 'I-PER', 'score': 0.9992587, 'index': 2, 'word': '##on', 'start': 2, 'end': 4}, {'entity': 'I-PER', 'score': 0.9994752, 'index': 3, 'word': 'Mu', 'start': 5, 'end': 7}, {'entity': 'I-PER', 'score': 0.99885345, 'index': 4, 'word': '##sk', 'start': 7, 'end': 9}, {'entity': 'I-ORG', 'score': 0.99793833, 'index': 9, 'word': 'Te', 'start': 24, 'end': 26}, {'entity': 'I-ORG', 'score': 0.9967206, 'index': 10, 'word': '##sla', 'start': 26, 'end': 29}, {'entity': 'I-ORG', 'score': 0.99916244, 'index': 12, 'word': 'Space', 'start': 34, 'end': 39}, {'entity': 'I-ORG', 'score': 0.9990557, 'index': 13, 'word': '##X', 'start': 39, 'end': 40}, {'entity': 'I-LOC', 'score': 0.99965036, 'index': 18, 'word': 'United', 'start': 63, 'end': 69}, {'entity': 'I-LOC', 'score': 0.99969244, 'index': 19, 'word': 'States', 'start': 70, 'end': 76}]


## **4- Automatic Speech Recognition (ASR)**
**Transcribing speech from an audio file.**

In [7]:
from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="openai/whisper-large-v2")
print(asr("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"))

config.json:   0%|          | 0.00/1.99k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/6.17G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/4.29k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/283k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/836k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.19k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}


## **5- Machine Translation**
**Translating text from one language to another.**

In [8]:
from transformers import pipeline

translator = pipeline("translation_en_to_fr")
print(translator("Hello, how are you today?"))

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

[{'translation_text': "Bonjour, comment vous êtes-vous aujourd'hui?"}]


## **6- Question Answering**
**Extracting answers from a given context.**

In [9]:
from transformers import pipeline

qa = pipeline("question-answering")
context = "Hugging Face is a company based in New York that specializes in NLP models."
question = "Where is Hugging Face located?"
print(qa(question=question, context=context))

config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

{'score': 0.9973986744880676, 'start': 35, 'end': 43, 'answer': 'New York'}


## **7- Image Classification**
**Classifying images into categories.**

In [10]:
from transformers import pipeline

image_classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
image_url = "https://cdn.pixabay.com/photo/2024/12/31/01/02/costa-rica-9301364_960_720.jpg"
print(image_classifier(image_url))

config.json:   0%|          | 0.00/69.7k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/346M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

[{'label': 'toucan', 'score': 0.9976680874824524}, {'label': 'hornbill', 'score': 0.0002660394529812038}, {'label': 'jacamar', 'score': 0.00021827677846886218}, {'label': 'macaw', 'score': 7.138107321225107e-05}, {'label': 'coucal', 'score': 4.063433880219236e-05}]


In [11]:
image_classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
image_url = "https://cdn.pixabay.com/photo/2025/01/13/19/40/horse-9331340_1280.jpg"
print(image_classifier(image_url))

[{'label': 'muzzle', 'score': 0.26967158913612366}, {'label': 'sorrel', 'score': 0.17724226415157318}, {'label': 'cowboy hat, ten-gallon hat', 'score': 0.13888441026210785}, {'label': 'sombrero', 'score': 0.019579362124204636}, {'label': 'bolo tie, bolo, bola tie, bola', 'score': 0.015707332640886307}]


## **8- Zero-Shot Text Classification**
**Classifying text into categories that were not part of training.**

In [12]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
text = "I recently bought an iPhone and it keeps freezing."
labels = ["electronics", "fashion", "sports"]
print(classifier(text, candidate_labels=labels))

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'I recently bought an iPhone and it keeps freezing.', 'labels': ['electronics', 'sports', 'fashion'], 'scores': [0.9932963252067566, 0.0034089444670826197, 0.0032946793362498283]}


## **9- Visual Question Answering (VQA)**
**Answering questions about an image.**

In [13]:
from transformers import pipeline

vqa = pipeline("vqa", model="dandelin/vilt-b32-finetuned-vqa")  

result = vqa(
    image="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg",
    question="What is the animal in the image?"
)  
print(result) 

config.json:   0%|          | 0.00/136k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/470M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/320 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/251 [00:00<?, ?B/s]

[{'score': 0.2977306544780731, 'answer': 'bear'}, {'score': 0.18058913946151733, 'answer': 'cat'}, {'score': 0.02547476626932621, 'answer': 'polar bear'}, {'score': 0.018054964020848274, 'answer': 'yes'}, {'score': 0.014309295453131199, 'answer': 'tiger'}]


## **10- Text-to-Speech (TTS) Pipeline**
**Text-to-Speech (TTS) Pipeline**

In [14]:
from transformers import pipeline
import IPython.display as ipd

# Load a compatible text-to-speech model
tts = pipeline("text-to-speech", model="facebook/mms-tts-eng")

# Generate speech
speech = tts("Hugging Face provides powerful AI models! Hi my name is Code Cavalier")

# Play the audio
ipd.Audio(speech["audio"], rate=22050)  # Adjust rate if needed


config.json:   0%|          | 0.00/1.64k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/145M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/413 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/47.0 [00:00<?, ?B/s]

## **11- Fill-in-the-Blank (Masked Language Modeling) Pipeline**
**Predict the missing word in a sentence using BERT-based models.**

In [15]:
mlm = pipeline("fill-mask", model="bert-base-uncased")  

result = mlm("Hugging Face is making AI [MASK] for everyone!")  
print(result)  

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

[{'score': 0.18804790079593658, 'token': 2571, 'token_str': '##le', 'sequence': 'hugging face is making aile for everyone!'}, {'score': 0.11470410972833633, 'token': 4737, 'token_str': 'worry', 'sequence': 'hugging face is making ai worry for everyone!'}, {'score': 0.04575171694159508, 'token': 4244, 'token_str': '##les', 'sequence': 'hugging face is making ailes for everyone!'}, {'score': 0.038535021245479584, 'token': 2140, 'token_str': '##l', 'sequence': 'hugging face is making ail for everyone!'}, {'score': 0.03673723340034485, 'token': 5390, 'token_str': 'cry', 'sequence': 'hugging face is making ai cry for everyone!'}]


## **12- Keyword Extraction (Feature Extraction) Pipeline**
**Extract meaningful numerical embeddings for text.**

In [16]:
feature_extractor = pipeline("feature-extraction", model="bert-base-uncased")  

embedding = feature_extractor("Hugging Face makes AI accessible.")  
print(len(embedding[0]))  # Output: 768 (dimension of BERT embeddings)  

8


## **13- Custom Model Loading with Pipeline**
**Use your own fine-tuned Hugging Face model.**

In [17]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer  

model_name = "distilbert-base-uncased-finetuned-sst-2-english"  # Replace with your own model  
model = AutoModelForSequenceClassification.from_pretrained(model_name)  
tokenizer = AutoTokenizer.from_pretrained(model_name)  

custom_pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer)  
print(custom_pipeline("I love this product!"))  

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9998855590820312}]


## **14- Multimodal AI: Image + Text Analysis**
**Use CLIP to find the best matching caption for an image.**

In [18]:
clip = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch16")  

result = clip(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg",
    candidate_labels=["a fluffy cat", "a dog", "a bird"]
)  
print(result)  

config.json:   0%|          | 0.00/4.10k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/599M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/961k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.22M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

[{'score': 0.9993884563446045, 'label': 'a fluffy cat'}, {'score': 0.00039234315045177937, 'label': 'a dog'}, {'score': 0.00021921691950410604, 'label': 'a bird'}]


## **15- Custom Tokenizer with Pipeline**
**Use a custom tokenizer for better control over text processing.**

In [19]:
from transformers import AutoTokenizer  

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")  
tokens = tokenizer("Hugging Face is democratizing AI!", return_tensors="pt")  
print(tokens)  

{'input_ids': tensor([[  101, 17662,  2227,  2003,  7672,  6026,  9932,   999,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}
