# Using Hugging Face Pipelines 🚀

Pipelines are the easiest way to use pre-trained models in Hugging Face. They provide a high-level interface that handles tokenization, model inference, and post-processing automatically.

## What are Pipelines?

Pipelines are **one-line solutions** for common NLP tasks:
- **Input**: Raw text
- **Output**: Ready-to-use results
- **Magic**: Everything happens behind the scenes!

```python
classifier = pipeline("sentiment-analysis")
result = classifier("I love this!")
# → [{'label': 'POSITIVE', 'score': 0.9998}]
```

## Learning Objectives

By the end of this notebook, you'll know how to:
1. Use all major pipeline types
2. Customize pipeline behavior and models
3. Handle batch processing and performance optimization
4. Build custom pipelines for specific needs
5. Compare pipeline performance and choose the right one
6. Integrate pipelines into real applications

Let's dive into the world of pipelines! 🌊

In [None]:
# Import essential libraries
import torch
from transformers import (
    pipeline,
    AutoTokenizer,
    AutoModel
)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
import warnings
from IPython.display import display, HTML
warnings.filterwarnings('ignore')

# Check device
device = 0 if torch.cuda.is_available() else -1
device_name = "GPU" if torch.cuda.is_available() else "CPU"
print(f"Using device: {device_name}")

# Set style for plots
plt.style.use('default')
sns.set_palette("husl")

print("All libraries imported successfully!")

## 1. Overview of Available Pipelines

Let's start by exploring all available pipeline tasks:

In [None]:
# Display all supported pipeline tasks
print("🎯 Available Pipeline Tasks:")
print("=" * 40)

# Group tasks by category with examples
task_categories = {
    "Text Analysis": [
        "sentiment-analysis",
        "text-classification", 
        "zero-shot-classification",
        "token-classification",
        "ner"
    ],
    "Text Generation": [
        "text-generation",
        "text2text-generation",
        "fill-mask",
        "conversational"
    ],
    "Question & Answer": [
        "question-answering",
        "table-question-answering"
    ],
    "Text Processing": [
        "summarization",
        "translation",
        "feature-extraction"
    ],
    "Audio & Vision": [
        "automatic-speech-recognition",
        "audio-classification",
        "image-classification",
        "object-detection"
    ]
}

for category, tasks in task_categories.items():
    print(f"\n{category}:")
    for task in tasks:
        print(f"  ✅ {task}")

print(f"\Total tasks shown: {sum(len(tasks) for tasks in task_categories.values())}")
print(f"We'll explore the most popular ones in this notebook!")

## 2. Sentiment Analysis Pipeline

Let's start with the most popular pipeline - sentiment analysis:

In [None]:
# Create sentiment analysis pipeline
print("🎭 Sentiment Analysis Pipeline")
print("=" * 32)

# Default sentiment pipeline
sentiment_pipeline = pipeline("sentiment-analysis", device=device)

# Test with various examples
test_texts = [
    "I absolutely love this new AI technology!",
    "This movie was terrible and boring.",
    "The weather is okay today, nothing special.",
    "Amazing breakthrough in machine learning! So excited!",
    "I'm not sure how I feel about this change...",
    "The service was disappointing and slow.",
    "Perfect! Everything worked exactly as expected.",
    "Meh, it's just average."
]

print("Testing sentiment analysis:")
print("-" * 30)

results = sentiment_pipeline(test_texts)

for text, result in zip(test_texts, results):
    label = result['label']
    score = result['score']
    
    # Add emoji based on sentiment
    emoji = "😊" if label == "POSITIVE" else "😞" if label == "NEGATIVE" else "😐"
    
    print(f"{emoji} '{text}'")
    print(f"   → {label} (confidence: {score:.3f})")
    print()

In [None]:
# Try different sentiment models
print("Comparing Different Sentiment Models")
print("=" * 40)

sentiment_models = [
    ("Default", "sentiment-analysis", None),
    ("Twitter-tuned", "sentiment-analysis", "cardiffnlp/twitter-roberta-base-sentiment-latest"),
    ("Financial", "sentiment-analysis", "ProsusAI/finbert")
]

test_sentence = "The stock market is showing positive trends today."
print(f"Test sentence: '{test_sentence}'\n")

for name, task, model in sentiment_models:
    try:
        if model:
            pipe = pipeline(task, model=model, device=device)
        else:
            pipe = sentiment_pipeline  # Use the one we already created
        
        result = pipe(test_sentence)[0]
        print(f"{name} model:")
        print(f"  Label: {result['label']}")
        print(f"  Score: {result['score']:.3f}")
        print()
        
    except Exception as e:
        print(f"{name} model: Error - {str(e)[:50]}...\n")

## 3. Text Classification Pipeline

Beyond sentiment, we can classify text into multiple categories:

In [None]:
# Zero-shot classification - classify without training data!
print(" Zero-Shot Classification Pipeline")
print("=" * 38)

zero_shot_classifier = pipeline("zero-shot-classification", device=device)

# Define candidate labels
candidate_labels = ["technology", "sports", "politics", "entertainment", "science", "business"]

test_articles = [
    "Apple announced a new iPhone with revolutionary AI capabilities.",
    "The Lakers won their championship game last night in overtime.",
    "Scientists discovered a new planet in a distant galaxy.",
    "The new Marvel movie breaks box office records worldwide.",
    "Stock markets rally as tech companies report strong earnings.",
    "New research shows promising results for cancer treatment."
]

print(f"Classifying into categories: {candidate_labels}\n")

for i, article in enumerate(test_articles, 1):
    result = zero_shot_classifier(article, candidate_labels)
    
    print(f"{i}. '{article}'")
    print(f"   🏆 Top prediction: {result['labels'][0]} ({result['scores'][0]:.3f})")
    
    # Show top 3 predictions
    print("   📊 All predictions:")
    for label, score in zip(result['labels'][:3], result['scores'][:3]):
        print(f"      • {label}: {score:.3f}")
    print()

## 4. Named Entity Recognition (NER) Pipeline

Extract people, organizations, locations, and other entities from text:

In [None]:
# Named Entity Recognition
print(" Named Entity Recognition Pipeline")
print("=" * 37)

ner_pipeline = pipeline("ner", aggregation_strategy="simple", device=device)

test_texts_ner = [
    "Apple Inc. was founded by Steve Jobs in Cupertino, California.",
    "Elon Musk announced that Tesla will build a new factory in Berlin, Germany.",
    "The meeting between Joe Biden and Emmanuel Macron was held in Paris.",
    "Microsoft's headquarters in Redmond employs over 50,000 people.",
    "Amazon reported $469 billion in revenue last year."
]

for i, text in enumerate(test_texts_ner, 1):
    entities = ner_pipeline(text)
    
    print(f"{i}. Text: '{text}'")
    print("   Entities found:")
    
    if entities:
        # Group entities by type
        entity_groups = {}
        for entity in entities:
            entity_type = entity['entity_group']
            if entity_type not in entity_groups:
                entity_groups[entity_type] = []
            entity_groups[entity_type].append(entity)
        
        for entity_type, entities_list in entity_groups.items():
            print(f"   📍 {entity_type}:")
            for entity in entities_list:
                print(f"      • {entity['word']} (confidence: {entity['score']:.3f})")
    else:
        print("   No entities found.")
    print()

In [None]:
# Visualize NER results
def visualize_ner(text, entities):
    """Create a simple HTML visualization of NER results."""
    
    # Color mapping for different entity types
    colors = {
        'PER': '#FFB6C1',      # Person - Light Pink
        'ORG': '#87CEEB',      # Organization - Sky Blue  
        'LOC': '#98FB98',      # Location - Pale Green
        'MISC': '#DDA0DD',     # Miscellaneous - Plum
        'PERSON': '#FFB6C1',
        'ORGANIZATION': '#87CEEB',
        'LOCATION': '#98FB98'
    }
    
    # Sort entities by start position
    sorted_entities = sorted(entities, key=lambda x: x['start'])
    
    html_text = text
    offset = 0
    
    for entity in sorted_entities:
        start = entity['start'] + offset
        end = entity['end'] + offset
        entity_type = entity['entity_group']
        word = entity['word']
        score = entity['score']
        
        color = colors.get(entity_type, '#FFFFE0')  # Default to light yellow
        
        # Create highlighted span
        highlighted = f'<span style="background-color: {color}; padding: 2px 4px; border-radius: 3px; font-weight: bold;" title="{entity_type}: {score:.3f}">{word}</span>'
        
        # Replace the original word with highlighted version
        html_text = html_text[:start] + highlighted + html_text[end:]
        
        # Update offset
        offset += len(highlighted) - (end - start)
    
    return html_text

# Example visualization
example_text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
example_entities = ner_pipeline(example_text)

print(" NER Visualization Example:")
print("=" * 30)
print(f"Original: {example_text}")
print("\nVisualized (colors represent entity types):")

# Show color legend
legend_html = """
<div style='margin: 10px 0;'>
    <b>Legend:</b><br>
    <span style='background-color: #FFB6C1; padding: 2px 4px; border-radius: 3px;'>Person</span>
    <span style='background-color: #87CEEB; padding: 2px 4px; border-radius: 3px;'>Organization</span>
    <span style='background-color: #98FB98; padding: 2px 4px; border-radius: 3px;'>Location</span>
</div>
"""

if example_entities:
    visualization = visualize_ner(example_text, example_entities)
    display(HTML(legend_html + "<div style='font-size: 14px; line-height: 1.5;'>" + visualization + "</div>"))
else:
    print("No entities found in the example.")

## 5. Question Answering Pipeline

Extract answers from context using question answering:

In [None]:
# Question Answering Pipeline
print(" Question Answering Pipeline")
print("=" * 32)

qa_pipeline = pipeline("question-answering", device=device)

# Provide context about Hugging Face
context = """
Hugging Face is an American company founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf. 
The company is headquartered in New York City with additional offices in Paris, France. 
Hugging Face is known for developing the Transformers library, which provides thousands of pre-trained models 
for natural language processing, computer vision, and audio tasks. The library supports both PyTorch and 
TensorFlow frameworks. In 2021, the company raised $40 million in Series B funding. 
The Hugging Face Hub hosts over 100,000 models and 10,000 datasets, making it the largest repository 
of open-source AI models. The company's mission is to democratize artificial intelligence.
"""

questions = [
    "When was Hugging Face founded?",
    "Who are the founders of Hugging Face?", 
    "Where is Hugging Face headquartered?",
    "What is the Transformers library?",
    "How much funding did they raise in 2021?",
    "How many models are on the Hugging Face Hub?",
    "What is Hugging Face's mission?",
    "Which frameworks does the library support?"
]

print(f"Context: {context[:100]}...\n")
print("Questions and Answers:")
print("-" * 25)

for i, question in enumerate(questions, 1):
    result = qa_pipeline(question=question, context=context)
    
    answer = result['answer']
    score = result['score']
    
    # Add confidence indicator
    confidence_emoji = "🟢" if score > 0.8 else "🟡" if score > 0.5 else "🔴"
    
    print(f"{i}. {confidence_emoji} Q: {question}")
    print(f"   A: {answer} (confidence: {score:.3f})")
    print()

## 6. Text Generation Pipeline

Generate creative text continuations:

In [None]:
# Text Generation Pipeline
print(" Text Generation Pipeline")
print("=" * 28)

# Load text generation pipeline
generator = pipeline("text-generation", model="gpt2", device=device)

prompts = [
    "Artificial intelligence will",
    "The future of technology",
    "Machine learning models are",
    "In the year 2030,",
    "The most exciting thing about AI is"
]

print("Generating text completions...\n")

for i, prompt in enumerate(prompts, 1):
    print(f"{i}. Prompt: '{prompt}'")
    
    # Generate multiple variations
    results = generator(
        prompt,
        max_new_tokens=30,  # Number of new tokens to generate
        num_return_sequences=2,  # Generate 2 variations
        temperature=0.7,  # Control randomness
        pad_token_id=generator.tokenizer.eos_token_id,  # Handle padding
        do_sample=True,  # Enable sampling
        truncation=True  # Enable truncation
    )
    
    print("   Generated completions:")
    for j, result in enumerate(results, 1):
        generated_text = result['generated_text']
        # Extract only the new part (after the prompt)
        completion = generated_text[len(prompt):].strip()
        print(f"   {j}. '{prompt} {completion}'")
    print()

## 7. Fill-Mask Pipeline

Fill in missing words in sentences:

In [None]:
# Fill-Mask Pipeline
print(" Fill-Mask Pipeline")
print("=" * 22)

fill_mask = pipeline("fill-mask", device=device)

masked_sentences = [
    "The capital of France is <mask>.",
    "Machine learning is a branch of <mask>.",
    "The <mask> is the largest planet in our solar system.",
    "Python is a popular <mask> language.",
    "<mask> revolutionized the field of natural language processing.",
    "The iPhone was invented by <mask>."
]

for i, sentence in enumerate(masked_sentences, 1):
    print(f"{i}. Sentence: '{sentence}'")
    
    # Get top predictions
    results = fill_mask(sentence)
    
    print("   Top predictions:")
    for j, result in enumerate(results[:3], 1):  # Show top 3
        token = result['token_str'].strip()
        score = result['score']
        filled_sentence = result['sequence']
        
        print(f"   {j}. '{token}' (score: {score:.3f})")
        print(f"      → '{filled_sentence}'")
    print()

## 8. Summarization Pipeline

Create concise summaries of long texts:

In [None]:
# Summarization Pipeline
print(" Summarization Pipeline")
print("=" * 26)

summarizer = pipeline("summarization", device=device)

# Long article about AI
long_article = """
Artificial Intelligence (AI) has become one of the most transformative technologies of the 21st century, 
fundamentally changing how we work, communicate, and solve complex problems. The field of AI encompasses 
machine learning, deep learning, natural language processing, computer vision, and robotics. 

Machine learning, a subset of AI, enables computers to learn and improve from experience without being 
explicitly programmed. Deep learning, which uses neural networks with multiple layers, has been 
particularly successful in tasks such as image recognition, speech recognition, and language translation.

Natural language processing (NLP) has made significant strides, with models like GPT and BERT 
demonstrating remarkable capabilities in understanding and generating human language. These advances 
have led to applications such as chatbots, virtual assistants, automated translation services, and 
content generation tools.

Computer vision technology has enabled machines to interpret and understand visual information, 
leading to applications in autonomous vehicles, medical imaging, security systems, and quality 
control in manufacturing.

The impact of AI extends across various industries including healthcare, finance, education, 
entertainment, and transportation. In healthcare, AI assists in drug discovery, medical diagnosis, 
and personalized treatment plans. In finance, it helps with fraud detection, algorithmic trading, 
and risk assessment.

However, the rapid advancement of AI also raises important ethical considerations and challenges. 
Issues such as bias in AI systems, privacy concerns, job displacement, and the need for AI governance 
and regulation are becoming increasingly important topics of discussion among researchers, policymakers, 
and society at large.
"""

print(f"Original article length: {len(long_article)} characters")
print(f"Word count: approximately {len(long_article.split())} words")
print()

# Generate different types of summaries
summary_configs = [
    {"max_length": 100, "min_length": 50, "name": "Short Summary"},
    {"max_length": 150, "min_length": 80, "name": "Medium Summary"},
    {"max_length": 200, "min_length": 120, "name": "Detailed Summary"}
]

for config in summary_configs:
    print(f"📄 {config['name']}:")
    
    summary = summarizer(
        long_article,
        max_length=config['max_length'],
        min_length=config['min_length'],
        do_sample=False  # Use deterministic summarization
    )
    
    summary_text = summary[0]['summary_text']
    
    print(f"   → {summary_text}")
    print(f"   Length: {len(summary_text)} characters")
    print(f"   Word count: {len(summary_text.split())} words\n")