# 01 - Hugging Face Pipeline Basics

In this notebook, you'll learn:
- What a pipeline is and why it exists
- The internal workflow of a pipeline
- How to create and configure pipelines
- Key parameters and options
- Device management (CPU/GPU)

## Setup

First, let's install and import the necessary libraries.

In [None]:
# Install if needed (uncomment)
# !pip install transformers torch

In [None]:
from transformers import pipeline, AutoTokenizer, AutoModel, AutoModelForSequenceClassification
import torch

## 1. What is a Pipeline?

A **Pipeline** is a high-level API that abstracts away the complexity of:
1. Loading the correct model and tokenizer
2. Preprocessing (tokenization, tensor conversion)
3. Running inference
4. Post-processing (decoding, formatting)

### The Pipeline Workflow

```
Raw Input â†’ Tokenizer â†’ Model â†’ Post-processor â†’ Output
   â†“            â†“          â†“           â†“            â†“
"Hello"    [101,7592]   logits     softmax    POSITIVE 99%
```

## 2. Creating Your First Pipeline

The simplest way to create a pipeline is to specify just the task.

In [None]:
# Create a sentiment analysis pipeline
# This automatically downloads the default model for this task
classifier = pipeline("sentiment-analysis")

# Use it!
result = classifier("I love learning about transformers and NLP!")
print(result)

### Understanding the Output

The output is a list of dictionaries. Each dictionary contains:
- `label`: The predicted class (e.g., POSITIVE, NEGATIVE)
- `score`: Confidence score (0-1)

In [None]:
# Let's examine the output structure
for item in result:
    print(f"Label: {item['label']}")
    print(f"Score: {item['score']:.4f}")
    print(f"Confidence: {item['score']*100:.2f}%")

## 3. What's Happening Under the Hood?

Let's break down what the pipeline does internally.

In [None]:
# WITHOUT a pipeline - doing it manually:

# Step 1: Load tokenizer and model
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Step 2: Tokenize input
text = "I love learning about transformers!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

print("Tokenized inputs:")
print(f"Input IDs: {inputs['input_ids']}")
print(f"Attention Mask: {inputs['attention_mask']}")

In [None]:
# Step 3: Run model inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

print(f"Raw logits: {logits}")

In [None]:
# Step 4: Post-process (softmax to get probabilities)
probabilities = torch.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()
confidence = probabilities[0][predicted_class].item()

# Map to label
labels = model.config.id2label
print(f"\nPredicted: {labels[predicted_class]} ({confidence:.4f})")

print("\n" + "="*50)
print("That was ~15 lines of code. With pipeline: 2 lines!")

## 4. Pipeline Configuration Options

Pipelines are highly configurable. Here are the key parameters:

In [None]:
# Option 1: Specify a different model
classifier_v2 = pipeline(
    "sentiment-analysis",
    model="nlptown/bert-base-multilingual-uncased-sentiment"  # 5-star rating model
)

result = classifier_v2("This product is absolutely fantastic!")
print(f"5-star model result: {result}")

In [None]:
# Option 2: Specify model AND tokenizer separately
classifier_custom = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    tokenizer="distilbert-base-uncased-finetuned-sst-2-english"
)

# This is useful when you have custom tokenizers or fine-tuned models

In [None]:
# Option 3: Use pre-loaded model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english"
)
tokenizer = AutoTokenizer.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english"
)

# Pass them to pipeline
classifier_preloaded = pipeline(
    "sentiment-analysis",
    model=model,
    tokenizer=tokenizer
)

print(classifier_preloaded("This is really useful for production!"))

## 5. Device Management (CPU/GPU)

Pipelines can run on CPU or GPU. Here's how to control this:

In [None]:
# Check if GPU is available
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# Method 1: Explicit device ID
# device=0 means first GPU, device=-1 means CPU
if torch.cuda.is_available():
    classifier_gpu = pipeline("sentiment-analysis", device=0)
    print("Running on GPU!")
else:
    classifier_cpu = pipeline("sentiment-analysis", device=-1)
    print("Running on CPU")

In [None]:
# Method 2: Automatic device mapping (great for large models)
# device_map="auto" automatically distributes model across available devices
generator = pipeline(
    "text-generation",
    model="gpt2",
    device_map="auto"  # Automatically chooses best device
)

print("Model loaded with automatic device mapping")

## 6. Batch Processing

Pipelines can efficiently process multiple inputs at once.

In [None]:
# Single input
single_result = classifier("I love this!")

# Batch input - just pass a list!
texts = [
    "I absolutely love this product!",
    "This is terrible, worst purchase ever.",
    "It's okay, nothing special.",
    "Amazing quality, highly recommended!",
    "Not worth the money."
]

batch_results = classifier(texts)

# Display results
for text, result in zip(texts, batch_results):
    print(f"{result['label']:8} ({result['score']:.2f}): {text[:40]}...")

In [None]:
# Control batch size for memory management
large_batch = texts * 100  # 500 texts

# Process with explicit batch size
results = classifier(large_batch, batch_size=16)  # Process 16 at a time

print(f"Processed {len(results)} texts")
print(f"Sample result: {results[0]}")

## 7. Pipeline Parameters at Inference Time

Many parameters can be set when calling the pipeline:

In [None]:
# Example with text generation - many inference parameters
generator = pipeline("text-generation", model="gpt2")

# Basic generation
result = generator("The future of AI is")
print("Basic:", result[0]['generated_text'])
print()

In [None]:
# With parameters
result = generator(
    "The future of AI is",
    max_length=50,           # Maximum length of generated text
    num_return_sequences=3,  # Generate 3 different completions
    temperature=0.7,         # Control randomness (lower = more deterministic)
    top_k=50,               # Only sample from top 50 tokens
    top_p=0.95,             # Nucleus sampling
    do_sample=True          # Enable sampling (vs greedy decoding)
)

print("Generated sequences:")
for i, seq in enumerate(result):
    print(f"\n{i+1}. {seq['generated_text']}")

## 8. Accessing Pipeline Components

You can access the underlying model and tokenizer:

In [None]:
classifier = pipeline("sentiment-analysis")

# Access the model
print(f"Model: {type(classifier.model).__name__}")
print(f"Model config: {classifier.model.config.model_type}")

# Access the tokenizer
print(f"\nTokenizer: {type(classifier.tokenizer).__name__}")
print(f"Vocab size: {classifier.tokenizer.vocab_size}")

# Access device info
print(f"\nDevice: {classifier.device}")

In [None]:
# You can even use the tokenizer directly
tokens = classifier.tokenizer("Hello world!")
print(f"Tokens: {tokens}")

# Decode back
decoded = classifier.tokenizer.decode(tokens['input_ids'])
print(f"Decoded: {decoded}")

## 9. Memory Management and Model Loading

Tips for managing memory when working with pipelines:

In [None]:
import gc

# Tip 1: Use half precision to reduce memory
if torch.cuda.is_available():
    classifier_fp16 = pipeline(
        "sentiment-analysis",
        torch_dtype=torch.float16,  # Half precision
        device=0
    )
    print("Loaded model in FP16 (half precision)")

In [None]:
# Tip 2: Delete pipeline and free memory when done
del classifier
gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()
print("Memory freed!")

In [None]:
# Tip 3: Use smaller models for development
# DistilBERT is ~40% smaller than BERT but retains 97% performance
small_classifier = pipeline(
    "text-classification",
    model="distilbert-base-uncased"  # Smaller model
)

## 10. Summary: Pipeline Creation Patterns

Here's a reference of the different ways to create pipelines:

In [None]:
# Pattern 1: Task only (uses default model)
pipe1 = pipeline("sentiment-analysis")

# Pattern 2: Task + specific model
pipe2 = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Pattern 3: Task + model + tokenizer
pipe3 = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    tokenizer="distilbert-base-uncased-finetuned-sst-2-english"
)

# Pattern 4: Full configuration
pipe4 = pipeline(
    task="sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    tokenizer="distilbert-base-uncased-finetuned-sst-2-english",
    device=0 if torch.cuda.is_available() else -1,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    batch_size=8
)

print("All patterns work! Choose based on your needs.")

## ðŸŽ¯ Key Takeaways

1. **Pipelines are abstractions** - They wrap tokenizer + model + post-processing
2. **Minimal code** - 2-3 lines vs 15-20 lines manually
3. **Configurable** - Choose model, device, precision, batch size
4. **Automatic downloads** - Models come from Hugging Face Hub
5. **Batch processing** - Pass a list for efficient processing
6. **Access internals** - `.model` and `.tokenizer` attributes available

## Next Steps

Continue to [02_nlp_pipelines.ipynb](02_nlp_pipelines.ipynb) to explore all NLP pipeline types!