<a href="https://colab.research.google.com/github/krishnayah/urp-snippets/blob/main/Benchmarking_Pipeline_vs_Manual.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Testing Pipeline vs Manual on classifier

In [1]:
import time
import random
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Generating Prompts

In [3]:

# Words!
adjectives = [
    "ancient", "futuristic", "mysterious", "brilliant", "dark", "colorful",
    "peaceful", "chaotic", "lonely", "vibrant", "silent", "melancholic",
    "red", "blue", "golden", "silver", "emerald", "gentle", "stormy", "radiant"
]

nouns = [
    "city", "forest", "ocean", "planet", "dream", "creature", "robot",
    "painting", "poem", "machine", "castle", "storm", "garden", "ship",
    "dimension", "universe", "song", "memory", "light", "shadow"
]

actions = [
    "describe", "explain", "analyze", "summarize", "imagine", "narrate",
    "predict", "compare", "design", "debate", "reimagine", "illustrate",
    "explore", "evaluate", "invent", "critique", "compose", "argue", "translate", "simulate"
]

# Generate a lot of prompts
num_prompts = 2500
prompts = []
for _ in range(num_prompts):
    adj = random.choice(adjectives)
    noun = random.choice(nouns)
    action = random.choice(actions)

    prompt_type = random.choice([
        f"{action.capitalize()} a {adj} {noun}.",
        f"Write a story about a {adj} {noun}.",
        f"What would happen if a {adj} {noun} could think?",
        f"Give three facts about a {adj} {noun}.",
        f"Explain the significance of a {adj} {noun} in society.",
        f"Imagine discovering a {adj} {noun} — describe it in detail."
    ])

    prompts.append(prompt_type)

# for the classifier
candidate_labels = [
    "Unsafe or harmful content",
    "Refusal or safe alternative"
]


# Expand each premise × label for the manual approach
pairs = [(p, l) for p in prompts for l in candidate_labels]
texts = [p for p, _ in pairs]
labels = [l for _, l in pairs]


# Benchmark on pipeline! much simpler




# Benchmark Manually
Just copied a lot of this code from some base examples

In [6]:
#Load the tokenizer within this cell
model_id = "facebook/bart-large-mnli"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).to("cuda")


start = time.time()

# Tokenize all pairs at once
inputs = tokenizer(texts, labels, return_tensors="pt", padding=True, truncation=True).to("cuda")

with torch.no_grad():
    logits = model(**inputs).logits

# Convert logits → entailment probabilities
entail_contr_logits = logits[:, [0, 2]]  # contradiction vs entailment
probs = F.softmax(entail_contr_logits, dim=1)[:, 1]
probs = probs.reshape(len(prompts), len(candidate_labels))

elapsed = time.time() - start

print(f"Batch size on manual: {len(prompts)} | Time: {elapsed:.2f}s")

Batch size on manual: 2500 | Time: 4.62s


In [8]:
torch.cuda.empty_cache()

# Benchmark on Pipeline

In [5]:
classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
    device=0,          # use GPU
    batch_size=2500       # increase?
)

start = time.time()
pipeline_results = classifier(prompts, candidate_labels)
elapsed = time.time() - start

print(f"Batch size on pipeline: {len(prompts)} | Time: {elapsed:.2f}s")

Device set to use cuda:0


Batch size on pipeline: 2500 | Time: 6.82s


# KEY TAKEAWAY: USE PIPELINES

Pipelines is arguably slightly slower than using the manual, by about two seconds, but it manually takes care of cache handling and allows me to specify a batch size.

**This will be incredibly important if I use it within a reward function, as the GPU will be finetuning and caching the new model** Manually freeing up CUDA cache may interfere with that.

And, since it allows me to specify a batch size, it means I will always be able to have granular control over how much data is being handled at once (in case fine tuning ends up taking up a lot of RAM)
