# Lab 2: Few-Shot Learning Experiments

**Week 2 - Prompt Engineering & LLM Basics**

**Provided by:** ADC ENGINEERING & CONSULTING LTD

## Objectives

In this lab, you will:
- Master zero-shot, one-shot, and few-shot learning techniques
- Learn when to use each learning paradigm
- Design effective few-shot examples
- Build few-shot classifiers and extractors
- Understand example selection strategies
- Implement dynamic few-shot prompting
- Measure and optimize few-shot performance

## Prerequisites

- Completed Lab 1: Basic Prompt Engineering
- Understanding of prompt structure
- OpenAI API key configured
- Python 3.9+

## Setup and Installation

In [None]:
# Install required packages
!pip install openai python-dotenv tiktoken numpy scikit-learn --quiet

In [None]:
import os
import json
from typing import List, Dict, Optional, Tuple
from dataclasses import dataclass
import random

from openai import OpenAI
from dotenv import load_dotenv
import tiktoken

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

print("✓ Setup complete!")

## Part 1: Understanding Learning Paradigms

Let's explore the three main learning paradigms and when to use each.

### Zero-Shot Learning

No examples provided - the model relies entirely on its pre-training.

**Best for:**
- Simple, well-defined tasks
- Tasks the model has seen during training
- When you don't have examples
- General knowledge questions

In [None]:
def zero_shot_example():
    """Demonstrate zero-shot learning."""
    
    tasks = [
        {
            "name": "Sentiment Analysis",
            "prompt": "Classify the sentiment of this review as positive, negative, or neutral:\n\nReview: The product works well but shipping took forever.\n\nSentiment:"
        },
        {
            "name": "Translation",
            "prompt": "Translate to French: 'Good morning, how are you?'"
        },
        {
            "name": "Question Answering",
            "prompt": "Question: What is the capital of France?\nAnswer:"
        },
        {
            "name": "Text Classification",
            "prompt": "Classify this email as 'spam' or 'not spam':\n\nEmail: Congratulations! You've won $1,000,000! Click here now!\n\nClassification:"
        }
    ]
    
    for task in tasks:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": task["prompt"]}],
            temperature=0.3,
            max_tokens=100
        )
        
        print(f"\n{'='*80}")
        print(f"Task: {task['name']}")
        print(f"{'='*80}")
        print(f"Prompt:\n{task['prompt']}\n")
        print(f"Response:\n{response.choices[0].message.content}")

zero_shot_example()

### One-Shot Learning

Provide a single example to demonstrate the task.

**Best for:**
- Tasks where format needs clarification
- Showing desired output structure
- Simple pattern matching

In [None]:
def one_shot_example():
    """Demonstrate one-shot learning."""
    
    # Task: Extract structured information from text
    
    prompt = """
    Extract the person's name, age, and occupation from the text.
    
    Example:
    Text: "John Smith is a 35-year-old software engineer living in Seattle."
    Output: {"name": "John Smith", "age": 35, "occupation": "software engineer"}
    
    Now extract from this text:
    Text: "Maria Garcia, age 28, works as a data scientist."
    Output:
    """
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    print("One-Shot Learning Example:")
    print("="*80)
    print(f"Prompt:\n{prompt}\n")
    print(f"Response:\n{response.choices[0].message.content}")

one_shot_example()

### Few-Shot Learning

Provide multiple examples (typically 2-10) to establish a pattern.

**Best for:**
- Complex or ambiguous tasks
- Domain-specific requirements
- Consistent formatting needs
- Tasks with edge cases
- Custom classification categories

In [None]:
def few_shot_example():
    """Demonstrate few-shot learning."""
    
    # Task: Classify customer support tickets by priority
    
    prompt = """
    Classify customer support tickets as 'high', 'medium', or 'low' priority.
    
    Examples:
    
    Ticket: "The entire application is down and users cannot log in."
    Priority: high
    
    Ticket: "Can you add a dark mode feature?"
    Priority: low
    
    Ticket: "Payment processing is slow, taking 30+ seconds."
    Priority: medium
    
    Ticket: "Some users reporting intermittent issues with file uploads."
    Priority: medium
    
    Ticket: "Typo in the footer copyright year."
    Priority: low
    
    Now classify:
    Ticket: "Database connection failing, affecting all transactions."
    Priority:
    """
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    print("Few-Shot Learning Example:")
    print("="*80)
    print(f"Prompt:\n{prompt}\n")
    print(f"Response:\n{response.choices[0].message.content}")

few_shot_example()

### Exercise 1.1: Compare Learning Paradigms

For the task of extracting action items from meeting notes, implement all three paradigms and compare:

In [None]:
# TODO: Implement all three paradigms for extracting action items

meeting_note = """
Team discussed the Q4 roadmap. Sarah will prepare the budget proposal by Friday. 
John mentioned we need to upgrade the server infrastructure. Mike volunteered to 
research cloud providers and present options next week. We should schedule a 
follow-up meeting for December 5th.
"""

# Zero-shot version
zero_shot_prompt = """
# TODO: Create a zero-shot prompt to extract action items
"""

# One-shot version
one_shot_prompt = """
# TODO: Create a one-shot prompt with one example
"""

# Few-shot version
few_shot_prompt = """
# TODO: Create a few-shot prompt with 3-4 examples
"""

# Test each version
# for name, prompt in [("Zero-shot", zero_shot_prompt), 
#                       ("One-shot", one_shot_prompt), 
#                       ("Few-shot", few_shot_prompt)]:
#     if prompt.strip() and not prompt.startswith("# TODO"):
#         response = client.chat.completions.create(
#             model="gpt-3.5-turbo",
#             messages=[{"role": "user", "content": prompt}],
#             temperature=0.3
#         )
#         print(f"\n{name}:")
#         print("="*80)
#         print(response.choices[0].message.content)

## Part 2: Designing Effective Few-Shot Examples

The quality of your examples determines the quality of results.

### Principle 1: Diverse Examples

Examples should cover different scenarios and edge cases.

In [None]:
def compare_example_diversity():
    """Compare homogeneous vs diverse examples."""
    
    # Task: Sentiment classification
    
    # Homogeneous examples (all similar)
    homogeneous_prompt = """
    Classify sentiment as positive, negative, or neutral.
    
    Examples:
    Text: "Great product!"
    Sentiment: positive
    
    Text: "Love it!"
    Sentiment: positive
    
    Text: "Amazing quality!"
    Sentiment: positive
    
    Now classify:
    Text: "It's okay but shipping was slow and packaging damaged."
    Sentiment:
    """
    
    # Diverse examples (varied scenarios)
    diverse_prompt = """
    Classify sentiment as positive, negative, or neutral.
    
    Examples:
    Text: "Great product, fast shipping!"
    Sentiment: positive
    
    Text: "Terrible quality, broke after one use."
    Sentiment: negative
    
    Text: "It works but nothing special."
    Sentiment: neutral
    
    Text: "Good quality but overpriced."
    Sentiment: neutral
    
    Text: "Arrived damaged, disappointed."
    Sentiment: negative
    
    Now classify:
    Text: "It's okay but shipping was slow and packaging damaged."
    Sentiment:
    """
    
    for name, prompt in [("Homogeneous Examples", homogeneous_prompt),
                         ("Diverse Examples", diverse_prompt)]:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        print(f"\n{'='*80}")
        print(f"{name}:")
        print(f"{'='*80}")
        print(f"Response: {response.choices[0].message.content}")

compare_example_diversity()

### Principle 2: Clear and Consistent Format

Examples must follow a consistent structure.

In [None]:
def test_format_consistency():
    """Test consistent vs inconsistent formatting."""
    
    # Inconsistent format
    inconsistent_prompt = """
    Extract product name and price:
    
    "Buy the SuperWidget for $49.99"
    Product: SuperWidget, Price: $49.99
    
    "The MegaGadget costs 79 dollars"
    MegaGadget: $79.00
    
    From: "Get the UltraDevice now, only $129"
    Output:
    """
    
    # Consistent format
    consistent_prompt = """
    Extract product name and price:
    
    Input: "Buy the SuperWidget for $49.99"
    Product: SuperWidget
    Price: $49.99
    
    Input: "The MegaGadget costs 79 dollars"
    Product: MegaGadget
    Price: $79.00
    
    Input: "Premium bundle at $299.99 includes the ProTool"
    Product: ProTool
    Price: $299.99
    
    Input: "Get the UltraDevice now, only $129"
    Product:
    """
    
    for name, prompt in [("Inconsistent", inconsistent_prompt),
                         ("Consistent", consistent_prompt)]:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        print(f"\n{'='*80}")
        print(f"{name} Format:")
        print(f"{'='*80}")
        print(response.choices[0].message.content)

test_format_consistency()

### Principle 3: Representative Examples

Examples should match your real use cases.

In [None]:
def test_representative_examples():
    """Test examples that match vs don't match real use case."""
    
    # Real use case: Classifying technical support tickets
    real_ticket = "API returning 503 errors intermittently since morning deployment."
    
    # Non-representative examples (general categories)
    non_representative = """
    Classify ticket category:
    
    Ticket: "How do I reset my password?"
    Category: account
    
    Ticket: "My payment didn't go through"
    Category: billing
    
    Ticket: "The app looks different"
    Category: ui
    
    Classify:
    Ticket: "API returning 503 errors intermittently since morning deployment."
    Category:
    """
    
    # Representative examples (technical categories)
    representative = """
    Classify technical ticket category:
    
    Ticket: "Database queries timing out after 30 seconds"
    Category: performance
    
    Ticket: "API returning 503 errors intermittently"
    Category: infrastructure
    
    Ticket: "User authentication tokens expiring too quickly"
    Category: security
    
    Ticket: "Memory leak in background job processor"
    Category: performance
    
    Classify:
    Ticket: "API returning 503 errors intermittently since morning deployment."
    Category:
    """
    
    for name, prompt in [("Non-Representative", non_representative),
                         ("Representative", representative)]:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        print(f"\n{'='*80}")
        print(f"{name} Examples:")
        print(f"{'='*80}")
        print(response.choices[0].message.content)

test_representative_examples()

### Exercise 2.1: Design Quality Examples

Create high-quality few-shot examples for this task:

In [None]:
# TODO: Design quality few-shot examples

# Task: Extract structured data from job postings
# Extract: job_title, company, location, salary_range, required_experience

job_posting = """
Senior Python Developer - TechCorp - Remote (US only)
We're looking for an experienced Python developer with 5+ years of experience.
Competitive salary: $120k-$160k depending on experience.
Must have Django and PostgreSQL experience.
"""

# TODO: Create a few-shot prompt with 3-4 diverse, consistent, representative examples
# Consider:
# - Different job types (junior, senior, remote, on-site)
# - Various salary formats ($X-Y, $X+, negotiable, not specified)
# - Different experience requirements
# - Consistent JSON output format

few_shot_prompt = """
# TODO: Your few-shot prompt here
"""

# Test your prompt
# if few_shot_prompt.strip() and not few_shot_prompt.startswith("# TODO"):
#     response = client.chat.completions.create(
#         model="gpt-3.5-turbo",
#         messages=[{"role": "user", "content": few_shot_prompt}],
#         temperature=0.3
#     )
#     print("Extracted Data:")
#     print("="*80)
#     print(response.choices[0].message.content)

## Part 3: Building Few-Shot Classifiers

Create robust classifiers using few-shot learning.

In [None]:
@dataclass
class Example:
    """A training example for few-shot learning."""
    input_text: str
    output_label: str
    metadata: Optional[Dict] = None

class FewShotClassifier:
    """
    A few-shot learning classifier.
    """
    
    def __init__(self, task_description: str, labels: List[str]):
        """
        Initialize classifier.
        
        Args:
            task_description: Description of the classification task
            labels: List of possible labels
        """
        self.task_description = task_description
        self.labels = labels
        self.examples: List[Example] = []
    
    def add_example(self, input_text: str, output_label: str, metadata: Optional[Dict] = None):
        """Add a training example."""
        if output_label not in self.labels:
            raise ValueError(f"Label {output_label} not in {self.labels}")
        
        self.examples.append(Example(input_text, output_label, metadata))
    
    def add_examples(self, examples: List[Tuple[str, str]]):
        """Add multiple examples."""
        for input_text, output_label in examples:
            self.add_example(input_text, output_label)
    
    def build_prompt(self, input_text: str) -> str:
        """Build the few-shot prompt."""
        prompt_parts = [
            f"Task: {self.task_description}",
            f"Labels: {', '.join(self.labels)}",
            "",
            "Examples:",
            ""
        ]
        
        # Add examples
        for example in self.examples:
            prompt_parts.append(f"Input: {example.input_text}")
            prompt_parts.append(f"Label: {example.output_label}")
            prompt_parts.append("")
        
        # Add input to classify
        prompt_parts.append("Now classify:")
        prompt_parts.append(f"Input: {input_text}")
        prompt_parts.append("Label:")
        
        return "\n".join(prompt_parts)
    
    def classify(self, input_text: str, temperature: float = 0.3) -> Dict:
        """
        Classify input text.
        
        Returns:
            Dictionary with label and confidence
        """
        prompt = self.build_prompt(input_text)
        
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature,
            max_tokens=50
        )
        
        predicted_label = response.choices[0].message.content.strip()
        
        return {
            "input": input_text,
            "label": predicted_label,
            "prompt": prompt,
            "tokens": response.usage.total_tokens
        }
    
    def classify_batch(self, texts: List[str]) -> List[Dict]:
        """Classify multiple texts."""
        return [self.classify(text) for text in texts]

# Test the classifier
classifier = FewShotClassifier(
    task_description="Classify customer reviews by product category",
    labels=["electronics", "clothing", "books", "home", "sports"]
)

# Add training examples
classifier.add_examples([
    ("The headphones have great sound quality and battery life.", "electronics"),
    ("This sweater is soft and fits perfectly.", "clothing"),
    ("Fascinating biography, couldn't put it down.", "books"),
    ("The coffee maker broke after two weeks.", "electronics"),
    ("Perfect yoga mat, great grip and cushioning.", "sports"),
    ("These curtains match my decor beautifully.", "home"),
    ("The thriller kept me guessing until the end.", "books"),
    ("Running shoes are comfortable for long distances.", "sports")
])

# Test classification
test_reviews = [
    "The smartwatch tracks my workouts accurately.",
    "This cookbook has amazing recipes.",
    "The throw pillows added perfect color to my couch."
]

print("Few-Shot Classifier Results:")
print("="*80)

for review in test_reviews:
    result = classifier.classify(review)
    print(f"\nReview: {result['input']}")
    print(f"Category: {result['label']}")
    print(f"Tokens used: {result['tokens']}")

### Exercise 3.1: Build Your Own Classifier

Create a classifier for a custom task:

In [None]:
# TODO: Build a custom classifier

# Task ideas:
# 1. Email priority classifier (urgent, normal, low)
# 2. Code language detector (python, javascript, java, etc.)
# 3. Writing tone classifier (formal, casual, technical, friendly)
# 4. News category classifier (politics, technology, sports, etc.)

# Example structure:
# my_classifier = FewShotClassifier(
#     task_description="...",
#     labels=[...]
# )
# 
# my_classifier.add_examples([
#     ("...", "label1"),
#     ("...", "label2"),
#     ...
# ])
#
# results = my_classifier.classify_batch(test_inputs)
# for result in results:
#     print(f"{result['input']}: {result['label']}")

## Part 4: Few-Shot Information Extraction

Extract structured data using few-shot learning.

In [None]:
class FewShotExtractor:
    """
    Extract structured information using few-shot learning.
    """
    
    def __init__(self, task_description: str, output_schema: Dict):
        """
        Initialize extractor.
        
        Args:
            task_description: What to extract
            output_schema: Structure of output (field names and types)
        """
        self.task_description = task_description
        self.output_schema = output_schema
        self.examples: List[Tuple[str, Dict]] = []
    
    def add_example(self, input_text: str, output_data: Dict):
        """Add an extraction example."""
        self.examples.append((input_text, output_data))
    
    def build_prompt(self, input_text: str) -> str:
        """Build extraction prompt."""
        prompt_parts = [
            f"Task: {self.task_description}",
            f"Output format: {json.dumps(self.output_schema, indent=2)}",
            "",
            "Examples:",
            ""
        ]
        
        # Add examples
        for example_input, example_output in self.examples:
            prompt_parts.append(f"Input: {example_input}")
            prompt_parts.append(f"Output: {json.dumps(example_output)}")
            prompt_parts.append("")
        
        # Add input to extract from
        prompt_parts.append("Now extract:")
        prompt_parts.append(f"Input: {input_text}")
        prompt_parts.append("Output:")
        
        return "\n".join(prompt_parts)
    
    def extract(self, input_text: str) -> Dict:
        """
        Extract structured data from input.
        
        Returns:
            Extracted data dictionary
        """
        prompt = self.build_prompt(input_text)
        
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2,
            max_tokens=300
        )
        
        try:
            extracted_data = json.loads(response.choices[0].message.content)
        except json.JSONDecodeError:
            # Fallback if response isn't valid JSON
            extracted_data = {"raw_response": response.choices[0].message.content}
        
        return {
            "input": input_text,
            "extracted": extracted_data,
            "tokens": response.usage.total_tokens
        }

# Test the extractor
extractor = FewShotExtractor(
    task_description="Extract event information from text",
    output_schema={
        "event_name": "string",
        "date": "string",
        "time": "string",
        "location": "string",
        "attendees": "number or null"
    }
)

# Add examples
extractor.add_example(
    "Team meeting on Monday, Jan 15 at 2:00 PM in Conference Room B. All 8 team members should attend.",
    {
        "event_name": "Team meeting",
        "date": "Monday, Jan 15",
        "time": "2:00 PM",
        "location": "Conference Room B",
        "attendees": 8
    }
)

extractor.add_example(
    "Annual company party scheduled for December 20th, 6 PM at the Grand Hotel ballroom.",
    {
        "event_name": "Annual company party",
        "date": "December 20th",
        "time": "6 PM",
        "location": "Grand Hotel ballroom",
        "attendees": None
    }
)

extractor.add_example(
    "Client presentation next Thursday at 10 AM via Zoom.",
    {
        "event_name": "Client presentation",
        "date": "next Thursday",
        "time": "10 AM",
        "location": "Zoom",
        "attendees": None
    }
)

# Test extraction
test_text = "Product launch webinar on Friday, Nov 3 at 3:00 PM. Register at company.com/webinar."

result = extractor.extract(test_text)

print("Few-Shot Extraction Results:")
print("="*80)
print(f"Input: {result['input']}")
print(f"Extracted Data:")
print(json.dumps(result['extracted'], indent=2))
print(f"Tokens used: {result['tokens']}")

### Exercise 4.1: Build Custom Extractors

Create extractors for these scenarios:

In [None]:
# TODO: Build custom extractors

# Scenario 1: Resume parser
# Extract: name, email, phone, education, experience_years, skills
resume_text = """
John Doe - johndoe@email.com - (555) 123-4567
Education: BS Computer Science, MIT (2015)
5 years experience in software development
Skills: Python, Java, AWS, Docker, Kubernetes
"""

# TODO: Create and test resume extractor
# resume_extractor = FewShotExtractor(...)
# result = resume_extractor.extract(resume_text)


# Scenario 2: Recipe parser
# Extract: dish_name, prep_time, cook_time, servings, difficulty, main_ingredients
recipe_text = """
Chocolate Chip Cookies
Prep: 15 mins, Bake: 12 mins, Makes 24 cookies
Easy recipe perfect for beginners
Main ingredients: flour, butter, sugar, chocolate chips, eggs
"""

# TODO: Create and test recipe extractor
# recipe_extractor = FewShotExtractor(...)
# result = recipe_extractor.extract(recipe_text)


# Scenario 3: Product listing parser
# Extract: product_name, price, condition, seller_rating, shipping
listing_text = """
iPhone 13 Pro - $699 - Excellent condition
Seller rating: 4.8/5.0 (250 reviews)
Free 2-day shipping included
"""

# TODO: Create and test product extractor
# product_extractor = FewShotExtractor(...)
# result = product_extractor.extract(listing_text)

## Part 5: Dynamic Few-Shot Selection

Dynamically select the best examples for each input.

In [None]:
class DynamicFewShotClassifier:
    """
    Classifier that dynamically selects relevant examples.
    """
    
    def __init__(self, task_description: str, labels: List[str], max_examples: int = 5):
        """
        Initialize dynamic classifier.
        
        Args:
            task_description: Classification task description
            labels: Possible labels
            max_examples: Maximum examples to include in prompt
        """
        self.task_description = task_description
        self.labels = labels
        self.max_examples = max_examples
        self.example_pool: List[Example] = []
    
    def add_to_pool(self, input_text: str, output_label: str, metadata: Optional[Dict] = None):
        """Add example to the pool."""
        self.example_pool.append(Example(input_text, output_label, metadata))
    
    def select_examples(self, input_text: str, strategy: str = "random") -> List[Example]:
        """
        Select relevant examples for the input.
        
        Args:
            input_text: The text to classify
            strategy: 'random', 'diverse', or 'similar'
        
        Returns:
            Selected examples
        """
        if strategy == "random":
            # Random selection
            if len(self.example_pool) <= self.max_examples:
                return self.example_pool
            return random.sample(self.example_pool, self.max_examples)
        
        elif strategy == "diverse":
            # Ensure at least one example per label
            selected = []
            examples_by_label = {}
            
            # Group by label
            for example in self.example_pool:
                if example.output_label not in examples_by_label:
                    examples_by_label[example.output_label] = []
                examples_by_label[example.output_label].append(example)
            
            # Take one from each label first
            for label in self.labels:
                if label in examples_by_label and len(selected) < self.max_examples:
                    selected.append(random.choice(examples_by_label[label]))
            
            # Fill remaining slots randomly
            remaining = [e for e in self.example_pool if e not in selected]
            while len(selected) < self.max_examples and remaining:
                selected.append(remaining.pop(random.randint(0, len(remaining) - 1)))
            
            return selected
        
        elif strategy == "similar":
            # TODO: Implement similarity-based selection
            # This would use embeddings to find most similar examples
            return self.select_examples(input_text, strategy="random")
        
        return self.example_pool[:self.max_examples]
    
    def classify(self, input_text: str, strategy: str = "diverse") -> Dict:
        """Classify with dynamically selected examples."""
        
        # Select relevant examples
        selected_examples = self.select_examples(input_text, strategy=strategy)
        
        # Build prompt
        prompt_parts = [
            f"Task: {self.task_description}",
            f"Labels: {', '.join(self.labels)}",
            "",
            "Examples:",
            ""
        ]
        
        for example in selected_examples:
            prompt_parts.append(f"Input: {example.input_text}")
            prompt_parts.append(f"Label: {example.output_label}")
            prompt_parts.append("")
        
        prompt_parts.append("Now classify:")
        prompt_parts.append(f"Input: {input_text}")
        prompt_parts.append("Label:")
        
        prompt = "\n".join(prompt_parts)
        
        # Get prediction
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return {
            "input": input_text,
            "label": response.choices[0].message.content.strip(),
            "examples_used": len(selected_examples),
            "strategy": strategy
        }

# Test dynamic selection
dynamic_classifier = DynamicFewShotClassifier(
    task_description="Classify programming questions by topic",
    labels=["syntax", "algorithm", "debugging", "design", "performance"],
    max_examples=4
)

# Add large pool of examples
examples_pool = [
    ("How do I declare a variable in Python?", "syntax"),
    ("What's the best sorting algorithm for large datasets?", "algorithm"),
    ("Why am I getting a NullPointerException?", "debugging"),
    ("Should I use inheritance or composition?", "design"),
    ("How can I optimize this database query?", "performance"),
    ("What does the += operator do?", "syntax"),
    ("Explain bubble sort algorithm", "algorithm"),
    ("My code runs but gives wrong output", "debugging"),
    ("How to structure a REST API?", "design"),
    ("Why is my app using too much memory?", "performance"),
    ("What are Python decorators?", "syntax"),
    ("Implement binary search in Python", "algorithm"),
    ("Getting 'index out of range' error", "debugging"),
    ("MVC vs MVVM architecture?", "design"),
    ("Reduce API response time", "performance")
]

for text, label in examples_pool:
    dynamic_classifier.add_to_pool(text, label)

# Test with different strategies
test_question = "How do I improve query execution time in PostgreSQL?"

print("Dynamic Few-Shot Selection:")
print("="*80)

for strategy in ["random", "diverse"]:
    result = dynamic_classifier.classify(test_question, strategy=strategy)
    print(f"\nStrategy: {strategy}")
    print(f"Question: {result['input']}")
    print(f"Predicted: {result['label']}")
    print(f"Examples used: {result['examples_used']}")

### Exercise 5.1: Implement Similarity-Based Selection

Implement example selection based on semantic similarity:

In [None]:
# TODO: Implement similarity-based example selection

def get_embedding(text: str) -> List[float]:
    """
    Get embedding for text using OpenAI's embedding model.
    
    TODO: Implement this function
    Hint: Use client.embeddings.create() with model "text-embedding-ada-002"
    """
    pass

def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
    """
    Calculate cosine similarity between two vectors.
    
    TODO: Implement this function
    Hint: Use numpy or manual calculation
    """
    pass

def select_similar_examples(
    input_text: str,
    example_pool: List[Example],
    max_examples: int = 5
) -> List[Example]:
    """
    Select examples most similar to input text.
    
    TODO: Implement this function
    Steps:
    1. Get embedding for input_text
    2. Get embeddings for all examples
    3. Calculate similarity scores
    4. Return top max_examples
    """
    pass

# Test your implementation
# test_text = "How do I optimize my Python code?"
# similar_examples = select_similar_examples(test_text, example_pool, max_examples=3)
# for ex in similar_examples:
#     print(f"- {ex.input_text} ({ex.output_label})")

## Part 6: Evaluating Few-Shot Performance

Measure and optimize few-shot classifier performance.

In [None]:
class FewShotEvaluator:
    """
    Evaluate few-shot classifier performance.
    """
    
    def __init__(self, classifier):
        """Initialize with a classifier."""
        self.classifier = classifier
        self.results = []
    
    def evaluate(self, test_cases: List[Tuple[str, str]]) -> Dict:
        """
        Evaluate classifier on test cases.
        
        Args:
            test_cases: List of (input, expected_label) tuples
        
        Returns:
            Evaluation metrics
        """
        correct = 0
        total = len(test_cases)
        total_tokens = 0
        
        for input_text, expected_label in test_cases:
            result = self.classifier.classify(input_text)
            predicted_label = result['label']
            
            is_correct = predicted_label.lower() == expected_label.lower()
            if is_correct:
                correct += 1
            
            total_tokens += result.get('tokens', 0)
            
            self.results.append({
                'input': input_text,
                'expected': expected_label,
                'predicted': predicted_label,
                'correct': is_correct
            })
        
        accuracy = correct / total if total > 0 else 0
        avg_tokens = total_tokens / total if total > 0 else 0
        
        return {
            'accuracy': accuracy,
            'correct': correct,
            'total': total,
            'avg_tokens_per_prediction': avg_tokens,
            'total_tokens': total_tokens
        }
    
    def get_confusion_cases(self) -> List[Dict]:
        """Get incorrectly classified cases."""
        return [r for r in self.results if not r['correct']]
    
    def print_report(self, metrics: Dict):
        """Print evaluation report."""
        print("\n" + "="*80)
        print("EVALUATION REPORT")
        print("="*80)
        print(f"Accuracy: {metrics['accuracy']*100:.1f}%")
        print(f"Correct: {metrics['correct']}/{metrics['total']}")
        print(f"Avg tokens per prediction: {metrics['avg_tokens_per_prediction']:.1f}")
        print(f"Total tokens used: {metrics['total_tokens']}")
        
        # Show misclassified cases
        confusion_cases = self.get_confusion_cases()
        if confusion_cases:
            print(f"\nMisclassified cases ({len(confusion_cases)}):")
            for case in confusion_cases:
                print(f"  Input: {case['input'][:60]}...")
                print(f"  Expected: {case['expected']} | Predicted: {case['predicted']}")
                print()

# Create test dataset
test_cases = [
    ("This laptop has amazing performance!", "electronics"),
    ("The novel's plot twist was unexpected", "books"),
    ("These running shoes are very comfortable", "sports"),
    ("The blender works great for smoothies", "home"),
    ("This dress fits perfectly", "clothing"),
    ("The tablet screen is crystal clear", "electronics"),
    ("Excellent mystery thriller", "books"),
    ("Yoga mat provides good cushioning", "sports"),
]

# Evaluate
evaluator = FewShotEvaluator(classifier)
metrics = evaluator.evaluate(test_cases)
evaluator.print_report(metrics)

### Exercise 6.1: Optimize Classifier Performance

Improve classifier accuracy through experimentation:

In [None]:
# TODO: Experiment with different configurations to improve accuracy

# Ideas to try:
# 1. Add more diverse examples
# 2. Improve example quality (clearer, more representative)
# 3. Adjust number of examples (3 vs 5 vs 10)
# 4. Try different temperature values
# 5. Improve prompt formatting
# 6. Add instruction clarity

# Create baseline
# baseline_classifier = FewShotClassifier(...)
# baseline_evaluator = FewShotEvaluator(baseline_classifier)
# baseline_metrics = baseline_evaluator.evaluate(test_cases)

# Create improved version
# improved_classifier = FewShotClassifier(...)
# improved_evaluator = FewShotEvaluator(improved_classifier)
# improved_metrics = improved_evaluator.evaluate(test_cases)

# Compare
# print("Baseline Accuracy:", baseline_metrics['accuracy'])
# print("Improved Accuracy:", improved_metrics['accuracy'])
# print("Improvement:", (improved_metrics['accuracy'] - baseline_metrics['accuracy']) * 100, "%")

## Challenge Projects

### Challenge 1: Multi-Label Few-Shot Classifier

Build a classifier that can assign multiple labels to a single input:

In [None]:
class MultiLabelFewShotClassifier:
    """
    Classify inputs into multiple categories simultaneously.
    
    TODO: Implement a multi-label classifier where:
    1. Each input can have multiple labels
    2. Examples show inputs with multiple labels
    3. Outputs are formatted as lists
    4. Evaluation handles multi-label metrics
    
    Example use case: Article tagging (technology, tutorial, beginner)
    """
    
    def __init__(self, task_description: str, available_labels: List[str]):
        self.task_description = task_description
        self.available_labels = available_labels
        self.examples = []
    
    # TODO: Implement methods
    
    pass

# Usage example:
# tagger = MultiLabelFewShotClassifier(
#     task_description="Tag articles with relevant categories",
#     available_labels=["technology", "tutorial", "beginner", "advanced", "web", "mobile"]
# )
# tagger.add_example(
#     "Introduction to React hooks for beginners",
#     ["technology", "tutorial", "beginner", "web"]
# )
# result = tagger.classify("Build your first iOS app with SwiftUI")
# print(result['labels'])  # ['technology', 'tutorial', 'mobile']

### Challenge 2: Few-Shot Chain Classifier

Build a classifier that makes decisions in multiple steps:

In [None]:
class ChainedFewShotClassifier:
    """
    Multi-stage classification with few-shot learning.
    
    TODO: Implement a chained classifier where:
    1. First stage classifies into broad categories
    2. Second stage classifies into specific subcategories
    3. Each stage uses relevant few-shot examples
    4. Results are hierarchical
    
    Example: Email -> Category (work/personal) -> Subcategory (urgent/normal/low)
    """
    
    def __init__(self):
        self.stages = []
    
    # TODO: Implement chained classification
    
    pass

# Usage example:
# chain = ChainedFewShotClassifier()
# chain.add_stage("category", ["work", "personal", "spam"])
# chain.add_stage("priority", ["urgent", "normal", "low"])
# result = chain.classify("Meeting moved to 2 PM today")
# print(result)  # {"category": "work", "priority": "urgent"}

### Challenge 3: Active Learning Few-Shot System

Build a system that improves by selecting which examples to add:

In [None]:
class ActiveLearningFewShot:
    """
    Few-shot system that actively selects examples to add.
    
    TODO: Implement active learning where:
    1. System identifies uncertain predictions
    2. Requests labels for uncertain cases
    3. Adds them to example pool
    4. Improves over time
    5. Tracks performance improvements
    
    Use uncertainty metrics like:
    - Prediction confidence
    - Similarity to existing examples
    - Label distribution balance
    """
    
    def __init__(self, classifier):
        self.classifier = classifier
        self.uncertainty_threshold = 0.7
        self.performance_history = []
    
    # TODO: Implement active learning loop
    
    pass

# Usage example:
# active_system = ActiveLearningFewShot(classifier)
# for text in unlabeled_data:
#     if active_system.is_uncertain(text):
#         label = get_human_label(text)
#         active_system.add_to_pool(text, label)
# active_system.show_improvement_curve()

## Summary

In this lab, you've learned:

1. ✅ Three learning paradigms (zero-shot, one-shot, few-shot)
2. ✅ When to use each paradigm
3. ✅ Principles of effective few-shot examples (diverse, consistent, representative)
4. ✅ Building few-shot classifiers and extractors
5. ✅ Dynamic example selection strategies
6. ✅ Evaluating and optimizing few-shot performance

### Key Takeaways

- **Few-shot learning** is powerful for tasks with limited data
- **Example quality** matters more than quantity
- **Diversity** in examples helps generalization
- **Consistency** in formatting is crucial
- **Representative examples** improve real-world performance
- **Dynamic selection** can improve efficiency
- **Evaluation** guides optimization

### Best Practices

1. **Start with zero-shot** - try without examples first
2. **Add examples gradually** - see when improvement plateaus
3. **Ensure diversity** - cover different scenarios and edge cases
4. **Maintain format consistency** - strict structural adherence
5. **Use representative examples** - match real use cases
6. **Balance label distribution** - equal representation when possible
7. **Evaluate systematically** - measure accuracy and tokens
8. **Iterate based on errors** - analyze misclassifications

### Comparison Matrix

| Paradigm | Examples | Best For | Token Cost | Accuracy |
|----------|----------|----------|------------|----------|
| Zero-shot | 0 | Simple, well-known tasks | Low | Medium |
| One-shot | 1 | Format clarification | Medium | Good |
| Few-shot | 2-10 | Complex, custom tasks | High | Best |

### Next Steps

- Complete the challenge projects
- Build classifiers for your own use cases
- Experiment with example selection strategies
- Move on to Lab 3: Chain-of-Thought Implementation

**Provided by:** ADC ENGINEERING & CONSULTING LTD