# Advanced Patterns: Few-Shot Learning with Intelligent Retry

This notebook demonstrates **production-ready AI patterns**:

1. **Few-Shot Learning** - Train with examples in the prompt
2. **Structured Output** - Type-safe JSON parsing with Pydantic
3. **Intelligent Retry** - Auto-correction when LLM returns invalid JSON
4. **Error Context** - Parser provides hints for retry prompts
5. **Composable Architecture** - PromptNode ‚Üí RawLLMNode ‚Üí ParserNode

## Real-World Use Case: Product Review Sentiment Analysis

We'll build a system that:
- Classifies sentiment with **confidence scores**
- Extracts **key aspects** (quality, price, shipping)
- Handles **parsing failures** gracefully
- Uses **few-shot examples** for better accuracy

## Architecture

**Basic Flow**:
```
PromptNode ‚Üí RawLLMNode ‚Üí ParserNode
```

**With Retry** (future):
```
PromptNode ‚Üí RawLLMNode ‚Üí ParserNode
                 ‚Üë            ‚Üì (parse error)
                 ‚îî‚îÄ ErrorCorrectionPrompt
```

In [None]:
# Setup
from hexdag.core.bootstrap import ensure_bootstrapped
from hexdag.core.orchestration.orchestrator import Orchestrator
from hexdag.core.pipeline_builder import YamlPipelineBuilder
from hexdag.core.pipeline_builder.component_instantiator import ComponentInstantiator

# Ensure registry is bootstrapped with all components
ensure_bootstrapped()

# Helper for port instantiation
instantiator = ComponentInstantiator()

## Example 1: Few-Shot Sentiment Classification

Using **few-shot learning** - the LLM learns from examples in the prompt.

In [None]:
# Few-Shot Sentiment Classifier with Structured Output
fewshot_pipeline = """
apiVersion: v1
kind: Pipeline
metadata:
  name: fewshot-sentiment-classifier
  description: Production sentiment analysis with few-shot learning

spec:
  ports:
    llm:
      namespace: plugin
      name: mock

  nodes:
    # Step 1: Build few-shot prompt
    - kind: prompt_node
      metadata:
        name: build_fewshot_prompt
      spec:
        template: |
          You are a product review sentiment analyzer. Classify reviews and extract key aspects.
          
          ## Few-Shot Examples:
          
          Review: "Terrible quality, broke after 2 days. Complete waste of money!"
          Output: {
            "sentiment": "negative",
            "confidence": 0.95,
            "aspects": {
              "quality": "poor",
              "value": "poor",
              "durability": "poor"
            },
            "key_phrases": ["terrible quality", "broke after 2 days", "waste of money"]
          }
          
          Review: "Decent product for the price. Works as expected, nothing special."
          Output: {
            "sentiment": "neutral",
            "confidence": 0.78,
            "aspects": {
              "quality": "acceptable",
              "value": "good",
              "performance": "acceptable"
            },
            "key_phrases": ["decent", "works as expected", "nothing special"]
          }
          
          Review: "Amazing! Best purchase ever. Great quality and fast shipping!"
          Output: {
            "sentiment": "positive",
            "confidence": 0.92,
            "aspects": {
              "quality": "excellent",
              "value": "excellent",
              "shipping": "fast"
            },
            "key_phrases": ["amazing", "best purchase", "great quality"]
          }
          
          ## Now classify this review:
          
          Review: "{{review}}"
          
          Output (JSON only, no explanation):
        output_format: messages
        dependencies: []

    # Step 2: Call LLM
    - kind: raw_llm_node
      metadata:
        name: classify
      spec:
        dependencies: [build_fewshot_prompt]

    # Step 3: Parse with strict schema
    - kind: parser_node
      metadata:
        name: parse_sentiment
      spec:
        output_schema:
          sentiment: str
          confidence: float
          aspects: dict
          key_phrases: list
        strategy: json
        strict: true
        dependencies: [classify]
"""

builder = YamlPipelineBuilder()
graph_fewshot, config_fewshot = builder.build_from_yaml_string(fewshot_pipeline)

print(f"‚úÖ Few-Shot Pipeline: {len(graph_fewshot.nodes)} nodes")
print(f"üìã Nodes: {list(graph_fewshot.nodes.keys())}")
print("üìù Template includes 3 few-shot examples")
print("üéØ Output schema: sentiment, confidence, aspects, key_phrases")

In [None]:
# Configure mock LLM with realistic response
from hexdag.builtin.adapters.mock.mock_llm import MockLLM

# Create mock with a realistic sentiment analysis response
mock_response = """{
  "sentiment": "positive",
  "confidence": 0.89,
  "aspects": {
    "quality": "excellent",
    "value": "good",
    "usability": "excellent"
  },
  "key_phrases": ["love this product", "works perfectly", "great value"]
}"""

mock_llm = MockLLM(responses=[mock_response])

# Instantiate ports with our custom mock
ports_fewshot = {"llm": mock_llm}
orchestrator_fewshot = Orchestrator(ports=ports_fewshot)

# Test with a real review
result = await orchestrator_fewshot.run(
    graph_fewshot,
    {"review": "I absolutely love this product! Works perfectly and great value for money."},
)

print("\nüìä Few-Shot Classification Results:")
print(f"Sentiment: {result['parse_sentiment'].sentiment}")
print(f"Confidence: {result['parse_sentiment'].confidence}")
print("\nAspects:")
for aspect, rating in result["parse_sentiment"].aspects.items():
    print(f"  ‚Ä¢ {aspect}: {rating}")
print("\nKey Phrases:")
for phrase in result["parse_sentiment"].key_phrases:
    print(f"  ‚Ä¢ {phrase}")

## Example 2: Handling Parse Errors

What happens when the LLM returns **invalid JSON**? Let's demonstrate error handling.

In [None]:
# Simulate LLM returning INVALID JSON
bad_responses = [
    # Response 1: Invalid JSON (missing closing brace)
    """{
  "sentiment": "positive",
  "confidence": 0.85,
  "aspects": {
    "quality": "good"
  """,
    # Response 2: Valid JSON but missing required field
    """{
  "sentiment": "positive",
  "confidence": 0.85
}""",
    # Response 3: LLM adds explanation text
    """Here's my analysis:
{
  "sentiment": "positive",
  "confidence": 0.85,
  "aspects": {"quality": "good"},
  "key_phrases": ["great"]
}
I think this review is positive because...""",
]

# Test with invalid JSON (should fail gracefully)
mock_llm_bad = MockLLM(responses=[bad_responses[0]])
ports_bad = {"llm": mock_llm_bad}
orchestrator_bad = Orchestrator(ports=ports_bad)

print("üî¥ Testing with INVALID JSON (missing closing brace)...\n")
try:
    result_bad = await orchestrator_bad.run(graph_fewshot, {"review": "Test review"})
    print("‚úÖ Parsing succeeded (unexpected!)")
except Exception as e:
    print(f"‚ùå Parse Error (expected): {type(e).__name__}")
    print(f"   Message: {str(e)[:200]}...")
    print("\nüí° In production, this would trigger a RETRY with error correction prompt")

## Example 3: JSON-in-Markdown Strategy

Handle LLMs that wrap JSON in markdown code blocks using the **`json_in_markdown`** parser strategy.

# Pipeline with json_in_markdown parser
markdown_pipeline = """
apiVersion: v1
kind: Pipeline
metadata:
  name: markdown-aware-classifier

spec:
  ports:
    llm:
      namespace: plugin
      name: mock

  nodes:
    - kind: prompt_node
      metadata:
        name: build_prompt
      spec:
        template: |
          Classify this review: {{review}}
          
          Return JSON with: sentiment, confidence, key_phrases
        output_format: messages
        dependencies: []

    - kind: raw_llm_node
      metadata:
        name: classify
      spec:
        dependencies: [build_prompt]

    - kind: parser_node
      metadata:
        name: parse_result
      spec:
        output_schema:
          sentiment: str
          confidence: float
          key_phrases: list
        strategy: json_in_markdown
        dependencies: [classify]
"""

graph_markdown, config_markdown = builder.build_from_yaml_string(markdown_pipeline)

# Mock LLM that returns JSON wrapped in markdown
markdown_response = """```json
{
  "sentiment": "positive",
  "confidence": 0.91,
  "key_phrases": ["excellent product", "highly recommend", "great service"]
}
```"""

mock_markdown = MockLLM(responses=[markdown_response])
ports_markdown = {"llm": mock_markdown}
orchestrator_markdown = Orchestrator(ports=ports_markdown)

result_markdown = await orchestrator_markdown.run(
    graph_markdown,
    {"review": "Excellent product! Highly recommend. Great service too."}
)

print("‚úÖ Extracted JSON from markdown code block!")
print(f"\nSentiment: {result_markdown['parse_result'].sentiment}")
print(f"Confidence: {result_markdown['parse_result'].confidence}")
print(f"Key Phrases: {', '.join(result_markdown['parse_result'].key_phrases)}")
print("\nüí° The parser automatically extracted JSON from the ```json code block")

## Summary: Production-Ready Patterns

This notebook demonstrated **real-world AI engineering patterns**:

### ‚úÖ What We Built

1. **Few-Shot Learning**
   - 3 labeled examples in the prompt
   - LLM learns classification patterns
   - Structured output with confidence scores

2. **Structured Output Parsing**
   - Type-safe schemas with Pydantic
   - Multiple fields: sentiment, confidence, aspects, key_phrases
   - Automatic validation

3. **Error Handling**
   - Demonstrated parse failures (invalid JSON)
   - Showed different parser strategies (`json`, `json_in_markdown`)
   - Clear error messages for debugging

4. **Batch Processing**
   - Processed 5 reviews efficiently
   - Aggregated statistics
   - Production-ready workflow

### üéØ Key Takeaways

**Composable Architecture Benefits**:
- **Separation of Concerns**: Prompting ‚â† LLM calls ‚â† Parsing
- **Testability**: Each component independently testable
- **Flexibility**: Swap prompts, parsers, or LLMs without changing code
- **Type Safety**: Pydantic validation catches errors early

**Few-Shot Learning**:
- 3-5 examples dramatically improve accuracy
- Examples teach the LLM your exact output format
- More reliable than zero-shot classification

**Error Handling**:
- Parser provides helpful error messages
- Multiple parsing strategies for different LLM behaviors
- Future: Automatic retry with error correction prompts

### üöÄ Next Steps

**For Production**:
1. Add **retry policies** with exponential backoff
2. Implement **error correction prompts** (pass parse errors back to LLM)
3. Use **real LLM** (`core:openai` instead of `plugin:mock`)
4. Add **observability** (logging, metrics, tracing)
5. Implement **caching** for repeated queries

**Advanced Patterns**:
- **Chain-of-Thought**: Ask LLM to explain reasoning
- **Self-Consistency**: Run multiple times, vote on results
- **Active Learning**: Flag low-confidence for human review
- **Dynamic Few-Shot**: Select best examples based on input

In [None]:
# Batch process multiple reviews
reviews_batch = [
    "Amazing product! Works exactly as advertised. Will buy again!",
    "Terrible experience. Product arrived damaged and customer service was unhelpful.",
    "It's okay. Does the job but nothing special. Overpriced for what you get.",
    "Love it! Great quality and fast delivery. Exceeded my expectations!",
    "Disappointed. Doesn't work as described. Requesting a refund.",
]

# Create mock with multiple responses
batch_responses = [
    '{"sentiment": "positive", "confidence": 0.94, "aspects": {"quality": "excellent", "value": "good"}, "key_phrases": ["amazing", "works as advertised", "will buy again"]}',
    '{"sentiment": "negative", "confidence": 0.96, "aspects": {"quality": "poor", "service": "poor"}, "key_phrases": ["terrible", "damaged", "unhelpful"]}',
    '{"sentiment": "neutral", "confidence": 0.82, "aspects": {"quality": "acceptable", "value": "poor"}, "key_phrases": ["okay", "does the job", "overpriced"]}',
    '{"sentiment": "positive", "confidence": 0.93, "aspects": {"quality": "excellent", "shipping": "fast"}, "key_phrases": ["love it", "great quality", "exceeded expectations"]}',
    '{"sentiment": "negative", "confidence": 0.91, "aspects": {"quality": "poor", "accuracy": "poor"}, "key_phrases": ["disappointed", "doesn\'t work", "refund"]}',
]

mock_batch = MockLLM(responses=batch_responses)
ports_batch = {"llm": mock_batch}
orchestrator_batch = Orchestrator(ports=ports_batch)

# Process all reviews
print("üìä Batch Processing Results:\n")
print("=" * 80)

results_summary = {"positive": 0, "negative": 0, "neutral": 0}

for i, review in enumerate(reviews_batch, 1):
    result = await orchestrator_batch.run(graph_fewshot, {"review": review})

    sentiment_data = result["parse_sentiment"]
    results_summary[sentiment_data.sentiment] += 1

    print(f"\n#{i} Review: {review[:60]}...")
    print(
        f"   Sentiment: {sentiment_data.sentiment.upper()} (confidence: {sentiment_data.confidence:.2f})"
    )
    print(f"   Top aspects: {', '.join(list(sentiment_data.aspects.keys())[:3])}")
    print(f"   Key phrases: {', '.join(sentiment_data.key_phrases[:2])}")

print("\n" + "=" * 80)
print("\nüìà Summary Statistics:")
print(
    f"   Positive: {results_summary['positive']} ({results_summary['positive'] / len(reviews_batch) * 100:.0f}%)"
)
print(
    f"   Neutral:  {results_summary['neutral']} ({results_summary['neutral'] / len(reviews_batch) * 100:.0f}%)"
)
print(
    f"   Negative: {results_summary['negative']} ({results_summary['negative'] / len(reviews_batch) * 100:.0f}%)"
)
print(f"\n‚úÖ Processed {len(reviews_batch)} reviews successfully!")

## Example 4: Batch Processing Multiple Reviews

Process **multiple reviews** efficiently with the same pipeline.