# Section 2.3: Patterns for Reasoning

| **Aspect** | **Details** |
|-------------|-------------|
| **Goal** | Master few-shot exemplars, chain-of-thought reasoning, and reference citations. |
| **Time** | ~25 minutes |
| **Prerequisites** | Complete Sections 2.1–2.2 and understand role prompting + structured inputs. |
| **Next Steps** | Continue to Section 2.4: Automation and Evaluation |

---

## 🔧 Quick Setup Check

Since you completed Sections 2.1-2.2, setup is already done! We just need to import it.

In [ ]:
# Quick setup check - imports setup_utils
try:
    import importlib
    import setup_utils
    importlib.reload(setup_utils)
    from setup_utils import *
    print(f"✅ Setup loaded! Using {AVAILABLE_PROVIDERS} with {get_default_model()}")
    print("🚀 Ready to learn reasoning patterns!")
except ImportError:
    print("❌ Setup not found!")
    print("💡 Please run 2.1-setup-and-foundations.ipynb first to set up your environment.")

---

### 📚 Tactic 3: Few-Shot Examples

**Teach AI your preferred styles and standards through carefully crafted examples**

Providing examples is your secret weapon for consistent, accurate outputs. This technique (called "few-shot" or "multishot" prompting) is especially effective for structured outputs and specific formats.

**What "Styles and Standards" Means:**
Coding conventions • Documentation formats • Error message patterns • Commit message styles • Test case patterns • API response structures

**Few-Shot Terminology:**

The number of "shots" = how many examples you provide:
- **Zero-shot:** No examples (instructions only)
- **One-shot:** Single example
- **Few-shot:** 2-5 examples (sweet spot for most tasks)
- **N-shot:** Many examples for complex patterns

**Why Examples Work:**

Showing AI how you want it to behave (or *not* behave) is powerful for:
1. **Right answer:** Clarifies ambiguous requirements
2. **Right format:** Demonstrates exact structure and style
3. **Pattern learning:** AI infers rules you'd struggle to describe
4. **Less confusion:** Eliminates guesswork

**Best Practices:**
- Mirror your actual use case
- Include 3-5 diverse examples for best results
- Wrap in `<example>` tags (or `<examples>` for multiple)
- Cover edge cases without creating unintended patterns
- More examples = better performance for complex tasks

*Reference: [Claude Documentation - Multishot Prompting](https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting)*

Let's teach the AI to extract structured data from logs in a consistent format:

In [None]:
# Shared setup helpers (run Section 2.1 first to install dependencies)
from setup_utils import get_chat_completion


In [None]:
# Few-shot examples for consistent log parsing
few_shot_messages = [
    {"role": "system", "content": "Extract service names and error types from log entries following the examples provided."},
    
    # Example 1
    {"role": "user", "content": 'Extract from: "[ERROR] payment-service: Database connection pool exhausted"'},
    {"role": "assistant", "content": "Service: payment-service, Error: connection_pool"},
    
    # Example 2  
    {"role": "user", "content": 'Extract from: "[WARN] user-auth: Rate limit exceeded for API endpoint"'},
    {"role": "assistant", "content": "Service: user-auth, Error: rate_limit"},
    
    # Example 3
    {"role": "user", "content": 'Extract from: "[ERROR] notification-hub: Message queue timeout after 30s"'},
    {"role": "assistant", "content": "Service: notification-hub, Error: timeout"},
    
    # New log entry following the established pattern
    {"role": "user", "content": 'Extract from: "[ERROR] inventory-manager: Cache invalidation failed during peak load"'}
]

few_shot_response = get_chat_completion(few_shot_messages)
print("📚 CONSISTENT LOG PARSING RESPONSE:")
print(few_shot_response)
print("\n" + "="*70 + "\n")

🎯 **Perfect!** Notice how the AI learned the exact format and style from the examples and applied it consistently.

**Why This Works - System Messages + Examples:**

This example demonstrates how **few-shot learning differs from role prompting**:

- **Role Prompting (Tactic 1):** Assigns AI a *persona* or *expertise* (e.g., "You are a security engineer") → Focuses on domain knowledge and perspective
- **Few-Shot Examples (Tactic 3):** Teaches AI a *pattern* or *style* through concrete examples → Focuses on format, structure, and consistency

**The Mechanism:**
- The `system` message "Answer in a consistent style using the examples provided" primes the AI to look for patterns
- The `user`/`assistant` pairs demonstrate the desired input-output relationship
- The AI learns: "concise format," "specific structure," "technical accuracy" from seeing 2-3 examples
- When given a new question, it applies the learned pattern automatically

**Why Both Matter:**
- Use role prompting when you need **specialized knowledge** (security vulnerabilities, performance optimization)
- Use few-shot examples when you need **consistent formatting** (commit messages, error patterns, documentation style)
- Combine both for powerful results: "You are a senior engineer" (role) + 3 examples of your team's code review format (pattern)


---

### 🎯 Try It Yourself: Few-Shot Examples

**Common Misconception:** AI automatically knows your preferred categorization and format without examples.

**The Reality:** Examples teach AI exactly what you want, enforcing consistency across outputs.

**Your Task:** You want AI to categorize infrastructure alerts into severity categories. Currently, it might use inconsistent categories. Add 3 few-shot examples to establish your categorization system, then test it!

**Your Team's Categories:**
- **SECURITY:** Authentication, certificates, access control
- **AVAILABILITY:** Service health, uptime, connectivity
- **MAINTENANCE:** Backups, updates, routine tasks

In [None]:
# ❌ BAD: No examples (generic, inconsistent categorization)
bad_messages = [
    {"role": "system", "content": "Categorize infrastructure alerts."},
    {"role": "user", "content": 'Categorize this alert: "Failed login attempts increased 400% on admin portal"'}
]

bad_response = get_chat_completion(bad_messages)
print("=" * 70)
print("WITHOUT EXAMPLES (Inconsistent Categorization):")
print("=" * 70)
print(bad_response)
print("\n")

# ✅ YOUR TURN: Add few-shot examples for alert categorization
# TODO: Uncomment and complete this section
# good_messages = [
#     {"role": "system", "content": "Categorize infrastructure alerts following the examples provided."},
#     
#     # Example 1
#     {"role": "user", "content": 'Categorize: "SSL certificate expires in 7 days for payment gateway"'},
#     {"role": "assistant", "content": "Category: SECURITY"},
#     
#     # Example 2
#     {"role": "user", "content": 'Categorize: "Load balancer health check failing for 2/8 backend servers"'},
#     {"role": "assistant", "content": "Category: AVAILABILITY"},
#     
#     # Example 3
#     {"role": "user", "content": 'Categorize: "Backup job completed with 3 file permission warnings"'},
#     {"role": "assistant", "content": "Category: MAINTENANCE"},
#     
#     # Now the actual request
#     {"role": "user", "content": 'Categorize: "Failed login attempts increased 400% on admin portal"'}
# ]
# 
# good_response = get_chat_completion(good_messages)
# print("=" * 70)
# print("WITH FEW-SHOT EXAMPLES (Consistent Categorization):")
# print("=" * 70)
# print(good_response)
# 
# print("\n💡 See how examples enforce your exact categories? The AI correctly identifies this as SECURITY!")

---

### ⛓️‍💥 Tactic 4: Chain-of-Thought Reasoning

**Guide AI through systematic step-by-step problem breakdown**

**Core Principle:** When faced with complex tasks like research, analysis, or problem-solving, having AI models break down problems into explicit, sequential steps dramatically improves performance. This technique, known as chain of thought (CoT) prompting, encourages the AI to work through problems methodically rather than jumping straight to conclusions, leading to more accurate and nuanced outputs.

Think of it like showing your work in math class—by making the intermediate reasoning steps visible, you catch errors, verify logic, and produce more reliable results.

**Why This Works:**
- **Accuracy:** Breaking problems into steps reduces errors, especially in math, logic, analysis, or complex tasks with multiple considerations
- **Coherence:** Structured thinking leads to more cohesive, well-organized responses  
- **Debugging:** Seeing the AI's thought process helps you pinpoint where prompts may be unclear or where reasoning breaks down
- **Transparency:** Makes AI decision-making auditable and explainable

**When to Use CoT:**
- Use for tasks that a human would need to think through carefully
- Examples: complex math, multi-step analysis, writing complex documents, decisions with many factors
- Especially valuable for: debugging workflows, architectural decisions, security analysis, test generation
- **Note:** Increased output length may impact latency, so use judiciously

**How to Implement CoT (from least to most complex):**

1. **Basic prompt:** Include "Think step-by-step" in your prompt
2. **Guided prompt:** Outline specific steps for the AI to follow in its thinking process  
3. **Structured prompt:** Use XML tags like `<thinking>` and `<answer>` to separate reasoning from the final answer

**Important:** Always have the AI output its thinking. Without outputting its thought process, no thinking occurs!

**Examples Below:** This section demonstrates three CoT patterns: forcing evaluation after solving, systematic multi-step code analysis, and a comprehensive review combining multiple techniques.

Test generation, code reviews, debugging workflows, architecture decisions, and security analysis are all critical areas where methodical analysis prevents missed issues.

*Reference: [Claude Documentation - Chain of Thought](https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought)*



#### **CoT Technique: Force AI to Analyze Before Diagnosing**

When troubleshooting production incidents, jumping to conclusions is dangerous. AI models exhibit similar behavior—they'll suggest a root cause prematurely if asked to diagnose immediately.

**The Problem:** Asking "What's the root cause?" directly causes AI to:
- Jump to obvious symptoms without deeper analysis
- Miss metric correlations and timeline patterns
- Confuse symptoms with actual root causes

**The Solution:** Force systematic analysis first:
1. Correlate all metrics with timeline
2. Identify primary vs. secondary symptoms
3. Trace causal relationships (A → B → C)
4. Only then diagnose root cause

**The Pattern:** *"Don't diagnose until you've systematically analyzed all metrics and their relationships."*

This technique is critical for incident response and performance debugging. Let's see it in action:

In [None]:
# Example scenario: Root Cause Analysis (RCA) for production incidents
# We'll compare two approaches: immediate judgment vs. systematic analysis

problem = """
Problem: API errors spiked from 1% to 15% at 2 PM. CPU normal, memory up 30%, disk I/O high.

Step 1: Correlate metrics with timeline
- Error spike: 2 PM
- Memory increase: Started 1:55 PM
- Disk I/O spike: Started 2 PM
- CPU usage: Normal throughout

Step 2: Identify primary symptom
- Main issue: API errors (functional impact)
- Secondary: Resource usage changes (memory, disk I/O)

Step 3: Find root cause
- Disk I/O correlates exactly with error spike (both at 2 PM)
- Memory leak started earlier (1:55 PM), likely caused disk swapping
- CPU normal rules out compute bottleneck
- Causal chain: Memory leak → disk swapping → API timeouts

Conclusion / Root cause:
Memory leak triggered disk swapping at 2 PM, causing API request timeouts and error spike.
"""

new_incident = """
A web service's response time spiked from 200ms to 2000ms at 3 PM. 
CPU usage is normal, memory usage increased by 20%, and database query count doubled.
"""

# ❌ BAD APPROACH: Asking AI to diagnose immediately
# Risk: AI may jump to conclusions without systematic analysis
print("=" * 70)
print("BAD APPROACH: Immediate Diagnosis")
print("=" * 70)

bad_messages = [
    {
        "role": "system",
        "content": "You are a site reliability engineer."
    },
    {
        "role": "user",
        "content": f"""Diagnose this incident:

{new_incident}

What's the likely root cause?"""
    }
]

bad_response = get_chat_completion(bad_messages)
print(bad_response)
print("\n")

# ✅ GOOD APPROACH: Force AI to follow RCA methodology first, THEN diagnose
# Benefit: AI develops systematic understanding and can identify root cause accurately
print("=" * 70)
print("GOOD APPROACH: Systematic RCA Analysis")
print("=" * 70)

good_messages = [
    {
        "role": "system",
        "content": "You are a site reliability engineer skilled in root cause analysis."
    },
    {
        "role": "user",
        "content": f"""Study this RCA methodology example:

{problem}

Now apply the SAME 3-step format to diagnose this new incident:

{new_incident}

Use this exact structure:

Step 1: Correlate metrics with timeline
[List each metric with its timing]

Step 2: Identify primary symptom
[Distinguish main issue from secondary symptoms]

Step 3: Find root cause
[Analyze correlations, rule out non-factors, identify causal chain]

Conclusion / Root cause:
[Provide final diagnosis in one clear sentence]

Important: Follow the example's format exactly. Work through all steps before concluding."""
    }
]

good_response = get_chat_completion(good_messages)
print(good_response)

**📌 Key Takeaway: Systematic Analysis Beats Quick Diagnosis**

Notice the difference:
- **Bad approach:** AI jumps to conclusions without rigorous analysis
- **Good approach:** AI systematically correlates metrics, identifies causal chains, then diagnoses

**What Triggers Chain-of-Thought Reasoning?**

Any prompt structure that forces sequential, visible reasoning will trigger CoT. The more explicit your steps, the more systematic the analysis.

**5 Ways to Activate CoT:**

1. **Simple phrases:** "Think step-by-step" • "Before answering, first..." • "Show your reasoning"

2. **Structured instructions:**
   ```
   Step 1: Analyze inputs
   Step 2: Consider edge cases
   Step 3: Develop solution
   ```

3. **XML tags:** `<thinking>`, `<analysis>`, `<solution>` • Combine: "In <thinking> tags, work through this step-by-step"

4. **Numbered requirements:** "List three approaches and evaluate each" • "Identify all risks numbered 1-N"

5. **"Before X, First Y" patterns:** "Before diagnosing, first correlate all metrics" • "Before recommending fixes, first reproduce the issue"

#### **CoT Technique: Structured Analysis with XML Tags**

The previous example showed systematic RCA methodology. Now let's see another powerful CoT approach: **using XML tags to structure multi-step analysis**.

This example demonstrates **XML-structured Chain-of-Thought** where we:
- Use `<thinking>` tags to define explicit analysis steps
- Use `<analysis>` tags to document detailed findings
- Use `<solution>` tags to provide actionable recommendations
- Separate reasoning from conclusions for clear, auditable decision-making

This approach is particularly effective for:
- **Production troubleshooting** where you need clear reasoning trails
- **Complex debugging** with multiple potential root causes
- **Code reviews** requiring systematic analysis (security, performance, maintainability)
- **Incident post-mortems** where documentation and reproducibility matter

Let's see XML-structured CoT in action with a production bug investigation:

In [None]:
# Chain-of-thought for systematic troubleshooting
system_message = """You are a senior engineer debugging production issues. Use systematic step-by-step analysis.

Structure your response using XML tags:

<thinking>
Step 1: Analyze the symptoms and reproduce the issue
Step 2: Examine relevant logs and stack traces
Step 3: Identify potential root causes
Step 4: Trace the execution flow to pinpoint the problem
</thinking>

<analysis>
Provide detailed findings for each step, explaining what you discovered and why it matters
</analysis>

<solution>
Recommend specific fixes with code changes and verification steps
</solution>"""

user_message = """
Debug this production issue:

**Symptoms:**
- User reports: "Items randomly disappear from shopping cart"
- Happens intermittently, ~10% of users affected
- No errors in application logs
- Issue started after deploying new caching layer

**Code:**
```python
# Cart service with Redis cache
class CartService:
    def add_item(self, user_id, item_id, quantity):
        cart = redis.get(f'cart:{user_id}') or []
        cart.append({'item': item_id, 'qty': quantity})
        redis.set(f'cart:{user_id}', cart, ex=3600)  # 1 hour TTL
        return cart
    
    def get_cart(self, user_id):
        return redis.get(f'cart:{user_id}') or []
    
    def remove_item(self, user_id, item_id):
        cart = redis.get(f'cart:{user_id}') or []
        cart = [item for item in cart if item['item'] != item_id]
        redis.set(f'cart:{user_id}', cart, ex=3600)
        return cart
```

**Environment:**
- 5 application servers behind load balancer
- Single Redis instance (no clustering)
- Average request rate: 500 req/sec
"""

chain_messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message}
]

chain_response = get_chat_completion(chain_messages)
print("🔗 SYSTEMATIC TROUBLESHOOTING ANALYSIS:")
print(chain_response)

🚀 **Excellent!** The AI followed each step methodically, providing structured, comprehensive analysis of the production issue.

---

### 🎯 Try It Yourself: Chain-of-Thought Reasoning

**Common Misconception:** AI can analyze complex code and spot all issues instantly without systematic thinking.

**The Reality:** Step-by-step reasoning with structured outputs catches issues that instant analysis misses.

**Your Task:** Below is code with multiple issues. The first prompt asks for instant analysis. Fix it by adding chain-of-thought reasoning with XML tags:
1. Use `<security>` tags for security analysis
2. Use `<performance>` tags for performance review
3. Use `<quality>` tags for code quality assessment
4. Use `<recommendations>` tags for prioritized fixes

Compare which approach finds all the issues!

In [None]:
# Code with multiple issues (SQL injection, connection pooling, error handling)
code_to_review = """
from flask import Flask, request, jsonify
import sqlite3

app = Flask(__name__)

@app.route('/user/<user_id>')
def get_user(user_id):
    conn = sqlite3.connect('users.db')
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
    user = cursor.fetchone()
    conn.close()
    
    if user:
        return jsonify({
            "id": user[0],
            "name": user[1], 
            "email": user[2]
        })
    else:
        return jsonify({"error": "User not found"}), 404
"""

context = """
This is a user lookup endpoint for a web application that serves user profiles.
The application handles 1000+ requests per minute during peak hours.
"""

# ❌ BAD: Instant analysis (might miss issues)
bad_messages = [
    {
        "role": "user",
        "content": f"Review this code:\n\n{code_to_review}\n\nContext: {context}"
    }
]

bad_response = get_chat_completion(bad_messages)
print("=" * 70)
print("INSTANT ANALYSIS (No Chain-of-Thought):")
print("=" * 70)
print(bad_response)
print("\n")

# ✅ YOUR TURN: Add chain-of-thought reasoning with XML tags
# TODO: Uncomment and complete this section
# good_messages = [
#     {
#         "role": "system",
#         "content": """You are a senior software engineer conducting a comprehensive code review.
# 
# Analyze the code systematically using XML tags:
# 
# <security>
# Identify security vulnerabilities with severity levels
# </security>
# 
# <performance>
# Analyze efficiency and optimization opportunities
# </performance>
# 
# <quality>
# Evaluate readability, maintainability, and best practices
# </quality>
# 
# <recommendations>
# Provide specific, prioritized fixes with code examples
# </recommendations>"""
#     },
#     {
#         "role": "user",
#         "content": f"""Review this code:
# 
# <code>
# {code_to_review}
# </code>
# 
# <context>
# {context}
# </context>
# 
# Perform a comprehensive code review using the structured XML format."""
#     }
# ]
# 
# good_response = get_chat_completion(good_messages)
# print("=" * 70)
# print("CHAIN-OF-THOUGHT ANALYSIS WITH XML TAGS:")
# print("=" * 70)
# print(good_response)
# 
# print("\n💡 The systematic approach catches SQL injection, connection pooling issues, and more!")

---

<div style="margin:20px 0; padding:16px 24px; background:linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius:10px; color:#fff; text-align:center; box-shadow:0 4px 15px rgba(102,126,234,0.3);">
  <strong style="font-size:1.05em;">☕ Halfway there! Your mind absorbs best with rest.</strong><br>
  <span style="font-size:0.92em; opacity:0.95; margin-top:4px; display:block;">Take a short break—stretch, hydrate, and return refreshed for the next tactics.</span>
</div>

---

### 📖 Tactic 5: Reference Citations

**Ground responses in actual documentation to reduce hallucinations**

**Core Principle:** When working with long documents or multiple reference materials, asking AI models to quote relevant parts of the documents first before carrying out tasks helps them cut through the "noise" and focus on pertinent information. This technique is especially powerful when working with extended context windows.

**Why This Works:**
- The AI identifies and focuses on relevant information before generating responses
- Citations make outputs verifiable and trustworthy
- Reduces hallucination by grounding responses in actual source material
- Makes it easy to trace conclusions back to specific code or documentation sections

**Best Practices for Long Context:**
- **Put longform data at the top:** Place long documents (~20K+ tokens) near the top of your prompt, above queries and instructions (can improve response quality by up to 30%)
- **Structure with XML tags:** Use `<documents>`, `<document>`, `<source>`, and `<document_content>` tags to organize multiple documents
- **Request quotes first:** Ask the AI to extract relevant quotes in `<quotes>` tags before generating the final response

**Working with External Documentation in IDEs:**

When using AI coding assistants like **GitHub Copilot**, **Claude Code**, or **OpenAI Codex**, you often need to reference documentation outside your codebase. Here's how to provide context effectively:

**1. Structure documents with XML tags (Recommended):**

Use this format for local files or external documentation. Keep a `docs/` folder with frequently-used references:
- `docs/api-conventions.md` - Your team's API standards
- `docs/external-apis/stripe.md` - Third-party API summaries  
- `docs/architecture.md` - System design decisions
- `docs/error-codes.md` - Standard error codes and handling

Then reference them in your prompts:

```xml
<documents>
  <document index="1">
    <source>docs/api-guide.md</source>
    <document_content>
    # Stripe Payment API
    POST /v1/payment_intents
    Creates a PaymentIntent object.
    
    Required fields:
    - amount (integer): Amount in cents
    - currency (string): 3-letter ISO code
    </document_content>
  </document>
  
  <document index="2">
    <source>docs/authentication.md</source>
    <document_content>
    Authentication uses Bearer tokens in the Authorization header.
    Example: Authorization: Bearer sk_test_abc123
    </document_content>
  </document>
</documents>

Based on the documentation above, implement a payment creation function.
```

**2. IDE-specific file shortcuts:**
- **GitHub Copilot Chat:** Use `#file:docs/api-guide.md` to quickly reference specific files
- **Claude Code:** Use `@filename` (e.g., `@docs/api-guide.md`) or open files in your editor as context
- **VS Code with Copilot:** Use `@workspace` or keep documentation files open in tabs

**3. Reference external URLs with key excerpts:**

For web documentation, extract and structure the relevant parts:

```xml
<document>
  <source>https://docs.stripe.com/api/payment_intents</source>
  <document_content>
  Key requirements:
  - Authentication: Bearer token in Authorization header  
  - amount must be in cents (e.g., $10.00 = 1000)
  - currency must be 3-letter ISO code
  - Successful response returns 200 with payment_intent object
  </document_content>
</document>
```

**Pro Tip:** Maintain local markdown summaries of frequently-used external APIs in your repo. This gives AI assistants grounded reference material and prevents hallucination of API details.

Code review with large codebases, documentation generation from source files, security audit reports, and analyzing API documentation all become more effective with proper citations.

*Reference: [Claude Documentation - Long Context Tips](https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/long-context-tips)*

#### Example 1: Code Review with Multiple Files

Let's demonstrate how to structure multiple code files and ask the AI to extract relevant quotes before providing analysis:


In [None]:
# Example: Multi-file code review with quote extraction
auth_service = """
class AuthService:
    def __init__(self, db_connection):
        self.db = db_connection
    
    def authenticate_user(self, username, password):
        # TODO: Add password hashing
        query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
        result = self.db.execute(query)
        return result.fetchone() is not None
    
    def create_session(self, user_id):
        session_id = str(uuid.uuid4())
        # Session expires in 24 hours
        expiry = datetime.now() + timedelta(hours=24)
        self.db.execute(f"INSERT INTO sessions VALUES ('{session_id}', {user_id}, '{expiry}')")
        return session_id
"""

user_controller = """
from flask import Flask, request, jsonify
from auth_service import AuthService

app = Flask(__name__)
auth = AuthService(db_connection)

@app.route('/login', methods=['POST'])
def login():
    username = request.json.get('username')
    password = request.json.get('password')
    
    if auth.authenticate_user(username, password):
        user_id = get_user_id(username)
        session_id = auth.create_session(user_id)
        return jsonify({'session_id': session_id, 'status': 'success'})
    else:
        return jsonify({'status': 'failed'}), 401
"""

# Structure the prompt with documents at the top, query at the bottom
messages = [
    {
        "role": "system",
        "content": "You are a senior security engineer reviewing code for vulnerabilities."
    },
    {
        "role": "user",
        "content": f"""<documents>
<document index="1">
<source>auth_service.py</source>
<document_content>
{auth_service}
</document_content>
</document>

<document index="2">
<source>user_controller.py</source>
<document_content>
{user_controller}
</document_content>
</document>
</documents>

Review the authentication code above for security vulnerabilities. 

First, extract relevant code quotes that demonstrate security issues and place them in <quotes> tags with the source file indicated.

Then, provide your security analysis in <analysis> tags, explaining each vulnerability and its severity.

Finally, provide specific remediation recommendations in <recommendations> tags."""
    }
]

response = get_chat_completion(messages)
print("🔒 SECURITY REVIEW WITH CITATIONS:")
print(response)


#### Example 2: API Documentation Analysis

Now let's analyze API documentation to extract specific information with citations:


In [None]:
# Example: Analyzing API documentation with quote grounding
api_docs = """
# Payment API Documentation

## Authentication
All API requests require an API key passed in the `X-API-Key` header.
Rate limit: 1000 requests per hour per API key.

## Create Payment
POST /api/v2/payments

Creates a new payment transaction.

**Request Body:**
- amount (required, decimal): Payment amount in USD
- currency (optional, string): Currency code, defaults to "USD"
- customer_id (required, string): Customer identifier
- payment_method (required, string): One of: "card", "bank", "wallet"
- metadata (optional, object): Additional key-value pairs

**Rate Limit:** 100 requests per minute

**Response:**
{
  "payment_id": "pay_abc123",
  "status": "pending",
  "amount": 99.99,
  "created_at": "2024-01-15T10:30:00Z"
}

## Retrieve Payment
GET /api/v2/payments/{payment_id}

Retrieves details of a specific payment.

**Security Note:** Only returns payments belonging to the authenticated API key's account.

**Response Codes:**
- 200: Success
- 404: Payment not found
- 401: Invalid API key
"""

integration_question = """
I need to integrate payment processing into my e-commerce checkout flow.
The checkout needs to:
1. Create a payment when user clicks "Pay Now"
2. Handle USD and EUR currencies
3. Store order metadata with the payment
4. Check payment status after creation

What do I need to know from the API documentation?
"""

messages = [
    {
        "role": "system",
        "content": "You are a technical integration specialist helping developers implement APIs."
    },
    {
        "role": "user",
        "content": f"""<documents>
<document index="1">
<source>payment_api_docs.md</source>
<document_content>
{api_docs}
</document_content>
</document>
</documents>

<integration_requirements>
{integration_question}
</integration_requirements>

First, find and quote the relevant sections from the API documentation that address the integration requirements. Place these quotes in <quotes> tags with the section name indicated.

Then, provide a step-by-step integration guide in <integration_guide> tags that references the quoted documentation."""
    }
]

response = get_chat_completion(messages)
print("📚 API INTEGRATION GUIDE WITH CITATIONS:")
print(response)


#### Key Takeaways: Reference Citations

**Best Practices Demonstrated:**
1. **Document Structure:** Used `<documents>` and `<document>` tags with `<source>` and `<document_content>` metadata
2. **Documents First:** Placed all reference materials at the top of the prompt, before the query
3. **Quote Extraction:** Asked AI to extract relevant quotes first, then perform analysis
4. **Structured Output:** Used XML tags like `<quotes>`, `<analysis>`, and `<recommendations>` to organize responses


---

### 🎯 Try It Yourself: Reference Citations

**Common Misconception:** AI can accurately implement APIs from general knowledge without documentation.

**The Reality:** Without reference citations, AI invents plausible-sounding but incorrect details. Quote extraction grounds responses in actual documentation.

**Documentation Setup:**

This exercise uses actual documentation files in the `docs/` directory:
```
module-02-fundamentals/
├── docs/
│   ├── stripe-api-guide.md       # API endpoint details, required fields
│   └── stripe-authentication.md  # Authentication methods, security
├── 2.1-setup-and-foundations.ipynb
├── 2.2-roles-and-structure.ipynb
├── 2.3-patterns-for-reasoning.ipynb
├── 2.4-automation-and-evaluation.ipynb
└── 2.5-hands-on-practice.ipynb
```

**Your Task:** Compare two approaches to the same task:

**Bad approach (without citations):**
- Vague prompt with no documentation
- AI relies on general knowledge about Stripe
- Results in hallucinations (wrong fields, wrong endpoints, wrong auth)

**Good approach (with citations):**
1. Load documentation from actual files: `docs/stripe-api-guide.md`, `docs/stripe-authentication.md`
2. Structure using XML tags: `<documents>`, `<document>`, `<source>`, `<document_content>`
3. Request quote extraction in `<quotes>` tags first
4. AI implements using ONLY the quoted information

**What you'll learn:** See programmatic validation detect hallucinations in the bad response, then watch them disappear with proper citations!

In [None]:
# Load Payment API Documentation from actual files
with open('docs/stripe-api-guide.md', 'r') as f:
    api_guide = f.read()

with open('docs/stripe-authentication.md', 'r') as f:
    authentication_guide = f.read()

# Helper function to validate responses against documentation
def validate_response(response, label):
    """Check if response matches documentation (detects hallucinations)"""
    required_fields = ["amount", "currency", "payment_method"]
    endpoint = "/v1/payment_intents"
    
    print(f"\n{'='*70}")
    print(f"🔍 HALLUCINATION CHECK - {label}:")
    print('='*70)
    
    # Check key requirements
    checks = {
        "Has required fields (amount, currency, payment_method)": all(field in response.lower() for field in required_fields),
        "Uses correct endpoint /v1/payment_intents": endpoint in response,
        "Uses Bearer token authentication": "bearer" in response.lower(),
    }
    
    print("\nValidating against documentation:")
    for check, passed in checks.items():
        print(f"  {'✅' if passed else '❌'} {check}")
    
    # Detect common hallucinations
    hallucinations = []
    if any(field in response.lower() for field in ["customer_name", "card_number", "stripe_key", "api_secret"]):
        hallucinations.append("Invents fields not in docs")
    if "x-api-key" in response.lower() or ("api key" in response.lower() and "bearer" not in response.lower()):
        hallucinations.append("Wrong authentication method")
    
    if hallucinations:
        print(f"\n⚠️  Hallucinations detected: {', '.join(hallucinations)}")
        return len(hallucinations)
    else:
        print("\n✅ No hallucinations detected!")
        return 0

# ❌ BAD: Vague request without structured citations
bad_messages = [{
    "role": "user",
    "content": """Create a Python function to process Stripe payments. 
    
The function should:
- Accept payment details like amount, card info, customer details
- Handle authentication with Stripe API
- Return the payment status

Make it production-ready with proper error handling."""
}]

print("="*70)
print("WITHOUT REFERENCE CITATIONS:")
print("="*70)
bad_response = get_chat_completion(bad_messages)
print(bad_response)
bad_count = validate_response(bad_response, "WITHOUT CITATIONS")

print("\n" + "="*70 + "\n")

# ✅ YOUR TURN: Use multi-document XML structure with quote extraction
# TODO: Uncomment and complete
good_messages = [{
    "role": "user",
    "content": f"""<documents>
  <document index="1">
    <source>docs/stripe-api-guide.md</source>
    <document_content>
      {api_guide}
    </document_content>
  </document>
  
  <document index="2">
    <source>docs/stripe-authentication.md</source>
    <document_content>
      {authentication_guide}
    </document_content>
  </document>
</documents>

Task: Create a Python function to process Stripe payments.

The function should:
- Accept payment details like amount, card info, customer details
- Handle authentication with Stripe API
- Return the payment status

Make it production-ready with proper error handling.

Step 1: Extract relevant quotes from the documentation above.
In <quotes> tags, extract:
- Required API fields from document 1
- Authentication format from document 2
- Correct endpoint from document 1

Step 2: Using ONLY the quoted information, provide implementation in <code> tags."""
}]

print("="*70)
print("WITH REFERENCE CITATIONS:")
print("="*70)
good_response = get_chat_completion(good_messages)
print(good_response)
good_count = validate_response(good_response, "WITH CITATIONS")

# Show comparison
print(f"\n{'='*70}")
print("📊 COMPARISON:")
print('='*70)
print(f"Without citations: {bad_count} hallucination(s) ⚠️")
print(f"With citations: {good_count} hallucination(s) ✅")
print("\n💡 Without documentation, AI invents plausible but incorrect details!")
print("   With citations, AI stays grounded in actual API requirements.")

---

<div style="margin:24px 0; padding:20px 24px; background:linear-gradient(135deg, #f8fafc 0%, #e2e8f0 100%); border-radius:12px; border-left:5px solid #f59e0b; box-shadow:0 2px 8px rgba(0,0,0,0.1);">
  <div style="color:#1e293b; font-size:0.85em; font-weight:600; text-transform:uppercase; letter-spacing:1px; margin-bottom:8px;">⏭️ Next Section</div>
  <div style="color:#0f172a; font-size:1.15em; font-weight:700; margin-bottom:6px;">Section 2.4: Automation and Evaluation</div>
  <div style="color:#475569; font-size:0.95em; line-height:1.5; margin-bottom:12px;">Build prompt chains, self-correction loops, and LLM-as-judge workflows for automated quality assurance.</div>
  <a href="./2.4-automation-and-evaluation.ipynb" style="display:inline-block; padding:8px 16px; background:#f59e0b; color:#fff; text-decoration:none; border-radius:6px; font-weight:600; font-size:0.9em; transition:all 0.2s;">Continue to Section 2.4 →</a>
</div>