# Chapter 6: Prompt Engineering - Medium Tasks (Solutions)

Complete solutions for all Medium Tasks.

## Solutions Summary

### Task 1: Prompt Builder

**Key insights:**
- All code provided is working
- Students experiment with different component combinations
- Format and Audience typically have the biggest impact
- Different scenarios benefit from different components

**Example prompts created:**
- Customer service: All 7 components for structured, empathetic responses
- Technical documentation: Persona, Instruction, Audience, Format, Tone
- Creative tasks: Fewer components for flexibility

### Task 2: Self-Consistency

**How it works:**
1. Sample multiple reasoning paths (temperature > 0)
2. Extract answer from each path
3. Take majority vote

**Results:**
- Simple problems: All samples typically agree
- Tricky problems (bat and ball): Self-consistency catches common mistakes
- More samples = more reliable, but more expensive

**Trade-offs:**
- Pro: Higher accuracy on complex problems
- Con: N times more expensive (N samples)
- Con: Slower response time

**When to use:**
- High-stakes decisions
- Counter-intuitive problems
- When single-path accuracy is insufficient

### Task 3: Constrained JSON Output

**Key technique:**
```python
output = llm.create_chat_completion(
    messages=[{"role": "user", "content": prompt}],
    response_format={"type": "json_object"},
    temperature=0,
    max_tokens=300
)
```

**Benefits:**
- Guaranteed valid JSON (no parsing errors)
- Reliable integration with applications
- Consistent structure

**Schema validation:**
```python
def validate_schema(data, required_fields):
    missing = []
    for field in required_fields:
        if field not in data:
            missing.append(field)
    if missing:
        return False, f"Missing fields: {', '.join(missing)}"
    return True, "Valid schema"
```

**When to use:**
- Building applications that consume LLM output
- When structure is critical
- API integrations

### Task 4: Prompt Optimization

**Systematic approach:**
1. Start with baseline
2. Add definitions
3. Add examples
4. Measure accuracy on consistent test set

**Results:**
- V1 (basic): Baseline accuracy
- V2 (definitions): Usually 10-20% improvement
- V3 (examples): Another 5-15% improvement

**What helps most:**
- Definitions: Clarify boundaries between categories
- Examples: Show the desired format and edge cases
- Both together: Best results

**Iterative improvement:**
- Identify failing cases
- Add examples that cover those cases
- Refine definitions
- Test again

## Questions Answered

### Task 1

1. **Which component had the biggest impact on output quality?**
   - Format and Audience typically have the most impact
   - Format: Controls structure
   - Audience: Affects language level and complexity

2. **How did adding audience change the language used?**
   - Simpler vocabulary for younger audiences
   - More technical language for expert audiences
   - Different tone and formality levels

3. **When would you intentionally omit certain components?**
   - Simple tasks: Instruction may be enough
   - Creative tasks: Too much structure limits creativity
   - Token budget: Remove least impactful components

### Task 2

1. **Which problem benefited most from self-consistency?**
   - Counter-intuitive problems (bat and ball)
   - Problems where single reasoning often makes errors
   - Complex multi-step calculations

2. **Did all samples agree? What does disagreement tell you?**
   - Simple problems: High agreement
   - Complex problems: More disagreement
   - Disagreement indicates problem difficulty or ambiguity

3. **What is the trade-off of using more samples?**
   - More reliable results
   - Higher cost (N times more expensive)
   - Slower response time

### Task 3

1. **Why is guaranteed JSON output important for applications?**
   - Eliminates parsing errors
   - Reliable integration
   - Predictable structure
   - Production-ready

2. **What happens if you request fields the model cannot infer from the prompt?**
   - May generate placeholder values
   - May omit the field
   - May hallucinate reasonable-sounding values

3. **How would you handle optional vs required fields?**
   - Validate required fields with schema checker
   - Check optional fields separately
   - Provide defaults for missing optional fields

### Task 4

1. **Which improvement (definitions or examples) had a bigger impact?**
   - Usually definitions have the biggest single impact
   - Examples provide additional improvement
   - Both together are best

2. **Are there tickets that all versions got wrong? What makes them difficult?**
   - Borderline cases between categories
   - Context-dependent urgency
   - Ambiguous wording

3. **What would you try next to improve further?**
   - Add more diverse examples
   - Refine definitions based on failures
   - Add Chain-of-Thought reasoning
   - Use self-consistency for difficult cases