# Chapter 6: Prompt Engineering - Hard Tasks (Solutions)

Complete solutions showing improved prompts for all advanced techniques.

## Task 1: Tree-of-Thought - SOLUTION

The key is improving the prompts to generate better options and evaluations.

### Solution: Improved `generate_next_steps` Prompt

In [None]:
def generate_next_steps(current_state, problem, num_options=3):
    """
    SOLUTION: Improved prompt with more specific guidance.
    
    Changes from original:
    - Added reminder about constraints
    - Requested specific format (who/what crosses)
    - Asked for actionable next moves
    """
    prompt = f"""{problem}

Current situation: {current_state}

Remember the constraints:
- Boat holds you + one item only
- Fox eats chicken if left alone
- Chicken eats grain if left alone

What are {num_options} possible next moves? 
For each move, specify exactly who/what crosses the river.

List them:
1."""
    
    output = generate_text(prompt, temperature=0.7, max_tokens=200)
    
    lines = output.strip().split('\n')
    options = []
    
    for line in lines[:num_options]:
        clean = line.strip()
        for prefix in ['1.', '2.', '3.', '4.', '-', '•']:
            if clean.startswith(prefix):
                clean = clean[len(prefix):].strip()
        if clean:
            options.append(clean)
    
    return options[:num_options]

### Solution: Safety-Focused Evaluation

In [None]:
def evaluate_option(option, problem, criterion="progress"):
    """
    SOLUTION: Improved safety criterion prompt.
    """
    if criterion == "safety":
        # Improved prompt focusing on constraint violations
        prompt = f"""{problem}

Proposed move: {option}

On a scale of 0-10, how SAFE is this move?

Consider:
- Will the fox be left alone with the chicken? (unsafe)
- Will the chicken be left alone with the grain? (unsafe)
- Does this respect the boat capacity constraint?

Score: 10 = completely safe, 0 = violates constraints

Score (0-10):"""
    else:
        prompt = f"""{problem}

Proposed move: {option}

On a scale of 0-10, how promising is this move?
Consider: Does it make progress? Does it violate constraints?

Score (0-10):"""
    
    output = generate_text(prompt, temperature=0, max_tokens=50)
    
    match = re.search(r'(\d+)', output)
    if match:
        score = int(match.group(1))
        return min(max(score, 0), 10)
    return 5

### Questions Answered

**1. How many branches were pruned (score < 5)?**

Typically 30-50% of branches. Each pruned branch saves:
- 1 LLM call to generate next steps
- N LLM calls to evaluate those steps (N = branch_factor)
- All recursive exploration from that branch

Pruning is essential for computational efficiency.

**2. Compare "progress" vs "safety" criteria solutions.**

- **Progress criterion**: May try risky moves that advance toward goal
- **Safety criterion**: Prioritizes avoiding constraint violations
- For this problem: Safety criterion often finds correct solution faster because it immediately rejects dangerous moves

**3. Why use separate functions instead of one big script?**

Benefits:
- **Testing**: Can test `generate_next_steps()` and `evaluate_option()` independently
- **Swapping**: Easy to change evaluation criteria by modifying one function
- **Reusability**: Functions work for different problems with similar structure
- **Debugging**: If evaluation is wrong, you know where to look
- **Collaboration**: Different people can work on different functions

## Task 2: Chain Prompting - SOLUTION

Improving Stage 3 to create more detailed response strategies.

### Solution: Improved Stage 3 Prompt

In [None]:
def plan_response_strategy(sentiment_analysis, extracted_info):
    """
    SOLUTION: More detailed strategy planning prompt.
    
    Improvements:
    - Structured into specific sections
    - Asks for concrete actions
    - Considers customer retention
    """
    prompt = f"""Given this sentiment analysis:

{sentiment_analysis}

And these extracted facts:

{extracted_info}

Plan a detailed response strategy:

1. ACKNOWLEDGMENT:
   - Which specific points from the review should be acknowledged?
   - What positive aspects should be reinforced?

2. APOLOGY (if needed):
   - What issues require an apology?
   - Specific or general apology?

3. SOLUTIONS:
   - What concrete actions should be offered?
   - Immediate fixes vs long-term improvements?
   - Compensation needed?

4. NEXT STEPS:
   - What should the customer do next?
   - Who will follow up and when?

5. TONE:
   - Based on sentiment, what tone is appropriate?
   - How formal/casual should the response be?

Strategy:"""
    
    return generate_text(prompt, temperature=0, max_tokens=400)

### Questions Answered

**1. What information from Stage 1 was used in Stage 3?**

Stage 3 uses:
- Issues mentioned (to know what to apologize for)
- Positive aspects (to know what to reinforce)
- Duration and price (to contextualize offers)

Without Stage 1, Stage 3 would miss important details from the original review.

**2. Which stage added the most value?**

Depends on use case:
- **Stage 1**: Prevents information loss from long reviews
- **Stage 2**: Ensures response matches customer mood
- **Stage 3**: Creates structured, actionable plan (often most valuable)
- **Stage 4**: Executes plan with appropriate language

**3. Why 4 functions instead of copying prompts 4 times?**

With functions:
- Change extraction logic once, affects all reviews
- Can cache Stage 1 results and try different Stage 3 strategies
- Can A/B test Stage 4 without re-running analysis
- Can monitor which stage fails and needs improvement
- DRY principle: Don't Repeat Yourself

## Task 3: Output Verification - SOLUTION

Improving verification to catch more issues.

### Solution: More Thorough Verification Prompt

In [None]:
def verify_code(code, requirements):
    """
    SOLUTION: More thorough verification prompt.
    
    Improvements:
    - Specific checklist format
    - Checks multiple aspects
    - Asks for examples of missing items
    """
    prompt = f"""Check if this code meets ALL requirements:

CODE:
```python
{code}
```

REQUIREMENTS:
- Function name must be: {requirements['name']}
- Must have parameters: {', '.join(requirements['parameters'])}
- Must include: {', '.join(requirements['must_have'])}

CHECK EACH ITEM:

1. Function name:
   - Is it exactly '{requirements['name']}'? (Yes/No)

2. Docstring:
   - Is there a docstring explaining what the function does? (Yes/No)
   - Does it explain parameters and return value? (Yes/No)

3. Input validation:
   - Does it check if inputs are valid? (Yes/No)
   - What happens with negative numbers or invalid types? (Describe)

4. Return statement:
   - Is there a return statement? (Yes/No)
   - Does it return the correct type? (Yes/No)

5. Edge cases:
   - What happens if discount is 0? If it's 100? If it's over 100?

SUMMARY:
List all issues found (or write 'No issues'):"""
    
    return generate_text(prompt, temperature=0, max_tokens=300)

### Questions Answered

**1. What issues did verification catch?**

Common issues:
- Missing or incomplete docstrings
- No input validation (accepting negative prices)
- No handling of edge cases (discount > 100%)
- Unclear variable names
- Missing type hints

**2. How would you catch these with code-based checks?**

```python
# Code-based checks
import ast

def verify_code_programmatically(code, requirements):
    tree = ast.parse(code)
    
    # Check function exists
    functions = [n for n in tree.body if isinstance(n, ast.FunctionDef)]
    assert functions[0].name == requirements['name']
    
    # Check docstring
    assert ast.get_docstring(functions[0]) is not None
    
    # Check return statement exists
    has_return = any(isinstance(n, ast.Return) 
                     for n in ast.walk(functions[0]))
    assert has_return
```

**3. Why separate verify and correct functions?**

Separation allows:
- **Iteration**: Can run verify → correct → verify → correct multiple times
- **Logging**: Track what issues were found and how many iterations needed
- **Human-in-loop**: Show verification results to human before correcting
- **Different correctors**: Could use different models or strategies for correction
- **Exit criteria**: Stop when verification passes, regardless of iterations

## Task 4: Self-Reflection - SOLUTION

Better reflection prompts ask more critical questions.

### Solution: Improved Self-Reflection Prompt

In [None]:
def self_reflect(problem, initial_reasoning):
    """
    SOLUTION: More critical reflection questions.
    
    Improvements:
    - Specific calculation checks
    - Alternative perspective questions
    - Assumption challenges
    """
    prompt = f"""{problem}

Here was my initial reasoning:
{initial_reasoning}

Now critically examine this reasoning:

1. CALCULATION CHECK:
   - Are all arithmetic operations correct?
   - Did I account for all costs (revenue - costs)?
   - Double-check: For Strategy A: 10,000 customers × $10 = ?
   - Double-check: For Strategy B: 5,000 customers × $25 = ?

2. COSTS:
   - Did I subtract support costs?
   - Support cost = customers × $2/month
   - Strategy A support: 10,000 × $2 = ?
   - Strategy B support: 5,000 × $2 = ?

3. ASSUMPTIONS:
   - Am I assuming customer counts are guaranteed?
   - What if projections are wrong?
   - What other factors matter (churn, acquisition cost, lifetime value)?

4. ALTERNATIVE VIEWS:
   - Would I reach a different conclusion if I focused on profit margin?
   - What about long-term vs short-term?

5. MISSED FACTORS:
   - What did I not consider at all?

Critical reflection:"""
    
    return generate_text(prompt, temperature=0, max_tokens=400)

### Questions Answered

**1. What did reflection catch that initial reasoning missed?**

Common mistakes caught:
- Forgot to subtract support costs
- Calculated revenue but not profit
- Arithmetic errors in multiplication
- Missing considerations (churn rate, scalability)

**2. Compare with vs without reflection stage.**

**Without reflection**:
- Often makes calculation errors
- May recommend wrong strategy
- Less confident in uncertain areas

**With reflection**:
- Catches mistakes before finalizing
- More thorough analysis
- Explicitly states assumptions
- Higher quality final answer

**3. Why have 4 separate functions? List 3 advantages.**

1. **Debugging**: If final answer is wrong, inspect each stage to find where reasoning failed

2. **Iteration**: Can improve reflection questions without changing initial reasoning or final answer formatting

3. **Measurement**: Can quantify how often reflection changes the answer (initial vs revised reasoning)

4. **Reusability**: Same pipeline works for different decision problems

5. **Caching**: Can cache initial reasoning and try different reflection strategies

6. **Testing**: Each function can be unit tested with known inputs/outputs

## Key Takeaways

1. **Tree-of-Thought**: Specific prompts generate better options; pruning saves costs
2. **Chain Prompting**: Structured stages prevent information loss and enable debugging
3. **Verification**: Detailed checklists catch more issues than general requests
4. **Self-Reflection**: Specific critical questions catch mistakes better than general "check your work"
5. **Modular Design**: Functions enable testing, reuse, iteration, and measurement

## When to Use Each Technique

| Technique | Best For | Cost | Complexity |
|-----------|----------|------|------------|
| **Tree-of-Thought** | Strategic problems with multiple valid paths | High (many LLM calls) | High |
| **Chain Prompting** | Complex multi-step tasks requiring different expertise | Medium (N sequential calls) | Medium |
| **Output Verification** | When correctness is critical (code, calculations) | Medium (generate + verify + correct) | Low |
| **Self-Reflection** | High-stakes decisions, counter-intuitive problems | Medium (2-4x single call) | Low |