# Lab 6 — Agentic Plan-and-Execute

**Module reference:** [Module 6, §6.2](https://github.com/kunalsuri/prompt-engineering-playbook/blob/main/learn/06-agentic-patterns.md) — Plan-and-Execute Architecture

This lab implements a minimal **plan-and-execute agent** — a foundational agentic architecture — 
in pure Python (no agent framework required). You'll compare it against a single-prompt baseline 
to see exactly what decomposition buys you.

**Architecture:**
```
User Task
   │
   ▼
Planner  →  [Step 1, Step 2, ... Step N]
   │
   ▼
Executor (1 LLM call per step)
   │
   ▼
Synthesizer  →  Final Answer
```

**What you'll measure:** response quality, reasoning depth, and token cost vs. single-shot baseline.

---
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kunalsuri/prompt-engineering-playbook/blob/main/learn/labs/lab_06_agentic_plan_execute.ipynb)

## Setup — Install dependencies & set API key

**Recommended free provider:**
- [Google Gemini](https://aistudio.google.com/apikey) — 15 RPM, 1 M tokens/day ⭐
- [Groq](https://console.groq.com) — 30 RPM free tier

> **Cost note:** The plan-and-execute agent makes N+2 LLM calls per task (planner + N executors + synthesizer).
> With a free-tier provider this is fine; with commercial APIs budget ~10× a single-prompt call.
> Start with short tasks (2-3 step plans) on your first run.

In [None]:
%pip install -q openai python-dotenv

import os
# os.environ['GOOGLE_API_KEY'] = 'your-key-here'
# os.environ['GROQ_API_KEY']   = 'your-key-here'
# os.environ['OPENAI_API_KEY'] = 'your-key-here'

key = os.getenv('GOOGLE_API_KEY') or os.getenv('GROQ_API_KEY') or os.getenv('OPENAI_API_KEY')
print(f'✓ API key found ({len(key)} chars)' if key else '⚠ No API key found. Set one above.')

In [None]:
import os
if not os.path.exists('lab_utils.py'):
    base = 'https://raw.githubusercontent.com/kunalsuri/prompt-engineering-playbook/main/learn/labs/'
    !wget -q {base}lab_utils.py
    !wget -q {base}lab_06_agentic_plan_execute.py
    !wget -q {base}requirements.txt
    %pip install -q -r requirements.txt
    print('Lab files downloaded.')
else:
    print('Lab files already present.')

## Run the Full Experiment

Runs 3 benchmark tasks through both the plan-and-execute agent and the single-prompt baseline, then prints a comparison table.

In [None]:
from lab_06_agentic_plan_execute import run_experiment
run_experiment()

## Step-by-Step Walkthrough

Run just one task to see the full agent trace: planner output, each executor step, and final synthesis.

In [None]:
from lab_06_agentic_plan_execute import run_agent, TASKS

task = TASKS[0]  # Change index to try different tasks
print(f'Task: {task}\n{"="*60}')
result = run_agent(task)
print('\n--- FINAL ANSWER ---')
print(result['answer'])
print(f'\nAPI calls made: {result["calls"]}')

## Try Your Own Task

The agent works best on tasks with 3–6 natural sub-steps (research questions, writing tasks, analysis).
Open-ended single questions are served equally well by the baseline.

In [None]:
from lab_06_agentic_plan_execute import run_agent, run_single_prompt_baseline

my_task = (
    "Compare transformer and LSTM architectures for NLP, "
    "explain their key trade-offs, and recommend which to use for "
    "a small-data text classification problem."
)

print('=== PLAN-AND-EXECUTE AGENT ===')
agent_result   = run_agent(my_task)
print(agent_result['answer'])

print('\n=== SINGLE-PROMPT BASELINE ===')
baseline_result = run_single_prompt_baseline(my_task)
print(baseline_result['answer'])

print(f'\nAgent calls: {agent_result["calls"]}  |  Baseline calls: {baseline_result["calls"]}')

## Reflection Questions

1. On which tasks did the agent outperform the baseline? What property made those tasks suited to decomposition?
2. How many API calls does the agent make for a 4-step plan? What is the cost multiplier vs. the baseline?
3. The planner prompt returns numbered steps in natural language. What could go wrong with that parsing, and how would you harden it?
4. The synthesizer receives all step results as a single context window. What happens when a task requires many steps and the context overflows?
5. How would you add a **reflection step** where the agent checks its own plan before executing? (Hint: see Module 6 §6.3.)

**See also:** [Module 6](https://github.com/kunalsuri/prompt-engineering-playbook/blob/main/learn/06-agentic-patterns.md) §6.2–§6.4 and the [failure-gallery](../failure-gallery/) for real-world breakdowns to watch for.