In [9]:
import json
from IPython.display import display, Markdown
import tiktoken
from cadence.agents.reasoning import ReasoningAgent

In [5]:
! python tools/collect_code.py --root src/cadence tests tools scripts --ext .py --out agent_context/code.json
! python tools/collect_code.py --root docs --ext .md .json .mermaid .py .txt --out agent_context/docs.json

Wrote 49 files → agent_context/code.json
Wrote 6 files → agent_context/docs.json


In [4]:
from cadence.context.provider import SnapshotContextProvider
snap = SnapshotContextProvider().get_context(
    *["src/cadence", "tests", "tools", "docs", "scripts"],
    exts=(".py", ".md", ".json", ".mermaid", ".txt"),
)
print("Bytes:", len(snap.encode()))
print("Contains locking.py?", "locking.py" in snap)


Bytes: 216272
Contains locking.py? True


In [3]:
from cadence.agents.reasoning import ReasoningAgent
ag = ReasoningAgent()
assert "src/cadence/dev/locking.py" in ag.messages[-1]["content"]


In [2]:
ag.messages[-1]['content']



In [4]:
ag.reset_context()
ag.messages[-1]['content']



In [6]:

with open("agent_context/code.json", "r") as f:
    codebase = json.load(f)
with open("agent_context/docs.json", "r") as f:
    docs = json.load(f)
with open("agent_context/module_contexts.json", "r") as f:
    contexts = json.load(f)
with open("backlog/backlog.json", "r") as f:
    backlog = json.load(f)

system_prompt = f"""
You are an AI-enhanced co-developer and co-designer along with your human counterpart.  Your role is to optimize the project by mining the best, most creative ideas from him and leveraging each of them into a high-quality, high-powered, autonomous, multi-agent development platform. You balance ambitious ideas with practical progress.
You have access to the docs, a JSON data structure that includes context on each module, the codebase itself, and a backlog of task to-dos.
=================
Project Docs:
{docs}
=================
Module Contexts (agent-related):
{contexts}
=================
Actual Code (agent-related):
{codebase}
=================
Backlog:
{backlog}
"""


enc = tiktoken.get_encoding("o200k_base")
tokens = enc.encode(system_prompt)
len(tokens)

30041

In [7]:
print(system_prompt)


You are an AI-enhanced co-developer and co-designer along with your human counterpart.  Your role is to optimize the project by mining the best, most creative ideas from him and leveraging each of them into a high-quality, high-powered, autonomous, multi-agent development platform. You balance ambitious ideas with practical progress.
You have access to the docs, a JSON data structure that includes context on each module, the codebase itself, and a backlog of task to-dos.
Project Docs:
{'docs/DEV_PROCESS.md': '\n# CADENCE DEVELOPMENT\xa0PROCESS\n\n*Last‑updated: 2025‑06‑20*\n\n## 1\xa0·\xa0Overview\n\nOne canonical document defines *what* must happen and *in what order*. All other docs reference this file to avoid drift.\n\n## 2\xa0·\xa0Core Workflow Phases\n\n| Phase        | Role\xa0(Class)     | Critical\xa0Interfaces                           | Fail\xa0Criterion                 |\n| ------------ | ---------------- | --------------------------------------------- | ------------------

In [8]:
agent = ReasoningAgent()
agent.reset_context(system_prompt)
agent.messages = agent.messages[:1]
agent.messages

def chat(message):
    response = agent.run_interaction(f"{message}")
    display(Markdown(response))

In [None]:
chat("""You are an AI-enhanced co-developer and co-designer along with your human counterpart.  Your role is to optimize the project by mining the best, most creative ideas from him and leveraging each of them into a high-quality, high-powered, autonomous, multi-agent development platform. You balance ambitious ideas with practical progress.
You have access to the docs, a JSON data structure that includes context on each module, the codebase itself, and a backlog of task to-dos.
=================
Project Docs:
{'docs/DEV_PROCESS.md': '\n# CADENCE DEVELOPMENT\xa0PROCESS\n\n*Last‑updated: 2025‑06‑20*\n\n## 1\xa0·\xa0Overview\n\nOne canonical document defines *what* must happen and *in what order*. All other docs reference this file to avoid drift.\n\n## 2\xa0·\xa0Core Workflow Phases\n\n| Phase        | Role\xa0(Class)     | Critical\xa0Interfaces                           | Fail\xa0Criterion                 |\n| ------------ | ---------------- | --------------------------------------------- | ------------------------------ |\n| **Backlog**  | `BacklogManager` | `list_items`, `add_item`, `archive_completed` | Empty backlog blocks pipeline. |\n| **Generate** | `TaskGenerator`  | `generate_tasks`, `overwrite_tasks`           | Ill‑formed tasks.              |\n| **Execute**  | `TaskExecutor`   | `build_patch`, `refine_patch`                 | Patch invalid or cannot apply. |\n| **Test**     | `ShellRunner`    | `run_pytest`, `git_apply`                     | Test suite fails.              |\n| **Review**   | `TaskReviewer`   | `review_patch`                                | Review rejects diff.           |\n| **Commit**   | `ShellRunner`    | `git_commit`                                  | Commit fails or skipped.       |\n| **Record**   | `TaskRecord`     | `save`, `append_iteration`                    | State not persisted.           |\n| **Meta**     | `MetaAgent`      | `analyse`, `alert`                            | Drift >\xa0policy threshold.      |\n\n*Sequence is strict; no phase may be skipped or merged.*\n\n## 3\xa0·\xa0Guard\xa0Rails\n\n* Tests **and** review must pass before commit.\n* Overrides require explicit rationale and are logged.\n* All artefacts (tasks, diffs, logs) are immutable once archived.\n\n## 4\xa0·\xa0Failure\xa0Criteria\n\n* Roles perform multiple responsibilities.\n* Orchestration happens outside `DevOrchestrator`.\n* Silent state transitions or missing logs.\n* Context injection exceeds model window constraints (see DEV\\_AGENTS).\n\n## 5\xa0·\xa0Reference Architecture Diagram\n\nSee `docs/architecture.mmd` for the system flow.\n\n---\n\n*Change‑log:* 2025‑06‑20\xa0— merged DEV\\_WORKFLOW & DEV\\_PROCESS; added strict phase table.\n', 'docs/DEV_AGENTS.md': '\n# DEV AGENTS — Model Assignment, Profiles & Context Rules  \n\n*Last-updated: 2025-06-20*\n\n## 1 · Why This File Exists  \nThis is the **single source of truth** for how Cadence maps logical roles to concrete LLM\nmodels, context windows, review policy and—new in this revision—`AgentProfile`\nobjects that keep those concerns *out of the Python class hierarchy*.\n\n## 2 · Key Concepts  \n\n| Term            | Description                                                                    |\n| --------------- | ------------------------------------------------------------------------------ |\n| **AgentProfile**| Immutable dataclass declaring model, context limit, review policy, etc.        |\n| **Core Agent**  | Final class that *implements* a capability (Reasoning / Execution / Efficiency)|\n| **Persona**     | Thin wrapper that *delegates* to a Core Agent but presents a different prompt. |\n| **Capability**  | A mix-in or helper that adds specific behaviour (e.g. `CodeContextCapability`).|\n\nSeparation of concerns:  \n\n```\n+------------------+     +--------------+     +----------+\n|  Persona (Sidekick)----> Core Agent ----->  AgentProfile\n+------------------+     +--------------+     +----------+\n                   delegates            references\n```\n\n## 3 · Profiles (Canonical)\n\n| Profile Name | Role Tag          | Model          | Context Limit | Review Path                                   |\n| ------------ | ----------------- | -------------- | ------------- | --------------------------------------------- |\n| `REASONING`  | `plan-review`     | `o3-2025-04-16`| 200 K         | Cannot commit code; must review Execution diff|\n| `EXECUTION`  | `implement`       | `gpt-4.1`      | 1 M           | Needs review by Reasoning or Efficiency       |\n| `EFFICIENCY` | `lint-summarise`  | `o4-mini`      | 200 K         | Reviews Execution unless diff is non-code     |\n\nAll profiles live in `cadence/agents/profile.py`.\n\n## 4 · Core Agents (final)\n\n| Class Name                | Uses Profile | Responsibilities                              |\n| ------------------------- | ------------ | --------------------------------------------- |\n| `ReasoningAgent`          | `REASONING`  | Planning, architecture review, policy checks  |\n| `ExecutionAgent`          | `EXECUTION`  | Code generation / refactor                    |\n| `EfficiencyAgent`         | `EFFICIENCY` | Linting, summarisation, static analysis       |\n\nThese classes are **final**—do not subclass for personas.\n\n## 5 · Personas\n\nA persona combines a *profile* + *prompt* + optional extra helpers by **delegating**\nto one of the Core Agents. Example: `Sidekick` (advisor persona) delegates to\n`ReasoningAgent` but overrides only the system prompt.\n\n## 6 · Context Injection Rules (unchanged)\n\n1. ExecutionAgent may receive full codebase when ≤ 1 M tokens; otherwise chunk.  \n2. Reasoning/Efficiency agents limited to ≤ 200 K tokens per call.  \n3. Module summaries (`# MODULE CONTEXT SUMMARY`) are mandatory for every file.  \n\n## 7 · Governance (unchanged)\n\n* All agent calls log: timestamp, model, prompt token count, hash of output.\n* CI step `lint_docs.py` verifies correct model names and context annotations.\n\n---\n\n*Change-log:*  \n2025-06-20 — Introduced AgentProfile pattern; Core Agents made final; personas use delegation.', 'docs/NORTH_STAR.md': '\n# CADENCE PLATFORM — NORTH\xa0STAR BLUEPRINT\n\n*Last‑updated: 2025‑06‑20 (UTC‑05:00)*\n\n## 1\xa0·\xa0Mission\n\n> **Industrialise high‑quality software delivery through an auditable, agent‑human workflow that enables continuous, self‑improving execution across diverse projects—at maximum reliability and minimal risk.**\n\n## 2\xa0·\xa0End‑State Vision\n\n| Axis                          | Description                                                                                 |\n| ----------------------------- | ------------------------------------------------------------------------------------------- |\n| **Unified Orchestrator**      | One controller owns coordination; roles are hot‑swappable (human ⇄ agent) without refactor. |\n| **Immutable Auditability**    | Tamper‑proof logs of every state‑transition and decision.                                   |\n| **Continuous Meta‑Learning**  | Meta‑agent detects bottlenecks and policy drift in real time.                               |\n| **Universal Applicability**   | Same pipeline covers ML, infra, analytics, etc.—no bespoke flows.                           |\n| **Transparent Collaboration** | All rationale, context, and hand‑offs observable by any stakeholder.                        |\n\n## 3\xa0·\xa0Objectives\xa0&\xa0Key\xa0Results (12‑month)\n\n| Objective                   | Key Results                                                   |\n| --------------------------- | ------------------------------------------------------------- |\n| **O1\xa0·\xa0Launch MVP**         | KR1\xa0— autonomous red→green run in *safe\\_inmemory* mode.      |\n| **O2\xa0·\xa0Scale Velocity**     | KR2\xa0— median task cycle ≤\u202f1\xa0day; ≥\u202f5 tasks/week/dev.          |\n| **O3\xa0·\xa0Assure Reliability** | KR3\xa0— 0 regressions\xa0post‑commit (tests gate merges).          |\n| **O4\xa0·\xa0Expand Autonomy**    | KR4\xa0— ≥\u202f3 workflow phases fully autonomous, overrides ≤\u202f10\u202f%. |\n| **O5\xa0·\xa0Meta‑optimise**      | KR5\xa0— monthly analytics on bottlenecks & rollback rate.       |\n\n## 4\xa0·\xa0Guiding Principles\n\n1. **Explicit Contracts** — single‑responsibility roles with strict I/O.\n2. **Audit by Default** — every action is logged, nothing silent.\n3. **Fail‑Fast Feedback** — surface errors immediately; automate retries where safe.\n4. **No Hidden\xa0State** — all state serialised and reconstructable.\n5. **Human‑First Overrides** — allowed, but always logged and reviewed.\n\n## 5\xa0·\xa0Glossary\n\n| Term           | Definition                                              |\n| -------------- | ------------------------------------------------------- |\n| **Task**       | Serializable JSON describing work, history, and status. |\n| **Patch**      | Unified diff representing proposed code change.         |\n| **Agent\xa0Slot** | Named interface that may be filled by human or agent.   |\n| **MetaAgent**  | Oversight component that analyses workflow telemetry.   |\n\n---\n\n*Change‑log:* 2025‑06‑20\xa0— consolidated vision; removed marketing prose.', 'docs/architecture.mermaid': '\nflowchart TD\nsubgraph Orchestrator\nORCH\\[DevOrchestrator]\nend\n\n```\nsubgraph Roles\n    BACK[BacklogManager]\n    GEN[TaskGenerator]\n    EXEC[TaskExecutor]\n    SHELL[ShellRunner]\n    REV[TaskReviewer]\n    RECORD[TaskRecord]\n    META[MetaAgent]\nend\n\nORCH --> BACK\nORCH --> GEN\nORCH --> EXEC\nORCH --> SHELL\nORCH --> REV\nORCH --> RECORD\nORCH --> META\n\nEXEC --> SHELL\nSHELL --> REV\nREV --> ORCH\n\nSHELL --> RECORD\nREV --> RECORD\nMETA --> ORCH\nMETA --> RECORD\n```\n\n%% End of architecture.mmd\n'}
=================
Module Contexts (agent-related):
{'tools/collect_code.py': {'purpose': 'Collects, filters, and serializes code and documentation from the Cadence codebase into a unified format (JSON or similar). Enables downstream agents, planners, or tools to access up-to-date, structured representations of the codebase for analysis, context injection, or system introspection. Central to maintaining an accurate global code snapshot for orchestration and LLM prompting.', 'public_api': ['tools.collect_code.collect', 'tools.collect_code.main', 'tools.collect_code.parse_args'], 'depends_on': ['argparse', 'json', 'pathlib', 'sys'], 'used_by': [], 'direct_imports': ['__future__', 'argparse', 'json', 'pathlib', 'sys'], 'related_schemas': ['CodePayload: {"<relative/path>": "<utf-8 file contents>"}'], 'context_window_expected': 'Very large – output may exceed 100k tokens for full repo snapshots.', 'escalation_review': 'None – utility script; no sensitive logic. Monitor memory usage on enormous repos.', 'filepath': 'tools/collect_code.py'}, 'tools/gen_prompt.py': {'purpose': 'Generates rich, contextually appropriate prompts for LLM or agent tasks by assembling relevant code, documentation, and context. It supports selective inclusion/exclusion of source files, size-based filtering, and automatic Markdown / code fencing so that downstream agents receive a single, copy-pastable prompt block. A small CLI wrapper makes it easy to invoke from shell scripts, CI pipelines, or higher-level orchestrators. Typical use-cases include: (1) creating a snapshot for an expert-review agent, (2) feeding an execution agent the entire codebase plus an explicit NEXT TASK header, or (3) archiving project state for later replay or fine-tuning.', 'public_api': ['tools.gen_prompt._collect_files', 'tools.gen_prompt._build_prompt', 'tools.gen_prompt._parse_args', 'tools.gen_prompt.main'], 'depends_on': [], 'used_by': [], 'direct_imports': ['__future__', 'argparse', 'pathlib', 'sys', 'textwrap'], 'related_schemas': [], 'context_window_expected': 'Potentially very large (docs + full source snapshot); downstream consumers should be prepared for 50-100k+ tokens.', 'escalation_review': '', 'filepath': 'tools/gen_prompt.py'}, 'tools/module_contexts.py': {'purpose': 'Extracts, maintains, and synchronizes structured metadata (“context”) for every module in the codebase, including public API, dependencies, and purpose. Provides the canonical source for module-level documentation and graph relationships, facilitating dependency analysis, agent task planning, and robust context slicing. Ensures context is current and consistent across JSON and other representations.', 'public_api': ['tools.module_contexts.extract_and_strip_shebang_and_futures', 'tools.module_contexts.find_existing_context', 'tools.module_contexts.get_module_import_path', 'tools.module_contexts.load_all_contexts', 'tools.module_contexts.parse_module', 'tools.module_contexts.print_context', 'tools.module_contexts.relpath', 'tools.module_contexts.render_context_block', 'tools.module_contexts.render_pretty_list', 'tools.module_contexts.scan_all_internal_modules', 'tools.module_contexts.scan_python_modules', 'tools.module_contexts.strip_duplicate_headers_at_top', 'tools.module_contexts.sync_contexts', 'tools.module_contexts.write_all_contexts'], 'depends_on': [], 'used_by': ['tools.collect_code', 'tools.gen_prompt'], 'direct_imports': ['ast', 'json', 'os', 're', 'sys'], 'related_schemas': ['module_contexts.json'], 'context_window_expected': 'Medium – function-level inspection to full-file parsing (typically <2k LOC).', 'escalation_review': 'None', 'filepath': 'tools/module_contexts.py'}, 'cadence/__init__.py': {'purpose': 'Declares the Cadence root package for Python module discovery. Contains no implementation but is necessary for package structure and internal import resolution.', 'public_api': [], 'depends_on': [], 'used_by': ['cadence/dev/command_center.py', 'cadence/agents/base.py', 'examples/test_001/mvp_loop.py'], 'direct_imports': [], 'related_schemas': [], 'context_window_expected': 'Tiny (<5 lines); only package declaration.', 'escalation_review': 'None – purely structural, no executable logic.', 'filepath': 'cadence/__init__.py'}, 'cadence/dev/record.py': {'purpose': 'Implements persistent, immutable task and state recording for all development workflow phases. Tracks the complete lifecycle of each task (creation, assignment, iterations, review outcomes, and archival), ensuring auditability and full historical replay. Guarantees that every action, override, or failure is logged for compliance and analytics.', 'public_api': ['cadence.dev.record.TaskRecord', 'cadence.dev.record.TaskRecordError'], 'depends_on': [], 'used_by': ['cadence.dev.orchestrator'], 'direct_imports': ['copy', 'datetime', 'json', 'os', 'threading', 'typing'], 'related_schemas': [], 'context_window_expected': '≈120 source lines; safe to provide in full within a 4-8 k token prompt.', 'escalation_review': 'Modifying persistence logic or on-disk format requires compliance/audit review to avoid breaking historical replay guarantees.', 'filepath': 'cadence/dev/record.py'}, 'cadence/dev/shell.py': {'purpose': 'Provides a safe abstraction for executing shell commands, running local tests, and managing file operations within the orchestrated workflow. Encapsulates error handling and isolation of subprocesses, allowing higher-level roles (e.g., test runners, commit hooks) to interact with the environment in a controlled and auditable manner.', 'public_api': ['cadence.dev.shell.ShellCommandError', 'cadence.dev.shell.ShellRunner'], 'depends_on': [], 'used_by': ['cadence.dev.orchestrator', 'examples.test_001.mvp_loop'], 'direct_imports': ['os', 'subprocess', 'tempfile', 'typing'], 'related_schemas': [], 'context_window_expected': '≈250 tokens (short, self-contained utility module)', 'escalation_review': '', 'filepath': 'cadence/dev/shell.py'}, 'cadence/dev/executor.py': {'purpose': 'Core implementation of the “TaskExecutor” role: receives a precise task description and produces a proposed code change as a unified diff (patch). Encapsulates patch construction, validation, and integration, acting as the “implementation” phase of the agentic loop. Ensures no code is modified directly—only via auditable diffs.', 'public_api': ['cadence.dev.executor.PatchBuildError', 'cadence.dev.executor.TaskExecutor'], 'depends_on': [], 'used_by': ['cadence.dev.orchestrator', 'examples.test_001.mvp_loop'], 'direct_imports': ['difflib', 'os', 'tempfile', 'typing'], 'related_schemas': ['task.json (must contain `diff` with `file`, `before`, `after`)', 'unified_diff (GNU patch-compatible format)'], 'context_window_expected': '~150 lines (moderate)', 'escalation_review': 'Required for any change that alters patch formatting logic or validation heuristics, because downstream components (ShellRunner, Reviewer, Orchestrator) rely on strict diff semantics.', 'filepath': 'cadence/dev/executor.py'}, 'cadence/dev/command_center.py': {'purpose': 'Aggregates interactive interfaces (CLI, UI, dashboards) for monitoring, guiding, or directly managing the development loop. Orchestrates real-time views of agent activity, backlog, and state, acting as a bridge between human oversight and autonomous agent execution.', 'public_api': ['cadence.dev.command_center.CONFIG', 'cadence.dev.command_center.orch'], 'depends_on': ['cadence.dev.orchestrator'], 'used_by': [], 'direct_imports': ['streamlit', 'cadence.dev.orchestrator', 'pandas'], 'related_schemas': [], 'context_window_expected': 'small', 'escalation_review': '', 'filepath': 'cadence/dev/command_center.py'}, 'cadence/dev/orchestrator.py': {'purpose': 'Implements the top-level DevOrchestrator – the single source of truth for sequencing every phase of the Cadence development loop (backlog → generation → patch execution → review → test → commit → archival). Exposes both programmatic and CLI entry-points so humans or agents can drive the end-to-end workflow. Persists all intermediate artefacts (patches, reviews, test results, git SHAs) through TaskRecord for full auditability.', 'public_api': ['cadence.dev.orchestrator.DevOrchestrator'], 'depends_on': ['cadence.dev.backlog.BacklogManager', 'cadence.dev.generator.TaskGenerator', 'cadence.dev.executor.TaskExecutor', 'cadence.dev.executor.PatchBuildError', 'cadence.dev.reviewer.TaskReviewer', 'cadence.dev.shell.ShellRunner', 'cadence.dev.shell.ShellCommandError', 'cadence.dev.record.TaskRecord', 'cadence.dev.record.TaskRecordError'], 'used_by': ['cadence.dev.command_center'], 'direct_imports': ['argparse', 'sys', 'tabulate', '.backlog', '.generator', '.executor', '.reviewer', '.shell', '.record'], 'related_schemas': ['backlog_file_schema', 'task_record_schema', 'config_schema'], 'context_window_expected': '≈180-220 LOC (medium)', 'escalation_review': 'None – orchestrator logic is straightforward, but security/guardrail audit recommended before production use.', 'filepath': 'cadence/dev/orchestrator.py'}, 'cadence/dev/reviewer.py': {'purpose': 'Implements the “TaskReviewer” role: given a code diff, performs automated or agent-based review for correctness, completeness, style, and policy adherence. Issues pass/fail outcomes and review rationale, blocking or allowing downstream commit/merge actions. Central for maintaining code quality and enforcing multi-agent approval. Provides a pluggable ruleset architecture (built-ins + external JSON) so future LLM/human or policy engines can extend the review surface without touching core logic.', 'public_api': ['cadence.dev.reviewer.PatchReviewError', 'cadence.dev.reviewer.TaskReviewer'], 'depends_on': [], 'used_by': ['cadence.dev.orchestrator'], 'direct_imports': ['json', 'os', 'typing'], 'related_schemas': ['External ruleset JSON (list[dict{type:str, pattern:str, message?:str}])'], 'context_window_expected': 'Small (<250 LOC) ‑ entire file plus ruleset easily fits typical LLM context.', 'escalation_review': 'Acts as a quality/security gatekeeper; modifications require senior review.', 'filepath': 'cadence/dev/reviewer.py'}, 'cadence/dev/__init__.py': {'purpose': 'Declares the Cadence dev subpackage for structured module resolution. Contains no runtime implementation but is required so that `import cadence.dev.*` works correctly across the code-base. Although empty, it implicitly enables every module inside `cadence/dev/` (e.g., BacklogManager, TaskExecutor, DevOrchestrator) to be imported as part of the Cadence package hierarchy.', 'public_api': [], 'depends_on': [], 'used_by': ['cadence.dev.backlog', 'cadence.dev.command_center', 'cadence.dev.executor', 'cadence.dev.generator', 'cadence.dev.orchestrator', 'cadence.dev.record', 'cadence.dev.reviewer', 'cadence.dev.shell', 'examples.test_001.mvp_loop'], 'direct_imports': [], 'related_schemas': [], 'context_window_expected': '≤10 lines (file is intentionally empty).', 'escalation_review': 'None—trivial placeholder for package initialization.', 'filepath': 'cadence/dev/__init__.py'}, 'cadence/dev/generator.py': {'purpose': 'Implements the “TaskGenerator” role: proposes, structures, and validates new micro-tasks (from LLMs, rules, or human suggestions). Guarantees that every task entering the backlog is well-formed, actionable, and fully specified, maximizing clarity and minimizing ambiguity for execution agents. Also supports template-driven generation and CLI-style ad-hoc creation for demos or manual workflows.', 'public_api': ['cadence.dev.generator.TaskGenerator', 'cadence.dev.generator.TaskGenerator.__init__', 'cadence.dev.generator.TaskGenerator.generate_tasks', 'cadence.dev.generator.TaskGenerator.overwrite_tasks', 'cadence.dev.generator.TaskTemplateError', 'cadence.dev.generator.REQUIRED_FIELDS'], 'depends_on': [], 'used_by': ['cadence.dev.orchestrator'], 'direct_imports': ['datetime', 'json', 'os', 'typing', 'uuid'], 'related_schemas': ['cadence.dev.backlog.Task'], 'context_window_expected': '≈200-250 source lines', 'escalation_review': '', 'filepath': 'cadence/dev/generator.py'}, 'cadence/dev/backlog.py': {'purpose': 'Implements the “BacklogManager” role: maintains the prioritized queue of micro-tasks, stories, and epics for development. Provides CRUD operations, status transitions, archival, and JSON persistence. Guarantees that every task object respects a minimal schema and that all state-changes are safely written to disk for downstream agents, UIs, and analytics. Serves as the single source of truth for work-in-progress within the Cadence development loop.', 'public_api': ['cadence.dev.backlog.BacklogEmptyError', 'cadence.dev.backlog.BacklogManager', 'cadence.dev.backlog.TaskNotFoundError', 'cadence.dev.backlog.TaskStructureError'], 'depends_on': [], 'used_by': ['cadence.dev.orchestrator'], 'direct_imports': ['copy', 'datetime', 'json', 'os', 'tabulate', 'typing', 'uuid'], 'related_schemas': [{'BacklogTask': {'id': 'str (UUID4, auto-generated if absent)', 'title': 'str (human-readable summary)', 'type': 'str (one of "micro", "story", "epic"; default "micro")', 'status': 'str (one of "open", "in_progress", "done", "archived"; default "open")', 'created_at': 'str (UTC ISO-8601; auto-generated if absent)', '...': 'additional arbitrary fields permitted'}}], 'context_window_expected': '≈120 LOC – comfortably fits within standard 8-16 k token LLM windows.', 'escalation_review': 'None – module is self-contained and side-effect-limited (file IO only).', 'filepath': 'cadence/dev/backlog.py'}, 'cadence/agents/base.py': {'purpose': 'Core superclass that now consumes an AgentProfile instead of raw model strings. Handles messaging stack, LLM calls, context gathering, and persistence. No subclass should hard-code models any more.', 'public_api': ['cadence.agents.base.BaseAgent'], 'depends_on': ['cadence.agents.profile', 'cadence.llm.client', 'cadence.context.provider'], 'used_by': ['cadence.agents.reasoning', 'cadence.agents.execution', 'cadence.agents.efficiency'], 'direct_imports': ['typing', 'pathlib', 'cadence.llm.client', 'cadence.context.provider', 'cadence.agents.profile'], 'context_window_expected': '≈100 LOC', 'filepath': 'cadence/agents/base.py'}, 'cadence/agents/efficiency.py': {'purpose': 'Final EfficiencyAgent using EFFICIENCY_PROFILE; low-cost linting & summarisation.', 'public_api': ['cadence.agents.efficiency.EfficiencyAgent'], 'depends_on': ['cadence.agents.base', 'cadence.agents.profile'], 'filepath': 'cadence/agents/efficiency.py'}, 'cadence/agents/reasoning.py': {'purpose': 'Final ReasoningAgent using REASONING_PROFILE; injects code/doc snapshot on reset.', 'public_api': ['cadence.agents.reasoning.ReasoningAgent'], 'depends_on': ['cadence.agents.base', 'cadence.agents.profile'], 'direct_imports': ['cadence.agents.base', 'cadence.agents.profile'], 'filepath': 'cadence/agents/reasoning.py'}, 'cadence/agents/__init__.py': {'purpose': 'Declares the Cadence agents subpackage for structured module resolution. Contains no implementation but is required so that all sub-modules (base, execution, efficiency, reasoning, registry) can be imported with the fully-qualified "cadence.agents.*" path.', 'public_api': [], 'depends_on': [], 'used_by': ['cadence.agents.base', 'cadence.agents.efficiency', 'cadence.agents.execution', 'cadence.agents.reasoning', 'cadence.agents.registry'], 'direct_imports': [], 'related_schemas': [], 'context_window_expected': 'negligible (file is empty)', 'escalation_review': 'none—simple package marker', 'filepath': 'cadence/agents/__init__.py'}, 'cadence/agents/registry.py': {'purpose': 'Central registry now exposes both Core Agents and AgentProfiles for external callers.', 'public_api': ['cadence.agents.registry.get_agent', 'cadence.agents.registry.get_profile'], 'depends_on': ['cadence.agents.reasoning', 'cadence.agents.execution', 'cadence.agents.efficiency', 'cadence.agents.profile'], 'filepath': 'cadence/agents/registry.py'}, 'cadence/agents/execution.py': {'purpose': 'Final ExecutionAgent using EXECUTION_PROFILE; dedicated to code generation/refactor tasks.', 'public_api': ['cadence.agents.execution.ExecutionAgent'], 'depends_on': ['cadence.agents.base', 'cadence.agents.profile'], 'filepath': 'cadence/agents/execution.py'}, 'cadence/llm/client.py': {'purpose': 'Abstracts and manages all LLM client connections, credentials, and request handling for Cadence agents. Provides unified access to various model endpoints, context window management, and API configuration, allowing agents to query or delegate to LLMs without bespoke integration code.', 'public_api': ['cadence.llm.client.LLMClient', 'cadence.llm.client.get_default_client', 'cadence.llm.client.get_env'], 'depends_on': [], 'used_by': ['cadence.agents.base'], 'direct_imports': ['asyncio', 'dotenv', 'logging', 'openai', 'os', 'typing'], 'related_schemas': [], 'context_window_expected': 'Dependent on selected OpenAI model (defaults to ~8K tokens for GPT-4-class models). Callers should chunk/summarize if larger context is required.', 'escalation_review': '', 'filepath': 'cadence/llm/client.py'}, 'cadence/llm/__init__.py': {'purpose': 'Declares the Cadence LLM subpackage for structured module resolution. Contains no implementation but ensures correct imports for LLM client components. Serves as the namespace anchor for any sub-modules under `cadence.llm` so that external code can reliably import `cadence.llm.*` symbols.', 'public_api': [], 'depends_on': [], 'used_by': ['cadence.llm.client', 'cadence.agents.base'], 'direct_imports': [], 'related_schemas': [], 'context_window_expected': '≤5 LOC (trivial stub)', 'escalation_review': 'None – inert stub file', 'filepath': 'cadence/llm/__init__.py'}, 'cadence/agents/profile.py': {'purpose': 'Introduces immutable AgentProfile dataclass holding model, context-limit, review policy, etc. Decouples runtime behaviour from static configuration and is the canonical source for profile data used by Core Agents and governance utilities.', 'public_api': ['cadence.agents.profile.AgentProfile', 'cadence.agents.profile.REASONING_PROFILE', 'cadence.agents.profile.EXECUTION_PROFILE', 'cadence.agents.profile.EFFICIENCY_PROFILE', 'cadence.agents.profile.BUILTIN_PROFILES'], 'depends_on': [], 'used_by': ['cadence.agents.base', 'cadence.agents.reasoning', 'cadence.agents.execution', 'cadence.agents.efficiency', 'cadence.agents.sidekick', 'cadence.agents.registry'], 'direct_imports': ['dataclasses', 'typing'], 'context_window_expected': 'trivial (<40 LOC)', 'filepath': 'cadence/agents/profile.py'}, 'cadence/agents/sidekick.py': {'purpose': 'Persona wrapper that delegates to ReasoningAgent rather than subclassing it, maintaining separation between capability and persona.', 'public_api': ['cadence.agents.sidekick.Sidekick'], 'depends_on': ['cadence.agents.reasoning', 'cadence.agents.profile'], 'filepath': 'cadence/agents/sidekick.py'}}
=================
Actual Code (agent-related):
\{'src/cadence/__init__.py': '\n', 'src/cadence/context/provider.py': '# src/cadence/context/provider.py\nimport subprocess, sys, json\nfrom abc import ABC, abstractmethod\nfrom pathlib import Path\nclass ContextProvider(ABC):\n    @abstractmethod\n    def get_context(self, *roots: Path, exts=(".py", ".md"), max_bytes=50_000) -> str: ...\nclass SnapshotContextProvider(ContextProvider):\n    def get_context(self, *roots, exts=(".py", ".md"), max_bytes=50_000, out="-") -> str:\n        args = [sys.executable, "tools/collect_code.py"]\n        for r in roots: args += ["--root", str(r)]\n        for e in exts:  args += ["--ext", e]\n        args += ["--max-bytes", str(max_bytes), "--out", out]\n        return subprocess.run(args, capture_output=True, text=True, check=True).stdout\n', 'src/cadence/context/__init__.py': '', 'src/cadence/dev/record.py': '\n# src/cadence/dev/record.py\n\n"""\nCadence TaskRecord\n-----------------\nSingle Responsibility: Append/persist task processtates for full audit/repro. \nWrite/read only here. \nFile format: JSON list of dicts, one per unique task_id; each has history of states/iterations.\n"""\n\nimport os\nimport json\nimport threading\nimport copy\nfrom typing import List, Dict, Optional\nfrom datetime import datetime, UTC\n\nclass TaskRecordError(Exception):\n    """Custom error for task record issues."""\n    pass\n\nclass TaskRecord:\n    def __init__(self, record_file: str):\n        self.record_file = record_file\n        self._lock = threading.Lock()\n        # Always keep in-memory up to date with file\n        self._records: List[Dict] = []\n        self._idmap: Dict[str, Dict] = {}  # task_id -> record dict\n        self._load()\n\n    def save(self, task: dict, state: str, extra: dict = None) -> None:\n        """\n        Records a snapshot for given task_id and state (e.g. "patch_proposed", "patch_reviewed", etc).\n        Extra provides arbitrary info (patch, review, commit_sha, test_results, etc).\n        If task does not exist (task_id is new), creates new record.\n        """\n        with self._lock:\n            record = self._find_or_create_record(task)\n            snapshot = {\n                "state": state,\n                "timestamp": self._now(),\n                "task": copy.deepcopy(task),\n                "extra": copy.deepcopy(extra) if extra else {},\n            }\n            record["history"].append(snapshot)\n            self._sync_idmap()\n            self._persist()\n\n    def load(self) -> List[Dict]:\n        """\n        Returns a (deep) copy of all records (full history).\n        """\n        with self._lock:\n            return copy.deepcopy(self._records)\n\n    def append_iteration(self, task_id: str, iteration: dict) -> None:\n        """\n        Appends a new step/edit/review (dict) to a task\'s record—usually finer-grained than save().\n        """\n        with self._lock:\n            record = self._find_record(task_id)\n            if record is None:\n                raise TaskRecordError(f"No record for task id={task_id}")\n            iter_snapshot = {\n                "timestamp": self._now(),\n                **copy.deepcopy(iteration)\n            }\n            record.setdefault("iterations", []).append(iter_snapshot)\n            self._persist()\n\n    # ========== Internal Below ==========\n\n    def _find_or_create_record(self, task: dict) -> Dict:\n        """\n        Finds or creates a new record for given task.\n        """\n        tid = self._get_task_id(task)\n        rec = self._idmap.get(tid)\n        if rec is None:\n            rec = {\n                "task_id": tid,\n                "created_at": self._now(),\n                "history": [],\n                "iterations": []\n            }\n            self._records.append(rec)\n            self._idmap[tid] = rec\n        return rec\n\n    def _find_record(self, task_id: str) -> Optional[Dict]:\n        return self._idmap.get(task_id)\n\n    def _get_task_id(self, task: dict) -> str:\n        tid = task.get("id")\n        if not tid:\n            raise TaskRecordError("Task dict missing \'id\'. Cannot save record.")\n        return tid\n\n    def _persist(self) -> None:\n        """\n        Writes in-memory records to disk, atomic/overwrite (JSON).\n        """\n        tmp = self.record_file + ".tmp"\n        with open(tmp, "w", encoding="utf8") as f:\n            json.dump(self._records, f, indent=2)\n        os.replace(tmp, self.record_file)\n\n    def _load(self) -> None:\n        """Loads file into memory iff exists. Otherwise, empty record."""\n        if not os.path.exists(self.record_file):\n            self._records = []\n            self._idmap = {}\n            return\n        with open(self.record_file, "r", encoding="utf8") as f:\n            self._records = json.load(f)\n        self._sync_idmap()\n\n    def _sync_idmap(self):\n        """Ensures self._idmap is up to date with self._records."""\n        self._idmap = {rec["task_id"]: rec for rec in self._records}\n\n    def _now(self):\n        return datetime.now(UTC).isoformat()\n\n# Example CLI/sanity use (not for prod)\nif __name__ == "__main__":\n    rec = TaskRecord("dev_record.json")\n    tid = "a1b2c3"\n    # Save new record\n    task = {"id": tid, "title": "Do something", "status": "open"}\n    rec.save(task, state="patch_proposed", extra={"patch": "--- foo"})\n    # Append an iteration (e.g., reviewer comment)\n    rec.append_iteration(tid, {"reviewer": "alice", "opinion": "looks good"})\n    # Print record for tid\n    print(json.dumps(rec.load(), indent=2))', 'src/cadence/dev/shell.py': '# src/cadence/dev/shell.py\n"""\nCadence ShellRunner\n-------------------\nSingle Responsibility\n    • Provide *safe*, auditable wrappers around git / pytest / shell\n      commands.  \n    • **NEW** – Support reverse-applying a patch (rollback) via the same\n      entry-point by passing `reverse=True`.\n\nNever creates code or diffs; only executes them.\n"""\n\nfrom __future__ import annotations\n\nimport os\nimport subprocess\nimport tempfile\nfrom typing import Optional, Dict\n\n\nclass ShellCommandError(Exception):\n    """Raised when a shell/git/pytest command fails."""\n    pass\n\n\nclass ShellRunner:\n    def __init__(self, repo_dir: str = "."):\n        self.repo_dir = os.path.abspath(repo_dir)\n        if not os.path.isdir(self.repo_dir):\n            raise ValueError(f"repo_dir \'{self.repo_dir}\' does not exist or is not a directory.")\n\n    # ------------------------------------------------------------------ #\n    # Git patch helpers\n    # ------------------------------------------------------------------ #\n    def git_apply(self, patch: str, *, reverse: bool = False) -> bool:\n        """\n        Apply a unified diff to the working tree.\n\n        Args:\n            patch:   Unified diff string (UTF-8).\n            reverse: If True, apply the patch in *reverse* (equivalent to\n                     `git apply -R`) – used for automatic rollback.\n\n        Returns:\n            True on success.\n\n        Raises:\n            ShellCommandError on any git failure or invalid patch input.\n        """\n        if not patch or not isinstance(patch, str):\n            raise ShellCommandError("No patch supplied to apply.")\n\n        # Write patch to a temporary file so git can consume it.\n        with tempfile.NamedTemporaryFile(mode="w+", suffix=".patch", delete=False) as tf:\n            tf.write(patch)\n            tf.flush()\n            tf_path = tf.name\n\n        try:\n            cmd = ["git", "apply"]\n            if reverse:\n                cmd.append("-R")\n            cmd.append(tf_path)\n\n            result = subprocess.run(\n                cmd,\n                cwd=self.repo_dir,\n                stdout=subprocess.PIPE,\n                stderr=subprocess.PIPE,\n                encoding="utf-8",\n                check=False,\n            )\n            if result.returncode != 0:\n                direction = "reverse " if reverse else ""\n                raise ShellCommandError(\n                    f"git {direction}apply failed: {result.stderr.strip() or result.stdout.strip()}"\n                )\n            return True\n        finally:\n            os.remove(tf_path)\n\n    # ------------------------------------------------------------------ #\n    # Testing helpers\n    # ------------------------------------------------------------------ #\n    def run_pytest(self, test_path: Optional[str] = None) -> Dict:\n        """\n        Run pytest on the given path (default: ./tests).\n\n        Returns:\n            {\'success\': bool, \'output\': str}\n        """\n        path = test_path or os.path.join(self.repo_dir, "tests")\n        if not os.path.exists(path):\n            raise ShellCommandError(f"Tests path \'{path}\' does not exist.")\n\n        try:\n            result = subprocess.run(\n                ["pytest", "-q", path],\n                cwd=self.repo_dir,\n                stdout=subprocess.PIPE,\n                stderr=subprocess.PIPE,\n                encoding="utf-8",\n                check=False,\n            )\n            passed = result.returncode == 0\n            output = (result.stdout or "") + "\\n" + (result.stderr or "")\n            return {"success": passed, "output": output.strip()}\n        except FileNotFoundError as e:\n            raise ShellCommandError("pytest is not installed or not in PATH.") from e\n\n    # ------------------------------------------------------------------ #\n    # Commit helper\n    # ------------------------------------------------------------------ #\n    def git_commit(self, message: str) -> str:\n        """\n        Commit **all** staged/changed files with the given commit message.\n\n        Returns:\n            The new commit SHA string.\n\n        Raises:\n            ShellCommandError on failure (e.g., nothing to commit).\n        """\n        # Stage all changes (MVP behaviour)\n        result = subprocess.run(\n            ["git", "add", "-A"],\n            cwd=self.repo_dir,\n            stdout=subprocess.PIPE,\n            stderr=subprocess.PIPE,\n            encoding="utf-8",\n            check=False,\n        )\n        if result.returncode != 0:\n            raise ShellCommandError(f"git add failed: {result.stderr.strip()}")\n\n        # Commit\n        result = subprocess.run(\n            ["git", "commit", "-m", message],\n            cwd=self.repo_dir,\n            stdout=subprocess.PIPE,\n            stderr=subprocess.PIPE,\n            encoding="utf-8",\n            check=False,\n        )\n        if result.returncode != 0:\n            if "nothing to commit" in (result.stderr + result.stdout).lower():\n                raise ShellCommandError("git commit: nothing to commit.")\n            raise ShellCommandError(f"git commit failed: {result.stderr.strip()}")\n\n        # Retrieve last commit SHA\n        result = subprocess.run(\n            ["git", "rev-parse", "HEAD"],\n            cwd=self.repo_dir,\n            stdout=subprocess.PIPE,\n            stderr=subprocess.PIPE,\n            encoding="utf-8",\n            check=True,\n        )\n        return result.stdout.strip()\n\n\n# --------------------------------------------------------------------------- #\n# Dev-only sanity CLI\n# --------------------------------------------------------------------------- #\nif __name__ == "__main__":\n    runner = ShellRunner(".")\n    # Example usage:\n    # runner.git_apply(patch_string)\n    # runner.git_apply(patch_string, reverse=True)  # rollback', 'src/cadence/dev/executor.py': '\n# src/cadence/dev/executor.py\n\n"""\nCadence TaskExecutor\n-------------------\nSingle Responsibility: Given a task, produce a code/text patch (unapplied). Never applies, commits, or tests.\nExtensible: can be subclassed or composed with LLM/crowd agents for codegen/refinement.\n"""\n\nimport os\nimport difflib\nimport tempfile\nfrom typing import Dict, Optional, List\n\nclass PatchBuildError(Exception):\n    """Raised if patch/diff cannot be produced."""\n    pass\n\n\nclass TaskExecutor:\n    def __init__(self, src_root: str):\n        if not os.path.isdir(src_root):\n            raise ValueError(f"src_root \'{src_root}\' is not a directory.")\n        self.src_root = os.path.abspath(src_root)\n\n    def build_patch(self, task: Dict) -> str:\n        """\n        Given selected task (dict), produce diff/patch string.\n        - For simplicity, expects \'file\', \'before\', \'after\' in task[\'diff\'].\n        - Never applies patch.\n        - Returns unified diff as UTF-8 str.\n        """\n        try:\n            diff_info = task.get(\'diff\')\n            if not diff_info:\n                raise PatchBuildError("Task missing \'diff\' key. Task must include code diff directives.")\n\n            file_rel = diff_info.get(\'file\')\n            before = diff_info.get(\'before\')\n            after = diff_info.get(\'after\')\n            if not file_rel or before is None or after is None:\n                raise PatchBuildError("Diff dict must have \'file\', \'before\', and \'after\' (as strings).")\n\n            file_abs = os.path.join(self.src_root, file_rel)\n            # Ensure trailing newline for correct diff context\n            if not before.endswith("\\n"):\n                before += "\\n"\n            if not after.endswith("\\n"):\n                after += "\\n"\n\n            before_lines = before.splitlines(keepends=True)\n            after_lines  = after.splitlines(keepends=True)\n\n            diff_lines = list(difflib.unified_diff(\n                before_lines,\n                after_lines,\n                fromfile=f"a/{file_rel}",\n                tofile=f"b/{file_rel}",\n                # use default lineterm=\'\\n\' for consistent newlines\n            ))\n            patch = "".join(diff_lines)\n            if not patch.strip():\n                raise PatchBuildError("Generated patch is empty.")\n\n            # Logically, do NOT write/apply - that\'s ShellRunner\'s responsibility.\n            return patch\n        except Exception as e:\n            raise PatchBuildError(f"Failed to build patch: {e}")\n\n    def refine_patch(self, task: Dict, feedback: str) -> str:\n        """\n        Propose a revised patch, given task and feedback (from reviewer/human).\n        Here, we\'re stubbed for simplicity - can be extended to call LLM/code agent.\n        - Returns new diff/patch string.\n        """\n        # In a future agentic system, call out to LLM or microservice here with context.\n        # Example hook: (pseudo) agent.generate_patch(task, feedback)\n        # For now, just raise if not implemented.\n        raise NotImplementedError("Patch refinement requires agent integration or human intervention.")\n\n    # Optionally: you can add utility for validating a patch (not apply!).\n    def validate_patch(self, patch: str) -> bool:\n        """\n        Returns True if patch is nontrivial and properly formatted.\n        (Simple heuristic only; actual application/testing is ShellRunner\'s job.)\n        """\n        return bool(patch and patch.startswith(\'---\'))\n\n# Example CLI/dev usage\nif __name__ == "__main__":\n    # Example simulated task:\n    executor = TaskExecutor(src_root="cadence")\n    sample_task = {\n        "id": "testid",\n        "diff": {\n            "file": "sample_module.py",\n            "before": "# Old code\\nprint(\'Hello\')\\n",\n            "after":  "# Old code\\nprint(\'Hello, world!\')\\n"\n        }\n    }\n    patch = executor.build_patch(sample_task)\n    print("--- PATCH OUTPUT ---")\n    print(patch)', 'src/cadence/dev/command_center.py': '\n# src/cadence/dev/command_center.py\n\nimport streamlit as st\n\n# You may need to adjust the import path according to your setup\nfrom src.cadence.dev.orchestrator import DevOrchestrator\n\n# ---- Basic Config (map to your dev environment) ----\nCONFIG = dict(\n    backlog_path="dev_backlog.json",\n    template_file="dev_templates.json",\n    src_root="cadence",\n    ruleset_file=None,\n    repo_dir=".",\n    record_file="dev_record.json"\n)\norch = DevOrchestrator(CONFIG)\n\n# ---- Session State Initialization ----\nif "selected_task_id" not in st.session_state:\n    st.session_state["selected_task_id"] = None\nif "phase" not in st.session_state:\n    st.session_state["phase"] = "Backlog"\n\n# ---- Sidebar: Phase Navigation ----\nst.sidebar.title("Cadence Dev Center")\nphase = st.sidebar.radio(\n    "Workflow phase",\n    ["Backlog", "Task Detail", "Patch Review", "Run Test", "Archive"],\n    index=["Backlog", "Task Detail", "Patch Review", "Run Test", "Archive"].index(st.session_state["phase"])\n)\nst.session_state["phase"] = phase\n\n# ---- Main: Backlog View ----\nif phase == "Backlog":\n    st.title("Task Backlog")\n    open_tasks = orch.backlog.list_items(status="open")\n    if not open_tasks:\n        st.info("No open tasks! Add tasks via CLI/Notebook.")\n    else:\n        import pandas as pd\n        df = pd.DataFrame(open_tasks)\n        st.dataframe(df[["id", "title", "type", "status", "created_at"]])\n        selected = st.selectbox(\n            "Select a task to work on",\n            options=[t["id"] for t in open_tasks],\n            format_func=lambda tid: f\'{tid[:8]}: {next(t["title"] for t in open_tasks if t["id"] == tid)}\'\n        )\n        if st.button("Continue to task detail"):\n            st.session_state["selected_task_id"] = selected\n            st.session_state["phase"] = "Task Detail"\n            st.experimental_rerun()\n\n# ---- Task Detail View ----\nelif phase == "Task Detail":\n    st.title("Task Details")\n    task_id = st.session_state.get("selected_task_id")\n    if not task_id:\n        st.warning("No task selected.")\n        st.stop()\n    task = orch.backlog.get_item(task_id)\n    st.markdown(f"**Title:** {task[\'title\']}\\n\\n**Type:** {task[\'type\']}\\n\\n**Status:** {task[\'status\']}\\n\\n**Created:** {task[\'created_at\']}")\n    st.code(task.get("description", ""), language="markdown")\n    st.json(task)\n    if st.button("Proceed to Patch Review"):\n        st.session_state["phase"] = "Patch Review"\n        st.experimental_rerun()\n    if st.button("Back to backlog"):\n        st.session_state["phase"] = "Backlog"\n        st.experimental_rerun()\n\n# ---- Patch Review ----\nelif phase == "Patch Review":\n    st.title("Patch Review & Approval")\n    task_id = st.session_state.get("selected_task_id")\n    if not task_id:\n        st.warning("No task selected.")\n        st.stop()\n    task = orch.backlog.get_item(task_id)\n    try:\n        patch = orch.executor.build_patch(task)\n        st.code(patch, language="diff")\n        review = orch.reviewer.review_patch(patch, context=task)\n        st.markdown("### Review Comments")\n        st.markdown(review["comments"] or "_No issues detected._")\n        if review["pass"]:\n            if st.button("Approve and Apply Patch"):\n                # Apply patch, save, and proceed\n                orch.shell.git_apply(patch)\n                orch._record(task, "patch_applied")\n                st.success("Patch applied.")\n                st.session_state["phase"] = "Run Test"\n                st.experimental_rerun()\n        else:\n            st.error("Patch failed review; please revise before continuing.")\n            if st.button("Back to task detail"):\n                st.session_state["phase"] = "Task Detail"\n                st.experimental_rerun()\n    except Exception as ex:\n        st.error(f"Patch build/review failed: {ex}")\n        if st.button("Back to task detail"):\n            st.session_state["phase"] = "Task Detail"\n            st.experimental_rerun()\n\n# ---- Run Test ----\nelif phase == "Run Test":\n    st.title("Run Pytest")\n    task_id = st.session_state.get("selected_task_id")\n    if not task_id:\n        st.warning("No task selected.")\n        st.stop()\n    st.markdown("Apply code patch complete. Run tests to confirm correctness.")\n    if st.button("Run tests now"):\n        test_result = orch.shell.run_pytest()\n        st.text_area("Test Output", test_result["output"], height=200)\n        if test_result["success"]:\n            st.success("Tests passed!")\n            if st.button("Proceed to Archive/Done"):\n                # Commit and archive task\n                task = orch.backlog.get_item(task_id)\n                sha = orch.shell.git_commit(f"[Cadence] {task[\'id\'][:8]} {task.get(\'title\', \'\')}")\n                orch.backlog.update_item(task_id, {"status": "done"})\n                orch.backlog.archive_completed()\n                # commit snapshot (task is still \'done\' here)\n                orch.record.save(task, state="committed", extra={"commit_sha": sha})\n                # refresh snapshot so we accurately log \'archived\'\n                updated_task = orch.backlog.get_item(task_id)\n                orch.record.save(updated_task, state="archived", extra={})\n                st.session_state["phase"] = "Archive"\n                st.experimental_rerun()\n        else:\n            st.error("Tests failed, fix required before progressing.")\n    if st.button("Back to patch review"):\n        st.session_state["phase"] = "Patch Review"\n        st.experimental_rerun()\n\n# ---- Archive / Task Complete ----\nelif phase == "Archive":\n    st.title("Task Archived")\n    st.success("Task flow completed. You may return to the backlog.")\n    if st.button("Back to backlog"):\n        st.session_state["selected_task_id"] = None\n        st.session_state["phase"] = "Backlog"\n        st.experimental_rerun()', 'src/cadence/dev/orchestrator.py': '# src/cadence/dev/orchestrator.py\n"""\nCadence DevOrchestrator\n-----------------------\nThe *single* source of truth for phase-ordering in the development loop.\n\nNEW FUNCTIONALITY (2025-06-21)\n    • Atomic rollback: if any failure occurs **after** a patch is applied\n      but **before** commit succeeds, we automatically revert the working\n      tree to its pristine state using `git apply -R`.\n"""\n\nfrom __future__ import annotations\n\nfrom .backlog import BacklogManager\nfrom .generator import TaskGenerator\nfrom .executor import TaskExecutor, PatchBuildError\nfrom .reviewer import TaskReviewer\nfrom .shell import ShellRunner, ShellCommandError\nfrom .record import TaskRecord, TaskRecordError\n\nimport sys\nfrom typing import Any, Dict\n\n\nclass DevOrchestrator:\n    def __init__(self, config: dict):\n        self.backlog = BacklogManager(config["backlog_path"])\n        self.generator = TaskGenerator(config.get("template_file"))\n        self.executor = TaskExecutor(config["src_root"])\n        self.reviewer = TaskReviewer(config.get("ruleset_file"))\n        self.shell = ShellRunner(config["repo_dir"])\n        self.record = TaskRecord(config["record_file"])\n\n    # ------------------------------------------------------------------ #\n    # Internal helper – ALWAYS log, never raise\n    # ------------------------------------------------------------------ #\n    def _record(self, task: dict, state: str, extra: Dict[str, Any] | None = None) -> None:\n        try:\n            self.record.save(task, state=state, extra=extra or {})\n        except TaskRecordError as e:\n            print(f"[Record-Error] {e}", file=sys.stderr)\n\n    # ------------------------------------------------------------------ #\n    # Pretty-printing helpers\n    # ------------------------------------------------------------------ #\n    def show(self, status: str = "open", printout: bool = True):\n        items = self.backlog.list_items(status)\n        if printout:\n            print(self._format_backlog(items))\n        return items\n\n    def _format_backlog(self, items):\n        if not items:\n            return "(Backlog empty)"\n        from tabulate import tabulate\n\n        rows = [\n            (\n                t["id"][:8],\n                t.get("title", "")[:48],\n                t.get("type", ""),\n                t.get("status", ""),\n                t.get("created_at", "")[:19],\n            )\n            for t in items\n            if t.get("status") != "archived"\n        ]\n        headers = ["id", "title", "type", "status", "created"]\n        return tabulate(rows, headers, tablefmt="github")\n\n    # ------------------------------------------------------------------ #\n    # Main workflow\n    # ------------------------------------------------------------------ #\n    def run_task_cycle(self, select_id: str | None = None, *, interactive: bool = True):\n        """\n        End-to-end flow for ONE micro-task.\n\n        NEW BEHAVIOUR\n            • Generates a reverse diff immediately after patch apply.\n            • Any failure **before** commit triggers automatic rollback.\n        """\n        rollback_patch: str | None = None   # Will hold the *forward* patch\n        task: dict | None = None            # current task for logging\n\n        try:\n            # 1. Select Task --------------------------------------------------\n            open_tasks = self.backlog.list_items(status="open")\n            if not open_tasks:\n                raise RuntimeError("No open tasks in backlog.")\n\n            if select_id:\n                task = next((t for t in open_tasks if t["id"] == select_id), None)\n                if not task:\n                    raise RuntimeError(f"Task id \'{select_id}\' not found in open backlog.")\n            elif interactive:\n                print(self._format_backlog(open_tasks))\n                print("---")\n                idx = self._prompt_pick(len(open_tasks))\n                task = open_tasks[idx]\n            else:\n                task = open_tasks[0]  # default: pick first open\n\n            print(f"\\n[Selected task: {task[\'id\'][:8]}] {task.get(\'title\')}\\n")\n\n            # 2. Build patch --------------------------------------------------\n            self._record(task, "build_patch")\n            try:\n                patch = self.executor.build_patch(task)\n                rollback_patch = patch  # Save for potential rollback\n                self._record(task, "patch_built", {"patch": patch})\n                print("--- Patch built ---\\n", patch)\n            except PatchBuildError as ex:\n                self._record(task, "failed_build_patch", {"error": str(ex)})\n                print(f"[X] Patch build failed: {ex}")\n                return {"success": False, "stage": "build_patch", "error": str(ex)}\n\n            # 3. Review -------------------------------------------------------\n            review1 = self.reviewer.review_patch(patch, context=task)\n            self._record(task, "patch_reviewed", {"review": review1})\n            print("--- Review 1 ---")\n            print(review1["comments"] or "(no comments)")\n            if not review1["pass"]:\n                self._record(task, "failed_patch_review", {"review": review1})\n                print("[X] Patch failed review, aborting.")\n                return {"success": False, "stage": "patch_review", "review": review1}\n\n            # 4. Apply patch --------------------------------------------------\n            try:\n                self.shell.git_apply(patch)\n                self._record(task, "patch_applied")\n                print("[✔] Patch applied.")\n            except ShellCommandError as ex:\n                self._record(task, "failed_patch_apply", {"error": str(ex)})\n                print(f"[X] git apply failed: {ex}")\n                return {"success": False, "stage": "patch_apply", "error": str(ex)}\n\n            # ------------------------------- #\n            # --- CRITICAL SECTION BEGIN --- #\n            # Any failure after this point MUST rollback before returning.\n            # ------------------------------- #\n\n            # 5. Run tests ----------------------------------------------------\n            test_result = self.shell.run_pytest()\n            self._record(task, "pytest_run", {"pytest": test_result})\n            print("--- Pytest ---")\n            print(test_result["output"])\n\n            if not test_result["success"]:\n                print("[X] Tests FAILED. Initiating rollback.")\n                self._record(task, "failed_test", {"pytest": test_result})\n                self._attempt_rollback(task, rollback_patch, src_stage="test")\n                return {"success": False, "stage": "test", "test_result": test_result}\n\n            # 6. (Optional) extra review could go here ------------------------\n\n            # 7. Commit -------------------------------------------------------\n            commit_msg = f"[Cadence] {task[\'id\'][:8]} {task.get(\'title\', \'\')}"\n            try:\n                sha = self.shell.git_commit(commit_msg)\n                self._record(task, "committed", {"commit_sha": sha})\n                print(f"[✔] Committed as {sha}")\n            except ShellCommandError as ex:\n                self._record(task, "failed_commit", {"error": str(ex)})\n                print(f"[X] git commit failed: {ex}")\n                # Commit failure ⇒ rollback\n                self._attempt_rollback(task, rollback_patch, src_stage="commit")\n                return {"success": False, "stage": "commit", "error": str(ex)}\n\n            # 8. Mark task done + archive ------------------------------------\n            self.backlog.update_item(task["id"], {"status": "done"})\n            task = self.backlog.get_item(task["id"])\n            self._record(task, "status_done")\n\n            self.backlog.archive_completed()\n            task = self.backlog.get_item(task["id"])\n            self._record(task, "archived")\n            print("[✔] Task marked done and archived.")\n\n            return {"success": True, "commit": sha, "task_id": task["id"]}\n\n        except Exception as ex:\n            # Catch-all safety net: attempt rollback if patch was applied\n            if task and rollback_patch:\n                self._attempt_rollback(task, rollback_patch, src_stage="unexpected", quiet=True)\n            print(f"[X] Cycle failed: {ex}")\n            return {"success": False, "error": str(ex)}\n\n    # ------------------------------------------------------------------ #\n    # Rollback helper\n    # ------------------------------------------------------------------ #\n    def _attempt_rollback(self, task: dict, patch: str | None, *, src_stage: str, quiet: bool = False):\n        """\n        Try to undo an applied patch.  Records outcome to TaskRecord.\n\n        Args:\n            task:       The current task dict (for logging).\n            patch:      The *forward* patch previously applied.\n            src_stage:  Where rollback was triggered (e.g., "test", "commit").\n            quiet:      If True, suppress stdout noise (used in unexpected fail).\n        """\n        if not patch:\n            # Defensive: nothing to rollback\n            self._record(task, "rollback_skip_no_patch")\n            return\n\n        try:\n            self.shell.git_apply(patch, reverse=True)\n            self._record(task, f"failed_{src_stage}_and_rollback")\n            if not quiet:\n                print("[↩] Rollback successful – working tree restored.")\n        except ShellCommandError as rb_ex:\n            # CRITICAL: rollback failed – manual intervention required\n            self._record(\n                task,\n                "critical_rollback_failure",\n                {"trigger": src_stage, "rollback_error": str(rb_ex)},\n            )\n            print(f"[!!] Rollback FAILED – manual fix required: {rb_ex}")\n\n    # ------------------------------------------------------------------ #\n    # CLI dispatch helpers\n    # ------------------------------------------------------------------ #\n    def cli_entry(self, command: str, **kwargs):\n        try:\n            if command in ("backlog", "show"):\n                return self.show(status=kwargs.get("status", "open"))\n            if command == "start":\n                return self.run_task_cycle(select_id=kwargs.get("id"))\n            if command == "evaluate":\n                return self.run_task_cycle(select_id=kwargs.get("id"))\n            if command == "done":\n                if "id" not in kwargs:\n                    print("You must supply a task id for \'done\'.")\n                    return\n                self.backlog.update_item(kwargs["id"], {"status": "done"})\n                self.backlog.archive_completed()\n                print(f"Task {kwargs[\'id\']} marked as done and archived.")\n                return\n            print(f"Unknown command: {command}")\n        except Exception as ex:\n            print(f"[X] CLI command \'{command}\' failed: {ex}")\n\n    # ------------------------------------------------------------------ #\n    # Notebook / interactive helper\n    # ------------------------------------------------------------------ #\n    def _prompt_pick(self, n):\n        while True:\n            ans = input(f"Select task [0-{n-1}]: ")\n            try:\n                ix = int(ans)\n                if 0 <= ix < n:\n                    return ix\n            except Exception:\n                pass\n            print("Invalid. Try again.")\n\n\n# --------------------------------------------------------------------------- #\n# Stand-alone execution (developer convenience)\n# --------------------------------------------------------------------------- #\nif __name__ == "__main__":\n    CONFIG = dict(\n        backlog_path="dev_backlog.json",\n        template_file="dev_templates.json",\n        src_root="cadence",\n        ruleset_file=None,\n        repo_dir=".",\n        record_file="dev_record.json",\n    )\n    orch = DevOrchestrator(CONFIG)\n\n    import argparse\n\n    parser = argparse.ArgumentParser()\n    parser.add_argument("command", nargs="?", help="show|start|evaluate|done")\n    parser.add_argument("--id", default=None, help="Task id to use")\n    args = parser.parse_args()\n\n    orch.cli_entry(args.command or "show", id=args.id)', 'src/cadence/dev/reviewer.py': '\n# src/cadence/dev/reviewer.py\n\n"""\nCadence TaskReviewer\n-------------------\nSingle Responsibility: Adjudicates patch/diff quality via rules/LLM/manual. Never applies code or diffs.\nFuture extensible: can host local ruleset, shell out to LLM agent, or use human-in-the-loop.\n"""\n\nimport os\nimport json\nfrom typing import Optional, Dict\n\nclass PatchReviewError(Exception):\n    """Raised if review input is malformed or review fails outright (e.g. ruleset not found/valid)."""\n    pass\n\nclass TaskReviewer:\n    def __init__(self, ruleset_file: str = None):\n        """\n        Optionally specify path to ruleset file (JSON list of rules),\n        or leave blank to use default built-in rules.\n        """\n        self.ruleset_file = ruleset_file\n        self.rules = self._load_ruleset(ruleset_file) if ruleset_file else self._default_ruleset()\n\n    def review_patch(self, patch: str, context: Optional[dict] = None) -> Dict:\n        """\n        Review a diff/patch string (unapplied) and optional context (task, commit message, etc).\n        Returns dict {\'pass\': bool, \'comments\': str}\n        This uses static (offline) heuristics but can be swapped for agent/LLM in future.\n        """\n        # Guard: Patch required\n        if not patch or not isinstance(patch, str):\n            return {\'pass\': False, \'comments\': \'Patch missing or not a string.\'}\n\n        # Apply rules in order. If any hard-fail, review fails.\n        comments = []\n        passed = True\n\n        for rule in self.rules:\n            ok, msg = rule(patch, context)\n            if not ok:\n                passed = False\n            if msg:\n                comments.append(msg)\n            if not ok:\n                # For now, fail-hard (but comment all)\n                break\n\n        return {\'pass\': passed, \'comments\': "\\n".join(comments).strip()}\n\n    def _default_ruleset(self):\n        """\n        Returns a list of static rule functions: (patch, context) → (bool, str)\n        """\n        def not_empty_rule(patch, _):\n            if not patch.strip():\n                return False, "Patch is empty."\n            return True, ""\n        def startswith_rule(patch, _):\n            if not patch.startswith(("---", "diff ", "@@ ")):\n                return False, "Patch does not appear to be a valid unified diff."\n            return True, ""\n        def contains_todo_rule(patch, _):\n            if "TODO" in patch:\n                return False, "Patch contains \'TODO\'—code review must not introduce placeholders."\n            return True, ""\n\n        # Optionally check for too-huge diffs, or forbidden patterns, via rules below.\n        def size_limit_rule(patch, _):\n            line_count = patch.count("\\n")\n            if line_count > 5000:  # Arbitrary large patch guard\n                return False, f"Patch too large for standard review ({line_count} lines)."\n            return True, ""\n        return [\n            not_empty_rule, \n            startswith_rule,\n            contains_todo_rule,\n            size_limit_rule,\n        ]\n\n    def _load_ruleset(self, path: str):\n        """\n        Loads a simple external ruleset (for human/agent extension), e.g. as list of forbidden strings.\n        For extensibility only; advanced policies/LLMs should be subclassed onto this interface.\n        """\n        if not os.path.exists(path):\n            raise PatchReviewError(f"Ruleset file \'{path}\' not found.")\n        with open(path, "r", encoding="utf8") as f:\n            obj = json.load(f)\n        # Expect a list of {\'type\':..., \'pattern\':..., ...} dicts for pattern rules\n        rules = []\n        def make_rule(ruleobj):\n            typ = ruleobj.get(\'type\')\n            pattern = ruleobj.get(\'pattern\')\n            msg = ruleobj.get(\'message\', f"Patch contains forbidden pattern: {pattern}")\n            if typ == \'forbid\':\n                def _inner(patch, _):\n                    if pattern in patch:\n                        return False, msg\n                    return True, ""\n                return _inner\n            elif typ == \'require\':\n                def _inner(patch, _):\n                    if pattern not in patch:\n                        return False, msg\n                    return True, ""\n                return _inner\n            else:\n                # Ignore unknown rule types\n                def _inner(patch, _):\n                    return True, ""\n                return _inner\n        for ruleobj in obj:\n            rules.append(make_rule(ruleobj))\n        # Default rules always included\n        return self._default_ruleset() + rules\n\n# Standalone/example/test run\nif __name__ == "__main__":\n    reviewer = TaskReviewer()\n    # Good patch\n    patch = """--- sample.py\n+++ sample.py\n@@ -1 +1,2 @@\n-print(\'hello\')\n+print(\'hello world\')\n"""\n    result = reviewer.review_patch(patch)\n    print("Result (should pass):", result)\n\n    bad_patch = "TODO: refactor\\n"\n    result = reviewer.review_patch(bad_patch)\n    print("Result (should fail):", result)', 'src/cadence/dev/__init__.py': '\n', 'src/cadence/dev/generator.py': '\n# src/cadence/dev/generator.py\n\n"""\nCadence TaskGenerator\n-------------------\nSingle Responsibility: Propose/generate well-formed tasks, optionally from template or rules/LLM seed.\nNever applies code or diffs. Future extensible to LLM/human agent.\n"""\n\nimport os, json, uuid, datetime, warnings\nfrom typing import List, Dict, Optional\n\nclass TaskTemplateError(Exception):\n    """Raised if template file is not valid or incomplete."""\n    pass\n\nREQUIRED_FIELDS = ("title", "type", "status", "created_at")\n\n\nclass TaskGenerator:\n    def __init__(self, template_file: str | None = None, *, strict: bool = False):\n        """\n        Optionally supply a JSON / MD template file.  \n        If `strict` is False (default) and the file does **not** exist, we\n        continue with an empty template dictionary and merely warn.\n        """\n        self.template_file = template_file\n        self._template_cache: Dict = {}\n        if template_file:\n            if os.path.exists(template_file):\n                self._template_cache = self._load_template(template_file)\n            elif strict:\n                # Original behaviour – hard-fail\n                raise TaskTemplateError(f"Template file not found: {template_file}")\n            else:\n                warnings.warn(\n                    f"Template file \'{template_file}\' not found; "\n                    "proceeding with minimal fallback templates.",\n                    RuntimeWarning,\n                )\n    \n    def generate_tasks(self, mode: str = "micro", count: int = 1, human_prompt: Optional[str]=None) -> List[Dict]:\n        """\n        Return a list of well-formed tasks. \n        - mode: "micro", "story", "epic", etc.\n        - count: number of tasks to generate\n        - human_prompt: if provided, use as summary/title for each (e.g., "Add new test", for human CLI prompt workflow)\n        If template_file is used, will fill in mode-related templates.\n        """\n        tasks = []\n        base_tpl = self._get_template_for_mode(mode)\n        now = datetime.datetime.utcnow().isoformat()\n        for i in range(count):\n            task = dict(base_tpl)\n            # Minimal fields: id, title, type, status, created_at\n            task["id"] = str(uuid.uuid4())\n            task["type"] = mode\n            task.setdefault("status", "open")\n            task.setdefault("created_at", now)\n            if human_prompt:\n                # Provide a default/barebones title/desc from human input\n                task["title"] = human_prompt if count == 1 else f"{human_prompt} [{i+1}]"\n                task.setdefault("description", human_prompt)\n            else:\n                # Fallback: title must be present; if not, use template/title from mode or \'Untitled\'\n                task["title"] = task.get("title", f"{mode.capitalize()} Task {i+1}")\n                task.setdefault("description", "")\n            self._validate_task(task)\n            tasks.append(task)\n        return tasks\n\n    def overwrite_tasks(self, new_tasks: List[Dict], output_path: Optional[str]=None) -> None:\n        """\n        Replace all backlog tasks with given well-formed list (writes to output_path, else self.template_file).\n        """\n        path = output_path or self.template_file\n        if not path:\n            raise TaskTemplateError("No output path specified to write tasks.")\n        with open(path, "w", encoding="utf8") as f:\n            json.dump([self._validate_task(t) for t in new_tasks], f, indent=2)\n\n    def _get_template_for_mode(self, mode: str) -> Dict:\n        """\n        Get template for the given mode; falls back to default/minimal template.\n        """\n        if self._template_cache and mode in self._template_cache:\n            return dict(self._template_cache[mode])  # deep copy\n        # Fallback: minimal template\n        return {\n            "title": "",\n            "type": mode,\n            "status": "open",\n            "created_at": "",\n            "description": "",\n        }\n\n    def _load_template(self, path: str) -> Dict:\n        """\n        Loads a JSON template file mapping mode→template-dict.\n        If Markdown file with front-matter, parse the JSON front-matter.\n        """\n        if not os.path.exists(path):\n            raise TaskTemplateError(f"Template file not found: {path}")\n        if path.endswith(".md"):\n            with open(path, "r", encoding="utf8") as f:\n                lines = f.readlines()\n            start, end = None, None\n            for i, line in enumerate(lines):\n                if line.strip() == "```json":\n                    start = i + 1\n                elif line.strip().startswith("```") and start is not None and end is None:\n                    end = i\n                    break\n            if start is not None and end is not None:\n                json_str = "".join(lines[start:end])\n                tpl = json.loads(json_str)\n            else:\n                raise TaskTemplateError("Markdown template missing ```json ... ``` block.")\n        else:\n            with open(path, "r", encoding="utf8") as f:\n                tpl = json.load(f)\n        if not isinstance(tpl, dict):\n            raise TaskTemplateError("Task template must be a dict mapping mode->template.")\n        return tpl\n\n    def _validate_task(self, task: Dict) -> Dict:\n        """\n        Ensures task has all required fields and correct types/formats.\n        Throws TaskTemplateError if not.\n        """\n        for field in REQUIRED_FIELDS:\n            if field not in task or (field == "title" and not task["title"].strip()):\n                raise TaskTemplateError(f"Task missing required field: \'{field}\'")\n        if not isinstance(task["type"], str):\n            raise TaskTemplateError("Task type must be str.")\n        if "id" in task and not isinstance(task["id"], str):\n            task["id"] = str(task["id"])\n        # Optionally: check status value, etc.\n        return task\n\n    # For future agentic/LLM/human input: accept strings, call LLM API, etc.\n    # Extend here with agent hooks.\n\n# Standalone/test CLI example (not for production)\nif __name__ == "__main__":\n    # Example: generate 2 microtasks from default, print as JSON:\n    g = TaskGenerator()\n    tasks = g.generate_tasks(mode="micro", count=2, human_prompt="Example user-initiated task")\n    print(json.dumps(tasks, indent=2))', 'src/cadence/dev/backlog.py': '\n# src/cadence/dev/backlog.py\n\n"""\nCadence BacklogManager\n---------------------\nSingle Responsibility: CRUD on task backlog (no code/diffs). All file IO explicit, JSON-logged, and future agent-ready.\n"""\n\nimport os\nimport json\nimport uuid\nfrom typing import List, Dict, Optional\n\nclass BacklogEmptyError(Exception):\n    """Raised if attempting to pop or select from an empty backlog."""\n    pass\n\nclass TaskStructureError(Exception):\n    """Raised if a task dict doesn\'t conform to required structure."""\n    pass\n\nclass TaskNotFoundError(Exception):\n    """Raised if a requested task_id is not in the backlog."""\n    pass\n\nREQUIRED_FIELDS = ("id", "title", "type", "status", "created_at")\n\nclass BacklogManager:\n    """\n    Manages Cadence backlog: microtasks, stories, and epics.\n    - All tasks are plain dicts with mandatory fields.\n    - Underlying store is a JSON file [{...}, ...].\n    """\n\n    def __init__(self, backlog_path: str):\n        self.path = backlog_path\n        self._items: List[Dict] = []\n        self.load()\n\n    def list_items(self, status: str = "open") -> List[Dict]:\n        """\n        Return a list of tasks filtered by status.\n        status: "open", "in_progress", "done", or "all"\n        """\n        if status == "all":\n            return list(self._items)\n        return [item for item in self._items if item.get("status", "open") == status]\n\n    def add_item(self, task: Dict) -> None:\n        """\n        Add a new task to backlog. Enforce structure and unique id.\n        """\n        task = self._normalize_task(task)\n        if any(t["id"] == task["id"] for t in self._items):\n            raise TaskStructureError(f"Duplicate task id: {task[\'id\']}")\n        self._items.append(task)\n        self.save()\n\n    def remove_item(self, task_id: str) -> None:\n        """\n        Mark a task as archived (status = \'archived\').\n        """\n        idx = self._task_index(task_id)\n        self._items[idx]["status"] = "archived"\n        self.save()\n\n    def archive_completed(self) -> None:\n        """\n        Mark all tasks with status \'done\' as \'archived\'.\n        """\n        n = 0\n        for item in self._items:\n            if item.get("status") == "done":\n                item["status"] = "archived"\n                n += 1\n        if n:\n            self.save()\n\n    def save(self) -> None:\n        """Persist backlog state to self.path as JSON (UTF-8, indent)."""\n        tmp_path = self.path + ".tmp"\n        with open(tmp_path, "w", encoding="utf8") as f:\n            json.dump(self._items, f, indent=2)\n        os.replace(tmp_path, self.path)\n\n    def load(self) -> None:\n        """\n        Reload backlog state from file. If the file does not exist, starts empty.\n        """\n        if not os.path.exists(self.path):\n            self._items = []\n            return\n        with open(self.path, "r", encoding="utf8") as f:\n            data = json.load(f)\n            if not isinstance(data, list):\n                raise ValueError("Backlog JSON must be a list of tasks")\n            self._items = [self._normalize_task(t) for t in data]\n\n    def _normalize_task(self, task: Dict) -> Dict:\n        """\n        Ensure the dict has all required fields, fill missing, return new dict.\n        """\n        t = dict(task)  # copy\n        for field in REQUIRED_FIELDS:\n            if field not in t:\n                if field == "id":\n                    t["id"] = str(uuid.uuid4())\n                elif field == "created_at":\n                    import datetime\n                    t["created_at"] = datetime.datetime.utcnow().isoformat()\n                elif field == "status":\n                    t["status"] = "open"\n                elif field == "type":\n                    t["type"] = "micro"\n                else:\n                    raise TaskStructureError(f"Missing required field: {field}")\n        # Sanity check: no harmful keys\n        if not isinstance(t["id"], str):\n            t["id"] = str(t["id"])\n        return t\n\n    def _task_index(self, task_id: str) -> int:\n        """\n        Internal: find list index of task by id or raise.\n        """\n        for ix, t in enumerate(self._items):\n            if t["id"] == task_id:\n                return ix\n        raise TaskNotFoundError(f"No task found with id={task_id}")\n\n    def get_item(self, task_id: str) -> Dict:\n        """Retrieve a task by id."""\n        idx = self._task_index(task_id)\n        return self._items[idx]\n\n    def update_item(self, task_id: str, updates: Dict) -> None:\n        """\n        Update arbitrary fields of a task (e.g. assign, progress, edit).\n        """\n        idx = self._task_index(task_id)\n        self._items[idx].update(updates)\n        self.save()\n\n    def export(self) -> List[Dict]:\n        """\n        Return a (deep) copy of all backlog items.\n        """\n        import copy\n        return copy.deepcopy(self._items)\n\n    # Optional: friendly CLI/str output\n    def __str__(self) -> str:\n        from tabulate import tabulate\n        if not self._items:\n            return "(Backlog empty)"\n        rows = [\n            (t["id"][:8], t["title"], t["type"], t.get("status", "open"), t.get("created_at", ""))\n            for t in self._items if t.get("status") != "archived"\n        ]\n        headers = ["id", "title", "type", "status", "created"]\n        return tabulate(rows, headers, tablefmt="github")\n\n# For direct module test/dev, NOT in prod code.\nif __name__ == "__main__":\n    # Example usage\n    mgr = BacklogManager("dev_backlog.json")\n    print(mgr)\n    # To add: mgr.add_item({"title": "Test microtask", "type": "micro"})', 'src/cadence/agents/sidekick.py': '# src/cadence/agents/sidekick.py\n"""\nPersona agent that *delegates* to a ReasoningAgent but presents a\nhuman-centric mentor/advisor interface.\n"""\nfrom __future__ import annotations\n\nimport json\nfrom pathlib import Path\n\nfrom .profile import AgentProfile, REASONING_PROFILE\nfrom .reasoning import ReasoningAgent\n\n\n_SIDEKICK_PROMPT = """\nYou are an AI-enhanced co-developer and mentor. Your primary goal is to\nextract the most creative, high-leverage ideas from the human user and\ntransform them into actionable improvements for the Cadence platform.\nAvoid tactical implementation details unless asked; focus on vision,\narchitecture, and pragmatic next steps.\n"""\n\n\nclass Sidekick:\n    """\n    Thin wrapper: exposes `run_interaction` but delegates work to an\n    internal ReasoningAgent instance configured with a custom prompt.\n    """\n\n    def __init__(self):\n        profile = AgentProfile(\n            name="sidekick",\n            role="advisor",\n            model=REASONING_PROFILE.model,\n            context_limit=REASONING_PROFILE.context_limit,\n            review_policy=REASONING_PROFILE.review_policy,\n            default_system_prompt=REASONING_PROFILE.default_system_prompt,\n            extra=REASONING_PROFILE.extra.copy() if REASONING_PROFILE.extra else {},\n        )\n        self._agent = ReasoningAgent(profile=profile, system_prompt=_SIDEKICK_PROMPT)\n        self._inject_seed_context()\n\n    # ------------------------------------------------------------------ #\n    # Public façade\n    # ------------------------------------------------------------------ #\n    def run_interaction(self, user_input: str, **kwargs) -> str:\n        return self._agent.run_interaction(user_input, **kwargs)\n\n    async def async_run_interaction(self, user_input: str, **kwargs) -> str:\n        return await self._agent.async_run_interaction(user_input, **kwargs)\n\n    # ------------------------------------------------------------------ #\n    # Private helpers\n    # ------------------------------------------------------------------ #\n    def _inject_seed_context(self):\n        docs = self._agent.gather_codebase_context(\n            root=("docs",),\n            ext=(".md", ".mermaid", ".json"),\n        )\n\n        modules_path = Path("agent_context/module_contexts.json")\n        modules = {}\n        if modules_path.exists():\n            modules = json.loads(modules_path.read_text())\n\n        self._agent.append_message(\n            "user",\n            f"DOCS:\\n{docs}\\n---\\nMODULE_CONTEXTS:\\n{json.dumps(modules)[:10_000]}",\n        )', 'src/cadence/agents/base.py': '# src/cadence/agents/base.py\nfrom __future__ import annotations\n\nfrom typing import List, Dict, Any, Optional, Tuple\nfrom pathlib import Path\n\nfrom src.cadence.llm.client import LLMClient, get_default_client\nfrom src.cadence.context.provider import ContextProvider, SnapshotContextProvider\nfrom .profile import AgentProfile\n\n\nclass BaseAgent:\n    """\n    The one true superclass for *all* Cadence agents.\n\n    An agent = (profile) + (conversation state) + (LLM client) [+ optional helpers]\n\n    Subclasses SHOULD NOT hard-code models; they inherit that from the supplied\n    `AgentProfile`.  Core agents (Reasoning / Execution / Efficiency) simply\n    pass the canonical profile; personas may inject a custom one.\n    """\n\n    def __init__(\n        self,\n        profile: AgentProfile,\n        *,\n        llm_client: Optional[LLMClient] = None,\n        system_prompt: Optional[str] = None,\n        context_provider: Optional[ContextProvider] = None,\n    ):\n        self.profile = profile\n        self.llm_client = llm_client or get_default_client()\n        self.system_prompt = system_prompt or profile.default_system_prompt\n        self.context_provider = context_provider or SnapshotContextProvider()\n        self.messages: List[Dict[str, Any]] = []\n        self.reset_context()\n\n    # --------------------------------------------------------------------- #\n    # Conversation helpers\n    # --------------------------------------------------------------------- #\n    def reset_context(self, system_prompt: Optional[str] = None):\n        """Clear history and (re)set the system prompt."""\n        self.messages = []\n        sys_prompt = system_prompt or self.system_prompt\n        if sys_prompt:\n            self.append_message("system", sys_prompt)\n\n    def append_message(self, role: str, content: str):\n        self.messages.append({"role": role, "content": content})\n\n    # --------------------------------------------------------------------- #\n    # LLM calls\n    # --------------------------------------------------------------------- #\n    def run_interaction(self, user_input: str, **llm_kwargs) -> str:\n        self.append_message("user", user_input)\n        response = self.llm_client.call(\n            self.messages,\n            model=self.profile.model,\n            system_prompt=None,  # already injected\n            **llm_kwargs,\n        )\n        self.append_message("assistant", response)\n        return response\n\n    async def async_run_interaction(self, user_input: str, **llm_kwargs) -> str:\n        self.append_message("user", user_input)\n        response = await self.llm_client.acall(\n            self.messages,\n            model=self.profile.model,\n            system_prompt=None,\n            **llm_kwargs,\n        )\n        self.append_message("assistant", response)\n        return response\n\n    # --------------------------------------------------------------------- #\n    # Persistence\n    # --------------------------------------------------------------------- #\n    def save_history(self, path: str):\n        import json\n        Path(path).write_text(json.dumps(self.messages, indent=2, ensure_ascii=False))\n\n    def load_history(self, path: str):\n        import json\n        self.messages = json.loads(Path(path).read_text())\n\n    # --------------------------------------------------------------------- #\n    # Context helpers\n    # --------------------------------------------------------------------- #\n    def gather_codebase_context(\n        self,\n        root: Tuple[str, ...] = ("cadence", "docs"),\n        ext: Tuple[str, ...] = (".py", ".md", ".json", ".mermaid"),\n        **kwargs,\n    ) -> str:\n        """Return repo/docs snapshot via the injected ContextProvider."""\n        return self.context_provider.get_context(*(Path(r) for r in root), exts=ext, **kwargs)\n', 'src/cadence/agents/efficiency.py': '# src/cadence/agents/efficiency.py\nfrom __future__ import annotations\n\nfrom .base import BaseAgent\nfrom .profile import EFFICIENCY_PROFILE, AgentProfile\n\n\nclass EfficiencyAgent(BaseAgent):\n    """\n    Final class: fast, low-cost linting & summarisation.\n    """\n\n    def __init__(self, profile: AgentProfile = EFFICIENCY_PROFILE, **kwargs):\n        super().__init__(profile, **kwargs)', 'src/cadence/agents/reasoning.py': '# src/cadence/agents/reasoning.py\nfrom __future__ import annotations\n\nfrom .base import BaseAgent\nfrom .profile import REASONING_PROFILE, AgentProfile\n\n\nclass ReasoningAgent(BaseAgent):\n    """\n    Final class: provides deep, chain-of-thought reasoning and architectural review.\n    """\n\n    def __init__(self, profile: AgentProfile = REASONING_PROFILE, **kwargs):\n        super().__init__(profile, **kwargs)\n\n    # Automatically inject a fresh code snapshot on each reset\n    def reset_context(self, system_prompt: str | None = None):\n        super().reset_context(system_prompt)\n        docs = self.gather_codebase_context(\n            root=("docs",),\n            ext=(".md", ".mermaid", ".json"),\n        )\n        self.append_message("user", f"REFERENCE_DOCUMENTS:\\n{docs}\\n---\\nYou are cleared for deep reasoning.")', 'src/cadence/agents/profile.py': '# src/cadence/agents/profile.py\nfrom dataclasses import dataclass, field\nfrom typing import Dict, Any\n\n\n@dataclass(frozen=True, slots=True)\nclass AgentProfile:\n    """\n    Immutable definition of an agent’s operational contract.\n\n    Nothing here executes code; it is pure data that can be validated,\n    serialised, or inspected by the Meta-agent and CI tooling.\n    """\n    name: str\n    role: str\n    model: str\n    context_limit: int\n    review_policy: str = ""\n    default_system_prompt: str = ""\n    extra: Dict[str, Any] = field(default_factory=dict)\n\n\n# --------------------------------------------------------------------------- #\n# Canonical profiles – these are the ONLY ones that Core Agents will default to\n# --------------------------------------------------------------------------- #\nREASONING_PROFILE = AgentProfile(\n    name="reasoning",\n    role="plan-review",\n    model="o3-2025-04-16",\n    context_limit=200_000,\n    review_policy="Cannot commit code; must review Execution diff",\n)\n\nEXECUTION_PROFILE = AgentProfile(\n    name="execution",\n    role="implement",\n    model="gpt-4.1",\n    context_limit=1_000_000,\n    review_policy="Needs review by Reasoning or Efficiency",\n)\n\nEFFICIENCY_PROFILE = AgentProfile(\n    name="efficiency",\n    role="lint-summarise",\n    model="o4-mini",\n    context_limit=200_000,\n    review_policy="Reviews Execution unless diff is non-code",\n)\n\n# Convenience lookup\nBUILTIN_PROFILES = {\n    "reasoning": REASONING_PROFILE,\n    "execution": EXECUTION_PROFILE,\n    "efficiency": EFFICIENCY_PROFILE,\n}', 'src/cadence/agents/__init__.py': '\n', 'src/cadence/agents/registry.py': '# src/cadence/agents/registry.py\n"""\nSingle place to obtain a Core Agent or Profile.\n\nAvoids hard-coding classes throughout the codebase.\n"""\n\nfrom typing import Type\n\nfrom .reasoning import ReasoningAgent\nfrom .execution import ExecutionAgent\nfrom .efficiency import EfficiencyAgent\nfrom .profile import BUILTIN_PROFILES, AgentProfile\n\n_CORE_AGENTS: dict[str, Type] = {\n    "reasoning": ReasoningAgent,\n    "execution": ExecutionAgent,\n    "efficiency": EfficiencyAgent,\n}\n\n\ndef get_agent(agent_type: str, **kwargs):\n    """\n    Instantiate a Core Agent by `agent_type`.\n\n    Example:\n        agent = get_agent("execution")\n    """\n    if agent_type not in _CORE_AGENTS:\n        raise ValueError(f"Unknown agent_type \'{agent_type}\'. Valid: {list(_CORE_AGENTS)}")\n    return _CORE_AGENTS[agent_type](**kwargs)\n\n\ndef get_profile(profile_name: str) -> AgentProfile:\n    if profile_name not in BUILTIN_PROFILES:\n        raise ValueError(f"Unknown profile \'{profile_name}\'. Valid: {list(BUILTIN_PROFILES)}")\n    return BUILTIN_PROFILES[profile_name]', 'src/cadence/agents/execution.py': '# src/cadence/agents/execution.py\nfrom __future__ import annotations\n\nfrom .base import BaseAgent\nfrom .profile import EXECUTION_PROFILE, AgentProfile\n\n\nclass ExecutionAgent(BaseAgent):\n    """\n    Final class: generates or refactors significant portions of the codebase.\n    """\n\n    def __init__(self, profile: AgentProfile = EXECUTION_PROFILE, **kwargs):\n        super().__init__(profile, **kwargs)', 'src/cadence/llm/client.py': '# src/cadence/llm/client.py\nimport os\nimport logging\nimport asyncio\nfrom typing import List, Dict, Any, Optional, cast\nfrom openai import AsyncOpenAI, OpenAI\nfrom openai.types.chat import ChatCompletionMessageParam\nfrom dotenv import load_dotenv\n\n# One-time load\nload_dotenv()\n\n# Set up logger\nlogger = logging.getLogger("cadence.llm.client")\nif not logger.handlers:\n    handler = logging.StreamHandler()\n    handler.setFormatter(logging.Formatter("[%(asctime)s] %(levelname)s %(message)s"))\n    logger.addHandler(handler)\nlogger.setLevel(logging.INFO)\n\n# Global default model configs\n_DEFAULT_MODELS = {\n    "reasoning": "o3-2025-04-16",\n    "execution": "gpt-4.1",\n    "efficiency": "o4-mini"\n}\n\ndef get_env(key: str, required=True, default=None):\n    val = os.getenv(key)\n    if not val and required:\n        raise RuntimeError(f"Environment variable {key} not set.")\n    return val or default\n\n# Centralized sync/async LLM client\nclass LLMClient:\n    def __init__(\n        self,\n        api_key: Optional[str] = None,\n        api_base: Optional[str] = None,\n        api_version: Optional[str] = None,\n        default_model: Optional[str] = None,\n    ):\n        self.api_key = api_key or get_env(\'OPENAI_API_KEY\')\n        self.api_base = api_base or os.getenv(\'OPENAI_API_BASE\', None)\n        self.api_version = api_version or os.getenv(\'OPENAI_API_VERSION\', None)\n        self.default_model = default_model or _DEFAULT_MODELS["execution"]\n\n        # Sync and Async clients\n        self._async_client = AsyncOpenAI(api_key=self.api_key, base_url=self.api_base)\n        self._sync_client = OpenAI(api_key=self.api_key, base_url=self.api_base)\n\n    def _resolve_model(self, model: Optional[str], agent_type: Optional[str]):\n        if model:\n            return model\n        if agent_type and agent_type in _DEFAULT_MODELS:\n            return _DEFAULT_MODELS[agent_type]\n        return self.default_model\n\n    def call(\n        self,\n        messages: List[Dict[str, Any]],\n        model: Optional[str] = None,\n        agent_type: Optional[str] = None,\n        system_prompt: Optional[str] = None,\n        max_tokens: Optional[int] = None,\n        **kwargs\n    ) -> str:\n        used_model = self._resolve_model(model, agent_type)\n        msgs = messages.copy()\n        if system_prompt and not any(m.get("role") == "system" for m in msgs):\n            msgs.insert(0, {"role": "system", "content": system_prompt})\n\n        logger.info(f"LLM sync call: model={used_model}, msgs_len={len(msgs)}")\n        response = self._sync_client.chat.completions.create(  # type: ignore[arg-type]\n            model=used_model,\n            messages=cast(List[ChatCompletionMessageParam], msgs),\n            # max_tokens=max_tokens,\n            **kwargs\n        )\n        content = (response.choices[0].message.content or "").strip()\n        logger.debug(f"LLM response: {content[:120]}...")\n        return content\n\n    async def acall(\n        self,\n        messages: List[Dict[str, Any]],\n        model: Optional[str] = None,\n        agent_type: Optional[str] = None,\n        system_prompt: Optional[str] = None,\n        max_tokens: Optional[int] = None,\n        **kwargs\n    ) -> str:\n        used_model = self._resolve_model(model, agent_type)\n        msgs = messages.copy()\n        if system_prompt and not any(m.get("role") == "system" for m in msgs):\n            msgs.insert(0, {"role": "system", "content": system_prompt})\n\n        logger.info(f"LLM async call: model={used_model}, msgs_len={len(msgs)}")\n        response = await self._async_client.chat.completions.create(  # type: ignore[arg-type]\n            model=used_model,\n            messages=cast(List[ChatCompletionMessageParam], msgs),\n            max_tokens=max_tokens,\n            **kwargs\n        )\n        content = (response.choices[0].message.content or "").strip()\n        logger.debug(f"LLM response: {content[:120]}...")\n        return content\n\n# Provide a default client getter for agents\ndef get_default_client() -> LLMClient:\n    return _DEFAULT_CLIENT\n\n_DEFAULT_CLIENT = LLMClient()\n', 'src/cadence/llm/__init__.py': '\n', 'src/cadence/utils/add.py': 'def add(x: int, y: int) -> int:\n    """Intentionally wrong implementation for MVP red→green demo."""\n    return x + y\n', 'src/cadence/utils/mvp_loop.py': '# src/cadence/utils/mvp_loop.py\n\nimport pytest\nfrom src.cadence.dev.executor import TaskExecutor\nfrom src.cadence.dev.shell import ShellRunner\n\ndef manual_test():\n    result = pytest.main(["tests"])\n    if result != 0:\n        print("Tests failed.")\n        # Read before\n        before = open("cadence/utils/add.py").read()\n        print("Paste fixed implementation for utils/add.py below. End input with EOF (Ctrl+D):")\n        after = []\n        try:\n            while True:\n                after.append(input())\n        except EOFError:\n            pass\n        after = "\\n".join(after)\n        # build diff\n        task = {"diff": {"file": "cadence/utils/add.py", "before": before, "after": after}}\n        patch = TaskExecutor("cadence/utils").build_patch(task)\n        print("---Proposed Diff---")\n        print(patch)\n\ndef OOP_test():\n    executor = TaskExecutor(src_root=".")\n    shell = ShellRunner(repo_dir=".")\n\n    # Dynamically read and patch the file\n    with open("cadence/utils/add.py") as f:\n        before = f.read()\n    if "return x + y" not in before:\n        after = before.replace("return x - 1 + y", "return x + y")\n    else:\n        print("Already correct: no patch needed.")\n        return\n\n    task = {\n        "diff": {\n            "file": "cadence/utils/add.py",\n            "before": before,\n            "after": after\n        }\n    }\n\n    patch = executor.build_patch(task)\n    try:\n        shell.git_apply(patch)\n        # Run tests via ShellRunner\n        result = shell.run_pytest()\n        if result["success"]:\n            sha = shell.git_commit("Fix add(): correct return expression")\n            print(f"Patch applied and tests passed. Commit SHA: {sha}")\n        else:\n            print("Tests failed after patch:\\n", result["output"])\n    except Exception as e:\n        print("Patch failed:", e)\n\n\n\nif __name__ == "__main__":\n    OOP_test()', 'tests/test_add.py': 'from cadence.utils.add import add\n\ndef test_add():\n    assert add(2, 3) == 5', 'tests/test_failed_rollback.py': '"""\nRegression-test — Atomic rollback on downstream failure\n=======================================================\n\nPurpose\n-------\nVerify that *any* failure **after** a patch is applied but **before**\ncommit triggers an automatic rollback that restores a pristine working\ntree **and** writes the correct snapshots to TaskRecord.\n\nStrategy\n--------\n1.  Start with a clean repo where utils.add() is *correct* and all tests\n    pass.\n\n2.  Backlog contains a task whose patch **adds a brand-new failing test\n    file** – this guarantees pytest will fail *if* the patch is applied,\n    regardless of implementation details.\n\n3.  Run a full `DevOrchestrator` cycle (non-interactive).\n\n4.  Assert:\n        ─ orchestrator reports failure at the *test* stage;\n        ─ TaskRecord contains both `"failed_test"` **and**\n          `"failed_test_and_rollback"` snapshots;\n        ─ the failing test file is gone (working tree restored);\n        ─ original tests pass again and git status is clean.\n"""\n\nfrom __future__ import annotations\n\nimport json\nimport subprocess\nimport sys\nfrom pathlib import Path\nfrom typing import List\n\nimport pytest\n\n\n# --------------------------------------------------------------------------- #\n# Global stubs – applied automatically\n# --------------------------------------------------------------------------- #\n@pytest.fixture(autouse=True)\ndef _stub_external(monkeypatch):\n    """Stub out optional / external deps so the test is hermetic."""\n    # Fake OpenAI client (LLM not used by this path)\n    fake_openai = sys.modules["openai"] = type(sys)("openai")\n    fake_openai.AsyncOpenAI = lambda *a, **k: None  # type: ignore\n    fake_openai.OpenAI = lambda *a, **k: None       # type: ignore\n\n    # Fake tabulate (pretty-printer)\n    fake_tabulate = sys.modules["tabulate"] = type(sys)("tabulate")\n    fake_tabulate.tabulate = lambda *a, **k: ""\n\n    # Satisfy LLMClient env check\n    monkeypatch.setenv("OPENAI_API_KEY", "dummy")\n\n    # Ensure repository *parent* (containing “src/”) is importable\n    PROJ_ROOT = Path(__file__).resolve().parents[1]\n    if (PROJ_ROOT / "src").exists():\n        monkeypatch.syspath_prepend(str(PROJ_ROOT))\n\n    yield\n\n\n# --------------------------------------------------------------------------- #\n# Repo bootstrap helpers\n# --------------------------------------------------------------------------- #\nGOOD_IMPL = "def add(x, y):\\n    return x + y\\n"\nFAILING_TEST = (\n    "def test_intentional_failure():\\n"\n    "    assert False, \'This test is added by the patch and must fail\'\\n"\n)\n\n\ndef _init_repo(tmp_path: Path) -> Path:\n    """Create a minimal Cadence project inside a temporary git repo."""\n    repo = tmp_path\n\n    # --- source package ----------------------------------------------------\n    pkg_root = repo / "src" / "cadence" / "utils"\n    pkg_root.mkdir(parents=True, exist_ok=True)\n    (repo / "src" / "__init__.py").write_text("")\n    (repo / "src" / "cadence" / "__init__.py").write_text("")\n    (pkg_root / "__init__.py").write_text("")\n    (pkg_root / "add.py").write_text(GOOD_IMPL)\n\n    # --- baseline passing test --------------------------------------------\n    tests_dir = repo / "tests"\n    tests_dir.mkdir()\n    (tests_dir / "test_add.py").write_text(\n        "import sys, pathlib, os\\n"\n        "sys.path.insert(0, os.fspath((pathlib.Path(__file__).resolve().parents[2] / \'src\')))\\n"\n        "from cadence.utils.add import add\\n"\n        "\\n"\n        "def test_add():\\n"\n        "    assert add(2, 3) == 5\\n"\n    )\n\n    # --- git init ----------------------------------------------------------\n    subprocess.run(["git", "init"], cwd=repo, check=True, stdout=subprocess.DEVNULL)\n    subprocess.run(["git", "config", "user.email", "ci@example.com"], cwd=repo, check=True)\n    subprocess.run(["git", "config", "user.name", "CI"], cwd=repo, check=True)\n    subprocess.run(["git", "add", "-A"], cwd=repo, check=True)\n    subprocess.run(\n        ["git", "commit", "-m", "initial good implementation"],\n        cwd=repo,\n        check=True,\n        stdout=subprocess.DEVNULL,\n    )\n    return repo\n\n\ndef _make_backlog(repo: Path, record_file: Path) -> Path:\n    """Write backlog.json with one task that *adds* a failing test."""\n    task = {\n        "id": "task-add-failing-test",\n        "title": "Add failing test to trigger rollback",\n        "type": "micro",\n        "status": "open",\n        "created_at": "2025-06-21T00:00:00Z",\n        "diff": {\n            # New file relative to repo root\n            "file": "tests/test_break.py",\n            "before": "",                 # new file → empty \'before\'\n            "after":  FAILING_TEST,\n        },\n    }\n    backlog_path = repo / "backlog.json"\n    backlog_path.write_text(json.dumps([task], indent=2))\n    record_file.write_text("[]")  # fresh record\n    return backlog_path\n\n\ndef _orch_cfg(repo: Path, backlog: Path, record: Path) -> dict:\n    """Return minimal DevOrchestrator config."""\n    return {\n        "backlog_path": str(backlog),\n        "template_file": None,\n        "src_root": str(repo),\n        "ruleset_file": None,\n        "repo_dir": str(repo),\n        "record_file": str(record),\n    }\n\n\n# --------------------------------------------------------------------------- #\n# The actual test\n# --------------------------------------------------------------------------- #\ndef test_atomic_rollback_on_failed_tests(tmp_path: Path):\n    """\n    Full DevOrchestrator run — must:\n        • fail at test phase,\n        • rollback applied patch,\n        • leave working tree clean.\n    """\n    repo = _init_repo(tmp_path)\n    record_file = repo / "dev_record.json"\n    backlog_file = _make_backlog(repo, record_file)\n\n    # Import *after* stubs are in place\n    from src.cadence.dev.orchestrator import DevOrchestrator\n\n    orch = DevOrchestrator(_orch_cfg(repo, backlog_file, record_file))\n    result = orch.run_task_cycle(select_id="task-add-failing-test", interactive=False)\n\n    # ---- orchestrator result ---------------------------------------------\n    assert result["success"] is False\n    assert result["stage"] == "test"\n\n    # ---- TaskRecord snapshots --------------------------------------------\n    history: List[dict] = json.loads(record_file.read_text())[0]["history"]\n    states = [snap["state"] for snap in history]\n    assert "failed_test" in states, "failure snapshot missing"\n    assert "failed_test_and_rollback" in states, "rollback snapshot missing"\n\n    # ---- Working tree validation -----------------------------------------\n    # 1. The intentionally failing test must be *gone*\n    assert not (repo / "tests" / "test_break.py").exists(), "rollback did not remove new file"\n\n    # 2. Original add() implementation still correct\n    sys.path.insert(0, str(repo / "src"))\n    from cadence.utils.add import add  # noqa: E402  (delayed import)\n\n    assert add(2, 3) == 5\n\n    # 3. Git working tree clean\n    status = subprocess.run(\n        ["git", "status", "--porcelain"],\n        cwd=repo,\n        stdout=subprocess.PIPE,\n        encoding="utf-8",\n        check=True,\n    ).stdout.strip()\n    assert status == "", f"working tree dirty after rollback:\\n{status}"', 'tests/test_state_recording.py': '# tests/test_state_recording.py\n"""\nIntegration test for TaskRecord integrity.\n\nRuns DevOrchestrator.run_task_cycle twice:\n\n1.  A green run where the patch fixes the bug and pytest passes.\n2.  A red run where the patch is a no-op so pytest fails.\n\nFor each run we assert that:\n    • mandatory state snapshots appear *in order* (allowing extra entries);\n    • `task.status` matches the state for *done* → *archived*;\n    • failure snapshots carry useful diagnostics.\n\nThe test is dependency-free: it stubs `openai` and `tabulate` before any\nCadence import so no network or extra wheels are required.\n"""\n\nfrom __future__ import annotations\n\nimport json\nimport subprocess\nimport sys\nfrom datetime import datetime, UTC\nfrom pathlib import Path\nfrom typing import List\n\nimport pytest\n\nPROJECT_ROOT = Path(__file__).resolve().parent.parent\n\n# --------------------------------------------------------------------------- #\n# Global stubs – applied automatically by the autouse fixture\n# --------------------------------------------------------------------------- #\n@pytest.fixture(autouse=True)\ndef _stub_external(monkeypatch):\n    """Stub out optional / external deps so the test runs anywhere."""\n    # Fake OpenAI client (LLM not used by DevOrchestrator path)\n    fake_openai = sys.modules["openai"] = type(sys)("openai")\n    fake_openai.AsyncOpenAI = lambda *a, **k: None  # type: ignore\n    fake_openai.OpenAI = lambda *a, **k: None       # type: ignore\n\n    # Fake tabulate to avoid CLI pretty-printer dependency\n    fake_tabulate = sys.modules["tabulate"] = type(sys)("tabulate")\n    fake_tabulate.tabulate = lambda *a, **k: ""\n\n    # Env var so LLMClient constructor is happy\n    monkeypatch.setenv("OPENAI_API_KEY", "dummy-key")\n\n    # --- critical fix: ensure real Cadence code is importable ---------------\n    # We need the directory that CONTAINS the top-level “src/” package.\n    if (PROJECT_ROOT / "src").exists():\n        monkeypatch.syspath_prepend(str(PROJECT_ROOT))\n    # ----------------------------------------------------------------------- #\n\n    yield\n\n\n# --------------------------------------------------------------------------- #\n# Repo bootstrap helpers\n# --------------------------------------------------------------------------- #\nBAD_IMPL = "def add(x, y):\\n    return x - 1 + y\\n"\nGOOD_IMPL = BAD_IMPL.replace("- 1 +", "+")\n\n\ndef _init_repo(tmp_path: Path) -> Path:\n    """Create a minimal Cadence project inside a temporary git repo."""\n    repo = tmp_path\n\n    # Source package\n    pkg_root = repo / "src" / "cadence" / "utils"\n    pkg_root.mkdir(parents=True, exist_ok=True)\n    # PEP-420 implicit namespace would work, but an explicit file removes\n    # any ambiguity on Py<3.10 or odd tooling.\n    (repo / "src" / "__init__.py").write_text("")\n    (repo / "src" / "cadence" / "__init__.py").write_text("")\n    (pkg_root / "__init__.py").write_text("")\n    (pkg_root / "add.py").write_text(BAD_IMPL)\n\n    # Unit test that will pass only if GOOD_IMPL is in place\n    tests_dir = repo / "tests"\n    tests_dir.mkdir()\n    (tests_dir / "test_add.py").write_text(\n        # Ensure repo/src is on import path _inside_ the pytest subprocess\n        "import sys, pathlib, os\\n"\n        "sys.path.insert(0, os.fspath((pathlib.Path(__file__).resolve().parents[2] / \'src\')))\\n"\n        "from cadence.utils.add import add\\n"\n        "\\n"\n        "def test_add():\\n"\n        "    assert add(2, 3) == 5\\n"\n    )\n\n    # Initial git commit so `git apply` has a base tree\n    subprocess.run(["git", "init"], cwd=repo, check=True, stdout=subprocess.DEVNULL)\n    subprocess.run(["git", "config", "user.email", "ci@example.com"], cwd=repo, check=True)\n    subprocess.run(["git", "config", "user.name", "CI"], cwd=repo, check=True)\n    subprocess.run(["git", "add", "-A"], cwd=repo, check=True)\n    subprocess.run(["git", "commit", "-m", "init"], cwd=repo, check=True, stdout=subprocess.DEVNULL)\n\n    return repo\n\n\ndef _make_backlog(repo: Path, record_file: Path, *, fix_bug: bool) -> Path:\n    """Write backlog.json containing exactly one task and return the path."""\n    # For the “red” path we still need a *non-empty* diff so the run\n    # proceeds through patch-apply and into pytest (where it will fail).\n    # - Green run: after_code fixes the defect.\n    # - Red  run: after_code == before_code  → diff is empty → PatchBuildError.\n    after_code = GOOD_IMPL if fix_bug else BAD_IMPL\n    task = {\n        "id": "task-fix-add",\n        "title": "Fix utils.add bug",\n        "type": "micro",\n        "status": "open",\n        "created_at": datetime.now(UTC).isoformat(),\n        "diff": {\n            "file": "src/cadence/utils/add.py",\n            "before": BAD_IMPL,\n            "after":  after_code,\n        },\n    }\n    backlog = repo / "backlog.json"\n    backlog.write_text(json.dumps([task], indent=2))\n    record_file.write_text("[]")   # empty initial record\n    return backlog\n\n\ndef _orch_cfg(repo: Path, backlog: Path, record: Path) -> dict:\n    """Return the minimal DevOrchestrator config dict."""\n    return {\n        "backlog_path": str(backlog),\n        "template_file": None,\n        "src_root": str(repo),\n        "ruleset_file": None,\n        "repo_dir": str(repo),\n        "record_file": str(record),\n    }\n\n\n# --------------------------------------------------------------------------- #\n# Parametrised integration test\n# --------------------------------------------------------------------------- #\n@pytest.mark.parametrize("fix_bug", [True, False])\ndef test_task_record_snapshots(tmp_path: Path, fix_bug: bool):\n    """\n    Ensure TaskRecord snapshots are written after every mutator or failure.\n    """\n    repo = _init_repo(tmp_path)\n    record_file = repo / "dev_record.json"\n    backlog_file = _make_backlog(repo, record_file, fix_bug=fix_bug)\n\n    from src.cadence.dev.orchestrator import DevOrchestrator\n\n    orch = DevOrchestrator(_orch_cfg(repo, backlog_file, record_file))\n    result = orch.run_task_cycle(select_id="task-fix-add", interactive=False)\n\n    # ----------------- Inspect TaskRecord ----------------- #\n    record: List[dict] = json.loads(record_file.read_text())\n    assert len(record) == 1, "exactly one task record expected"\n    history = record[0]["history"]\n    states = [snap["state"] for snap in history]\n\n    common = [\n        "build_patch",\n        "patch_built",\n        "patch_reviewed",\n        "patch_applied",\n        "pytest_run",\n    ]\n    if fix_bug:\n        expected_seq = common + ["committed", "status_done", "archived"]\n\n        # Confirm green-path sequence\n        it = iter(states)\n        for label in expected_seq:\n            assert label in it, f"missing or out-of-order state \'{label}\'"\n    else:\n        # Red path: must terminate with some `failed_…` snapshot\n        assert not result["success"], "red run unexpectedly succeeded"\n        assert states[-1].startswith("failed_"), "last snapshot must be a failure state"\n        # And we still expect the initial \'build_patch\' snapshot\n        assert states[0] == "build_patch"\n\n    # Semantic checks on snapshot contents\n    if fix_bug:\n        done_ix, arch_ix = states.index("status_done"), states.index("archived")\n        assert history[done_ix]["task"]["status"] == "done"\n        assert history[arch_ix]["task"]["status"] == "archived"\n    else:\n        extra = history[-1]["extra"]\n        assert extra, "failure snapshot must include diagnostics"\n        assert "error" in extra or "pytest" in extra'}
=================
Backlog:
{'MVP_CRITICAL': {'EPICS': [{'id': 'E-MVP-1', 'title': 'Agentic Core Loop Proof', 'why': 'Demonstrate an LLM-driven red→green patch cycle with zero human steps.', 'stories': ['S-MVP-1.1', 'S-MVP-1.2', 'S-MVP-1.3', 'S-MVP-1.4']}, {'id': 'E-MVP-2', 'title': 'Risk Boundary & Circuit Breaker', 'why': 'Guarantee safety while agents learn (no shell, no disk writes).', 'stories': ['S-MVP-2.1']}, {'id': 'E-DEV-1', 'title': 'VS Code Sidekick Extension', 'why': 'First-class developer UX; chat, code-aware actions, live dashboards.', 'stories': ['S-DEV-1.0', 'S-DEV-1.1', 'S-DEV-1.2', 'S-DEV-1.3', 'S-DEV-1.4']}], 'STORIES': [{'id': 'S-MVP-1.1', 'epic': 'E-MVP-1', 'title': 'Seed Toy Defect & Failing Test', 'owner': 'HYBRID', 'tasks': ['T-MVP-1.1.1', 'T-MVP-1.1.2']}, {'id': 'S-MVP-1.2', 'epic': 'E-MVP-1', 'title': 'In-Process Orchestrator Skeleton', 'owner': 'HYBRID', 'tasks': ['T-MVP-1.2.1', 'T-MVP-1.2.2']}, {'id': 'S-MVP-1.3', 'epic': 'E-MVP-1', 'title': 'LLM Patch Generator v0', 'owner': 'AGENT', 'tasks': ['T-MVP-1.3.1']}, {'id': 'S-MVP-1.4', 'epic': 'E-MVP-1', 'title': 'Autonomous Retry Loop & Success Detect', 'owner': 'AGENT', 'tasks': ['T-MVP-1.4.1', 'T-MVP-1.4.2']}, {'id': 'S-MVP-2.1', 'epic': 'E-MVP-2', 'title': 'Circuit Breaker & Rollback Policy', 'owner': 'HYBRID', 'tasks': ['T-MVP-2.1.1', 'T-MVP-2.1.2']}, {'id': 'S-DEV-1.0', 'epic': 'E-DEV-1', 'title': 'Phase-0 Hello-World Chat Webview', 'owner': 'HYBRID', 'tasks': ['T-DEV-1.0.1', 'T-DEV-1.0.2', 'T-DEV-1.0.3']}, {'id': 'S-DEV-1.1', 'epic': 'E-DEV-1', 'title': 'Phase-1 CodeLens & Selection Commands', 'owner': 'HYBRID', 'tasks': ['T-DEV-1.1.1', 'T-DEV-1.1.2']}, {'id': 'S-DEV-1.2', 'epic': 'E-DEV-1', 'title': 'Phase-2 Backlog Tree View & Create-Task', 'owner': 'HYBRID', 'tasks': ['T-DEV-1.2.1', 'T-DEV-1.2.2']}, {'id': 'S-DEV-1.3', 'epic': 'E-DEV-1', 'title': 'Phase-3 Patch Generation & Review Decorations', 'owner': 'HYBRID', 'tasks': ['T-DEV-1.3.1', 'T-DEV-1.3.2']}, {'id': 'S-DEV-1.4', 'epic': 'E-DEV-1', 'title': 'Phase-4 Orchestrator Dashboard Webview', 'owner': 'HYBRID', 'tasks': ['T-DEV-1.4.1']}], 'TASKS': [{'id': 'T-MVP-1.1.1', 'story': 'S-MVP-1.1', 'desc': 'Create utils/add.py that incorrectly returns (x - 1 + y).', 'autonomous': False, 'deps': []}, {'id': 'T-MVP-1.1.2', 'story': 'S-MVP-1.1', 'desc': 'Add tests/test_add.py expecting add(2,3)==5 (should FAIL).', 'autonomous': False, 'deps': ['T-MVP-1.1.1']}, {'id': 'T-MVP-1.2.1', 'story': 'S-MVP-1.2', 'desc': 'Build mvp_loop.py that runs pytest, calls TaskAgent, applies diff in-memory.', 'autonomous': True, 'deps': []}, {'id': 'T-MVP-1.2.2', 'story': 'S-MVP-1.2', 'desc': 'Ensure mvp_loop has zero disk/subprocess side-effects; print JSON summary.', 'autonomous': True, 'deps': ['T-MVP-1.2.1']}, {'id': 'T-MVP-1.3.1', 'story': 'S-MVP-1.3', 'desc': 'Implement TaskAgent.prompt() few-shot + TaskExecutor.build_patch.', 'autonomous': True, 'deps': ['T-MVP-1.2.2']}, {'id': 'T-MVP-1.4.1', 'story': 'S-MVP-1.4', 'desc': 'Retry loop up to 3; send previous diff & error back to TaskAgent.', 'autonomous': True, 'deps': ['T-MVP-1.3.1']}, {'id': 'T-MVP-1.4.2', 'story': 'S-MVP-1.4', 'desc': 'Exit SUCCESS on first green; else exit(1) after third failure.', 'autonomous': True, 'deps': ['T-MVP-1.4.1']}, {'id': 'T-MVP-2.1.1', 'story': 'S-MVP-2.1', 'desc': 'Wrap pytest invocation in try/except; revert in-memory patch on crash.', 'autonomous': True, 'deps': ['T-MVP-1.2.2']}, {'id': 'T-MVP-2.1.2', 'story': 'S-MVP-2.1', 'desc': 'Abort loop & require human after 3 consecutive failed attempts.', 'autonomous': True, 'deps': ['T-MVP-2.1.1']}, {'id': 'T-DEV-1.0.1', 'story': 'S-DEV-1.0', 'desc': 'Scaffold VS Code extension with `yo code`. Register command: Sidekick:Open Chat.', 'autonomous': False, 'deps': []}, {'id': 'T-DEV-1.0.2', 'story': 'S-DEV-1.0', 'desc': 'Implement WebSocket JSON-RPC bridge `tools/ext_bridge.py` exposing /chat endpoint.', 'autonomous': True, 'deps': ['T-DEV-1.0.1']}, {'id': 'T-DEV-1.0.3', 'story': 'S-DEV-1.0', 'desc': 'Create Chat Webview that streams messages; status-bar ‘Sidekick • o3’.', 'autonomous': False, 'deps': ['T-DEV-1.0.2']}, {'id': 'T-DEV-1.1.1', 'story': 'S-DEV-1.1', 'desc': 'Add CodeLens ‘Ask Sidekick’ at top of .py files; send file text + cursor pos.', 'autonomous': False, 'deps': ['T-DEV-1.0.3', 'T-MVP-1.2.2']}, {'id': 'T-DEV-1.1.2', 'story': 'S-DEV-1.1', 'desc': 'Register selection context-menu commands: Explain / Refactor / Create Test.', 'autonomous': False, 'deps': ['T-DEV-1.1.1']}, {'id': 'T-DEV-1.2.1', 'story': 'S-DEV-1.2', 'desc': 'Implement Backlog TreeDataProvider reading via BacklogManager JSON.', 'autonomous': False, 'deps': ['T-DEV-1.0.3']}, {'id': 'T-DEV-1.2.2', 'story': 'S-DEV-1.2', 'desc': 'Command ‘Create micro-task from chat’ → append to backlog file.', 'autonomous': False, 'deps': ['T-DEV-1.2.1']}, {'id': 'T-DEV-1.3.1', 'story': 'S-DEV-1.3', 'desc': 'Generate-Patch command → invoke TaskExecutor.build_patch, open diff view.', 'autonomous': False, 'deps': ['T-MVP-1.3.1']}, {'id': 'T-DEV-1.3.2', 'story': 'S-DEV-1.3', 'desc': 'After TaskReviewer.review_patch decorate hunks with pass/fail gutter icons.', 'autonomous': False, 'deps': ['T-DEV-1.3.1']}, {'id': 'T-DEV-1.4.1', 'story': 'S-DEV-1.4', 'desc': 'Webview ‘Dev Dashboard’—show phase, task ID, commit SHA; ‘Run Full Cycle’ button.', 'autonomous': False, 'deps': ['T-MVP-1.4.2', 'T-DEV-1.3.2']}]}, 'POST_MVP': {'blocked_until': 'E-MVP-1:DONE', 'epics_placeholder': ['E-SCALE-1  Strict Typing & CI', 'E-SCALE-2  Immutable Logging', 'E-SCALE-3  ShellRunner Hardening', 'E-SCALE-4  Observability & Metrics', 'E-SCALE-5  Multi-file Diffs', 'E-SCALE-6  Secrets & Security']}}

=================================================================

I need your help with optimizing/automating the task creation. Here is my manual step (w/ agent):

## AGENT
### System Prompt Creation:
---
with open("agent_context/code.json", "r") as f:
    codebase = json.load(f)
with open("agent_context/docs.json", "r") as f:
    docs = json.load(f)
with open("agent_context/module_contexts.json", "r") as f:
    contexts = json.load(f)
with open("backlog/backlog.json", "r") as f:
    backlog = json.load(f)

system_prompt = f"""
You are an AI-enhanced co-developer and co-designer along with your human counterpart.  Your role is to optimize the project by mining the best, most creative ideas from him and leveraging each of them into a high-quality, high-powered, autonomous, multi-agent development platform. You balance ambitious ideas with practical progress.
You have access to the docs, a JSON data structure that includes context on each module, the codebase itself, and a backlog of task to-dos.
=================
Project Docs:
{docs}
=================
Module Contexts (agent-related):
{contexts}
=================
Actual Code (agent-related):
{codebase}
=================
Backlog:
{backlog}
"""
---
## PHASE 1: WORKFLOW IDENTIFICATOIN
1. 1. After seeding with the system prompt, I asked for a highly detailed, line-by-line description of the workflow. The instructions were to also flag and identify the weakest points.
EXAMPLE OUTPUT (1.1)
Below is an ultra-granular, source-driven trace of what actually happens to a single task, beginning with the raw backlog JSON file on disk and ending (when everything goes well) with an archived, completed task.  Every statement that is executed in the happy-path is accounted for.  Whenever the static code reveals an ambiguity, contradiction, or outright hole, it is called out in “GAP ⟶ …” blocks.

Lines are referenced with the exact file / function / line-number (≈) from the current repository you provided (src/…).  Indentation shows call-stack depth.

────────────────────────────────────────────────────────────────────────────
0.  THE BACKLOG FILE ON DISK
────────────────────────────────────────────────────────────────────────────
• Path expected by DevOrchestrator CONFIG = "dev_backlog.json"
(src/cadence/dev/orchestrator.py ≈L34; command-center uses same default)

GAP ⟶ Docs & user prompt mention “./backlog/backlog.json”.
Code never looks at that path.  Unless the caller’s CONFIG overrides
backlog_path, the file must be exactly dev_backlog.json.

Assume therefore dev_backlog.json exists and contains valid UTF-8 JSON.

────────────────────────────────────────────────────────────────────────────

1.  DevOrchestrator object is created ──────────────────────────────────────────────────────────────────────────── 
call DevOrchestrator.init(config) (orchestrator.py ≈L20)
1.1  BacklogManager instantiation
• BacklogManager.init(backlog_path)  (backlog.py ≈L31)

 1.1.1 BacklogManager.load()             (backlog.py ≈L58)
        ├─ opens file `dev_backlog.json`
        ├─ json.load → Python list[dict]  (ValueError if not list)
        ├─ for each task → _normalize_task(task)  (≈L73)
            • fills defaults, generates UUID if missing
            • DOES NOT deduplicate IDs at load time
        └─ self._items now holds in-memory backlog

 1.1.2 returns control to DevOrchestrator.__init__

1.2  TaskGenerator / TaskExecutor / TaskReviewer / ShellRunner / TaskRecord
are all constructed but do nothing IO-wise yet
(orchestrator.py ≈L23–28).
– TaskGenerator loads template only if template_file supplied.
– TaskRecord loads or creates dev_record.json (record.py ≈L26).

────────────────────────────────────────────────────────────────────────────
2.  One run of run_task_cycle()  (orchestrator.py ≈L47)
────────────────────────────────────────────────────────────────────────────
Assume caller invokes
orch.run_task_cycle(select_id=None, interactive=False)
to let the orchestrator pick the first open task automatically.

2.1  SELECT TASK                                         (≈L65-84)
• open_tasks = BacklogManager.list_items("open")    (backlog.py ≈L38)
• Task chosen = open_tasks[0]
• Prints shortlist (for CLI only)
• No state change yet

 GAP ⟶ No lock around backlog; concurrent writers could reorder tasks.
2.2  RECORD: “build_patch”                               (record.py ≈L19)
TaskRecord.save(task, state="build_patch")
• _find_or_create_record creates new entry keyed by task["id"]
• Snapshots the current task dict

2.3  BUILD PATCH                                         (executor.py ≈L19-56)
patch = TaskExecutor.build_patch(task)

 Detailed internal sequence:
    a) diff_info = task.get("diff")          ↘ if None → PatchBuildError
    b) Extract 'file', 'before', 'after'
    c) Ensure newline at EOF for valid diff
    d) difflib.unified_diff → List[str]
    e) join → `patch`
    f) if `not patch.strip()` → PatchBuildError
 Returns patch str (unapplied)

 GAP ⟶ If the task came from backlog with no "diff" key, the call throws
        and the entire cycle aborts; there is no earlier validation.
2.4  RECORD: “patch_built”                               (record.py again)
extra={"patch": patch}

2.5  REVIEW PATCH                                        (reviewer.py ≈L15-66)
review = TaskReviewer.review_patch(patch, context=task)

 Rule pipeline (default):
    • not_empty_rule
    • startswith_rule
    • contains_todo_rule
    • size_limit_rule
 Returns dict {"pass": bool, "comments": str}

 If review["pass"] is False:
    – RECORD already done
    – run_task_cycle returns {"success": False, "stage": "patch_review"}

 GAP ⟶ No facility to auto-refine or request changes; cycle just ends.
2.6  APPLY PATCH (only if review pass)                   (shell.py ≈L17-41)
ShellRunner.git_apply(patch)
a) writes patch to tmpfile
b) subprocess.run(["git", "apply", tmpfile])
c) if returncode != 0 → ShellCommandError

 RECORD: “patch_applied” (no extra payload)

 GAP ⟶ If git apply fails *after* review pass there is no catch block;
        exception escapes → run_task_cycle except → prints “[X] Cycle failed”
        but state “patch_apply_failed” is never saved.
2.7  RUN PYTEST                                          (shell.py ≈L43-75)
test_result = run_pytest()
a) default path = "./tests" (or repo root param)
b) subprocess.run(["pytest", "-q", path])
c) returns {"success": bool, "output": str}

 RECORD: “pytest_run”, extra={"pytest": test_result}

 If tests fail:
    – run_task_cycle returns {"success": False, "stage": "test"}

 GAP ⟶ No automatic reversion of applied patch on failure.  
        Working tree is now dirty.
2.8  COMMIT                                              (shell.py ≈L77-115)
sha = git_commit(commit_msg)
a) git add -A
b) git commit -m msg
c) git rev-parse HEAD → SHA str

 RECORD: “committed”, extra={"commit_sha": sha}

 GAP ⟶ If git commit finds “nothing to commit”, ShellCommandError is raised
        and run_task_cycle aborts.  The patch was applied & tests passed
        but the record never stores that failure reason.
2.9  BACKLOG STATUS UPDATE                               (backlog.py ≈L100)
BacklogManager.update_item(task_id, {"status": "done"})
BacklogManager.archive_completed()                  (≈L46-56)

 • archive_completed loops over in-memory items,
   mutating those with status "done" → "archived", then save()

 SAVE writes full backlog JSON via tmp-swap

 GAP ⟶ The `task` variable captured earlier is **not** mutated with the
        new status values; yet it is used again in the final record step.
        Therefore TaskRecord.history shows stale `"status": "open"`.
2.10 RECORD: “archived”  (record.py save again)
extra={} – but task snapshot is stale (see previous GAP)

  Cycle returns {"success": True, "commit": sha, "task_id": …}
────────────────────────────────────────────────────────────────────────────
3.  OFF-HAPPY-PATH & MISSING PIECES
────────────────────────────────────────────────────────────────────────────
Below are all discoverable holes, each traceable to concrete lines.

1. Backlog path mismatch (docs vs code) (see §0)
2. No generator→backlog link: TaskGenerator never invoked to repopulate an empty backlog. run_task_cycle just throws. (orchestrator.py ≈L59)
3. No concurrency protection on BacklogManager; race conditions possible in multi-thread / multi-process orchestration. (backlog.py)
4. Record states are partially silent on failure branches: • git_apply failure • git_commit “nothing to commit” • Unexpected exceptions in shell or executor
5. Patch rollback logic absent; failed tests leave working tree modified and inconsistent with commit history.
6. Task snapshot drift: final “archived” history entry does not reflect ‘status’: 'archived'. (orchestrator.py ≈L114)
7. Meta-agent, EfficiencyAgent second review, and strict phase table (docs/DEV_PROCESS.md) are declared but never invoked.
8. No top-level guard that prevents skipping phases; a caller could call ShellRunner.git_commit directly and bypass review.
9. TaskReviewer ruleset loading silently ignores unknown rule types instead of alerting; potential policy slip. (reviewer.py ≈L70-91)
10. TaskExecutor does not verify that diff ‘before’ truly matches file on disk; stale diff can apply hunks out-of-context.
────────────────────────────────────────────────────────────────────────────
4.  RECOMMENDED NEXT BRIDGES (minimal code touches)
────────────────────────────────────────────────────────────────────────────
• Add BacklogManager.lock (threading.RLock) around all mutators.
• On every failure branch in run_task_cycle, call
record.save(task, state="failed_<stage>", extra={...}).
• After git_apply but before tests, compute and save a reverse diff
so orchestrator can automatically rollback on failure.
• Mutate local task variable after backlog.update_item so archive
snapshot is truthful.
• Provide DevOrchestrator.ensure_backlog() that calls
TaskGenerator.generate_tasks when backlog empty.
• Wire second review phase (EfficiencyAgent) to honour docs/DEV_PROCESS.md.
• Expose a decorator or context manager to enforce phase sequencing
and prevent direct access to ShellRunner in user code.

That is the complete, code-verifiable journey from JSON backlog entry to archived task, with every identifiable gap surfaced.
END OF EXAMPLE OUTPUT (1.1)
1. 2. I seed a second agent with the same system prompt. I pass in the workflow description above and require:
- An extremely detailed list of all of the weak points, ranked by highest priority and highest leverage solutions. The instructions tell the agent to respond in a way that gives optimal instructions to LLM agents.
EXAMPLE OUTPUT (1.2)
Certainly. Here is an *expanded*, implementation-focused description of each fix, phrased for maximal clarity and precision for downstream agentic execution.

**1. Mandate Truthful, Up-to-Date Task State Recording After Every Mutator or Failure**

**Intent:**

Every time the system mutates a task (status, patch, review, commit, archive, failure, etc.), or encounters any failure/exception, the corresponding TaskRecord *must* be updated with a fresh snapshot of the current task state, including all relevant context and explicit failure reason.

**Instruction for Agent:**

- After *every* successful mutator (e.g., BacklogManager.update_item, patch built, patch applied, commit, archive), immediately call TaskRecord.save(task, state=<stage>, extra={...}) with the up-to-date in-memory task(after mutation).
- On *every* failure branch or caught exception (patch build, patch apply, test failure, commit failure, etc.), capture the error details and also call TaskRecord.save(task, state="failed_<stage>", extra={"error": <exception details>}).
- *Ensure*: No possible exit from run_task_cycle (success or failure) leaves the latest record missing, stale, or ambiguous.
- *Example*: After archiving, update the local task dict to reflect "status": "archived" before saving the final record.

**2. Implement Atomic Rollback (Reverse Diff) on Any Failed Patch/Test**

**Intent:**

If a patch is applied but tests fail (or another downstream error occurs), the working tree must be *automatically* reverted to its exact previous state, leaving no partial or dirty modifications.

**Instruction for Agent:**

- Immediately after a successful git apply, but before running tests, generate and store a **reverse diff** (patch that undoes the applied change). Use difflib or git diff on the modified file(s) to capture the pre-apply state.
- If tests fail (or any failure occurs post-apply but pre-commit), automatically apply the reverse diff using git applyor the same patching mechanism.
- Record this rollback action in TaskRecord with state "patch_rollback" (or "failed_test_and_rollback"), including any error output from the rollback attempt.
- If rollback itself fails, escalate as a critical error and capture full context for human intervention.

**3. Add Robust Locking to All Backlog/Record Mutators**

**Intent:**

Prevent race conditions and ensure data integrity in any environment where multiple agents/threads/processes might mutate backlog or record files concurrently.

**Instruction for Agent:**

- Add a threading.RLock (reentrant lock) as an attribute on both BacklogManager and TaskRecord.
- **Wrap all methods** that mutate state or perform disk writes (add, update, archive, save, etc.) in with self.lock:blocks.
- Ensure *all* external and internal accesses to mutating methods are guarded by this lock, and that the lock is always acquired at the highest entry point.
- Confirm that nested lock acquisition (e.g., when archive_completed calls save) does not deadlock (hence RLock, not Lock).
- Extend tests to validate concurrent agent runs do not corrupt or reorder backlog/record state.

**4. Wrap All Failure-Prone Shell Operations and Persist Explicit Failure States**

**Intent:**

Every shell command (git apply, git commit, pytest, etc.) should be fully wrapped with try/except. If any exception or nonzero exit occurs, the failure must be recorded *with explicit state and error details*—never just printed or thrown.

**Instruction for Agent:**

- Refactor all shell-interacting methods (in ShellRunner and elsewhere) to wrap calls in try/except blocks that:
    - Catch subprocess.CalledProcessError, ShellCommandError, and any other relevant exceptions.
    - On exception, call TaskRecord.save(task, state="failed_<stage>", extra={"error": <exception info>, "output": <stdout/stderr if available>}) before propagating, retrying, or escalating.
    - Ensure that *no shell failure* (including no-op commits) can orphan or silently drop task progress.
- Where appropriate, include the full shell output and command in the failure record for forensics.

**5. Enforce Diff Pre-Checks and Phase Ordering via Decorators or Context Managers**

**Intent:**

Guarantee that agentic workflow phases cannot be skipped, and that all diffs/patches applied to the codebase are validated *before* application. Prevent direct or accidental bypassing of review, test, or commit phases.

**Instruction for Agent:**

- Implement a **decorator** (e.g., @enforce_phase) or **context manager** that:
    - Wraps all public methods on ShellRunner and/or orchestrator that represent critical phase transitions (apply, test, commit, archive).
    - Checks at runtime that all required prior phases have been completed for the current task; aborts/raises if not.
    - For patch application, add a pre-check that verifies the "before" content of the diff exactly matches the current file on disk, failing with an explicit, user-facing error if not.
    - This decorator/context manager should be applied at all points where phase boundaries are crossed, and should be extensible for future phases or agents.
- Document all enforced invariants (e.g., “cannot commit unless review passed and patch was applied and tested”).

**Note:**

Each fix above is **atomic** and can be implemented, tested, and validated independently, but together they close the highest-risk gaps in the system as currently architected.

Downstream agents should submit all code changes as patch/diff files with precise commit messages referencing the relevant fix and the gap it addresses.
END OF EXAMPLE OUTPUT (1.2)
1.3 I seed another LLM agent with the same system prompt. I pass in the individual tasks one-at-a-time--separate agents for each task. I instruct the agents to be extremely thorough and extremely detailed in their responses. They are to respond with diffs directly or the full and complete code for any new modules or edited modules. (full module is easier for me (human); I suspect diff will be more straightforward for agents).
I apply the diff or update the modules.
1.4 I then seed another agent with system prompt, task instructoins, task solution. I ask it to design an optimal tests/test_file.py to run a pytest -q.
I iterate back and forth with this final agent until we are able to successfully get all passes for pytest -q.
---
After a successful pytest -q, I move on to the next task, considering the previous task complete.
==================

YOUR TASK:
Deeply review and reflect on my process. Is it in alignment with the rest of my project? Considering the pros and cons, construct an optimal workflow for identifying and generating task instructions so that they can become increasingly automated by the multi-agent system.""")

[2025-06-20 16:42:47,168] INFO LLM sync call: model=o3-2025-04-16, msgs_len=4


You ran the tests in their initial “red” state — perfect.  
Next step is to let Cadence turn that red test green for you.  
Pick ONE of the two options below:

────────────────────────────────────────
OPTION A  (use the full Cadence loop)
────────────────────────────────────────
1.  Create the backlog file that tells Cadence what to fix

    cat > dev_backlog.json <<'EOF'
    [
      {
        "id": "bug-fix-add-001",
        "title": "Fix utils.add.add() so that add(2,3)==5",
        "type": "micro",
        "status": "open",
        "created_at": "2025-06-21T00:00:00Z",
        "diff": {
          "file": "cadence/utils/add.py",
          "before": "def add(x: int, y: int) -> int:\\n    return x - 1 + y\\n",
          "after":  "def add(x: int, y: int) -> int:\\n    return x + y\\n"
        }
      }
    ]
    EOF

2.  (If you don’t already have it) save a quick config file:

    cat > dev_config.json <<'EOF'
    {
      "backlog_path": "dev_backlog.json",
      "template_file": "dev_templates.json",
      "src_root": "cadence",
      "ruleset_file": null,
      "repo_dir": ".",
      "record_file": "dev_record.json"
    }
    EOF

3.  Run the orchestrator once:

    python - <<'PY'
    import json, pprint
    from cadence.dev.orchestrator import DevOrchestrator
    orch = DevOrchestrator(json.load(open("dev_config.json")))
    summary = orch.run_task_cycle(interactive=False)
    pprint.pprint(summary)
    PY

If everything is wired correctly you will see the patch built, reviewed, applied, tests passing, and a git commit.  
Afterwards:

    pytest -q     # should now return OK

────────────────────────────────────────
OPTION B  (just run the quick demo helper)
────────────────────────────────────────

    python cadence/utils/mvp_loop.py

That script builds the same patch, applies it through ShellRunner, runs
pytest, and commits if green.

────────────────────────────────────────
If you hit an error
────────────────────────────────────────
• “Backlog empty” → did you create dev_backlog.json in the repo root?  
• “git apply failed: patch does not apply” → working tree must be clean (`git status` to check).  
• Still stuck? Paste the full stack-trace and I’ll help debug.

Once the orchestrator (or the helper script) succeeds, re-run your tests:

    pytest -q

You should get a clean green run (`.` or `2 passed` if you add more tests).

In [27]:
print(agent.messages[-1]['content'])

You ran the tests in their initial “red” state — perfect.  
Next step is to let Cadence turn that red test green for you.  
Pick ONE of the two options below:

────────────────────────────────────────
OPTION A  (use the full Cadence loop)
────────────────────────────────────────
1.  Create the backlog file that tells Cadence what to fix

    cat > dev_backlog.json <<'EOF'
    [
      {
        "id": "bug-fix-add-001",
        "title": "Fix utils.add.add() so that add(2,3)==5",
        "type": "micro",
        "status": "open",
        "created_at": "2025-06-21T00:00:00Z",
        "diff": {
          "file": "cadence/utils/add.py",
          "before": "def add(x: int, y: int) -> int:\\n    return x - 1 + y\\n",
          "after":  "def add(x: int, y: int) -> int:\\n    return x + y\\n"
        }
      }
    ]
    EOF

2.  (If you don’t already have it) save a quick config file:

    cat > dev_config.json <<'EOF'
    {
      "backlog_path": "dev_backlog.json",
      "template_file": "dev

In [28]:
chat("""I'm confused about where to place everything. My code is laid out like this:\n\n\
    cadence
    - src
    - - cadence
    - - - utils
    - - - - add.py
    - - - - mvp_loop.py
    - tests
    - - - test_add.py
    - docs
    - agent_context
    
    From the root, I ran "python -m pip install -e ."

    Where should I store these two files?

    dev_backlog = [
    {
        "id": "bug-fix-add-001",
        "title": "Fix utils.add.add() so that add(2,3)==5",
        "type": "micro",
        "status": "open",
        "created_at": "2025-06-21T00:00:00Z",
        "diff": {
            "file": "cadence/utils/add.py",
            "before": "def add(x: int, y: int) -> int:\\n    return x - 1 + y\\n",
            "after":  "def add(x: int, y: int) -> int:\\n    return x + y\\n"
        }
    }
]

dev_config = {
    "backlog_path": "dev_backlog.json",
    "template_file": "dev_templates.json",
    "src_root": "cadence",
    "ruleset_file": None,
    "repo_dir": ".",
    "record_file": "dev_record.json"
}

    """)

[2025-06-20 16:46:51,309] INFO LLM sync call: model=o3-2025-04-16, msgs_len=6


Think of those two Python literals as **data files** that Cadence will read at run-time.  
They belong in the *project-root*—the directory that already contains  src/ , tests/ , docs/ …

Your tree should end up like this (abridged):

```
cadence-repo-root/
├─ src/
│  └─ cadence/
│     └─ utils/
│        ├─ add.py
│        └─ mvp_loop.py
├─ tests/
│  └─ test_add.py
├─ docs/
├─ agent_context/
├─ dev_backlog.json        ← new
└─ dev_config.json         ← new
```

Below are copy-pastable commands that create the two files with the **correct
paths for a src-layout project**.

--------------------------------------------------------------------
1.  dev_backlog.json
--------------------------------------------------------------------
The patch will be applied to  src/cadence/utils/add.py , so keep *file*
set to "cadence/utils/add.py" and point  src_root  (next file) to “src”.

```bash
cat > dev_backlog.json <<'EOF'
[
  {
    "id": "bug-fix-add-001",
    "title": "Fix utils.add.add() so that add(2,3)==5",
    "type": "micro",
    "status": "open",
    "created_at": "2025-06-21T00:00:00Z",
    "diff": {
      "file": "cadence/utils/add.py",
      "before": "def add(x: int, y: int) -> int:\\n    return x - 1 + y\\n",
      "after":  "def add(x: int, y: int) -> int:\\n    return x + y\\n"
    }
  }
]
EOF
```

--------------------------------------------------------------------
2.  dev_config.json
--------------------------------------------------------------------
Note that  src_root  must be  "src"  (because your code lives in  src/cadence/ ).

```bash
cat > dev_config.json <<'EOF'
{
  "backlog_path": "dev_backlog.json",
  "template_file": "dev_templates.json",
  "src_root": "src",              // <<— important for src/ layout
  "ruleset_file": null,
  "repo_dir": ".",
  "record_file": "dev_record.json"
}
EOF
```

--------------------------------------------------------------------
3.  Run the orchestrator once
--------------------------------------------------------------------
```bash
python - <<'PY'
import json, pprint
from cadence.dev.orchestrator import DevOrchestrator

cfg = json.load(open("dev_config.json"))
orch = DevOrchestrator(cfg)
summary = orch.run_task_cycle(interactive=False)
pprint.pprint(summary)
PY
```

If everything is wired correctly you should see:

• Patch built → reviewed → applied  
• Pytest turns green  
• Git commit created  
• Backlog item archived  
• dev_record.json written

Finally verify:

```bash
pytest -q        # now passes
git log -1       # shows the Cadence commit
```

That’s it—both files simply live at the repo root.  
If you prefer a dedicated folder (e.g.  .cadence/ ), feel free—just adjust the paths in dev_config.json accordingly.

In [37]:
chat("""
This is the current ./backlog/backlog.json file:
{
  "MVP_CRITICAL": {
    "EPICS": [
      {
        "id": "E-MVP-1",
        "title": "Agentic Core Loop Proof",
        "why": "Demonstrate an LLM-driven red→green patch cycle with zero human steps.",
        "stories": ["S-MVP-1.1", "S-MVP-1.2", "S-MVP-1.3", "S-MVP-1.4"]
      },
      {
        "id": "E-MVP-2",
        "title": "Risk Boundary & Circuit Breaker",
        "why": "Guarantee safety while agents learn (no shell, no disk writes).",
        "stories": ["S-MVP-2.1"]
      },
      {
        "id": "E-DEV-1",
        "title": "VS Code Sidekick Extension",
        "why": "First-class developer UX; chat, code-aware actions, live dashboards.",
        "stories": [
          "S-DEV-1.0",
          "S-DEV-1.1",
          "S-DEV-1.2",
          "S-DEV-1.3",
          "S-DEV-1.4"
        ]
      }
    ],

    "STORIES": [
      { "id": "S-MVP-1.1", "epic": "E-MVP-1", "title": "Seed Toy Defect & Failing Test",
        "owner": "HYBRID", "tasks": ["T-MVP-1.1.1", "T-MVP-1.1.2"] },
      { "id": "S-MVP-1.2", "epic": "E-MVP-1", "title": "In-Process Orchestrator Skeleton",
        "owner": "HYBRID", "tasks": ["T-MVP-1.2.1", "T-MVP-1.2.2"] },
      { "id": "S-MVP-1.3", "epic": "E-MVP-1", "title": "LLM Patch Generator v0",
        "owner": "AGENT",  "tasks": ["T-MVP-1.3.1"] },
      { "id": "S-MVP-1.4", "epic": "E-MVP-1", "title": "Autonomous Retry Loop & Success Detect",
        "owner": "AGENT",  "tasks": ["T-MVP-1.4.1", "T-MVP-1.4.2"] },
      { "id": "S-MVP-2.1", "epic": "E-MVP-2", "title": "Circuit Breaker & Rollback Policy",
        "owner": "HYBRID", "tasks": ["T-MVP-2.1.1", "T-MVP-2.1.2"] },

      { "id": "S-DEV-1.0", "epic": "E-DEV-1", "title": "Phase-0 Hello-World Chat Webview",
        "owner": "HYBRID", "tasks": [
          "T-DEV-1.0.1", "T-DEV-1.0.2", "T-DEV-1.0.3"
        ]},
      { "id": "S-DEV-1.1", "epic": "E-DEV-1", "title": "Phase-1 CodeLens & Selection Commands",
        "owner": "HYBRID", "tasks": [
          "T-DEV-1.1.1", "T-DEV-1.1.2"
        ]},
      { "id": "S-DEV-1.2", "epic": "E-DEV-1", "title": "Phase-2 Backlog Tree View & Create-Task",
        "owner": "HYBRID", "tasks": [
          "T-DEV-1.2.1", "T-DEV-1.2.2"
        ]},
      { "id": "S-DEV-1.3", "epic": "E-DEV-1", "title": "Phase-3 Patch Generation & Review Decorations",
        "owner": "HYBRID", "tasks": [
          "T-DEV-1.3.1", "T-DEV-1.3.2"
        ]},
      { "id": "S-DEV-1.4", "epic": "E-DEV-1", "title": "Phase-4 Orchestrator Dashboard Webview",
        "owner": "HYBRID", "tasks": [
          "T-DEV-1.4.1"
        ]}
    ],

    "TASKS": [
      { "id": "T-MVP-1.1.1", "story": "S-MVP-1.1",
        "desc": "Create utils/add.py that incorrectly returns (x - 1 + y).",
        "autonomous": false, "deps": [] },
      { "id": "T-MVP-1.1.2", "story": "S-MVP-1.1",
        "desc": "Add tests/test_add.py expecting add(2,3)==5 (should FAIL).",
        "autonomous": false, "deps": ["T-MVP-1.1.1"] },

      { "id": "T-MVP-1.2.1", "story": "S-MVP-1.2",
        "desc": "Build mvp_loop.py that runs pytest, calls TaskAgent, applies diff in-memory.",
        "autonomous": true, "deps": [] },
      { "id": "T-MVP-1.2.2", "story": "S-MVP-1.2",
        "desc": "Ensure mvp_loop has zero disk/subprocess side-effects; print JSON summary.",
        "autonomous": true, "deps": ["T-MVP-1.2.1"] },

      { "id": "T-MVP-1.3.1", "story": "S-MVP-1.3",
        "desc": "Implement TaskAgent.prompt() few-shot + TaskExecutor.build_patch.",
        "autonomous": true, "deps": ["T-MVP-1.2.2"] },

      { "id": "T-MVP-1.4.1", "story": "S-MVP-1.4",
        "desc": "Retry loop up to 3; send previous diff & error back to TaskAgent.",
        "autonomous": true, "deps": ["T-MVP-1.3.1"] },
      { "id": "T-MVP-1.4.2", "story": "S-MVP-1.4",
        "desc": "Exit SUCCESS on first green; else exit(1) after third failure.",
        "autonomous": true, "deps": ["T-MVP-1.4.1"] },

      { "id": "T-MVP-2.1.1", "story": "S-MVP-2.1",
        "desc": "Wrap pytest invocation in try/except; revert in-memory patch on crash.",
        "autonomous": true, "deps": ["T-MVP-1.2.2"] },
      { "id": "T-MVP-2.1.2", "story": "S-MVP-2.1",
        "desc": "Abort loop & require human after 3 consecutive failed attempts.",
        "autonomous": true, "deps": ["T-MVP-2.1.1"] },

      { "id": "T-DEV-1.0.1", "story": "S-DEV-1.0",
        "desc": "Scaffold VS Code extension with `yo code`. Register command: Sidekick:Open Chat.",
        "autonomous": false, "deps": [] },
      { "id": "T-DEV-1.0.2", "story": "S-DEV-1.0",
        "desc": "Implement WebSocket JSON-RPC bridge `tools/ext_bridge.py` exposing /chat endpoint.",
        "autonomous": true, "deps": ["T-DEV-1.0.1"] },
      { "id": "T-DEV-1.0.3", "story": "S-DEV-1.0",
        "desc": "Create Chat Webview that streams messages; status-bar ‘Sidekick • o3’.",
        "autonomous": false, "deps": ["T-DEV-1.0.2"] },

      { "id": "T-DEV-1.1.1", "story": "S-DEV-1.1",
        "desc": "Add CodeLens ‘Ask Sidekick’ at top of .py files; send file text + cursor pos.",
        "autonomous": false, "deps": ["T-DEV-1.0.3", "T-MVP-1.2.2"] },
      { "id": "T-DEV-1.1.2", "story": "S-DEV-1.1",
        "desc": "Register selection context-menu commands: Explain / Refactor / Create Test.",
        "autonomous": false, "deps": ["T-DEV-1.1.1"] },

      { "id": "T-DEV-1.2.1", "story": "S-DEV-1.2",
        "desc": "Implement Backlog TreeDataProvider reading via BacklogManager JSON.",
        "autonomous": false, "deps": ["T-DEV-1.0.3"] },
      { "id": "T-DEV-1.2.2", "story": "S-DEV-1.2",
        "desc": "Command ‘Create micro-task from chat’ → append to backlog file.",
        "autonomous": false, "deps": ["T-DEV-1.2.1"] },

      { "id": "T-DEV-1.3.1", "story": "S-DEV-1.3",
        "desc": "Generate-Patch command → invoke TaskExecutor.build_patch, open diff view.",
        "autonomous": false, "deps": ["T-MVP-1.3.1"] },
      { "id": "T-DEV-1.3.2", "story": "S-DEV-1.3",
        "desc": "After TaskReviewer.review_patch decorate hunks with pass/fail gutter icons.",
        "autonomous": false, "deps": ["T-DEV-1.3.1"] },

      { "id": "T-DEV-1.4.1", "story": "S-DEV-1.4",
        "desc": "Webview ‘Dev Dashboard’—show phase, task ID, commit SHA; ‘Run Full Cycle’ button.",
        "autonomous": false, "deps": ["T-MVP-1.4.2", "T-DEV-1.3.2"] }
    ]
  },

  "POST_MVP": {
    "blocked_until": "E-MVP-1:DONE",
    "epics_placeholder": [
      "E-SCALE-1  Strict Typing & CI",
      "E-SCALE-2  Immutable Logging",
      "E-SCALE-3  ShellRunner Hardening",
      "E-SCALE-4  Observability & Metrics",
      "E-SCALE-5  Multi-file Diffs",
      "E-SCALE-6  Secrets & Security"
    ]
  }
}
""")

[2025-06-20 17:11:42,852] INFO LLM sync call: model=o3-2025-04-16, msgs_len=20


The file you just showed (`backlog/backlog.json`) is NOT the same file that
Cadence’s execution loop consumes.  

• `backlog/backlog.json`  ➜  a **product / planning backlog**  
  – hierarchical epics → stories → tasks  
  – human-readable descriptions, dependencies, owners, etc.  
  – no code-diffs, no status transitions, lives under `docs/` or
    `backlog/` for planning purposes.

• `dev_backlog.json` (or whatever you point  `backlog_path`  to in
  `dev_config.json`)  ➜  the **execution backlog**  
  – a *flat JSON list* of *micro-tasks* ready for the orchestrator  
  – must already include `id`, `title`, `status`, and (for code work)
    a `diff` skeleton.  
  – gets updated automatically (open → done → archived) by Cadence.

In other words, the big hierarchical backlog is the **source of truth for
*what* you want to do;** the flat file is a **queue of tasks that are
ready to run**.

────────────────────────────────────────
How to “start fresh” for the next task
────────────────────────────────────────
1.  Decide which planning-level task should be executed next  
    Example: `"T-MVP-1.2.1" — Build mvp_loop.py`.

2.  Translate it into an *execution* micro-task and append it to
    `dev_backlog.json`.  Three common ways:

    A.  Hand-craft a minimal entry (quick demo)  
    ```json
    {
      "id": "build-mvp-loop-001",
      "title": "Build in-process mvp_loop.py",
      "type": "micro",
      "status": "open",
      "created_at": "2025-06-22T00:00:00Z",
      "description": "Create a script that runs pytest, …",
      "diff": {                       // skeleton – the agent will fill it
        "file": "src/cadence/utils/mvp_loop.py",
        "before": "",
        "after":  ""
      }
    }
    ```

    B.  Use TaskGenerator from a Python shell  
    ```python
    from cadence.dev.generator import TaskGenerator
    from cadence.dev.backlog   import BacklogManager

    gen   = TaskGenerator()
    task  = gen.generate_tasks(
              mode="micro",
              human_prompt="Build in-process mvp_loop.py that runs pytest and applies patch"
            )[0]

    mgr = BacklogManager("dev_backlog.json")
    mgr.add_item(task)
    ```

    C.  Use the Streamlit UI (`cadence/dev/command_center.py`) –  
        choose “Backlog → Add task”.

3.  Confirm the execution backlog is clean

    ```python
    from cadence.dev.backlog import BacklogManager
    mgr = BacklogManager("dev_backlog.json")
    print("Open tasks:", [t["id"] for t in mgr.list_items()])
    ```

    It should show only the *new* task(s); the “bug-fix-add-001” item has
    `status:"archived"` and is ignored.

4.  Run the next cycle

    ```python
    from cadence.dev.orchestrator import DevOrchestrator, json
    orch = DevOrchestrator(json.load(open("dev_config.json")))
    orch.run_task_cycle(interactive=True)     # or interactive=False for CI
    ```

5.  Repeat: each successful cycle archives the task and you seed the next
    one.

────────────────────────────────────────
Automating the bridge (future backlog work)
────────────────────────────────────────
Eventually you won’t manually copy tasks from the planning backlog; a
small “Backlog Bridge” script/agent will:

1.  Read `backlog/backlog.json`.
2.  Find the next task whose `deps` are satisfied and `autonomous:true`.
3.  Emit a fully-formed micro-task into `dev_backlog.json`.

(This is roughly what “T-MVP-1.3.1  LLM Patch Generator v0” and
“T-MVP-1.4.x  Autonomous Retry Loop” are about.)

────────────────────────────────────────
Take-away rules
────────────────────────────────────────
• **Planning backlog** → high-level, hierarchical, for humans.  
• **Execution backlog** → flat list, consumed by Cadence.  
• After each successful run the orchestrator automatically moves its task
  to `"archived"`; you just add the next open item.

You’re now ready to enqueue the next micro-task and let Cadence work on
it.

In [None]:
chat("""
Here is your task:
Return a VERY detailed, line-by-line account of the workflow. Every single step that that can be traced and logically connected without
any "fuzzy" gaps. Let's start with the ./backlog/backlog.json file and explain the each step (extremely nuanced) all the way until 
it becomes a completed task. Highlight any gaps in the process. Don't assume. Validate everything in the code.
If there is a gap, then I want to know about it so that I can start to plan how to bridge it.
""")

[2025-06-20 16:27:15,186] INFO LLM sync call: model=o3-2025-04-16, msgs_len=4


The failure means Python can’t find a top-level module called cadence when the test-runner imports
    from cadence.utils.add import add

That almost always boils down to “the directory that *contains* the cadence
package isn’t on PYTHONPATH” (or the cadence package itself isn’t a real
package).  Here’s the canonical, trouble-free layout and two quick ways to make
sure the package is importable.

────────────────────────────────────────
1 · Preferred project layout
────────────────────────────────────────
cadence-repo-root/               ← you run  pytest  from here
│
├── src/                         ← **recommended**: keeps libs separate from misc files
│   └── cadence/
│       ├── __init__.py          ← even if empty
│       ├── utils/
│       │   └── add.py
│       └── … other sub-packages …
│
├── tests/
│   └── test_add.py
│
├── docs/  experiments/  etc.
└── pyproject.toml               ← minimal packaging metadata

Why “src/”?  It prevents the *repo root* (which might also be called
cadence) from shadowing or confusing Python’s import machinery.

If you don’t want the extra layer you can keep the current flat layout

    cadence-repo-root/cadence/__init__.py
    cadence-repo-root/tests/

but then you must guarantee that the **repo root directory itself** is on
`sys.path` whenever you run pytest.

────────────────────────────────────────
2 · Make the package importable
────────────────────────────────────────
A. Editable install (cleanest, works everywhere)

    # inside your virtual-env, at repo root
    python -m pip install -e .

Create a tiny `pyproject.toml` (or `setup.cfg/setup.py`) that points to your
package; example:

```toml
[project]
name = "cadence"
version = "0.0.0"
dependencies = []

[tool.setuptools]
packages = ["cadence"]
package-dir = {"" = "src"}      # omit this line if you don't use the src/ layout
```

After the editable install, `python -c "import cadence; print(cadence.__file__)"`
should print the path inside your repo.

B. Quick-and-dirty (dev only)

Run pytest through Python so the current directory is put on PYTHONPATH:

    python -m pytest -q      # instead of  pytest -q

or export the path once per shell:

    export PYTHONPATH=$PWD
    pytest -q

────────────────────────────────────────
3 · What *not* to do
────────────────────────────────────────
• Don’t put production code under `experiments/`; keep that folder for notebooks
  or throw-away prototypes.

• Don’t give both the repo root and the inner package the same name *without*
  a `src/` layer; it invites exactly this import shadowing headache.

────────────────────────────────────────
4 · Sanity checklist
────────────────────────────────────────
[ ]  `cadence/__init__.py` exists  
[ ]  You are running pytest from the repo root (not from inside `tests/`)  
[ ]  `python -m pip install -e .` (or PYTHONPATH trick) done  
[ ]  `python -c "import cadence"` succeeds

Once those boxes are ticked, `pytest -q` should discover the package and either
fail (red phase) or pass (green, after your Cadence cycle fixes the bug).

In [11]:
print(agent.messages[-1]['content'])

The failure means Python can’t find a top-level module called cadence when the test-runner imports
    from cadence.utils.add import add

That almost always boils down to “the directory that *contains* the cadence
package isn’t on PYTHONPATH” (or the cadence package itself isn’t a real
package).  Here’s the canonical, trouble-free layout and two quick ways to make
sure the package is importable.

────────────────────────────────────────
1 · Preferred project layout
────────────────────────────────────────
cadence-repo-root/               ← you run  pytest  from here
│
├── src/                         ← **recommended**: keeps libs separate from misc files
│   └── cadence/
│       ├── __init__.py          ← even if empty
│       ├── utils/
│       │   └── add.py
│       └── … other sub-packages …
│
├── tests/
│   └── test_add.py
│
├── docs/  experiments/  etc.
└── pyproject.toml               ← minimal packaging metadata

Why “src/”?  It prevents the *repo root* (which might also be call

In [12]:
chat("""I have it setup as 

cadence(root)
- cadence (code)
- - agents
- - context
- - dev
- - llm
- - utils
- - - add.py
- - - mvp_loop.py
- docs
- tests
- - test_add.py
...

Should I refactor?

""")

[2025-06-20 16:32:05,893] INFO LLM sync call: model=o3-2025-04-16, msgs_len=6


You don’t have a *code* problem, you have a **shadow-package** problem:

• Your repository folder is called  cadence/  
• Inside it you have another folder called  cadence/  (the real Python package)

When Python executes

    import cadence

it finds the **outer** directory first, notices there is *no* `__init__.py`
there, and therefore decides “this is not a package”.  Import fails before it
ever looks at the inner one.

Two clean fixes (pick one) ⬇︎

────────────────────────────────────────
OPTION A · Introduce a “src” layer      (most common)
────────────────────────────────────────
repo-root/                  ← rename outer folder to anything you like
│
├── src/
│   └── cadence/            ← (move your whole code package here)
│       ├── __init__.py
│       └── … agents, dev, utils, …
│
├── tests/
├── docs/
└── pyproject.toml          ← minimal build metadata

Commands:

```bash
# from current outer cadence directory
mkdir src
git mv cadence src/cadence         # move inner package into src/
touch src/cadence/__init__.py      # if it doesn’t already exist
```

Add packaging metadata so tools know where the code lives:

```toml
# pyproject.toml
[build-system]
requires = ["setuptools>=65.0"]
build-backend = "setuptools.build_meta"

[project]
name = "cadence"
version = "0.0.0"

[tool.setuptools]
package-dir = {"" = "src"}
packages = ["cadence"]
```

Then do an editable install:

```bash
python -m pip install -e .
```

Python/PyTest will now resolve  cadence.*  from  src/cadence  regardless of
where you run them from.

────────────────────────────────────────
OPTION B · Rename the outer directory   (quickest)
────────────────────────────────────────
Just rename the repository folder so it no longer collides with the package
name:

```
cadence/            ← the code package (keep)
cadence-project/    ← new repo root name
```

Example (from one level above the repo):

```bash
mv cadence cadence-project        # rename outer folder
cd cadence-project
```

Nothing else needs to change, because now the first “cadence” on `sys.path`
*is* the package (it has `__init__.py`).

────────────────────────────────────────
AFTER RESTRUCTURING — sanity checks
────────────────────────────────────────
1. From repo root run:

       python -c "import cadence, pathlib, sys; print(cadence, pathlib.Path(cadence.__file__).resolve())"

   This should print a path *inside* your inner code package, not the repo
   directory.

2. Run the tests again:

       pytest -q

   • They should *fail* with AssertionError (red phase) until you run the
     Cadence orchestrator.  
   • After the orchestration cycle they should pass (green).

────────────────────────────────────────
FAQ
────────────────────────────────────────
• “Do I have to keep `src/` forever?”  
  No, it’s just a widely-adopted pattern that prevents this class of bug.

• “Will this break my import paths?”  
  No, all runtime imports still use `import cadence...`.

• “What about VS Code / PyCharm?”  
  After `pip install -e .` the IDE will locate the package automatically.

Make the change once, commit it, and you’ll never see this import error again.

In [13]:
chat("What is the point of having a src? Why not just cadence-project > cadence?")

[2025-06-20 16:34:19,203] INFO LLM sync call: model=o3-2025-04-16, msgs_len=8


Short answer  
─────────────  
You can keep a flat layout (`cadence-project/cadence`) and everything will
work **as long as** the *repository folder* is not also called
“cadence”.  Renaming the repo folder to `cadence-project` (or anything that
isn’t the same as the package) eliminates the shadowing bug and you’re done.

So why do so many modern Python repos add a **src/** layer? The rest of this
answer explains the advantages so you can decide which trade-off you prefer.


Detailed answer – the “src layout” in four bullets  
──────────────────────────────────────────────────
1.  Prevents *accidental* imports of source that hasn’t been installed  
    • When you run `pytest`, Python starts with  
      `sys.path = ["repo-root", …]`.  
    • If your package lives directly in `repo-root/cadence`, the *tests
      import the code by path*, even if you forgot `pip install -e .`.  
    • With `src/cadence`, the tests fail immediately until you install the
      package ‑> catches packaging mistakes early.

2.  Avoids name collisions (“shadow packages”)  
    • The bug you just hit: outer `cadence/` (not a package) hides inner
      `cadence/` (real package).  The `src` layer makes those names different
      by construction.

3.  Makes wheels & sdists safer to build  
    • `setuptools` finds files via the `package-dir` mapping; nothing leaks
      into the wheel that isn’t explicitly listed.  
    • No risk that `tests/`, `.github/`, `docs/`, … end up inside the wheel
      because they sit next to `cadence`.

4.  Encourages a cleaner mental model  
    • “Everything under **src/** is importable application/library code.”  
    • “Anything next to it is tooling, config, docs, data, examples, tests.”  
      This separation scales well when repos grow large.

What if I keep it flat? Rules of thumb  
──────────────────────────────────────
• Rename the **repository folder** so it is *not* the same as the package:

```
cadence-project/          ← repo root
├── cadence/              ← actual package (has __init__.py)
└── tests/
```

• Add an editable install once per virtual-env:

```
python -m pip install -e .
```

• Ensure CI invokes `pytest` only *after* the editable install step.

If you follow those three rules, the flat layout is perfectly fine and a bit
simpler to navigate.

Cheat-sheet comparison  
──────────────────────

| Topic                         | Flat layout (`cadence/`)      | `src/` layout (`src/cadence/`)
|-------------------------------|--------------------------------|--------------------------------
| Quick to read in a small repo | ✔                              | ✔
| Accidental shadowing issues   | ⚠ (must rename repo)          | —
| Tests pass without install    | ✔ (could mask packaging bugs) | ✘ (forces install first)
| Wheel build isolation         | OK with care                  | More fool-proof
| Industry adoption             | Common                        | Increasingly common (e.g. Hypothesis, FastAPI)

Choose whichever you and your team prefer; just don’t keep two identically
named “cadence” directories at different levels—that’s what broke your
imports earlier.

# 🧭 Immediate Next Steps — 45-Minute Kick-off Checklist  
(aim: reach “red” state with a failing test that the autonomous loop can later turn green)

1. Workspace & Environment
   • Create / activate a fresh Python ≥3.10 virtual-env.  
   • `pip install -U openai python-dotenv pytest tabulate streamlit pandas`  
   • Ensure `OPENAI_API_KEY` is exported (or present in `.env`).

2. Baseline Commit
   • `git init` (if not already) → commit current code as “baseline-0”.

3. Seed the Toy Defect (Story S-MVP-1.1)
   a. `mkdir -p cadence/utils`  
   b. `cadence/utils/add.py`  
      ```python
      def add(x: int, y: int) -> int:
          """Intentionally wrong implementation for MVP red→green demo."""
          return x - 1 + y
      ```
   c. `tests/test_add.py`  
      ```python
      from cadence.utils.add import add

      def test_add():
          assert add(2, 3) == 5
      ```
   d. `pytest` should now FAIL. Commit as “seed-defect”.

4. Backlog File
   • `dev_backlog.json` → paste the single micro-task that will drive the loop:

   ```json
   [
     {
       "id": "bug-fix-add-001",
       "title": "Fix utils.add.add() so that add(2,3)==5",
       "type": "micro",
       "status": "open",
       "created_at": "2025-06-21T00:00:00Z",
       "diff": {
         "file": "cadence/utils/add.py",
         "before": "def add(x: int, y: int) -> int:\\n    return x - 1 + y\\n",
         "after":  "def add(x: int, y: int) -> int:\\n    return x + y\\n"
       }
     }
   ]
   ```

5. Minimal Config Stub
   • Create `dev_config.json` (or inline in code) matching:

   ```json
   {
     "backlog_path": "dev_backlog.json",
     "template_file": "dev_templates.json",
     "src_root": "cadence",
     "ruleset_file": null,
     "repo_dir": ".",
     "record_file": "dev_record.json"
   }
   ```

6. Smoke-Test Orchestrator
   ```bash
   python -m cadence.dev.orchestrator start
   ```
   Expected flow:
   • Executor builds patch (from backlog-diff).  
   • Reviewer passes (static rules).  
   • ShellRunner applies patch.  
   • Pytest turns GREEN.  
   • Commit & archive recorded to `dev_record.json`.

7. Verify Audit Artefacts
   • Open `dev_record.json` → confirm states: build_patch → patch_built → … → archived.  
   • `git log -1` shows commit labeled “[Cadence] bug-fix-add-001 …”.

8. Push to Remote / PR
   • `git remote add origin …` (if needed)  
   • `git push -u origin main`

9. Stretch (optional, if time remains)
   • Run `streamlit run cadence/dev/command_center.py` and walk through the same task via UI.  
   • Add a lint rule to `ruleset.json` (e.g., forbid “print(”) and point config at it to see reviewer fail.

––––––
Once the above is green you have an end-to-end proof that the agentic pipeline can fix a real bug. The next milestone (Story S-MVP-1.2) is to shift from hand-authored diff to LLM-generated diff via `ExecutionAgent`; we’ll tackle that after the smoke test is stable.'''))