Skip to content

feat(prd): recursive decomposition stress test (#421)#446

Merged
frankbria merged 3 commits into
mainfrom
feature/issue-421-prd-stress-test
Mar 16, 2026
Merged

feat(prd): recursive decomposition stress test (#421)#446
frankbria merged 3 commits into
mainfrom
feature/issue-421-prd-stress-test

Conversation

@frankbria
Copy link
Copy Markdown
Owner

@frankbria frankbria commented Mar 16, 2026

Summary

Implements #421: Recursive Decomposition as PRD Stress Test and Technical Spec Generator

  • New cf prd stress-test command that recursively decomposes PRD goals using tri-state classification (atomic/composite/ambiguous)
  • Surfaces requirements gaps as actionable ambiguity reports with questions and recommendations
  • Generates a technical specification from the decomposition tree
  • Interactive mode (--interactive) resolves ambiguities inline and creates a new PRD version
  • Configurable max recursion depth (--max-depth, default 3)
  • File output (--output) writes tech spec to disk

Files Changed

  • New: codeframe/core/prd_stress_test.py — Core module with data models, prompts, recursive engine, renderers
  • Modified: codeframe/cli/app.py — Added prd stress-test CLI command
  • New: tests/core/test_prd_stress_test.py — 23 tests covering models, functions, renderers, CLI

Acceptance Criteria

  • cf prd stress-test recursively decomposes PRD goals
  • Ambiguities surfaced with actionable questions
  • Tech spec generated from decomposition tree
  • --interactive mode allows inline ambiguity resolution
  • Resolved ambiguities update the PRD (via prd.create_new_version())
  • --output writes tech spec to file
  • Max depth prevents runaway recursion (default: 3)
  • Works with PRDs generated by cf prd generate

Test Plan

  • 23 unit + integration tests (TDD approach)
  • All 1882 existing core tests still pass
  • Ruff linting clean
  • No AI artifacts (deslop scan clean)

Implementation Notes

  • Follows PrdDiscoverySession patterns for LLM interaction and task_tree.py for recursive decomposition
  • Tri-state classification (Classification enum) extends beyond task_tree.py's binary classify
  • AnthropicProvider imported in CLI (not core) to keep core provider-agnostic

Closes #421

Summary by CodeRabbit

  • New Features
    • PRD stress-test tool: recursively decomposes goals, surfaces ambiguities, and generates a Markdown technical specification and ambiguity report; supports interactive ambiguity resolution and optional file export.
  • Bug Fixes
    • Removed duplicate CLI command registration that could show redundant command entries.
  • Tests
    • Added comprehensive tests for decomposition, ambiguity handling, rendering, CLI behavior, and file output.

#421)

Add `cf prd stress-test` command that recursively decomposes PRD goals
using tri-state classification (atomic/composite/ambiguous) to surface
requirements gaps and generate a technical specification.

- New core module: `codeframe/core/prd_stress_test.py`
- CLI command with --max-depth, --output, --interactive flags
- Interactive mode resolves ambiguities and creates new PRD version
- 23 tests covering models, core functions, renderers, and CLI
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 16, 2026

Walkthrough

Adds a PRD stress-test feature: new headless core module for recursive LLM-guided decomposition with ambiguity surfacing and spec rendering, a CLI command (declared twice in the diff) to run it (including interactive resolution), plus tests and a small node-count helper.

Changes

Cohort / File(s) Summary
CLI: stress-test command
codeframe/cli/app.py
Adds prd stress-test CLI command with options --workspace, --prd-id, --max-depth, --output, --interactive. Loads PRD, requires API key, constructs provider, calls stress_test_prd, displays markdown spec and ambiguity report, optionally writes spec to file, supports interactive ambiguity resolution that can update the PRD. The command is duplicated in the diff. Includes private _count_nodes(tree) helper.
Core decomposition logic
codeframe/core/prd_stress_test.py
New module implementing recursive PRD stress-testing: Classification enum, dataclasses DecompositionNode, Ambiguity, StressTestResult; system prompt constants; functions extract_goals, classify_and_decompose, recursive_decompose, render_tech_spec, render_ambiguity_report, resolve_ambiguities_into_prd, and top-level stress_test_prd. Produces decomposition tree, ambiguity list, markdown spec, and ambiguity report.
Tests
tests/core/test_prd_stress_test.py
Comprehensive test suite with fixtures and a mocked LLM provider covering goal extraction, classification branches (atomic/composite/ambiguous), recursive decomposition and depth control, rendering, end-to-end orchestration, and CLI integration (including file output and interactive flows).

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant App as CLI App
    participant Stress as stress_test_prd()
    participant LLM as LLM Provider
    participant Render as Renderer

    User->>App: cf prd stress-test [--interactive]
    App->>Stress: stress_test_prd(prd_content, provider, max_depth)
    Stress->>LLM: extract_goals(prd_content)
    LLM-->>Stress: [goals...]

    loop per goal
        Stress->>Stress: recursive_decompose(goal, depth=0)
        loop decompose until leaf or max_depth
            Stress->>LLM: classify_and_decompose(node, lineage, prd_context)
            alt atomic
                LLM-->>Stress: ATOMIC (leaf)
            else composite
                LLM-->>Stress: COMPOSITE + children
                Stress->>Stress: recurse into children
            else ambiguous
                LLM-->>Stress: AMBIGUOUS + questions
                Stress->>Stress: collect ambiguity
            end
        end
    end

    Stress->>Render: render_tech_spec(tree, ambiguities)
    Render-->>Stress: markdown spec
    Stress->>Render: render_ambiguity_report(ambiguities)
    Render-->>Stress: ambiguity report
    Stress-->>App: StressTestResult (spec + report + tree)

    App->>User: display spec + ambiguity report
    alt --interactive
        User->>App: provide answers
        App->>LLM: resolve_ambiguities_into_prd(answers)
        LLM-->>App: updated PRD content
        App->>App: emit PRD_UPDATED event
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I hop through nodes and questions small,
I sniff out gaps and dance down each hall.
When ambiguity blooms I pause and ask,
A human replies — I carry the task.
Tech spec grows bright, the PRD wears a sash.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive The raw summary documents a critical issue: the stress-test command is declared twice in app.py, creating duplicate public entity definitions that appear unintended. Verify whether the duplicated stress-test command definition in app.py is intentional or should be removed; if unintended, remove one copy to prevent conflicts or ambiguity.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(prd): recursive decomposition stress test' clearly and concisely summarizes the main change—adding a new PRD stress-testing feature with recursive decomposition.
Linked Issues check ✅ Passed The PR implementation fully satisfies all linked issue #421 requirements: tri-state classification, recursive decomposition, ambiguity surfacing, interactive workflow, configurable max-depth, and technical specification generation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/issue-421-prd-stress-test
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

test comment

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

Deleting test comment

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

PR #446 Review - feat(prd): recursive decomposition stress test

Overall this is a clean addition that fits the Think/Build/Prove/Ship pipeline well. The core module is properly headless, the tri-state classification extends task_tree.py nicely, and test coverage is solid. A few issues worth addressing.

Bug: Patch target is wrong in CLI tests

The CLI tests patch codeframe.adapters.llm.anthropic.AnthropicProvider, but the CLI does from codeframe.adapters.llm.anthropic import AnthropicProvider, which binds the name in codeframe.cli.app namespace. The patch will not intercept it there. The correct target is @patch("codeframe.cli.app.AnthropicProvider"). Affects test_stress_test_with_prd and test_stress_test_output_flag. These may pass today only due to missing API keys, but do not reliably validate mock substitution.

Missing type hints on provider parameters

All core functions accept an untyped provider arg. adapters/llm/base.py already defines a LLMProvider Protocol - using it makes the contract explicit and catches adapter mismatches earlier: from codeframe.adapters.llm.base import LLMProvider; def extract_goals(prd_content: str, provider: LLMProvider) -> list[str]: ...

Silent failure on goal extraction

If extract_goals returns [] (JSON parse error or empty LLM response), stress_test_prd returns an empty tree silently. The CLI prints Summary: 0 goals, 0 nodes, 0 ambiguities - looks like success but nothing was analyzed. A warning when goals is empty would prevent silent no-ops.

Tech spec silently truncated at 2000 chars in CLI output

console.print(Panel(result.tech_spec_markdown[:2000], ...)) silently cuts off the spec for any reasonably sized PRD. If --output is the recommended path, indicate when truncation occurs so users know to use it.

Minor: misleading comment in test_max_depth_forces_atomic

The comment says Should NOT have called the provider then immediately hedges it away. Either add the actual assertion (mock_provider.complete.assert_not_called()) or remove the comment - the test intent is clear from the assertion alone.

Minor: missing tests for resolve_ambiguities_into_prd and resolved answer rendering

resolve_ambiguities_into_prd has no unit test, and render_ambiguity_report with a resolved answer (the Resolved: branch) is also uncovered. The interactive PRD update flow depends on both.

No concern: AnthropicProvider imported in CLI not core - Correct pattern, keeps core provider-agnostic per architecture rules.

No concern: mutable ambiguities accumulator parameter - Fine as written, always called with an explicit [] and the accumulator intent is clear from all call sites.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

Please disregard this comment - posted in error.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

PR #446 Review — feat(prd): recursive decomposition stress test

Overall this is a clean addition that fits the Think/Build/Prove/Ship pipeline well. The core module is properly headless, the tri-state classification extends task_tree.py nicely, and test coverage is solid. A few issues worth addressing:


Bug: Patch target is wrong in CLI tests

The CLI tests patch codeframe.adapters.llm.anthropic.AnthropicProvider, but the CLI does from codeframe.adapters.llm.anthropic import AnthropicProvider, which binds the name in codeframe.cli.app's namespace. The patch won't intercept it there — the correct target is:

@patch(codeframe.cli.app.AnthropicProvider)

Affects test_stress_test_with_prd and test_stress_test_output_flag. These may pass today only due to missing API keys, but don't reliably validate mock substitution.


Missing type hints on provider parameters

All core functions accept an untyped provider arg. adapters/llm/base.py already defines a LLMProvider Protocol — using it makes the contract explicit and catches adapter mismatches earlier:

from codeframe.adapters.llm.base import LLMProvider

def extract_goals(prd_content: str, provider: LLMProvider) -> list[str]: ...

Silent failure on goal extraction

If extract_goals returns [] (JSON parse error or empty LLM response), stress_test_prd returns an empty tree silently. The CLI prints Summary: 0 goals, 0 nodes, 0 ambiguities — looks like success but nothing was analyzed. A warning when goals is empty would prevent silent no-ops.


Tech spec silently truncated at 2000 chars in CLI output

console.print(Panel(result.tech_spec_markdown[:2000], ...))

This silently cuts off the spec for any reasonably sized PRD. If --output is the recommended path, indicate when truncation occurs so users know to use it.


Minor: misleading comment in test_max_depth_forces_atomic

The comment says "Should NOT have called the provider" then immediately hedges it away. Either add the actual assertion (mock_provider.complete.assert_not_called()) or remove the comment — the test intent is clear from the assertion alone.


Minor: missing tests for resolve_ambiguities_into_prd and resolved answer rendering

resolve_ambiguities_into_prd has no unit test, and render_ambiguity_report with a resolved answer (the "Resolved:" branch) is also uncovered. The interactive PRD update flow depends on both.


No concern: AnthropicProvider imported in CLI not core ✓

Correct pattern — keeps core provider-agnostic per architecture rules.

No concern: mutable ambiguities accumulator parameter ✓

Fine as written — always called with an explicit [] and the accumulator intent is clear from all call sites.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/cli/app.py`:
- Around line 1605-1609: The CLI option max_depth currently allows zero or
negative values which breaks recursive_decompose's expected invariant; add
validation at the option layer to reject values < 1. Implement a small validator
(e.g., validate_max_depth) and wire it into the typer.Option for max_depth (or
use a click/typer range type) so that passing <= 0 raises a Typer/Click
BadParameter with a clear message; keep recursive_decompose unchanged but ensure
its contract (depth >= max_depth) is respected by the CLI.
- Around line 1706-1735: After successfully creating a new PRD version with
prd_module.create_new_version (new_record), reload the updated PRD (e.g., fetch
record by new_record.id) and re-run the same generation/analysis step that
produced result so that result.tech_spec_markdown, result.tree and
result.ambiguities reflect the resolved content before printing/writing the
panel and summary; also handle the falsy create_new_version case by logging a
warning instead of silently continuing when interactive is true.

In `@codeframe/core/prd_stress_test.py`:
- Around line 253-257: The ambiguity map currently uses a.source_node_title as
the key which can collide for multiple nodes with the same title; change amb_map
to use a stable per-node unique key (e.g., a.node_id, a.source_node_path, or
a.lineage) so each ambiguity gets a distinct number and update usages that pass
amb_map into _render_spec_node (and any other call sites around lines 272-274)
to look up by that unique key instead of source_node_title; ensure the Ambiguity
objects (ambiguities) expose the chosen unique identifier and that the map
creation: amb_map = {<unique_key>: i + 1 for i, a in enumerate(ambiguities)} is
consistent with the lookup inside _render_spec_node.
- Around line 326-340: The function returns raw model output (response.content)
with no validation; add a guard after the provider.complete call that trims
whitespace and validates the returned string (e.g., non-empty, above a
configurable minimum length, and not obviously truncated—reject outputs that end
mid-sentence or with an ellipsis/partial token). If validation fails, either
retry the provider.complete call up to N times or raise/return a clear error so
downstream interactive mode doesn't accept a broken PRD; reference the local
variables response and provider.complete and the function
resolve_ambiguities_into_prd (or wherever this return occurs) when adding the
checks and retry/error logic.
- Around line 119-125: extract_goals() currently swallows JSON parse/type errors
and returns an empty list (and classify_and_decompose() similarly returns ATOMIC
on parse failure); instead either raise a clear exception with context (e.g.,
ValueError/ParseError including the raw response.content and the underlying
exception) so the run fails loudly, or return a clearly labeled synthesized
ambiguity token (e.g., ["UNPARSEABLE_MODEL_OUTPUT"] or an Ambiguity object) so
downstream logic can treat it as ambiguous; update extract_goals() and
classify_and_decompose() to implement one of these behaviors and include the raw
response content in the error/ambiguity payload for debugging.

In `@tests/core/test_prd_stress_test.py`:
- Around line 393-406: The tests patch AnthropicProvider but fail in CI because
the CLI checks ANTHROPIC_API_KEY before instantiation; update
test_stress_test_with_prd (and the other similar test at 409-426) to set a dummy
ANTHROPIC_API_KEY in the environment before calling runner.invoke — e.g., use
monkeypatch.setenv("ANTHROPIC_API_KEY","test") or
os.environ["ANTHROPIC_API_KEY"]="test" at the start of the test so the CLI
proceeds to use the patched AnthropicProvider class and the assert on
result.exit_code remains valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 289b1941-6534-41a7-99e6-9bcdab32b407

📥 Commits

Reviewing files that changed from the base of the PR and between 1d3f2d5 and efb9026.

📒 Files selected for processing (3)
  • codeframe/cli/app.py
  • codeframe/core/prd_stress_test.py
  • tests/core/test_prd_stress_test.py

Comment thread codeframe/cli/app.py
Comment thread codeframe/cli/app.py
Comment on lines +119 to +125
try:
goals = json.loads(response.content)
if isinstance(goals, list):
return [str(g) for g in goals]
except (json.JSONDecodeError, TypeError):
pass
return []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't turn provider parse failures into "no issues found".

Invalid model output currently degrades to [] in extract_goals() and ATOMIC in classify_and_decompose(). That can make a broken run look like a clean PRD with zero ambiguities. Please surface a hard failure or synthesize an explicit ambiguity instead of silently succeeding.

Also applies to: 158-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 119 - 125, extract_goals()
currently swallows JSON parse/type errors and returns an empty list (and
classify_and_decompose() similarly returns ATOMIC on parse failure); instead
either raise a clear exception with context (e.g., ValueError/ParseError
including the raw response.content and the underlying exception) so the run
fails loudly, or return a clearly labeled synthesized ambiguity token (e.g.,
["UNPARSEABLE_MODEL_OUTPUT"] or an Ambiguity object) so downstream logic can
treat it as ambiguous; update extract_goals() and classify_and_decompose() to
implement one of these behaviors and include the raw response content in the
error/ambiguity payload for debugging.

Comment thread codeframe/core/prd_stress_test.py Outdated
Comment thread codeframe/core/prd_stress_test.py Outdated
Comment thread tests/core/test_prd_stress_test.py
@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

"""PRD stress test via recursive decomposition.

Recursively decomposes PRD goals using a tri-state classification
(atomic / composite / ambiguous) to surface requirements gaps and
generate a technical specification. This is a human-facing discovery
tool — not a task generator.

This module is headless — no FastAPI or HTTP dependencies.
"""

import json
import uuid
from dataclasses import dataclass
from enum import Enum
from typing import Optional

from codeframe.adapters.llm.base import Purpose

---------------------------------------------------------------------------

Data Models

---------------------------------------------------------------------------

class Classification(str, Enum):
ATOMIC = "atomic"
COMPOSITE = "composite"
AMBIGUOUS = "ambiguous"

@DataClass
class DecompositionNode:
id: str
title: str
description: str
classification: Classification
children: list["DecompositionNode"]
lineage: list[str]
depth: int
complexity_hint: Optional[str] = None

@DataClass
class Ambiguity:
id: str
label: str
source_node_title: str
questions: list[str]
recommendation: str
resolved_answer: Optional[str] = None

@DataClass
class StressTestResult:
prd_title: str
tree: list[DecompositionNode]
ambiguities: list[Ambiguity]
tech_spec_markdown: str
ambiguity_report: str

---------------------------------------------------------------------------

Prompt Constants

---------------------------------------------------------------------------

GOAL_EXTRACTION_SYSTEM = (
"You are a requirements analyst. Given a Product Requirements Document, "
"extract the high-level deliverable goals — the major features or systems "
"that need to be built. Return ONLY a JSON array of short goal strings. "
"Example: ["User Authentication", "Invoice Management", "PDF Export"]"
)

CLASSIFY_AND_DECOMPOSE_SYSTEM = """
You are a recursive requirements decomposer. Given a goal, its context (lineage
of ancestor goals), and the original PRD, classify the goal into exactly one of:

  • "atomic": Small enough to implement directly (1-2 days of work). The PRD
    provides enough detail.
  • "composite": Clearly needs breakdown into sub-goals. The PRD provides
    enough detail to know what the pieces are.
  • "ambiguous": You CANNOT confidently classify because the PRD is missing
    critical information. You must explain what's missing.

Return a JSON object with these fields:
{
"classification": "atomic" | "composite" | "ambiguous",
"children": [{"title": "...", "description": "..."}], // only if composite
"ambiguity_label": "SHORT LABEL", // only if ambiguous
"questions": ["question 1", "question 2"], // only if ambiguous
"recommendation": "what to add to the PRD", // only if ambiguous
"complexity_hint": "Low" | "Low-Medium" | "Medium" | "High" // always
}

Return ONLY valid JSON. No markdown wrapping."""

AMBIGUITY_RESOLUTION_SYSTEM = (
"You are a PRD editor. Given the original PRD content and a set of resolved "
"ambiguities (question + answer pairs), update the PRD to incorporate the "
"new information. Preserve the original structure and tone. Return the "
"complete updated PRD content."
)

---------------------------------------------------------------------------

Core Functions

---------------------------------------------------------------------------

def extract_goals(prd_content: str, provider) -> list[str]:
"""Extract high-level deliverable goals from a PRD."""
response = provider.complete(
messages=[{"role": "user", "content": prd_content}],
purpose=Purpose.PLANNING,
system=GOAL_EXTRACTION_SYSTEM,
max_tokens=1024,
temperature=0.0,
)
try:
goals = json.loads(response.content)
if isinstance(goals, list):
return [str(g) for g in goals]
except (json.JSONDecodeError, TypeError):
pass
return []

def classify_and_decompose(
title: str,
description: str,
lineage: list[str],
prd_content: str,
depth: int,
provider,
) -> tuple[Classification, list[dict], Optional[Ambiguity], str]:
"""Classify a goal node and optionally decompose or flag ambiguity."""
lineage_ctx = ""
if lineage:
lineage_ctx = "\n\nAncestor context:\n" + "\n".join(
f"- {desc}" for desc in lineage
)

user_msg = (
    f"Goal: {title}\n"
    f"Description: {description}\n"
    f"Depth: {depth}{lineage_ctx}\n\n"
    f"PRD:\n{prd_content}"
)

response = provider.complete(
    messages=[{"role": "user", "content": user_msg}],
    purpose=Purpose.PLANNING,
    system=CLASSIFY_AND_DECOMPOSE_SYSTEM,
    max_tokens=2048,
    temperature=0.0,
)

try:
    data = json.loads(response.content)
except (json.JSONDecodeError, TypeError):
    return Classification.ATOMIC, [], None, "Low"

raw_cls = data.get("classification", "atomic").lower()
try:
    cls = Classification(raw_cls)
except ValueError:
    cls = Classification.ATOMIC

complexity = data.get("complexity_hint", "Low")
children = data.get("children", []) if cls == Classification.COMPOSITE else []

ambiguity = None
if cls == Classification.AMBIGUOUS:
    ambiguity = Ambiguity(
        id=str(uuid.uuid4()),
        label=data.get("ambiguity_label", "UNSPECIFIED"),
        source_node_title=title,
        questions=data.get("questions", []),
        recommendation=data.get("recommendation", ""),
    )

return cls, children, ambiguity, complexity

def recursive_decompose(
title: str,
description: str,
lineage: list[str],
prd_content: str,
depth: int,
max_depth: int,
ambiguities: list[Ambiguity],
provider,
) -> DecompositionNode:
"""Recursively decompose a goal, collecting ambiguities along the way."""
# Force leaf at max depth
if depth >= max_depth:
return DecompositionNode(
id=str(uuid.uuid4()),
title=title,
description=description,
classification=Classification.ATOMIC,
children=[],
lineage=lineage,
depth=depth,
complexity_hint="Unknown",
)

cls, child_dicts, ambiguity, complexity = classify_and_decompose(
    title, description, lineage, prd_content, depth, provider,
)

if ambiguity:
    ambiguities.append(ambiguity)

children = []
if cls == Classification.COMPOSITE:
    child_lineage = lineage + [title]
    for child_dict in child_dicts:
        child_node = recursive_decompose(
            child_dict.get("title", "Untitled"),
            child_dict.get("description", ""),
            child_lineage,
            prd_content,
            depth + 1,
            max_depth,
            ambiguities,
            provider,
        )
        children.append(child_node)

return DecompositionNode(
    id=str(uuid.uuid4()),
    title=title,
    description=description,
    classification=cls,
    children=children,
    lineage=lineage,
    depth=depth,
    complexity_hint=complexity,
)

---------------------------------------------------------------------------

Renderers

---------------------------------------------------------------------------

def render_tech_spec(
tree: list[DecompositionNode], ambiguities: list[Ambiguity]
) -> str:
"""Render the decomposition tree as a markdown technical specification."""
amb_map = {a.source_node_title: i + 1 for i, a in enumerate(ambiguities)}
lines = ["# Technical Specification\n"]

for node in tree:
    _render_spec_node(node, lines, amb_map, header_level=2)

return "\n".join(lines)

def _render_spec_node(
node: DecompositionNode,
lines: list[str],
amb_map: dict[str, int],
header_level: int,
) -> None:
"""Recursively render a node into the tech spec."""
prefix = "#" * min(header_level, 6)
lines.append(f"{prefix} {node.title}")

if node.classification == Classification.AMBIGUOUS:
    amb_num = amb_map.get(node.title, "?")
    lines.append(f"**[NEEDS CLARIFICATION — see ambiguity #{amb_num}]**\n")
elif node.classification == Classification.ATOMIC:
    lines.append(f"- {node.description}")
    if node.complexity_hint:
        lines.append(f"- Estimated complexity: {node.complexity_hint}")
    lines.append("")
else:
    lines.append("")

for child in node.children:
    _render_spec_node(child, lines, amb_map, header_level + 1)

def render_ambiguity_report(ambiguities: list[Ambiguity]) -> str:
"""Render the ambiguity list as a human-readable report."""
if not ambiguities:
return "No ambiguities found — PRD is well-specified for decomposition."

lines = [
    f"PRD Stress Test — {len(ambiguities)} ambiguities found:\n",
]

for i, amb in enumerate(ambiguities, 1):
    lines.append(
        f"{i}. {amb.label} (from decomposing \"{amb.source_node_title}\")"
    )
    lines.append("   The PRD doesn't specify:")
    for q in amb.questions:
        lines.append(f"   - {q}")
    lines.append(f"   → Recommendation: {amb.recommendation}")
    if amb.resolved_answer:
        lines.append(f"   ✓ Resolved: {amb.resolved_answer}")
    lines.append("")

return "\n".join(lines)

def resolve_ambiguities_into_prd(
prd_content: str,
ambiguities: list[Ambiguity],
provider,
) -> str:
"""Use LLM to update PRD content with resolved ambiguity answers."""
resolved = [a for a in ambiguities if a.resolved_answer]
if not resolved:
return prd_content

resolution_text = "\n".join(
    f"- {a.label}: {', '.join(a.questions)} → Answer: {a.resolved_answer}"
    for a in resolved
)

response = provider.complete(
    messages=[{
        "role": "user",
        "content": (
            f"Original PRD:\n{prd_content}\n\n"
            f"Resolved ambiguities:\n{resolution_text}\n\n"
            "Update the PRD to incorporate these answers."
        ),
    }],
    purpose=Purpose.PLANNING,
    system=AMBIGUITY_RESOLUTION_SYSTEM,
    max_tokens=8192,
    temperature=0.0,
)
return response.content

---------------------------------------------------------------------------

Top-level Orchestrator

---------------------------------------------------------------------------

def stress_test_prd(
prd_content: str, provider, max_depth: int = 3
) -> StressTestResult:
"""Run the full PRD stress test: extract goals → recursive decompose → render."""
goals = extract_goals(prd_content, provider)

tree: list[DecompositionNode] = []
ambiguities: list[Ambiguity] = []

for goal in goals:
    node = recursive_decompose(
        title=goal,
        description=goal,
        lineage=[],
        prd_content=prd_content,
        depth=0,
        max_depth=max_depth,
        ambiguities=ambiguities,
        provider=provider,
    )
    tree.append(node)

tech_spec = render_tech_spec(tree, ambiguities)
amb_report = render_ambiguity_report(ambiguities)

# Extract title from PRD (first heading or first line)
prd_title = "Untitled"
for line in prd_content.splitlines():
    stripped = line.strip()
    if stripped.startswith("# "):
        prd_title = stripped[2:].strip()
        break
    if stripped:
        prd_title = stripped[:80]
        break

return StressTestResult(
    prd_title=prd_title,
    tree=tree,
    ambiguities=ambiguities,
    tech_spec_markdown=tech_spec,
    ambiguity_report=amb_report,
)

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

Please disregard - posted in error.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

PR #446 Review — feat(prd): recursive decomposition stress test

Overall this is a clean addition that fits the Think/Build/Prove/Ship pipeline well. The core module is properly headless, the tri-state classification extends task_tree.py nicely, and test coverage is solid. A few issues worth addressing:


Bug: Patch target is wrong in CLI tests

The CLI tests patch codeframe.adapters.llm.anthropic.AnthropicProvider, but the CLI does from codeframe.adapters.llm.anthropic import AnthropicProvider, which binds the name in codeframe.cli.app's namespace. The patch won't intercept it there — the correct target is:

@patch("codeframe.cli.app.AnthropicProvider")

Affects test_stress_test_with_prd and test_stress_test_output_flag. These may pass today only due to missing API keys, but don't reliably validate mock substitution.


Missing type hints on provider parameters

All core functions accept an untyped provider arg. adapters/llm/base.py already defines a LLMProvider Protocol — using it makes the contract explicit and catches adapter mismatches earlier:

from codeframe.adapters.llm.base import LLMProvider

def extract_goals(prd_content: str, provider: LLMProvider) -> list[str]: ...

Silent failure on goal extraction

If extract_goals returns [] (JSON parse error or empty LLM response), stress_test_prd returns an empty tree silently. The CLI prints Summary: 0 goals, 0 nodes, 0 ambiguities — looks like success but nothing was analyzed. A warning when goals is empty would prevent silent no-ops.


Tech spec silently truncated at 2000 chars in CLI output

console.print(Panel(result.tech_spec_markdown[:2000], ...))

This silently cuts off the spec for any reasonably sized PRD. If --output is the recommended path, indicate when truncation occurs so users know to use it.


Minor: misleading comment in test_max_depth_forces_atomic

The comment says "Should NOT have called the provider" then immediately hedges it away. Either add the actual assertion (mock_provider.complete.assert_not_called()) or remove the comment — the test intent is clear from the assertion alone.


Minor: missing tests for resolve_ambiguities_into_prd and resolved answer rendering

resolve_ambiguities_into_prd has no unit test, and render_ambiguity_report with a resolved answer (the "Resolved:" branch) is also uncovered. The interactive PRD update flow depends on both.


No concern: AnthropicProvider imported in CLI not core ✓

Correct pattern — keeps core provider-agnostic per architecture rules.

No concern: mutable ambiguities accumulator parameter ✓

Fine as written — always called with an explicit [] and the accumulator intent is clear from all call sites.

- Validate --max-depth range (1-10) at CLI option layer
- Re-run stress test after interactive PRD update for fresh results
- Log warnings on LLM parse failures instead of silent degradation
- Use ambiguity ID (not title) as unique key in tech spec cross-refs
- Validate rewritten PRD isn't truncated before accepting
- Set ANTHROPIC_API_KEY in CLI test env for proper mock isolation
@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

Code Review: PR 446 - cf prd stress-test. Overall this is a well-structured addition to the THINK phase. Full review posted separately due to formatting constraints.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

CRITICAL - Unbounded LLM fan-out: stress_test_prd can issue O(children^depth) API calls. With max_depth=3 and branching factor of 3 that is 40+ serial calls with no progress feedback and no concurrency cap. Recommend adding a max_nodes CLI flag, printing per-goal progress, and documenting expected call volume in help text.

IMPORTANT (1) - _count_nodes function is in app.py but operates on DecompositionNode - it belongs in prd_stress_test.py alongside the models it works with, and currently has no test coverage.

IMPORTANT (2) - Tech spec panel silently truncates at 2000 chars via Panel(result.tech_spec_markdown[:2000]) with no indication to the user. Add a visible truncation notice or use a pager.

IMPORTANT (3) - Interactive mode: multiple questions per ambiguity are comma-joined into a single prompt. Either prompt per question or clarify in help text that one consolidated answer per group is expected.

IMPORTANT (4) - resolve_ambiguities_into_prd uses len(updated) < len(prd_content) // 2 as a guard which will silently discard valid rewrites of short PRDs. The floor should be relative to resolved content, not a blanket 50 percent rule.

NICE TO HAVE: (a) detect empty tree from extract_goals failure and warn the user explicitly; (b) DecompositionNode.children and .lineage lack field(default_factory=list); (c) resolved_answer rendering branch and resolve_ambiguities_into_prd flow have no test coverage; (d) workspace vs tmp_path fixture usage in CLI tests needs a comment.

STRENGTHS: core module is headless with provider injected in CLI layer per CLAUDE.md conventions; tri-state classification model and prompts are well-designed and auditable; 23 tests with mocked LLM and CLI integration coverage are solid; re-running after interactive resolution is a good UX detail.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
codeframe/core/prd_stress_test.py (2)

123-130: ⚠️ Potential issue | 🟠 Major

Don’t treat unparseable model output as a successful run.

On parse failure, returning [] (Line 130) or defaulting to ATOMIC (Line 167) can hide provider failures and incorrectly report a PRD as well-specified.

💡 Suggested fix
@@
-    except (json.JSONDecodeError, TypeError) as exc:
-        logger.warning("Failed to parse goal extraction response: %s", exc)
-    return []
+    except (json.JSONDecodeError, TypeError) as exc:
+        logger.warning("Failed to parse goal extraction response: %s", exc)
+        raise ValueError("Unparseable goal extraction response") from exc
@@
-    except (json.JSONDecodeError, TypeError) as exc:
-        logger.warning("Failed to parse classification for '%s': %s", title, exc)
-        return Classification.ATOMIC, [], None, "Low"
+    except (json.JSONDecodeError, TypeError) as exc:
+        logger.warning("Failed to parse classification for '%s': %s", title, exc)
+        ambiguity = Ambiguity(
+            id=str(uuid.uuid4()),
+            label="UNPARSEABLE_MODEL_OUTPUT",
+            source_node_title=title,
+            questions=["Could not parse model decomposition output. Can you restate this requirement?"],
+            recommendation="Clarify this requirement with explicit decomposition details.",
+        )
+        return Classification.AMBIGUOUS, [], ambiguity, "Unknown"

Also applies to: 163-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 123 - 130, The current
goal-extraction code treats JSON parse failures as successful runs by returning
an empty list (variable goals) and the code later defaulting to ATOMIC; instead,
surface parse failures so the caller can mark the provider run as failed. Change
the handler for json.JSONDecodeError/TypeError (and the non-list branch) to
raise a specific exception or return None (and update call sites) rather than
returning [], and stop silently defaulting to ATOMIC; update the code paths that
inspect the goal extraction result to treat None/exception as a provider
failure. Ensure references to response.content, the goals variable, and any code
that currently defaults to ATOMIC are updated to handle the new failure signal.

347-348: ⚠️ Potential issue | 🟠 Major

Guard response.content before calling .strip().

If provider returns None content, Line 347 raises AttributeError and aborts interactive flow.

💡 Suggested fix
-    updated = response.content.strip()
+    updated = (response.content or "").strip()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 347 - 348, The code calls
response.content.strip() without guarding for None; update the logic in the
block that sets updated to first validate response.content (the variable
response from the provider) before calling .strip(). Replace the direct call
with a safe pattern (e.g., treat None as empty string or check isinstance/None
and early-handle) so that updated is computed from a non-None string and the
subsequent length check against prd_content is safe; adjust any downstream
handling to account for the empty/None case.
codeframe/cli/app.py (1)

1717-1729: ⚠️ Potential issue | 🟠 Major

Fail fast when interactive PRD version creation fails.

In interactive mode, Line 1728 only warns and continues with exit code 0. That can mislead users into thinking their resolved ambiguities were persisted.

💡 Suggested fix
-        else:
-            console.print("[yellow]Warning:[/yellow] Failed to create new PRD version.")
+        else:
+            console.print("[red]Error:[/red] Failed to create new PRD version.")
+            raise typer.Exit(1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/cli/app.py` around lines 1717 - 1729, The interactive flow
currently treats a failed PRD version creation (when new_record is falsy) as a
warning, which continues execution with exit code 0; change this to fail fast by
aborting with a non-zero exit: in the branch where new_record is false (the
block that now calls console.print("[yellow]Warning...")), raise or call a
non-zero exit (e.g., raise RuntimeError or sys.exit(1)) with a clear message so
the CLI returns failure to the caller; locate the logic around new_record,
console.print, emit_for_workspace and stress_test_prd to implement this behavior
so unresolved persistence failures in interactive mode do not proceed silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/prd_stress_test.py`:
- Around line 176-177: The code assumes children is a list of dicts when cls ==
Classification.COMPOSITE and then calls .get on each child in
recursive_decompose; to fix, validate and normalize the payload before
recursion: in the block that sets children (and in the recursive_decompose
function), check that data.get("children") is a list and replace non-list or
None with an empty list, then filter the list to only include dict items (e.g.,
[c for c in children if isinstance(c, dict)]); if any invalid items are detected
optionally log a warning or ignore them so subsequent calls to child.get(...)
are safe. Ensure you update the assignment and any recursion points that rely on
children (the children variable and the recursive_decompose function) to use the
validated/filtered list.
- Around line 114-404: Import the core event API from core.events and instrument
major state transitions: call the event emitter at start/end of stress_test_prd,
after extract_goals returns (in extract_goals), when classify_and_decompose
detects an ambiguity (in classify_and_decompose), when recursive_decompose hits
max depth and when it finishes building a node (in recursive_decompose), and
when resolve_ambiguities_into_prd produces an updated PRD (in
resolve_ambiguities_into_prd); include meaningful event names (e.g.,
"prd.stress_test.started"/".completed", "prd.goals.extracted",
"prd.ambiguity.detected", "prd.node.leaf_forced", "prd.node.decomposed",
"prd.prd.updated") and attach contextual payloads (prd_title or snippet,
goal/title, node id/title, ambiguity id/details, complexity_hint, and whether
rewrite was truncated) so auditors can correlate events — use the event API
exported by core/events.py (e.g., import emit_event or the module’s emitter) and
ensure calls are non-blocking and tolerate emitter failures (log and continue).

---

Duplicate comments:
In `@codeframe/cli/app.py`:
- Around line 1717-1729: The interactive flow currently treats a failed PRD
version creation (when new_record is falsy) as a warning, which continues
execution with exit code 0; change this to fail fast by aborting with a non-zero
exit: in the branch where new_record is false (the block that now calls
console.print("[yellow]Warning...")), raise or call a non-zero exit (e.g., raise
RuntimeError or sys.exit(1)) with a clear message so the CLI returns failure to
the caller; locate the logic around new_record, console.print,
emit_for_workspace and stress_test_prd to implement this behavior so unresolved
persistence failures in interactive mode do not proceed silently.

In `@codeframe/core/prd_stress_test.py`:
- Around line 123-130: The current goal-extraction code treats JSON parse
failures as successful runs by returning an empty list (variable goals) and the
code later defaulting to ATOMIC; instead, surface parse failures so the caller
can mark the provider run as failed. Change the handler for
json.JSONDecodeError/TypeError (and the non-list branch) to raise a specific
exception or return None (and update call sites) rather than returning [], and
stop silently defaulting to ATOMIC; update the code paths that inspect the goal
extraction result to treat None/exception as a provider failure. Ensure
references to response.content, the goals variable, and any code that currently
defaults to ATOMIC are updated to handle the new failure signal.
- Around line 347-348: The code calls response.content.strip() without guarding
for None; update the logic in the block that sets updated to first validate
response.content (the variable response from the provider) before calling
.strip(). Replace the direct call with a safe pattern (e.g., treat None as empty
string or check isinstance/None and early-handle) so that updated is computed
from a non-None string and the subsequent length check against prd_content is
safe; adjust any downstream handling to account for the empty/None case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ea99d2b3-ce26-4784-be1d-df1020ec85bd

📥 Commits

Reviewing files that changed from the base of the PR and between efb9026 and 6052191.

📒 Files selected for processing (3)
  • codeframe/cli/app.py
  • codeframe/core/prd_stress_test.py
  • tests/core/test_prd_stress_test.py

Comment on lines +114 to +404
def extract_goals(prd_content: str, provider) -> list[str]:
"""Extract high-level deliverable goals from a PRD."""
response = provider.complete(
messages=[{"role": "user", "content": prd_content}],
purpose=Purpose.PLANNING,
system=GOAL_EXTRACTION_SYSTEM,
max_tokens=1024,
temperature=0.0,
)
try:
goals = json.loads(response.content)
if isinstance(goals, list):
return [str(g) for g in goals]
logger.warning("Goal extraction returned non-list: %s", type(goals).__name__)
except (json.JSONDecodeError, TypeError) as exc:
logger.warning("Failed to parse goal extraction response: %s", exc)
return []


def classify_and_decompose(
title: str,
description: str,
lineage: list[str],
prd_content: str,
depth: int,
provider,
) -> tuple[Classification, list[dict], Optional[Ambiguity], str]:
"""Classify a goal node and optionally decompose or flag ambiguity."""
lineage_ctx = ""
if lineage:
lineage_ctx = "\n\nAncestor context:\n" + "\n".join(
f"- {desc}" for desc in lineage
)

user_msg = (
f"Goal: {title}\n"
f"Description: {description}\n"
f"Depth: {depth}{lineage_ctx}\n\n"
f"PRD:\n{prd_content}"
)

response = provider.complete(
messages=[{"role": "user", "content": user_msg}],
purpose=Purpose.PLANNING,
system=CLASSIFY_AND_DECOMPOSE_SYSTEM,
max_tokens=2048,
temperature=0.0,
)

try:
data = json.loads(response.content)
except (json.JSONDecodeError, TypeError) as exc:
logger.warning("Failed to parse classification for '%s': %s", title, exc)
return Classification.ATOMIC, [], None, "Low"

raw_cls = data.get("classification", "atomic").lower()
try:
cls = Classification(raw_cls)
except ValueError:
cls = Classification.ATOMIC

complexity = data.get("complexity_hint", "Low")
children = data.get("children", []) if cls == Classification.COMPOSITE else []

ambiguity = None
if cls == Classification.AMBIGUOUS:
ambiguity = Ambiguity(
id=str(uuid.uuid4()),
label=data.get("ambiguity_label", "UNSPECIFIED"),
source_node_title=title,
questions=data.get("questions", []),
recommendation=data.get("recommendation", ""),
)

return cls, children, ambiguity, complexity


def recursive_decompose(
title: str,
description: str,
lineage: list[str],
prd_content: str,
depth: int,
max_depth: int,
ambiguities: list[Ambiguity],
provider,
) -> DecompositionNode:
"""Recursively decompose a goal, collecting ambiguities along the way."""
# Force leaf at max depth
if depth >= max_depth:
return DecompositionNode(
id=str(uuid.uuid4()),
title=title,
description=description,
classification=Classification.ATOMIC,
children=[],
lineage=lineage,
depth=depth,
complexity_hint="Unknown",
)

cls, child_dicts, ambiguity, complexity = classify_and_decompose(
title, description, lineage, prd_content, depth, provider,
)

if ambiguity:
ambiguities.append(ambiguity)

children = []
if cls == Classification.COMPOSITE:
child_lineage = lineage + [title]
for child_dict in child_dicts:
child_node = recursive_decompose(
child_dict.get("title", "Untitled"),
child_dict.get("description", ""),
child_lineage,
prd_content,
depth + 1,
max_depth,
ambiguities,
provider,
)
children.append(child_node)

return DecompositionNode(
id=str(uuid.uuid4()),
title=title,
description=description,
classification=cls,
children=children,
lineage=lineage,
depth=depth,
complexity_hint=complexity,
ambiguity_id=ambiguity.id if ambiguity else None,
)


# ---------------------------------------------------------------------------
# Renderers
# ---------------------------------------------------------------------------


def render_tech_spec(
tree: list[DecompositionNode], ambiguities: list[Ambiguity]
) -> str:
"""Render the decomposition tree as a markdown technical specification."""
amb_map = {a.id: i + 1 for i, a in enumerate(ambiguities)}
lines = ["# Technical Specification\n"]

for node in tree:
_render_spec_node(node, lines, amb_map, header_level=2)

return "\n".join(lines)


def _render_spec_node(
node: DecompositionNode,
lines: list[str],
amb_map: dict[str, int],
header_level: int,
) -> None:
"""Recursively render a node into the tech spec."""
prefix = "#" * min(header_level, 6)
lines.append(f"{prefix} {node.title}")

if node.classification == Classification.AMBIGUOUS:
amb_num = amb_map.get(node.ambiguity_id, "?") if node.ambiguity_id else "?"
lines.append(f"**[NEEDS CLARIFICATION — see ambiguity #{amb_num}]**\n")
elif node.classification == Classification.ATOMIC:
lines.append(f"- {node.description}")
if node.complexity_hint:
lines.append(f"- Estimated complexity: {node.complexity_hint}")
lines.append("")
else:
lines.append("")

for child in node.children:
_render_spec_node(child, lines, amb_map, header_level + 1)


def render_ambiguity_report(ambiguities: list[Ambiguity]) -> str:
"""Render the ambiguity list as a human-readable report."""
if not ambiguities:
return "No ambiguities found — PRD is well-specified for decomposition."

lines = [
f"PRD Stress Test — {len(ambiguities)} ambiguities found:\n",
]

for i, amb in enumerate(ambiguities, 1):
lines.append(
f"{i}. {amb.label} (from decomposing \"{amb.source_node_title}\")"
)
lines.append(" The PRD doesn't specify:")
for q in amb.questions:
lines.append(f" - {q}")
lines.append(f" → Recommendation: {amb.recommendation}")
if amb.resolved_answer:
lines.append(f" ✓ Resolved: {amb.resolved_answer}")
lines.append("")

return "\n".join(lines)


def resolve_ambiguities_into_prd(
prd_content: str,
ambiguities: list[Ambiguity],
provider,
) -> str:
"""Use LLM to update PRD content with resolved ambiguity answers."""
resolved = [a for a in ambiguities if a.resolved_answer]
if not resolved:
return prd_content

resolution_text = "\n".join(
f"- {a.label}: {', '.join(a.questions)} → Answer: {a.resolved_answer}"
for a in resolved
)

response = provider.complete(
messages=[{
"role": "user",
"content": (
f"Original PRD:\n{prd_content}\n\n"
f"Resolved ambiguities:\n{resolution_text}\n\n"
"Update the PRD to incorporate these answers."
),
}],
purpose=Purpose.PLANNING,
system=AMBIGUITY_RESOLUTION_SYSTEM,
max_tokens=8192,
temperature=0.0,
)
updated = response.content.strip()
if not updated or len(updated) < len(prd_content) // 2:
logger.warning(
"PRD rewrite looks truncated (%d chars vs original %d), returning original",
len(updated), len(prd_content),
)
return prd_content
return updated


# ---------------------------------------------------------------------------
# Top-level Orchestrator
# ---------------------------------------------------------------------------


def stress_test_prd(
prd_content: str, provider, max_depth: int = 3
) -> StressTestResult:
"""Run the full PRD stress test: extract goals → recursive decompose → render."""
goals = extract_goals(prd_content, provider)

tree: list[DecompositionNode] = []
ambiguities: list[Ambiguity] = []

for goal in goals:
node = recursive_decompose(
title=goal,
description=goal,
lineage=[],
prd_content=prd_content,
depth=0,
max_depth=max_depth,
ambiguities=ambiguities,
provider=provider,
)
tree.append(node)

tech_spec = render_tech_spec(tree, ambiguities)
amb_report = render_ambiguity_report(ambiguities)

# Extract title from PRD (first heading or first line)
prd_title = "Untitled"
for line in prd_content.splitlines():
stripped = line.strip()
if stripped.startswith("# "):
prd_title = stripped[2:].strip()
break
if stripped:
prd_title = stripped[:80]
break

return StressTestResult(
prd_title=prd_title,
tree=tree,
ambiguities=ambiguities,
tech_spec_markdown=tech_spec,
ambiguity_report=amb_report,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add core event emission for stress-test state transitions.

This module performs stateful transitions (goal extraction, ambiguity detection/resolution, decomposition completion) but emits no core events, which weakens auditability/observability.

As per coding guidelines, "All core modules must emit events for state transitions via core/events.py for audit and observability".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 114 - 404, Import the core
event API from core.events and instrument major state transitions: call the
event emitter at start/end of stress_test_prd, after extract_goals returns (in
extract_goals), when classify_and_decompose detects an ambiguity (in
classify_and_decompose), when recursive_decompose hits max depth and when it
finishes building a node (in recursive_decompose), and when
resolve_ambiguities_into_prd produces an updated PRD (in
resolve_ambiguities_into_prd); include meaningful event names (e.g.,
"prd.stress_test.started"/".completed", "prd.goals.extracted",
"prd.ambiguity.detected", "prd.node.leaf_forced", "prd.node.decomposed",
"prd.prd.updated") and attach contextual payloads (prd_title or snippet,
goal/title, node id/title, ambiguity id/details, complexity_hint, and whether
rewrite was truncated) so auditors can correlate events — use the event API
exported by core/events.py (e.g., import emit_event or the module’s emitter) and
ensure calls are non-blocking and tolerate emitter failures (log and continue).

Comment thread codeframe/core/prd_stress_test.py Outdated
Guard against malformed LLM output in classify_and_decompose by
filtering children to only dicts with title or description keys.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
codeframe/core/prd_stress_test.py (3)

123-130: ⚠️ Potential issue | 🟠 Major

Fail loud on unparseable model output instead of silently “succeeding.”

Line 123 and Line 165 currently degrade parse failures to empty goals / ATOMIC, which can produce false-clean stress-test results.

Suggested fix
 def extract_goals(prd_content: str, provider) -> list[str]:
@@
-    try:
-        goals = json.loads(response.content)
+    try:
+        goals = json.loads(response.content)
         if isinstance(goals, list):
             return [str(g) for g in goals]
-        logger.warning("Goal extraction returned non-list: %s", type(goals).__name__)
+        raise ValueError(f"Goal extraction returned {type(goals).__name__}, expected list")
     except (json.JSONDecodeError, TypeError) as exc:
-        logger.warning("Failed to parse goal extraction response: %s", exc)
-    return []
+        raise ValueError(f"Failed to parse goal extraction response: {exc}") from exc

@@
-    try:
-        data = json.loads(response.content)
-    except (json.JSONDecodeError, TypeError) as exc:
-        logger.warning("Failed to parse classification for '%s': %s", title, exc)
-        return Classification.ATOMIC, [], None, "Low"
+    try:
+        data = json.loads(response.content)
+    except (json.JSONDecodeError, TypeError) as exc:
+        raise ValueError(f"Failed to parse classification for '{title}': {exc}") from exc

Also applies to: 165-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 123 - 130, The code currently
swallows parse failures and non-list outputs by logging a warning and returning
an empty list (or falling back to "ATOMIC") which hides real failures; instead,
when json.loads(response.content) raises JSONDecodeError/TypeError or when the
decoded value is not a list, raise an explicit exception (e.g., ValueError or
RuntimeError) that includes the raw response.content and the parse error/type
info, so callers fail loudly; update both the block around the
json.loads(response.content)/logger.warning non-list check and the similar
fallback at lines 165-167 to raise with context rather than returning
empty/ATOMIC.

114-122: ⚠️ Potential issue | 🟠 Major

Add core event emission for stress-test state transitions.

This module performs multiple state transitions (start/completion, goal extraction outcomes, ambiguity detection, recursion cutoffs, PRD rewrite outcomes) but emits no core/events.py events.

As per coding guidelines, "All core modules must emit events for state transitions via core/events.py for audit and observability".

Also applies to: 155-161, 196-253, 323-359, 367-409

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 114 - 122, This module
currently performs many state transitions but doesn't emit observability events;
import the core/events emitter (e.g. from core.events import emit_event) and add
well-named event emissions around each state change: emit a
"prd_stress_test.started" event at the beginning of the top-level stress-test
runner and "prd_stress_test.completed" on exit (include duration/status), emit
"prd_goals.extracted" or "prd_goals.failed" in extract_goals with metadata (prd
identifier, goals count or error), emit "prd_ambiguity.detected" where ambiguity
logic runs, emit "prd_recursion.cutoff" when recursion limits trigger, and emit
"prd_rewrite.result" for rewrite outcomes; include relevant context (prd id,
timestamps, error/message) and ensure these calls are added in the functions
handling those transitions (e.g., extract_goals and the
ambiguity/recursion/rewrite handlers referenced in the diff ranges).

352-353: ⚠️ Potential issue | 🟠 Major

Handle nullable provider content before .strip().

Line 352 assumes response.content is always a string; None will raise AttributeError.

Suggested fix
-    updated = response.content.strip()
+    updated = (response.content or "").strip()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 352 - 353, The code assumes
response.content is a non-null string before calling .strip(), which will raise
AttributeError if it's None; update the logic around response and updated (used
with prd_content) to first handle a nullable response.content (e.g., treat None
as empty string or explicitly check for None) before calling .strip(), then
apply the existing length check comparing updated to prd_content; ensure you
modify the block that sets updated and the subsequent if that references
prd_content to use the safe/normalized updated value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/prd_stress_test.py`:
- Around line 163-173: The code assumes parsed JSON `data` is a dict and that
`data.get("classification")` returns a string before calling `.lower()`, which
can fail if the model returns an array or non-string; update the block around
`data = json.loads(response.content)` / `raw_cls = data.get("classification",
"atomic").lower()` to first ensure `data` is a dict and that `classification` is
a str (e.g., check isinstance(data, dict) and isinstance(cls_val, str)); if
those checks fail, fall back to "atomic" (or the default) so
`Classification(raw_cls)` never receives invalid input, and keep the existing
error handling using `Classification.ATOMIC` when ValueError occurs.

---

Duplicate comments:
In `@codeframe/core/prd_stress_test.py`:
- Around line 123-130: The code currently swallows parse failures and non-list
outputs by logging a warning and returning an empty list (or falling back to
"ATOMIC") which hides real failures; instead, when json.loads(response.content)
raises JSONDecodeError/TypeError or when the decoded value is not a list, raise
an explicit exception (e.g., ValueError or RuntimeError) that includes the raw
response.content and the parse error/type info, so callers fail loudly; update
both the block around the json.loads(response.content)/logger.warning non-list
check and the similar fallback at lines 165-167 to raise with context rather
than returning empty/ATOMIC.
- Around line 114-122: This module currently performs many state transitions but
doesn't emit observability events; import the core/events emitter (e.g. from
core.events import emit_event) and add well-named event emissions around each
state change: emit a "prd_stress_test.started" event at the beginning of the
top-level stress-test runner and "prd_stress_test.completed" on exit (include
duration/status), emit "prd_goals.extracted" or "prd_goals.failed" in
extract_goals with metadata (prd identifier, goals count or error), emit
"prd_ambiguity.detected" where ambiguity logic runs, emit "prd_recursion.cutoff"
when recursion limits trigger, and emit "prd_rewrite.result" for rewrite
outcomes; include relevant context (prd id, timestamps, error/message) and
ensure these calls are added in the functions handling those transitions (e.g.,
extract_goals and the ambiguity/recursion/rewrite handlers referenced in the
diff ranges).
- Around line 352-353: The code assumes response.content is a non-null string
before calling .strip(), which will raise AttributeError if it's None; update
the logic around response and updated (used with prd_content) to first handle a
nullable response.content (e.g., treat None as empty string or explicitly check
for None) before calling .strip(), then apply the existing length check
comparing updated to prd_content; ensure you modify the block that sets updated
and the subsequent if that references prd_content to use the safe/normalized
updated value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 93dba181-568a-47ff-af81-a36691e60f6e

📥 Commits

Reviewing files that changed from the base of the PR and between 6052191 and f5636a1.

📒 Files selected for processing (1)
  • codeframe/core/prd_stress_test.py

Comment on lines +163 to +173
try:
data = json.loads(response.content)
except (json.JSONDecodeError, TypeError) as exc:
logger.warning("Failed to parse classification for '%s': %s", title, exc)
return Classification.ATOMIC, [], None, "Low"

raw_cls = data.get("classification", "atomic").lower()
try:
cls = Classification(raw_cls)
except ValueError:
cls = Classification.ATOMIC
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard JSON shape before using .get() / .lower().

If the model returns a JSON array or a non-string classification, this block can fail or behave unpredictably.

Suggested fix
-    try:
-        data = json.loads(response.content)
+    try:
+        data = json.loads(response.content)
+        if not isinstance(data, dict):
+            raise TypeError(f"Expected object, got {type(data).__name__}")
     except (json.JSONDecodeError, TypeError) as exc:
         logger.warning("Failed to parse classification for '%s': %s", title, exc)
         return Classification.ATOMIC, [], None, "Low"

-    raw_cls = data.get("classification", "atomic").lower()
+    raw_cls = data.get("classification", "atomic")
+    if not isinstance(raw_cls, str):
+        raw_cls = "atomic"
+    raw_cls = raw_cls.lower()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/prd_stress_test.py` around lines 163 - 173, The code assumes
parsed JSON `data` is a dict and that `data.get("classification")` returns a
string before calling `.lower()`, which can fail if the model returns an array
or non-string; update the block around `data = json.loads(response.content)` /
`raw_cls = data.get("classification", "atomic").lower()` to first ensure `data`
is a dict and that `classification` is a str (e.g., check isinstance(data, dict)
and isinstance(cls_val, str)); if those checks fail, fall back to "atomic" (or
the default) so `Classification(raw_cls)` never receives invalid input, and keep
the existing error handling using `Classification.ATOMIC` when ValueError
occurs.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

PR #446 Follow-up Review

Building on my prior reviews — the earlier issues (patch target, type hints, silent empty-goals, 2000-char truncation) are still present. Two additional bugs found in the latest diff that were not covered before:

Bug: prd_module.store() called with swapped content/title args in tests

In tests/core/test_prd_stress_test.py, both setUp calls pass arguments in the wrong order. The actual prd.store() signature is store(workspace, content, title=None, ...) but the tests call it as store(workspace, 'Test PRD', sample_prd, {}). This stores the literal string 'Test PRD' as the PRD body and sample_prd as the title. Tests pass today because they only validate rendered output, not the stored record fields — but any assertion on record.content will silently return the wrong value. Fix: swap to store(workspace, sample_prd, 'Test PRD', {}).

Bug: output.write_text() missing encoding parameter

output.write_text(result.tech_spec_markdown) at line 151 uses the platform default encoding. On Windows this is typically cp1252, not UTF-8. The tech spec contains Unicode characters (arrows, checkmarks). prd.py's export_to_file already passes encoding='utf-8' — this write_text call should match that convention.

Minor: child validation uses OR and can silently produce Untitled nodes

The child filter uses ("title" in c or "description" in c). A dict with only a description key passes through and gets title = child_dict.get('title', 'Untitled') downstream. Using AND instead of OR is more defensive and matches the LLM prompt contract which specifies both keys are required.

@frankbria frankbria merged commit 933ffcc into main Mar 16, 2026
13 checks passed
@frankbria frankbria deleted the feature/issue-421-prd-stress-test branch March 24, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 5] Recursive Decomposition as PRD Stress Test and Technical Spec Generator

1 participant