Skip to content

refactor: remove planning module and repair system#122

Merged
spinje merged 3 commits into
mainfrom
refactor/remove-planning-and-repair
Mar 16, 2026
Merged

refactor: remove planning module and repair system#122
spinje merged 3 commits into
mainfrom
refactor/remove-planning-and-repair

Conversation

@spinje
Copy link
Copy Markdown
Owner

@spinje spinje commented Mar 16, 2026

Summary

Remove the gated planning module (~7,300 production lines, ~23,000 test lines) and repair system (~1,050 production lines). Replace PocketFlow-based discovery nodes with plain functions. Drop direct anthropic SDK dependency. Fix two pre-existing runtime bugs in error reporting exposed during the refactor.

Net impact: ~42,000 lines removed. 4,070 tests pass, 0 skipped from planning/repair.

Task

Task 92: Remove Planning Module and Repair System

Changes

Phase 1: Relocate active dependencies

  • parse_structured_responsecore/llm_utils.py
  • smart_filter.pyregistry/smart_filter.py
  • build_planning_context + build_nodes_contextregistry/context_builder.py
  • build_workflows_contextcore/workflow/context.py
  • Prompt loader → core/prompt_utils.py
  • WorkflowDiscoveryNodecore/workflow/discovery.py:discover_workflow() (plain function + WorkflowMatch dataclass)
  • ComponentBrowsingNoderegistry/discovery.py:discover_components() (plain function + ComponentSelection dataclass)
  • Removed install_anthropic_model monkey-patch, dropped anthropic>=0.75 direct dep

Phase 2: Delete and clean

  • Deleted src/pflow/planning/ (32 files), repair files, 68+ test files
  • Simplified workflow_execution.py from 642 → 83 lines (validate→execute→return)
  • Removed 654 lines from cli/main.py (13 planner functions, 7 CLI flags)
  • Removed planner metrics, is_planner threading, __modified_nodes__ tracking
  • Renamed: CriticalPlanningErrorCriticalDiscoveryError, build_planning_contextbuild_component_context
  • Updated all CLAUDE.md files, architecture docs, agent definitions, user-facing docs

Bug fixes (pre-existing, exposed by refactor)

  • API warnings (__warnings__) now surfaced as actionable error messages instead of generic "Workflow failed with action: error"
  • MCP "error": null no longer produces "None" string in error output
  • Nested MCP data.error (Slack/Discord style) now correctly unwrapped
  • Added 3 regression tests protecting these fixes

Explanation

The planning module was gated since Task 107 (markdown format migration) with 516 skipped tests and unreachable code paths. Rather than maintaining dormant code, we removed it cleanly. The few active features that depended on planning code (discovery commands, context builder, structured response parser) were relocated to their natural homes.

Discovery features (pflow workflow discover, pflow registry discover) were preserved as plain functions with typed dataclass returns — simpler, more testable, and with explicit contracts vs the PocketFlow node ceremony.

The anthropic SDK was only used by planning's custom Anthropic wrapper for prompt caching and thinking tokens. The llm-anthropic plugin natively supports schema= for structured output, which is all discovery needs.

Created Docs

  • .taskmaster/tasks/task_92/task-92.md — updated task spec reflecting what was actually implemented
  • .taskmaster/tasks/task_92/task-review.md — post-implementation review with integration points, patterns, and AI agent guidance
  • .taskmaster/tasks/task_92/implementation/progress-log.md — detailed implementation journal with decisions, corrections, and learnings

Testing

  • make test: 4,070 passed, 0 skipped, 0 failed
  • make check: all green (ruff, mypy, deptry)
  • Manual integration testing: 27 scenarios across execution, error handling, CLI flags, registry, workflow management, pipes, batch, traces, MCP server
  • Zero pflow.planning references remain in src/ or tests/

spinje added 3 commits March 16, 2026 12:00
Move actively-used code out of planning/ module to their natural homes:

- parse_structured_response → core/llm_utils.py
- smart_filter.py → registry/smart_filter.py
- build_planning_context + build_nodes_context → registry/context_builder.py
- build_workflows_context → core/workflow/context.py
- prompt loader → core/prompt_utils.py

Replace PocketFlow discovery nodes with plain functions:
- WorkflowDiscoveryNode → core/workflow/discovery.py:discover_workflow()
- ComponentBrowsingNode → registry/discovery.py:discover_components()

Remove install_anthropic_model monkey-patch and drop direct anthropic>=0.75
dependency (llm-anthropic provides it transitively).

Deprecation notices added to planning/ source files pending Phase 2 deletion.
Phase 2 of planning removal — deletes src/pflow/planning/ (~7,300 lines),
repair system (~1,050 lines), and all associated tests (~23,000 lines).

Code cleanup:
- Simplified workflow_execution.py from 642 to 83 lines (validate→execute→return)
- Removed 654 lines from cli/main.py (13 planner functions, 7 CLI flags)
- Removed planner metrics fields, is_planner threading, __modified_nodes__ tracking
- Removed _is_fixable_error(), repaired_workflow_ir, PlannerError, update_ir()
- Renamed CriticalPlanningError→CriticalDiscoveryError, build_planning_context→build_component_context

Bug fixes (pre-existing, exposed by refactor):
- API warnings (__warnings__) now surfaced as actionable error messages instead
  of generic "Workflow failed with action: error"
- MCP "error": null no longer produces "None" string in error output
- Nested MCP data.error (Slack/Discord style) now correctly unwrapped

Documentation: Updated all CLAUDE.md files, architecture docs, agent definitions,
user-facing docs, scripts. Removed natural language mode from CLI docs.

Tests: 4,070 pass, 0 skipped from planning/repair. Added 3 regression tests
for the error formatting bug fixes.
…iew]

- Tightened CLI test assertions to verify post-planner direct execution
- Added explicit validation short-circuit assertion in workflow_execution test
- Added degraded-success formatter coverage (2 tests)
- Updated 5 CLAUDE.md files: build_planning_context → build_component_context
- Removed stale DEP003 anthropic exclusion (no longer imported)
- Updated progress log
@claude
Copy link
Copy Markdown

claude Bot commented Mar 16, 2026

Code Review — PR #122: Remove Planning Module and Repair System

Overall Assessment: This is an excellent cleanup PR. Removing ~42,000 lines of dead code, simplifying workflow_execution.py from 642 to 83 lines, and preserving active features as plain functions with typed returns is exactly the right approach. The bug fixes for MCP null-error handling and API warning surfacing are well-motivated and well-tested.


Critical — must fix before merge

None.


Warnings — should be addressed

1. context_builder.py is 900 lines — largest new file in the PR

src/pflow/registry/context_builder.py was relocated from planning/context_blocks.py and appears to have been moved largely as-is. While this is fine for a relocation, it's worth noting this is a large module with many internal functions. Not a blocker for this PR (it was already this size), but worth flagging for future decomposition.

2. parse_structured_response silently falls back on Pydantic validation failure

In src/pflow/core/llm_utils.py:62-66:

except Exception as e:
    # If validation fails, log and return raw result
    logger.warning(f"Failed to validate result through {expected_type.__name__}: {e}")
    return result

This silently returns unvalidated data when Pydantic validation fails. The callers (discover_workflow, discover_components, smart_filter) all rely on specific field names being present. If the LLM returns malformed data that passes JSON parsing but fails Pydantic validation, the caller will get a dict missing expected keys and fail with a confusing KeyError downstream.

Consider either:

  • Raising a ValueError so the caller can handle it explicitly, or
  • At minimum, logging at error level rather than warning

3. anthropic dependency change could be clearer

pyproject.toml removes anthropic>=0.75 as a direct dependency but adds a DEP003 exception for it as a transitive dep. The comment says "used by llm-anthropic plugin at runtime." This is correct behavior, but if a future llm-anthropic version drops or changes the transitive dep, pflow would break silently at runtime. The current approach is fine — just worth documenting in a comment near the deptry exception that the transitive dep is load-bearing.


Suggestions — optional improvements

1. discover_components returns empty string on context-builder error

In src/pflow/registry/discovery.py:103-106:

if isinstance(component_context, dict) and "error" in component_context:
    logger.warning(...)
    component_context_str = ""

Returning an empty string when build_component_context fails means the MCP server will return empty content to the AI agent with no indication of what went wrong. Consider including the error message in the return or raising so the caller can decide.

2. Bug fix tests are high quality

The three regression tests in test_api_warning_system.py (TestErrorFormattingSurfacesWarnings) are excellent — they test the actual bug scenario end-to-end through WorkflowExecutorService._build_error_list rather than mocking internals. The test names and docstrings clearly explain what bug each test protects against. This is exactly the right testing pattern.

3. workflow_execution.py simplification is clean

The reduction from 478 lines with repair loops to 18 lines of validate-execute-return is a model cleanup. The resume_state parameter removal is correct since it was only used by the repair loop.

4. Consider: WorkflowMatch.workflow typing

WorkflowMatch.workflow is typed as Optional[dict[str, Any]] — consider whether a more specific type (e.g., a TypedDict or reference to the WorkflowManager's return type) would help callers. Low priority since this is internal API.


Summary

Strong PR. The approach of relocating active code to natural homes, converting PocketFlow nodes to plain functions, and removing the entire planning/repair subsystem is well-executed. The bug fixes are well-motivated with clear regression tests. The MCP server discovery service simplification from PocketFlow ceremony to direct function calls is particularly clean.

The only actionable item I'd recommend addressing before merge is the silent fallback in parse_structured_response (Warning #2).

@spinje spinje merged commit d8a70ed into main Mar 16, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant