feat: add fact-checking for implementation plan file claims (#68) (#68)#72
Merged
feat: add fact-checking for implementation plan file claims (#68) (#68)#72
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The implementation planner could hallucinate claims about file contents (e.g., "a
postbuildscript already exists") causing the code-writer to skip critical changes. This PR addresses the root cause by (1) mandating that the planner agent reads every source file before referencing it, and (2) adding afileExistenceCheckwarning toPlanningToImplementationGateso the pipeline surfaces missing file references at plan-validation time rather than silently continuing.Closes #68
Changes
src/agents/templates/implementation-planner.md: Added a mandatory rule requiring the agent to read every source file before making claims about its contents or structure; updated theTool Permissionsentry to mark file reads as required.src/core/phase-gate.ts: ExtendedPlanningToImplementationGate.validate()to importaccessfromnode:fs/promisesand check each file path listed across all plan tasks againstcontext.worktreePath, emitting a warning (not an error) for any path that does not exist on disk.tests/phase-gate.test.ts: Added four new test cases covering the new file-existence warning behaviour: all files present (pass), all files missing (warn), mixed present/missing (warn only for missing), and single missing file scenario.tests/implementation-planner-template.test.ts: Added six new test assertions verifying that the updated template contains aRulessection, the mandatoryMUST read every source fileinstruction, the "before making any claims" language, and thatTool Permissionsmarks the read as required.Implementation Details
The file-existence check uses a
try/catcharoundfs/promises.access()for each resolved path — consistent with how Node.js file presence is tested without stat overhead. Warnings are appended to the existingwarnings[]array, and the gate returns'warn'status only when there are no errors, preserving the existingfail/passcontract. No new dependencies were introduced.Testing
PlanningToImplementationGatefile-existence tests passnpx vitest run) passes with exit code 0Integration Verification
Notes
Cadre Process Challenges
makeContexthelper intests/phase-gate.test.tsonly accepted a single directory argument; existing tests constructed a context withworktreePathequal totempDir. The code-writer had to updatemakeContextcalls to accept a secondworktreeDirargument, which required updating existing passing tests, not just adding new ones — a mild friction point since the implementation plan did not explicitly call this out.tempDirasworktreePath, so the code-writer discovered the need to update callers only after reading the actual test file. More thorough scout output (enumerating existingmakeContextcall sites) would have prevented this surprise.makeContextcallers. Scout reports that enumerate specific function signatures and call sites in test files would eliminate this category of surprise.Closes #68