fix(brain): JSON-envelope verdict/decision transport for decide, orient & merge-judge (#2421, #2428, #2429, #2430, #2435, #2462, #2463) by rysweet · Pull Request #2486 · rysweet/Simard

rysweet · 2026-06-28T08:46:48Z

Summary

Recipe-produced fix for the recipe-brain VERDICT/DECISION parse-failure cluster (the #2419 family). The recipe completed the implementation but hollow-failed at the PR-creation step; landing the verified work here (rebased onto current main, only a trivial mkdocs nav conflict resolved).

Makes the merge-readiness-judge / decide / orient recipe brains request and robustly parse a STRUCTURED (JSON-envelope) verdict/decision instead of the recipe-runner TEXT-MODE banner. On a parse miss: schema-repair / re-prompt, then escalate via the existing confidence-gated ladder, and only fall back deterministically and LOUDLY after exhausting it. simard merge-pr now surfaces a real PASS/FAIL/BLOCKED verdict and fails closed when no verdict is produced (never silent SUCCESS). Verdict-parse success rate is instrumented (#2429).

Changes

src/ooda_brain/recipe_brain.rs, src/stewardship/recipe_merge_judge.rs, src/ooda_brain/{judgment_record,parse_failure,mod}.rs — JSON-envelope transport + schema-repair + escalation + fail-closed verdict.
prompt_assets/simard/recipes/{merge-readiness-judge,ooda-decide,ooda-orient}.yaml — structured-output prompts.
Tests: tests/recipe_brain_verdict_assets.rs, gadugi merge-judge-verdict + decide-orient-brain-parse.
Docs: verdict-parsing reference + diagnose how-tos.

Issues

Closes #2421, #2428, #2429, #2430, #2435, #2462, #2463

Test plan

Unit + content-pin tests green; pre-push (fmt + targeted release tests + clippy --all-targets --locked) passed locally.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…nt & merge-judge (#2421, #2428, #2429) The recipe-backed decide, orient, and merge-judge brains read recipe-runner-rs DEFAULT text-mode stdout — a summary banner ("Recipe: <name> ... SUCCESS ...") — instead of the agent's real output. Consequences: - merge-judge: banner has no ready/not_ready/unclear token, so every `simard merge-pr` aborted with "no verdict keyword found" (#2428/#2430/#2435/ #2462/#2463). - decide: banner first word `Recipe:` matched no action → silent `advance_goal` every cycle, ignoring the LLM (#2421). - orient: banner timing `(0.0s)` was scraped as a valid in-range decimal → urgency actively corrupted to 0.0 (#2421). Apply the shipped engineer-lifecycle pattern (#2419/#2432) to all three: - Invoke `recipe-runner-rs --output-format json` and extract the final step output from the envelope (extract_recipe_decision_output, now pub(crate) and reused by all phases) before parsing the real agent output. - Generalize the confidence-gated escalation ladder into a shared run_brain_ladder<D> backbone; run_escalation_ladder (lifecycle) is now a thin wrapper delegating to it — zero behavioral drift (existing ladder tests pass unchanged). decide/orient/merge ride the same ladder via phase-specific invoke/parse/default/note closures. - On a parse-miss, climb the ladder (schema-repair → high-effort re-prompt), then fall back LOUDLY: decide → AdvanceGoal, orient → deterministic floor, merge-judge → fail-closed Verdict::Unclear (never SUCCESS-without-verdict, never fail-open to Ready). Genuine spawn/exit/envelope-decode failures still surface as Err. - New parse-outcome variants parse_action_outcome / parse_orient_outcome / parse_merge_outcome distinguish a real decision from a default, so a defaulted row is never mistaken for a genuine advance_goal / floor / verdict. - merge-judge parse_merge_outcome tries the structured {"verdict":...} JSON (parse_judge_response, populating blockers) first, then the prose keyword scanner as a fallback. Instrument the class (#2429): shared brain_verdict_parsed_total{phase,outcome} counter emitted on BOTH the parsed and the defaulted branch (no-op under cfg!(test)), giving parse_success_rate{phase} a denominator across decide/orient/merge_judge. BrainPhase gains a MergeJudge variant (serde lowercase→snake_case keeps act/decide/orient byte-stable). Recipes: add the additive {{escalation_note}} placeholder (empty on the base attempt) to ooda-decide.yaml, ooda-orient.yaml, merge-readiness-judge.yaml, mirroring ooda-engineer-lifecycle.yaml. recipe-runner-rs renders a missing var as empty, so every other invocation path is unaffected. Tests: parse-outcome classification, escalation-note builders, shared-metric context, generic run_brain_ladder over an arbitrary decision type, merge fail-closed + JSON-envelope extraction + prose fallback (issue_2419_family_phase_tests, issue_2421_tests, issue_2428_tests, issue_2428_production_tests, tests/recipe_brain_verdict_assets.rs), plus two outside-in gadugi scenarios. Docs updated to the shipped design (durable reference/how-to, mkdocs --strict clean). Output contracts (DecideJudgment/OrientJudgment/JudgeOutcome, trait signatures, merge CLI verdict surface) preserved. Closes #2421 Closes #2428 Closes #2430 Closes #2435 Closes #2429 Closes #2462 Closes #2463 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ride seam The recipe_brain and recipe_merge_judge `resolve_recipe_path` helpers consult the hot-reload path `~/.simard/prompt_assets/simard/recipes/<recipe>` before the in-tree path, but read it through `dirs::home_dir()` with no override. Their constructor / path-resolution unit tests therefore trusted the ambient HOME: in any environment with a populated `~/.simard/prompt_assets` (e.g. a developer or agent box running the live daemon), 6 tests flipped red even though the production code is correct — - recipe_brain: resolve_recipe_path_returns_none_for_nonexistent_repo, resolve_recipe_path_finds_in_tree_recipe, and the three new_returns_none_when_{decide,orient,lifecycle}_recipe_missing constructor tests resolved the polluted hot-reload recipe instead of None / the in-tree file. - recipe_merge_judge: new_returns_none_when_recipe_missing likewise saw the ambient merge-readiness-judge.yaml and built a Some(judge). Adopt the `home_override: Option<&Path>` seam already shipped on the sibling `disk_health::resolve_recipe_path` and `brain_introspection::resolve_recipe_path`: - Thread `home_override` through both `resolve_recipe_path` helpers (`home_override.map(PathBuf::from).or_else(dirs::home_dir)`). - Keep the public `new()` constructors byte-stable; production still passes `None`. Add a private `new_with_home(.., home_override)` seam that `new` delegates to, so the 3 production call sites in daemon/brains.rs and the merge_judge dispatcher are untouched. - Point the affected unit tests at an isolated temp home so they no longer read `~/.simard`. The three *_uses_correct_recipe_filename tests (which only passed by luck because the hot-reload path also contained the filename substring) are made deterministic too. No production behavior change: the hot-reload-first → in-tree fallback order is identical when home_override is None. Full lib suite green (6223 passed) in a polluted environment that previously showed 6 failures; fmt/clippy/mkdocs --strict clean. Docs (recipe-brain-api, unified-recipe-brain) updated to the new signature and home_override convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…development-fix-the-recipe-brain-verdictdecis # Conflicts: # mkdocs.yml

github-actions · 2026-06-28T08:51:23Z

📊 Coverage Summary

Generated by cargo llvm-cov --workspace --summary-only (nightly, excluding test files)

Module	Lines	Covered	Coverage
Total	111556	93453	83.8%

_{Coverage data from CI run. Test files matching tests?/ are excluded from line counts.}

…ture (#2501, #2555) (#2565) #2501 and #2555 are duplicates: `simard merge-pr` aborted with `no verdict keyword (ready/not_ready/unclear) found in recipe output; raw="Recipe: merge-readiness-judge ... SUCCESS ..."`. The root cause — the agent step's stdout was not surfaced to the verdict extractor, only the recipe-runner text-mode SUCCESS banner — was already fixed by #2486 (`--output-format json` + `extract_recipe_decision_output`) and #2496/#2504 (strip the Copilot launcher preamble at the shared chokepoint). The residual reports came from a stale deployed binary (simard 0.22.0 vs repo 0.24.0); the diagnose howto already predicts that "pre-fix binary" symptom. Lock the real wire format so a regression cannot silently reintroduce the abort: - src/stewardship/recipe_merge_judge.rs: new `issue_2501_2555_real_envelope_tests` drives the PRODUCTION `extract_recipe_decision_output` on the exact `recipe-runner-rs 0.3.6 --output-format json --agent-binary copilot` envelope, where the launcher preamble (`ℹ … NODE_OPTIONS=… (saved preference)…`) is prepended to the verdict inside `step_results[].output`. Asserts: `ready` recovered (Parsed); `not_ready` keeps structured blockers; prose falls back to the keyword verdict with the preamble stripped from the rationale; preamble-only output fails closed to `unclear` (never a false `ready`); a text banner surfaces as a loud infra decode error. - tests/gadugi/merge-judge-verdict.sh: steps (d)/(e) reproduce the preamble wire format deterministically (no LLM) and run the new module end-to-end. - docs/howto/diagnose-merge-pr-verdict-parse-failures.md: document the preamble handling and the regression pin. No production code change — the capture fix already shipped; this pins the real 0.3.6 wire format against regression and dedupes the two reports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…erdict parse-miss (#2569) (#2600) The progress-evidence gate previously FAILED OPEN (Accept) whenever the LLM/recipe reviewer produced no recognizable accept/reject verdict — even on a successful, non-empty run — letting hallucinated "0%->100% with no verdict" bumps land as false "done" states. Adopt the merge judge's infra-vs-semantic split across both live tiers (RecipeProgressChecker primary + LlmReviewerProgressChecker direct-LLM fallback): - INFRA failure (transport error / spawn failure / non-zero exit / output that strips to empty) -> keep fail-OPEN, so goals aren't blocked on infra hiccups. - SEMANTIC parse-miss (successful, non-empty response with no verdict, or an unknown verdict string) -> fail-CLOSED (Reject). Reject only keeps the prior percent + logs a hallucination alert; it does not stall the goal. Also parse the structured {"verdict": ...} JSON first in the recipe tier (was a naive substring scan that wrongly Rejected an `accept` whose rationale mentioned "reject", and could Accept "unacceptable"), reusing the direct-LLM tier's tolerant parse_reviewer_response — matching the merge judge. The merge path reported in #2569 was already fail-closed on main (#2486/#2490/#2504, merged before #2569 was filed against released v0.22.0); this adds a regression test pinning the reporter's exact SUCCESS-with-no-verdict banners (30s AND 102s) -> Verdict::Unclear, and fixes the analogous fail-open in the progress path. Docs (progress-evidence-api, text-parsing-wire-formats §2a, progress-evidence-gating, text-based-brain-protocol) updated to the new policy and corrected to reflect that progress_reviewer.rs is the live direct-LLM fallback tier (not deleted). Adds a gadugi outside-in scenario plus parser/decision regression tests. Closes #2569 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Simard Engineer and others added 3 commits June 28, 2026 07:14

Merge remote-tracking branch 'origin/main' into feat/issue-2419-goal-…

0d62381

…development-fix-the-recipe-brain-verdictdecis # Conflicts: # mkdocs.yml

rysweet merged commit 247c193 into main Jun 28, 2026
15 checks passed

This was referenced Jul 4, 2026

fix(distill): recover facts+procedures from a pretty runner envelope behind the 1.0.66-2 launch banner (#2517) #2571

Merged

fix(goal_curation): progress-evidence gate fails closed on semantic verdict parse-miss (#2569) #2600

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(brain): JSON-envelope verdict/decision transport for decide, orient & merge-judge (#2421, #2428, #2429, #2430, #2435, #2462, #2463)#2486

fix(brain): JSON-envelope verdict/decision transport for decide, orient & merge-judge (#2421, #2428, #2429, #2430, #2435, #2462, #2463)#2486
rysweet merged 3 commits into
mainfrom
feat/issue-2419-goal-development-fix-the-recipe-brain-verdictdecis

rysweet commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rysweet commented Jun 28, 2026

Summary

Changes

Issues

Test plan

Uh oh!

github-actions Bot commented Jun 28, 2026

📊 Coverage Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant