Skip to content

fix(brain): JSON-envelope verdict/decision transport for decide, orient & merge-judge (#2421, #2428, #2429, #2430, #2435, #2462, #2463)#2486

Merged
rysweet merged 3 commits into
mainfrom
feat/issue-2419-goal-development-fix-the-recipe-brain-verdictdecis
Jun 28, 2026
Merged

fix(brain): JSON-envelope verdict/decision transport for decide, orient & merge-judge (#2421, #2428, #2429, #2430, #2435, #2462, #2463)#2486
rysweet merged 3 commits into
mainfrom
feat/issue-2419-goal-development-fix-the-recipe-brain-verdictdecis

Conversation

@rysweet

@rysweet rysweet commented Jun 28, 2026

Copy link
Copy Markdown
Owner

Summary

Recipe-produced fix for the recipe-brain VERDICT/DECISION parse-failure cluster (the #2419 family). The recipe completed the implementation but hollow-failed at the PR-creation step; landing the verified work here (rebased onto current main, only a trivial mkdocs nav conflict resolved).

Makes the merge-readiness-judge / decide / orient recipe brains request and robustly parse a STRUCTURED (JSON-envelope) verdict/decision instead of the recipe-runner TEXT-MODE banner. On a parse miss: schema-repair / re-prompt, then escalate via the existing confidence-gated ladder, and only fall back deterministically and LOUDLY after exhausting it. simard merge-pr now surfaces a real PASS/FAIL/BLOCKED verdict and fails closed when no verdict is produced (never silent SUCCESS). Verdict-parse success rate is instrumented (#2429).

Changes

  • src/ooda_brain/recipe_brain.rs, src/stewardship/recipe_merge_judge.rs, src/ooda_brain/{judgment_record,parse_failure,mod}.rs — JSON-envelope transport + schema-repair + escalation + fail-closed verdict.
  • prompt_assets/simard/recipes/{merge-readiness-judge,ooda-decide,ooda-orient}.yaml — structured-output prompts.
  • Tests: tests/recipe_brain_verdict_assets.rs, gadugi merge-judge-verdict + decide-orient-brain-parse.
  • Docs: verdict-parsing reference + diagnose how-tos.

Issues

Closes #2421, #2428, #2429, #2430, #2435, #2462, #2463

Test plan

  • Unit + content-pin tests green; pre-push (fmt + targeted release tests + clippy --all-targets --locked) passed locally.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Simard Engineer and others added 3 commits June 28, 2026 07:14
…nt & merge-judge (#2421, #2428, #2429)

The recipe-backed decide, orient, and merge-judge brains read recipe-runner-rs
DEFAULT text-mode stdout — a summary banner ("Recipe: <name> ... SUCCESS ...")
— instead of the agent's real output. Consequences:

- merge-judge: banner has no ready/not_ready/unclear token, so every
  `simard merge-pr` aborted with "no verdict keyword found" (#2428/#2430/#2435/
  #2462/#2463).
- decide: banner first word `Recipe:` matched no action → silent `advance_goal`
  every cycle, ignoring the LLM (#2421).
- orient: banner timing `(0.0s)` was scraped as a valid in-range decimal →
  urgency actively corrupted to 0.0 (#2421).

Apply the shipped engineer-lifecycle pattern (#2419/#2432) to all three:

- Invoke `recipe-runner-rs --output-format json` and extract the final step
  output from the envelope (extract_recipe_decision_output, now pub(crate) and
  reused by all phases) before parsing the real agent output.
- Generalize the confidence-gated escalation ladder into a shared
  run_brain_ladder<D> backbone; run_escalation_ladder (lifecycle) is now a thin
  wrapper delegating to it — zero behavioral drift (existing ladder tests pass
  unchanged). decide/orient/merge ride the same ladder via phase-specific
  invoke/parse/default/note closures.
- On a parse-miss, climb the ladder (schema-repair → high-effort re-prompt),
  then fall back LOUDLY: decide → AdvanceGoal, orient → deterministic floor,
  merge-judge → fail-closed Verdict::Unclear (never SUCCESS-without-verdict,
  never fail-open to Ready). Genuine spawn/exit/envelope-decode failures still
  surface as Err.
- New parse-outcome variants parse_action_outcome / parse_orient_outcome /
  parse_merge_outcome distinguish a real decision from a default, so a defaulted
  row is never mistaken for a genuine advance_goal / floor / verdict.
- merge-judge parse_merge_outcome tries the structured {"verdict":...} JSON
  (parse_judge_response, populating blockers) first, then the prose keyword
  scanner as a fallback.

Instrument the class (#2429): shared brain_verdict_parsed_total{phase,outcome}
counter emitted on BOTH the parsed and the defaulted branch (no-op under
cfg!(test)), giving parse_success_rate{phase} a denominator across
decide/orient/merge_judge. BrainPhase gains a MergeJudge variant (serde
lowercase→snake_case keeps act/decide/orient byte-stable).

Recipes: add the additive {{escalation_note}} placeholder (empty on the base
attempt) to ooda-decide.yaml, ooda-orient.yaml, merge-readiness-judge.yaml,
mirroring ooda-engineer-lifecycle.yaml. recipe-runner-rs renders a missing var
as empty, so every other invocation path is unaffected.

Tests: parse-outcome classification, escalation-note builders, shared-metric
context, generic run_brain_ladder over an arbitrary decision type, merge
fail-closed + JSON-envelope extraction + prose fallback (issue_2419_family_phase_tests,
issue_2421_tests, issue_2428_tests, issue_2428_production_tests,
tests/recipe_brain_verdict_assets.rs), plus two outside-in gadugi scenarios.

Docs updated to the shipped design (durable reference/how-to, mkdocs --strict
clean). Output contracts (DecideJudgment/OrientJudgment/JudgeOutcome, trait
signatures, merge CLI verdict surface) preserved.

Closes #2421
Closes #2428
Closes #2430
Closes #2435
Closes #2429
Closes #2462
Closes #2463

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ride seam

The recipe_brain and recipe_merge_judge `resolve_recipe_path` helpers consult
the hot-reload path `~/.simard/prompt_assets/simard/recipes/<recipe>` before the
in-tree path, but read it through `dirs::home_dir()` with no override. Their
constructor / path-resolution unit tests therefore trusted the ambient HOME: in
any environment with a populated `~/.simard/prompt_assets` (e.g. a developer or
agent box running the live daemon), 6 tests flipped red even though the
production code is correct —

- recipe_brain: resolve_recipe_path_returns_none_for_nonexistent_repo,
  resolve_recipe_path_finds_in_tree_recipe, and the three
  new_returns_none_when_{decide,orient,lifecycle}_recipe_missing constructor
  tests resolved the polluted hot-reload recipe instead of None / the in-tree
  file.
- recipe_merge_judge: new_returns_none_when_recipe_missing likewise saw the
  ambient merge-readiness-judge.yaml and built a Some(judge).

Adopt the `home_override: Option<&Path>` seam already shipped on the sibling
`disk_health::resolve_recipe_path` and `brain_introspection::resolve_recipe_path`:

- Thread `home_override` through both `resolve_recipe_path` helpers
  (`home_override.map(PathBuf::from).or_else(dirs::home_dir)`).
- Keep the public `new()` constructors byte-stable; production still passes
  `None`. Add a private `new_with_home(.., home_override)` seam that `new`
  delegates to, so the 3 production call sites in daemon/brains.rs and the
  merge_judge dispatcher are untouched.
- Point the affected unit tests at an isolated temp home so they no longer
  read `~/.simard`. The three *_uses_correct_recipe_filename tests (which only
  passed by luck because the hot-reload path also contained the filename
  substring) are made deterministic too.

No production behavior change: the hot-reload-first → in-tree fallback order is
identical when home_override is None. Full lib suite green (6223 passed) in a
polluted environment that previously showed 6 failures; fmt/clippy/mkdocs
--strict clean. Docs (recipe-brain-api, unified-recipe-brain) updated to the
new signature and home_override convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…development-fix-the-recipe-brain-verdictdecis

# Conflicts:
#	mkdocs.yml
@github-actions

Copy link
Copy Markdown

📊 Coverage Summary

Generated by cargo llvm-cov --workspace --summary-only (nightly, excluding test files)

Module Lines Covered Coverage
Total 111556 93453 83.8%

Coverage data from CI run. Test files matching tests?/ are excluded from line counts.

@rysweet rysweet merged commit 247c193 into main Jun 28, 2026
15 checks passed
rysweet added a commit that referenced this pull request Jul 4, 2026
…ture (#2501, #2555) (#2565)

#2501 and #2555 are duplicates: `simard merge-pr` aborted with
`no verdict keyword (ready/not_ready/unclear) found in recipe output;
raw="Recipe: merge-readiness-judge ... SUCCESS ..."`. The root cause —
the agent step's stdout was not surfaced to the verdict extractor, only the
recipe-runner text-mode SUCCESS banner — was already fixed by #2486
(`--output-format json` + `extract_recipe_decision_output`) and #2496/#2504
(strip the Copilot launcher preamble at the shared chokepoint). The residual
reports came from a stale deployed binary (simard 0.22.0 vs repo 0.24.0);
the diagnose howto already predicts that "pre-fix binary" symptom.

Lock the real wire format so a regression cannot silently reintroduce the abort:

- src/stewardship/recipe_merge_judge.rs: new `issue_2501_2555_real_envelope_tests`
  drives the PRODUCTION `extract_recipe_decision_output` on the exact
  `recipe-runner-rs 0.3.6 --output-format json --agent-binary copilot` envelope,
  where the launcher preamble (`ℹ … NODE_OPTIONS=… (saved preference)…`) is
  prepended to the verdict inside `step_results[].output`. Asserts: `ready`
  recovered (Parsed); `not_ready` keeps structured blockers; prose falls back to
  the keyword verdict with the preamble stripped from the rationale; preamble-only
  output fails closed to `unclear` (never a false `ready`); a text banner surfaces
  as a loud infra decode error.
- tests/gadugi/merge-judge-verdict.sh: steps (d)/(e) reproduce the preamble wire
  format deterministically (no LLM) and run the new module end-to-end.
- docs/howto/diagnose-merge-pr-verdict-parse-failures.md: document the preamble
  handling and the regression pin.

No production code change — the capture fix already shipped; this pins the real
0.3.6 wire format against regression and dedupes the two reports.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rysweet added a commit that referenced this pull request Jul 4, 2026
…erdict parse-miss (#2569) (#2600)

The progress-evidence gate previously FAILED OPEN (Accept) whenever the
LLM/recipe reviewer produced no recognizable accept/reject verdict — even on a
successful, non-empty run — letting hallucinated "0%->100% with no verdict"
bumps land as false "done" states.

Adopt the merge judge's infra-vs-semantic split across both live tiers
(RecipeProgressChecker primary + LlmReviewerProgressChecker direct-LLM fallback):
- INFRA failure (transport error / spawn failure / non-zero exit / output that
  strips to empty) -> keep fail-OPEN, so goals aren't blocked on infra hiccups.
- SEMANTIC parse-miss (successful, non-empty response with no verdict, or an
  unknown verdict string) -> fail-CLOSED (Reject). Reject only keeps the prior
  percent + logs a hallucination alert; it does not stall the goal.

Also parse the structured {"verdict": ...} JSON first in the recipe tier (was a
naive substring scan that wrongly Rejected an `accept` whose rationale mentioned
"reject", and could Accept "unacceptable"), reusing the direct-LLM tier's
tolerant parse_reviewer_response — matching the merge judge.

The merge path reported in #2569 was already fail-closed on main
(#2486/#2490/#2504, merged before #2569 was filed against released v0.22.0);
this adds a regression test pinning the reporter's exact SUCCESS-with-no-verdict
banners (30s AND 102s) -> Verdict::Unclear, and fixes the analogous fail-open in
the progress path.

Docs (progress-evidence-api, text-parsing-wire-formats §2a,
progress-evidence-gating, text-based-brain-protocol) updated to the new policy
and corrected to reflect that progress_reviewer.rs is the live direct-LLM
fallback tier (not deleted). Adds a gadugi outside-in scenario plus
parser/decision regression tests.

Closes #2569

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Brain: decide + orient RecipeBrain phases share the #2419 text-vs-json parse bug (orient actively corrupts urgency)

1 participant