Skip to content

fix(baa-dev): route session analysis to session-analyst; ground architect reviews in tool evidence#255

Merged
sadlilas merged 1 commit into
mainfrom
experiments/behavioral-anchor-amplifier-dev
Jun 18, 2026
Merged

fix(baa-dev): route session analysis to session-analyst; ground architect reviews in tool evidence#255
sadlilas merged 1 commit into
mainfrom
experiments/behavioral-anchor-amplifier-dev

Conversation

@sadlilas

Copy link
Copy Markdown
Collaborator

Two targeted, structural fixes to the behavioral-anchor-amplifier-dev experiment bundle, ahead of a verifying measurement sweep.

FIX 1 (session-analyst routing): register foundation:session-analyst as a delegable agent and add a "Delegate session analysis" principle to the system prompt. Closes a model-agnostic defect where session-analysis / events.jsonl tasks were not routed to the specialist (base BAA scored 2/2/2 vs amplifier-dev 10/9/9 across opus-4.7/opus-4.8/gpt-5.5).

FIX 2 (architect verification): grant the architect agent tool-web and turn its cite-evidence rule into an evidence gate, so PR/code reviews must fetch and read what they assert rather than confabulating (addresses the pr185 fabricated-review failure on opus-4.7).

Deliberately scoped: kernel-vocab phrasing, anti-paralysis, and DTU-persistence changes were evaluated and EXCLUDED as not worth the cost / not reproducible defects.

Impact to be confirmed by a measurement run after merge.

Note: This PR includes both the bundle-add commit (257c8f9) and this fix commit (e36cee1) — that is intentional. The PR delivers the complete, fixed bundle into the experiments folder. After approval, a measurement sweep will validate both fixes.

…tect reviews in tool evidence

Two targeted, structural fixes to the behavioral-anchor-amplifier-dev experiment bundle, ahead of a verifying measurement sweep.

FIX 1 (session-analyst routing): register `foundation:session-analyst` as a delegable agent and add a "Delegate session analysis" principle to the system prompt. Closes a model-agnostic defect where session-analysis / events.jsonl tasks were not routed to the specialist (base BAA scored 2/2/2 vs amplifier-dev 10/9/9 across opus-4.7/opus-4.8/gpt-5.5).

FIX 2 (architect verification): grant the architect agent `tool-web` and turn its cite-evidence rule into an evidence gate, so PR/code reviews must fetch and read what they assert rather than confabulating (addresses the pr185 fabricated-review failure on opus-4.7).

Deliberately scoped: kernel-vocab phrasing, anti-paralysis, and DTU-persistence changes were evaluated and EXCLUDED as not worth the cost / not reproducible defects.

Impact to be confirmed by a measurement run after merge.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@sadlilas sadlilas force-pushed the experiments/behavioral-anchor-amplifier-dev branch from e36cee1 to fdaf2f9 Compare June 18, 2026 17:53
@sadlilas sadlilas merged commit c566b95 into main Jun 18, 2026
4 checks passed
@sadlilas sadlilas deleted the experiments/behavioral-anchor-amplifier-dev branch June 18, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant