refactor(panel): rebuild apm-review-panel as advisory, scannable, and self-healing by danielmeppiel · Pull Request #1093 · microsoft/apm

danielmeppiel · 2026-05-01T08:52:17Z

refactor(panel): rebuild apm-review-panel as advisory, scannable, and self-healing

TL;DR

The apm-review-panel skill posted binary panel-rejected verdicts on PRs whose real signal was a handful of recommended improvements plus polish nits. This PR rebuilds the panel as an advisory regime: the CEO returns a ship_recommendation.stance (prose, never a label), panelists rank findings on a three-bucket severity scale that the orchestrator never reads as a gate, and the comment template top-loads the busy-maintainer view that PR #1084 demonstrated by hand. Adds two new conditional panelists (doc-writer, test-coverage-expert), wires test evidence as load-bearing in CEO arbitration, lands the python-architect class diagram in the top comment, and notifies the PR author + reviewers via @-ping so a fresh pass surfaces in inboxes without re-introducing labels. Closes #1091.

Important

The panel never blocks merge anymore. The maintainer ships. Re-applying the panel-review label re-runs the panel cleanly, and the orchestrator now sweeps stale panel-approved / panel-rejected labels left behind by the previous regime.

Why

The pre-refactor panel had five coupled defects that compounded into "loud, adversarial, easy to ignore":

Schema forced false rejections. panelist-return-schema.json had two slots: required[] (blocks merge) XOR nits[] (skip). A real-but-not-blocking finding had nowhere safe to live; defensively it landed in required[], the deterministic gate verdict = APPROVE iff sum(required)==0 flipped to REJECT, and the maintainer got noise where they needed signal.
Binary verdict + auto-applied labels. A multi-persona LLM ensemble is not a deterministic gate. Pretending it was created adversarial framing (Verdict: REJECT topline, Required before merge (N) next) and hid where the lenses agreed, where they disagreed, and what was worth doing now vs later.
Stale verdict labels. A previously-rejected PR re-triggered with a green run never refreshed correctly; panel-rejected survived. Maintainer-confirmed.
Comment shape didn't help a busy maintainer. Flat lists, no per-persona summary table, no architecture diagrams unless one panelist happened to supply them, CEO arbitration buried below the verdict block. PR fix(install): allow credential helpers in --update preflight for generic hosts #1084's hand-written comment showed the shape reviewers actually want: top-loaded synthesis, single summary table, diagrams in the body, curated follow-ups, recommendation prose.
Two unowned failure modes. Documentation drift (code changed, docs didn't) and silent test-coverage drift (CLI surface or auth or lockfile semantics changed without a regression-trap test) had no lens. The panel could surface "this is risky" but no persona asked the one question that protects users from quiet re-regression: "if this code drifts six months from now, will any test fail before a user does?"

APM's positioning is pragmatic as npm. npm doesn't have a bot that blocks publish on stylistic LLM disagreement. Neither should we. The panel is here to inform the maintainer, not to gate them.

What

Aspect	Before	After
Verdict	`APPROVE` / `REJECT` (binary, deterministic from `sum(required)`)	None. CEO returns `ship_recommendation.stance` prose: `ship_now` / `ship_with_followups` / `needs_discussion` / `needs_rework`.
Severity	`required[]` (blocks) + `nits[]` (skip)	`findings[]` with `severity: blocking \| recommended \| nit`. Severity is signal strength, never read as a gate.
Labels written	`panel-approved` XOR `panel-rejected` (auto), remove `panel-review`	No `add-labels` call. `remove-labels` sweeps `[panel-review, panel-approved, panel-rejected]` on every run -- defensive cleanup of legacy verdict labels, plus re-run idempotency.
Comment shape	Verdict line, Required, Nits, CEO arbitration, per-persona detail	`cc @author @reviewers` -> stance pill -> headline -> CEO synthesis -> per-persona summary table -> top-N curated follow-ups -> architecture diagrams (class + sequence) in the top comment -> recommendation prose. Full per-persona findings collapsed at the bottom.
Personas	5 mandatory + `auth-expert` conditional	5 mandatory + `auth-expert` (cond.) + `doc-writer` (cond. on docs/CHANGELOG/README/skill+agent bundles) + `test-coverage-expert` (mandatory whenever the diff touches `src/*/.py`; conditional otherwise).
Test evidence	Opinion-only findings	Each finding may carry an `evidence { outcome, test_file, test_name, assertion_excerpt, proves, principles }` block. CEO arbitration weights `passed`/`failed`/`missing` test outcomes above opinion, with `missing` on a `secure-by-default`/`governed-by-policy`/`portability-by-manifest` surface ranked at or near blocking-tier.
Notification	Verdict labels acted as inbox cue	Orchestrator resolves `notify_audience` = PR author + CODEOWNERS-resolved reviewers + requested teams (bots filtered, capped at 6, teams preferred over individuals when capped); template renders `cc @...` directly under the headline.
Architecture diagrams in comment	Buried in per-persona detail (often missing)	python-architect persona produces a `class_diagram` slot rendered FIRST under `### Architecture` in the top comment, followed by sequence diagram. Schema, template, and persona contract all updated to wire this end-to-end.
Companion workflow	`pr-panel-label-reset.yml` (deterministic verdict-label sweeper)	Deleted. No verdict labels exist to reset.
Local eval harness	One ad-hoc verdict harness, two pre-refactor content fixtures	`render_eval.py` + two scenario fixtures (`ship_now` shape + `needs_rework` shape) that render the template's actual rules and lint ASCII. Both fixtures exercise all 8 panelist slots, the `cc @...` line (rendered in 02, skipped in 01), the class diagram slot, and `evidence` blocks (passed in 01, missing in 02).

How

flowchart TD
    Trigger[panel-review label] --> Orch[orchestrator]
    Orch --> PA[python-architect]
    Orch --> CL[cli-logging-expert]
    Orch --> DX[devx-ux-expert]
    Orch --> SC[supply-chain-security]
    Orch --> GH[oss-growth-hacker]
    Orch --> AE[auth-expert / cond.]
    Orch --> DW[doc-writer / cond.]
    Orch --> TC[test-coverage / mand. on src.py]
    PA --> Sch[findings array<br/>severity: blocking / recommended / nit<br/>+ optional evidence block]
    CL --> Sch
    DX --> Sch
    SC --> Sch
    GH --> Sch
    AE --> Sch
    DW --> Sch
    TC --> Sch
    PA --> Diag[class + sequence diagrams<br/>rendered in top comment]
    Sch --> CEO[apm-ceo synthesizer<br/>ship_recommendation.stance<br/>test evidence load-bearing]
    Orch --> Audience[resolve notify_audience<br/>author + reviewers + teams, bots filtered]
    CEO --> Render[render recommendation-template.md]
    Diag --> Render
    Audience --> Render
    Render --> Comment[add-comment: 1 comment<br/>cc @author @reviewers]
    Render --> Sweep[remove-labels:<br/>panel-review, panel-approved, panel-rejected]
    classDef writer fill:#0a3d2c,stroke:#1f7a4d,color:#fff
    classDef synth fill:#1a2a4a,stroke:#3d5a8a,color:#fff
    class Comment,Sweep writer
    class CEO synth

Concretely, the changes touch four layers and both .github/ + .apm/ mirrors:

Skill contract. SKILL.md rewritten with advisory invariants up top (no verdict, no add-labels, severity is signal not gate, single-writer interlock sweeps all three labels). 8-persona roster with explicit conditional rules. Calibrated severity contract embedded in the panelist prompt template.
Schemas. panelist-return-schema.json collapses required/nits into a single findings[] array with a severity enum and an optional evidence block; adds a class_diagram schema slot. ceo-return-schema.json replaces verdict with ship_recommendation { stance, prose }, adds headline, arbitration, principle_alignment, recommended_followups[], and notify_audience[].
Personas. agents/test-coverage-expert.agent.md (new): scope inherits north star from DevX UX; enumerates critical user-promise surfaces (CLI / error wording / install / lockfile / auth / hooks / marketplace / cross-module integration); mandates view/grep probe before any "no test exists" claim; every finding MUST carry an evidence block. agents/python-architect.agent.md updated to require both class_diagram and sequence_diagram outputs.
Template + workflow. recommendation-template.md (new, replaces verdict-template.md) renders sections only when their source field is non-empty -- no placeholder noise. Class diagram rendered FIRST under ### Architecture. cc @... line rendered directly under headline. pr-review-panel.md workflow drops add-labels; remove-labels.allowed lists all three labels with max: 3. pr-panel-label-reset.yml deleted. Lock recompiled with gh aw compile.
Eval scaffolding. evals/render_eval.py is a specification-test renderer that applies the template's rules deterministically (without an LLM) so a maintainer can eyeball offline. Two fixtures: 01-ship-now-pr1084-shape.json (the common case) and 02-needs-rework-shape.json (real regressions: path-traversal + Windows-encoding + circular import + missing regression-trap test).

Proof

The eval scaffolding shipped in this PR is the executable contract. The skill ships no Python runtime change, so unit/integration tests in the conventional pytest sense do not apply -- the rendered fixtures, the workflow lock, and the diff -rq mirror checks ARE this PR's executable evidence.

Eval harness output:

$ python3 .github/skills/apm-review-panel/evals/render_eval.py
[OK] 01-ship-now-pr1084-shape.json -> 01-ship-now-pr1084-shape.rendered.md (137 lines)
[OK] 02-needs-rework-shape.json -> 02-needs-rework-shape.rendered.md (138 lines)

Workflow lock confirms verdict-label removal:

$ grep "add_labels\|remove_labels" .github/workflows/pr-review-panel.lock.yml
"remove_labels":{"allowed":["panel-review","panel-approved","panel-rejected"],"max":3}

$ gh aw compile .github/workflows/pr-review-panel.md
Compiled 1 workflow(s): 0 error(s), 0 warning(s)

.github/ and .apm/ mirrors verified identical:

$ diff -rq .github/skills/apm-review-panel .apm/skills/apm-review-panel
$ diff -q .github/agents/test-coverage-expert.agent.md .apm/agents/test-coverage-expert.agent.md

Scenario Evidence -- promises mapped to executable proof:

#	Scenario (user promise)	Principle(s)	Proof
1	Panel never blocks merge: a PR with real findings still ships at the maintainer's discretion	DevX, Governed by policy	`pr-review-panel.lock.yml` (`add_labels` absent); `02-needs-rework-shape.rendered.md` (no "Verdict"/"REJECT" anywhere)
2	Re-applying `panel-review` on a previously `panel-rejected` PR self-heals the stale label	Governed by policy, DevX	`pr-review-panel.lock.yml`: `remove_labels.allowed` enumerates all three labels with `max: 3` (sweep is unconditional)
3	A busy maintainer can read one panel comment in ~30 seconds	DevX	`01-ship-now-pr1084-shape.rendered.md` (137 lines); `02-needs-rework-shape.rendered.md` (138 lines); template top-loads cc-line + stance pill + headline + table
4	A fresh advisory pass surfaces in the PR author's and reviewers' inboxes without re-introducing labels	DevX	`02-needs-rework-shape.rendered.md` renders `cc @danielmeppiel @microsoft/apm-maintainers ...`; `01-ship-now-pr1084-shape.rendered.md` skips the line cleanly when `notify_audience` is absent
5	The python-architect class diagram lands in the top comment without expanding per-persona details	DevX	`02-needs-rework-shape.rendered.md` renders the class diagram FIRST under `### Architecture`, before the sequence diagram
6	Test-coverage-expert is mandatory on `src/*/.py` changes (its evidence is load-bearing for CEO arbitration)	Secure by default, OSS	`agents/test-coverage-expert.agent.md` Activation logic; `01-ship-now-pr1084-shape.rendered.md` shows TC active on a behavior-change PR
7	Test-coverage-expert never asserts "no test exists" without a tool-call probe	OSS, DevX	`agents/test-coverage-expert.agent.md` Review procedure step 5 mandates `view`/`grep`/`glob`; Anti-patterns section names "Asserting 'no test exists' without grepping" as disqualifying
8	Bug-fix PRs include the regression-trap test (the test that would have caught the bug had it existed)	Secure by default, Governed by policy	`assets/scenario-evidence-rubric.md` Required-mapping-shape; test-coverage-expert `missing-regression-trap` gap class is always at least `recommended`
9	"Secure by default" scenarios cannot be proven by mocking the security boundary they assert on	Secure by default	`assets/scenario-evidence-rubric.md` Anti-patterns refuses "Mocked boundary on a security scenario"; TC classifies as `mocked-boundary-on-security-scenario` -> `blocking`
10	Test evidence outweighs opinion in CEO arbitration	Governed by policy	`agents/apm-ceo.agent.md` "Treat test evidence as load-bearing" subsection; `02-needs-rework-shape.rendered.md` renders verbatim assertion excerpts under each blocking finding
11	Single source of truth: changing the rubric once updates both consumers	OSS, Portability	`diff -rq` confirms `.github/` and `.apm/` mirrors identical; SKILL.md + section-rubric.md + test-coverage-expert.agent.md all link to the same `scenario-evidence-rubric.md` path

End-to-end validation happens by labeling a real PR with panel-review after merge.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…val harness; legacy-label sweep The apm-review-panel skill posted binary REJECT verdicts on PRs whose real signal was a handful of recommended improvements + polish nits. Root cause: the panelist schema forced every finding into required[] (blocks merge) XOR nits[] (skip), with no middle slot. Careful reviewers defaulted to required defensively -> deterministic REJECT -> noise. This PR refactors the panel from a gate regime to an advisory regime: - New panelist schema: unified findings[] with severity enum (blocking | recommended | nit). Severity is signal strength, never read by the orchestrator to gate anything. - New CEO schema: returns ship_recommendation { stance, prose } where stance is one of ship_now | ship_with_followups | needs_discussion | needs_rework. Prose for the human reviewer, never auto-applied as a label or status check. - New comment template (assets/recommendation-template.md): top-loads CEO synthesis + per-persona summary table + top-N curated follow-ups + (optional) architecture diagrams + recommendation prose. Full per-persona findings collapsed at the bottom. Mirrors the manual comment shape on PR #1084. - New conditional persona: doc-writer. Activates on README, CHANGELOG, docs/, .apm/skills|agents/, .github/skills|agents|instructions/, and .github/workflows/*.md changes. Also detects DRIFT (code changed but docs didn't), not just doc edits. - Workflow safe-outputs: drop add-labels (no verdict to encode); remove-labels now sweeps panel-review (re-run idempotency) AND legacy panel-approved / panel-rejected (defensive cleanup of stale verdict labels left over from the pre-advisory regime). - Delete pr-panel-label-reset.yml: no verdict labels to reset. - Eval harness: render_eval.py + two fixtures (ship-now shape from PR #1084, needs-rework shape with blocking-severity findings). Specification test that confirms the rendered comment is short, scannable, and doesn't bury the lede. Closes #1091 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Refactors the apm-review-panel automation from a deterministic verdict/labeling system into an advisory review panel that posts a single scannable recommendation comment, never applies verdict labels, and defensively sweeps legacy labels to make re-runs idempotent.

Changes:

Update the gh-aw workflow to remove legacy verdict-label writes and expand label removal to sweep panel-review, panel-approved, and panel-rejected.
Redesign panelist/CEO JSON schemas and the skill contract around advisory severities (blocking | recommended | nit) and a CEO ship_recommendation.stance (no APPROVE/REJECT).
Replace verdict comment assets and eval artifacts with a recommendation template plus a deterministic render_eval.py harness and fixtures.

Show a summary per file

File	Description
.github/workflows/pr-review-panel.md	Removes verdict-label write surface; expands remove-label sweep and updates workflow prose to advisory contract.
.github/workflows/pr-review-panel.lock.yml	Recompiled lock reflecting removal of `add_labels` safe output and updated `remove_labels` constraints.
.github/workflows/pr-panel-label-reset.yml	Deletes legacy deterministic verdict-label reset workflow (now obsolete under advisory regime).
.github/skills/apm-review-panel/SKILL.md	Updates panel contract, roster (adds conditional doc-writer), and advisory rendering/sweep rules.
.github/skills/apm-review-panel/evals/trigger-evals.json	Updates routing/trigger eval phrasing from “verdict” to “recommendation”.
.github/skills/apm-review-panel/evals/run-verdict-harness.py	Removes verdict-era deterministic harness.
.github/skills/apm-review-panel/evals/results-trigger-2026-04-28.md	Removes old trigger eval write-up (verdict-era).
.github/skills/apm-review-panel/evals/results-label-reset-2026-04-28.md	Removes old label-reset workflow validation write-up.
.github/skills/apm-review-panel/evals/results-e2e-pr931-2026-04-28.md	Removes verdict-era e2e eval write-up.
.github/skills/apm-review-panel/evals/render_eval.py	Adds deterministic renderer for fixtures + ASCII lint to validate output shape offline.
.github/skills/apm-review-panel/evals/README.md	Updates eval documentation to focus on render eval + fixtures and the trigger eval set.
.github/skills/apm-review-panel/evals/fixtures/01-ship-now-pr1084-shape.json	Adds “ship_now” scenario fixture for the new advisory schema.
.github/skills/apm-review-panel/evals/fixtures/01-ship-now-pr1084-shape.rendered.md	Adds rendered markdown output for the “ship_now” fixture.
.github/skills/apm-review-panel/evals/fixtures/02-needs-rework-shape.json	Adds “needs_rework” scenario fixture demonstrating high-signal advisory findings.
.github/skills/apm-review-panel/evals/fixtures/02-needs-rework-shape.rendered.md	Adds rendered markdown output for the “needs_rework” fixture.
.github/skills/apm-review-panel/evals/content-eval-rejected-pr.md	Removes verdict-era content eval doc.
.github/skills/apm-review-panel/evals/content-eval-clean-pr.md	Removes verdict-era content eval doc.
.github/skills/apm-review-panel/assets/verdict-template.md	Removes verdict-era comment template asset.
.github/skills/apm-review-panel/assets/recommendation-template.md	Adds new top-loaded advisory recommendation template (with per-persona table + optional diagrams).
.github/skills/apm-review-panel/assets/panelist-return-schema.json	Replaces required/nits schema with `findings[]` + severity and adds doc-writer persona.
.github/skills/apm-review-panel/assets/ceo-return-schema.json	Replaces verdict-era CEO shape with advisory fields including `ship_recommendation` and curated follow-ups.
.apm/skills/apm-review-panel/SKILL.md	Mirrors the advisory contract changes into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/trigger-evals.json	Mirrors the trigger eval phrasing updates into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/run-verdict-harness.py	Mirrors removal of the verdict-era harness into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/results-trigger-2026-04-28.md	Mirrors removal of verdict-era eval write-up into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/results-label-reset-2026-04-28.md	Mirrors removal of verdict-era label-reset eval write-up into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/results-e2e-pr931-2026-04-28.md	Mirrors removal of verdict-era e2e eval write-up into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/render_eval.py	Mirrors addition of deterministic render harness into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/README.md	Mirrors updated eval documentation into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/fixtures/01-ship-now-pr1084-shape.json	Mirrors new fixture into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/fixtures/01-ship-now-pr1084-shape.rendered.md	Mirrors rendered output into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/fixtures/02-needs-rework-shape.json	Mirrors new fixture into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/fixtures/02-needs-rework-shape.rendered.md	Mirrors rendered output into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/content-eval-rejected-pr.md	Mirrors removal of verdict-era content eval doc into the `.apm/` bundle.
.apm/skills/apm-review-panel/evals/content-eval-clean-pr.md	Mirrors removal of verdict-era content eval doc into the `.apm/` bundle.
.apm/skills/apm-review-panel/assets/verdict-template.md	Mirrors removal of verdict-era template into the `.apm/` bundle.
.apm/skills/apm-review-panel/assets/recommendation-template.md	Mirrors addition of advisory template into the `.apm/` bundle.
.apm/skills/apm-review-panel/assets/panelist-return-schema.json	Mirrors advisory panelist schema into the `.apm/` bundle.
.apm/skills/apm-review-panel/assets/ceo-return-schema.json	Mirrors advisory CEO schema into the `.apm/` bundle.

Copilot's findings

Files reviewed: 39/39 changed files
Comments generated: 6

Copilot · 2026-05-01T08:57:42Z

    "active": {
      "type": "boolean",
-      "description": "Set to false ONLY for auth-expert when the conditional rule does not activate it. All other personas MUST set active=true. When false, required and nits MUST be empty arrays and inactive_reason MUST be a one-sentence explanation citing the touched files.",
-      "default": true
+      "description": "Set to false ONLY for conditional personas (auth-expert, doc-writer) when their fast-path file triggers and fallback self-check both miss. All mandatory personas MUST set active=true. When false, findings MUST be empty and inactive_reason MUST be a one-sentence explanation citing the touched files."
    },
    "inactive_reason": {


inactive_reason is documented as required when active=false, and findings is supposed to be empty in that case, but the schema doesn't actually enforce either constraint. As written, the S4 schema gate could accept an inactive panelist with missing inactive_reason (or with findings), which would render badly in the template. Consider adding an if/then (or allOf with dependentRequired) to require inactive_reason and findings: [] when active is false.

Copilot · 2026-05-01T08:57:42Z

+          "description": "Reserved for python-architect. Two mermaid blocks (string each) used by the comment template. If absent, the orchestrator renders a placeholder.",
+          "properties": {
+            "component": { "type": "string", "description": "Mermaid flowchart of the components/data flow touched by the PR." },
+            "sequence": { "type": "string", "description": "Mermaid sequenceDiagram of the user-visible behavior change." }


The schema/docs here say missing diagrams should render a placeholder, but assets/recommendation-template.md explicitly says to never render placeholders and to omit the Architecture section unless diagrams are supplied. Please align the schema wording with the template/rendering rules (and with SKILL.md) to avoid contradictory guidance for the orchestrator/panelists.

Suggested change

"description": "Reserved for python-architect. Two mermaid blocks (string each) used by the comment template. If absent, the orchestrator renders a placeholder.",

"properties": {

"component": { "type": "string", "description": "Mermaid flowchart of the components/data flow touched by the PR." },

"sequence": { "type": "string", "description": "Mermaid sequenceDiagram of the user-visible behavior change." }

"description": "Reserved for python-architect. Optional mermaid blocks (string each) used by the comment template. Never render placeholders; if diagrams are absent, omit the Architecture section.",

"properties": {

"component": { "type": "string", "description": "Optional mermaid flowchart of the components or data flow touched by the PR." },

"sequence": { "type": "string", "description": "Optional mermaid sequenceDiagram of the user-visible behavior change." }

Copilot · 2026-05-01T08:57:43Z

+     If absent, render the placeholder lines from the template (do NOT
+     invent diagrams).


This step says to render placeholder lines when diagrams are absent, but the new assets/recommendation-template.md says the opposite ("NEVER render a placeholder" / only render diagrams when supplied). Please make SKILL.md match the template: either require diagrams be present, or specify that the Architecture section is omitted when missing.

Suggested change

If absent, render the placeholder lines from the template (do NOT

invent diagrams).

Render the Architecture section ONLY when diagrams are supplied.

If absent, omit that section entirely and do NOT invent diagrams or

render placeholder lines.

Copilot · 2026-05-01T08:57:43Z

+{{#if python_architect.extras.diagrams.sequence }}
+```mermaid
+{{ python_architect.extras.diagrams.sequence }}
+```
+{{/if}}


As written, the ### Architecture heading is only included when the component diagram exists. If only python_architect.extras.diagrams.sequence is present, the template will emit a mermaid block without the section heading. Consider restructuring so the heading appears when either diagram exists (this also matches the behavior in evals/render_eval.py).

Copilot · 2026-05-01T08:57:43Z

+    for i, line in enumerate(text.splitlines(), 1):
+        for ch in line:
+            cp = ord(ch)
+            if ch == "\n" or ch == "\t":
+                continue


lint_ascii() skips tab characters (\t), but the repo encoding rule is printable ASCII U+0020–U+007E, so tabs (0x09) are non-printable and should be flagged. Dropping the special-case for \t (or explicitly reporting it) will make the eval harness properly catch accidental tabs in rendered output.

Copilot · 2026-05-01T08:57:43Z

+  "description": "Shape the apm-ceo synthesizer MUST return when invoked by the apm-review-panel skill. The CEO produces ARBITRATION and SHIP-RECOMMENDATION prose. The CEO does NOT pick a binary verdict; the panel is advisory. The orchestrator renders these fields into the comment template.",
  "type": "object",
-  "required": ["arbitration"],
+  "required": ["headline", "arbitration", "principle_alignment", "ship_recommendation"],


recommended_followups is described as part of the CEO output and the template expects it (e.g., .length checks). Since it's not in the schema's required list, a valid CEO return could omit it entirely, forcing the orchestrator to invent defaults. Consider making recommended_followups required (allowing an empty array) to keep the schema/template contract deterministic.

Suggested change

"required": ["headline", "arbitration", "principle_alignment", "ship_recommendation"],

"required": ["headline", "arbitration", "principle_alignment", "recommended_followups", "ship_recommendation"],

Pairs with the DevX UX lens to assess whether behavior-change PRs are protected by tests that catch regressions in user-facing promises. Conditional like auth-expert and doc-writer: orchestrator always spawns for schema-shape uniformity; persona self-reports active=false when off-condition (pure docs / pure refactor / pure asset bumps). Calibrated severity: blocking ONLY when a critical user-promise surface (auth, lockfile, install, marketplace, hooks, exit codes, error wording) changes AND no test exercises the new behavior AND the persona can name the specific test file path. Mandatory probe: must view/grep the test tree before emitting any 'no test exists' claim -- false positives here destroy maintainer trust. Boundaries documented vs python-architect (test code design), auth-expert (auth-specific assertions), and devx-ux-expert (which user promises exist). - New agent: .apm/agents/test-coverage-expert.agent.md (mirrored to .github/agents/) - SKILL.md: roster (5 mandatory + 3 conditional), topology, conditional intro count, new section, routing matrix, fan-out list, gotchas - panelist-return-schema.json: persona enum + active description - Eval fixtures 01 + 02: TC row added; needs-rework demonstrates the highest-value case (path-traversal regression where the regression- trap test is the test that prevents re-shipping) - render_eval.py: PERSONA_LABELS + canonical_order extended Verified: render_eval.py emits [OK] on both fixtures; .apm/ and .github/ mirrors verified identical via diff -rq. Refs #1091 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…orks Single shared asset (scenario-evidence-rubric.md) consumed by two primitives: pr-description-skill (as a required Validation subsection on every behavior-change PR) AND test-coverage-expert.agent.md (as the evaluation lens applied during panel review). The rubric answers ONE question from the maintainer's chair: for each user promise this PR touches, which test exercises the scenario, and which APM principle does that scenario serve? Not pure coverage -- SCENARIO coverage tied to the principles named in the README: Portability by manifest, Secure by default, Governed by policy, Multi-harness support, Vendor-neutral, DevX (pragmatic as npm), OSS. Composition (R3 EXTRACT + DUAL CONSUMER): - Asset lives once at .github/skills/pr-description-skill/assets/ scenario-evidence-rubric.md (mirrored to .apm/). - pr-description-skill: pr-body-template.md adds Scenario Evidence subsection; section-rubric.md adds 9b acceptance + refusal criteria and a final-pass checkbox; SKILL.md activation contract adds scenario-test-mapping input; required-body-structure row 9 updated. - test-coverage-expert: review procedure now READS the table FIRST, audits each row against the diff, then probes for unmapped behavior. New gap classes added: mocked-boundary-on-security-scenario (blocking; the rubric explicitly refuses tautological tests) and principle-mismapping (recommended). Skip clause for the table: docs-only / asset-bump-only / pure-refactor PRs (author states which case applies in trade-offs). Bug-fix PRs MUST include the regression-trap test row tagged with the issue it would have caught -- closes the loop with the test-coverage-expert's missing-regression-trap finding class. Verified: render_eval emits [OK] on both fixtures (no panel-side schema change). .apm/ and .github/ mirrors verified identical via diff -rq. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Tests, when coded right, are irrefutable on a given commit. The CEO synthesizer should weight a passed/failed test outcome above any opinion-only finding. This commit wires that discipline into the panel contract. Schema: panelist-return-schema.json gains optional finding.evidence { outcome, test_file, test_name, assertion_excerpt, proves, principles }. test-coverage-expert REQUIRES it on every finding (its contract is stricter than other panelists -- the evidence IS its value-add). apm-ceo: new 'Treat test evidence as load-bearing' subsection. - outcome=passed: proves the user promise holds; cite test+assertion - outcome=failed: strongest signal short of a CVE; surface in headline - outcome=missing on secure-by-default / governed-by-policy / portability surface ranks at or near blocking-tier opinion findings (the absence of an automated guardrail IS a real defect on those surfaces) - outcome=manual: arbitrated as missing - outcome=unknown: discarded from arbitration weight - two-tier rule for follow-ups: failed-test rows always rank above opinion-only rows of the same severity; missing-test on critical promise rank above any opinion 'recommended' finding Render: render_eval.py + recommendation-template.md surface a 'Proof' line per finding showing test ref, outcome, what it proves, and the APM principles it covers, with the verbatim assertion excerpt below. Fixtures 01 + 02 demonstrate both sides: - 01: outcome=passed on the credential-helper env-var contract - 02: outcome=missing on path-traversal regression-trap (secure-by- default, blocking) + integration boundary (portability, recommended) Both fixtures still render [OK]. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@Reviewer

Two coupled fixes folded into PR #1093 via the genesis discipline. 1) Restore the maintainer-notification signal that the verdict-label sweep removed. The panel now resolves an @-ping audience from 'gh pr view --json author,reviewRequests' (PR author + CODEOWNERS-resolved requested reviewers + requested teams), filters bots, caps at 6 handles, and renders a 'cc @author @Reviewer ...' line right under the headline. This is the only mechanism by which a fresh advisory pass surfaces in inboxes; without it, a panel re-run on a previously-reviewed PR is silent. 2) Make test-coverage-expert mandatory on every PR that touches 'src/**/*.py'. The persona was conditional for symmetry with auth/doc-writer, but that symmetry broke when test evidence became load-bearing in CEO arbitration (commit e502fa2). A persona whose findings outweigh opinion cannot be silently skipped on a heuristic. Skip ONLY on documentation-only PRs (zero src/*.py files in the diff). Eval fixtures updated: fixture 02 now carries a non-empty notify_audience to prove the cc line renders; fixture 01 leaves it empty to prove the line is skipped (not rendered as placeholder). Both evals green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The .apm/ rendered fixture for 01 was stale from before commit e502fa2 (proof-evidence rendering). 'apm install' regenerates .github/ from .apm/, so the integration drift check failed because .github/ had the Proof line but .apm/ source did not. Verified clean via local 'apm install'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds extras.diagrams.class_diagram schema slot, renders it first under ### Architecture, asks the persona for it explicitly, and regenerates both rendered.md fixtures so the panel comment shows load-bearing visual evidence without requiring readers to expand the per-persona findings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 1, 2026 08:52

danielmeppiel requested a review from sergio-sisternes-epam as a code owner May 1, 2026 08:52

Copilot started reviewing on behalf of danielmeppiel May 1, 2026 08:53 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Copilot and others added 3 commits May 1, 2026 11:38

danielmeppiel force-pushed the refactor/panel-advisory-regime branch from 07ca716 to e502fa2 Compare May 1, 2026 10:03

Copilot and others added 3 commits May 1, 2026 12:28

danielmeppiel merged commit 6c99c5b into main May 1, 2026
16 checks passed

danielmeppiel deleted the refactor/panel-advisory-regime branch May 1, 2026 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(panel): rebuild apm-review-panel as advisory, scannable, and self-healing#1093

refactor(panel): rebuild apm-review-panel as advisory, scannable, and self-healing#1093
danielmeppiel merged 7 commits intomainfrom
refactor/panel-advisory-regime

danielmeppiel commented May 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		If absent, render the placeholder lines from the template (do NOT
		invent diagrams).

-     If absent, render the placeholder lines from the template (do NOT
-     invent diagrams).
+     Render the Architecture section ONLY when diagrams are supplied.
+     If absent, omit that section entirely and do NOT invent diagrams or
+     render placeholder lines.

	"required": ["headline", "arbitration", "principle_alignment", "ship_recommendation"],
	"required": ["headline", "arbitration", "principle_alignment", "recommended_followups", "ship_recommendation"],

Conversation

danielmeppiel commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

refactor(panel): rebuild apm-review-panel as advisory, scannable, and self-healing

TL;DR

Why

What

How

Proof

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielmeppiel commented May 1, 2026 •

edited

Loading