Skip to content

fix(install): persist --skill filter to apm.yml (#1395)#1442

Merged
danielmeppiel merged 2 commits into
mainfrom
fix/1395-skill-subset-persist
May 22, 2026
Merged

fix(install): persist --skill filter to apm.yml (#1395)#1442
danielmeppiel merged 2 commits into
mainfrom
fix/1395-skill-subset-persist

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Description

When installing a package with --skill to select a subset of skills, the filter was computed but never threaded into the apm.yml persistence layer. The _skill_subset variable was effectively dead code — DependencyReference.to_apm_yml_entry() already handles skill_subset correctly (emitting {"git": "...", "skills": [...]}), but dep_ref.skill_subset was never populated.

Fixes

Fixes #1395

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Changes

  1. _resolve_package_references — added skill_subset parameter; sets dep_ref.skill_subset and populates _apm_yml_entries via to_apm_yml_entry()
  2. _validate_and_add_packages_to_apm_yml — forwards skill_subset to resolve
  3. InstallContext — added skill_subset and skill_subset_from_cli fields to carry the filter through the pipeline
  4. Call sites — wired _skill_subset into both the validate and install paths

Testing

  • 2 new regression tests in test_install_skill_subset.py (with/without --skill)
  • Existing install tests pass (57 tests)
  • Lint clean

Copilot AI review requested due to automatic review settings May 21, 2026 20:35
@sergio-sisternes-epam sergio-sisternes-epam mentioned this pull request May 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a regression where apm install ... --skill ... computed a skill filter but failed to persist it into apm.yml, causing later integrations to deploy all skills in a bundle instead of the selected subset.

Changes:

  • Thread skill_subset through _validate_and_add_packages_to_apm_yml() into _resolve_package_references() and ensure structured apm.yml entries are written when a subset is active.
  • Extend InstallContext (and CLI->pipeline wiring) to carry skill_subset and whether it originated from the CLI.
  • Add unit regression tests covering persistence behavior with and without --skill.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/apm_cli/commands/install.py Wires skill_subset through validation/persistence and into the install pipeline context.
tests/unit/commands/test_install_skill_subset.py Adds regression tests asserting apm.yml uses dict form with skills: when --skill is provided.
tests/unit/commands/test_install_context.py Updates the InstallContext structural field contract to include the new fields.

Comment thread src/apm_cli/commands/install.py
Comment thread src/apm_cli/commands/install.py Outdated
@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/1395-skill-subset-persist branch from fa40f28 to 76a3b97 Compare May 21, 2026 20:44
@danielmeppiel
Copy link
Copy Markdown
Collaborator

APM Review Panel: ship_with_followups

Community fix restores install --skill persistence to apm.yml, closing a v0.11.0 regression that broke manifest round-trip fidelity for skill-subset installs.

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

Panel signal is unusually convergent: all six active panelists recommend ship, zero blocking findings. The fix is surgical (29 src lines), correctly threads skill_subset through InstallContext and _validate_and_add_packages_to_apm_yml, and lands with 33 passing unit tests on the persistence boundary plus a live integration test behind @pytest.mark.live. The two strongest follow-up signals -- hermetic integration coverage (test-coverage-expert) and write-path input validation (supply-chain-security-expert) -- are defense-in-depth gaps, not exploit-grade blocks. The read-path already rejects traversal sequences via validate_path_segments, so the write-path gap is a fail-fast improvement, not a live vulnerability.

CLI-logging-expert and devx-ux-expert converge on the same user-facing concern: silent normalization (strip, dedupe, overwrite) without feedback. This is a polish gap, not a correctness gap -- the installed state and the persisted state are correct; the user simply is not told about deduplication or replacement. Worth tracking but not worth delaying a community regression fix.

No dissent between panelists. Python-architect's normalization-hoist and safety-branch observations are structural hygiene for a future refactor pass, not objections to the current fix.

Aligned with: Portable by manifest -- restores the contract that apm.yml faithfully records what was installed, including skill-subset filters. Pragmatic as npm -- overwrite semantics for repeated --skill match npm's replace-on-reinstall mental model. OSS community driven -- community contributor caught and fixed a regression with a well-structured PR; shipping fast validates the contributor funnel.

Growth signal. This is the archetype PR for community-funnel health: an external contributor (@sergio-sisternes-epam) identified a regression in a tier-1 promise (install fidelity), delivered a fix with tests, and the panel found zero blocking issues. Shipping promptly compounds the signal that community regression fixes land fast -- exactly the narrative that converts drive-by contributors into repeat participants. The install-idempotency angle is repostable: "your --skill filters now survive reinstalls."

Panel summary

Persona B R N Takeaway
Python Architect 0 2 1 Clean bug fix with correct dataclass extension; normalization should be hoisted out of the per-package loop to avoid redundant work.
CLI Logging Expert 0 1 1 Skill-subset normalization is entirely silent; users get no feedback when names are dropped or when apm.yml entry shape changes.
DevX UX Expert 0 2 1 Skill-subset persistence restores install-idempotency; replace-vs-merge semantics for repeated --skill should be documented.
Supply Chain Security Expert 0 1 0 Read path validates skill names; write path does not. No exploit on current code paths but defense-in-depth missing.
OSS Growth Hacker 0 1 1 Community-contributed regression fix is well-structured; add CHANGELOG entry crediting @sergio-sisternes-epam to compound contributor-funnel signal.
Test Coverage Expert 0 1 1 Unit coverage is thorough (33 tests pass on persistence helpers); live integration test exists, but no hermetic integration test exercises the install->persist round-trip without network.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Test Coverage Expert] Add a hermetic (non-@LiVe) integration test exercising install --skill -> apm.yml round-trip with a local bare git fixture. -- The PR's unit tests mock the persistence boundary they claim to prove; a hermetic integration test closes the regression-trap gap without requiring network or GITHUB_APM_PAT.
  2. [OSS Growth Hacker] Add CHANGELOG entry under [Unreleased] Fixed crediting @sergio-sisternes-epam for [BUG] #1395. -- Named credit in CHANGELOG converts merge into a visible contributor-funnel signal; cheap win that compounds repeat contributions.
  3. [DevX UX Expert] Document replace-vs-merge semantics in --skill help text so users know re-running with a different list overwrites. -- No prior art in package managers for subset selectors; without documentation, users expecting merge semantics will be surprised when the previous subset vanishes.
  4. [CLI Logging Expert] Emit a warning when normalization silently drops empty or duplicate skill names from --skill input. -- A typo'd skill name currently vanishes without trace; one diagnostic line would catch user errors before they persist bad state to apm.yml.
  5. [Supply Chain Security Expert] Add write-path validation (validate_skill_name or validate_path_segments) in the normalization loop before persisting to apm.yml. -- Defense-in-depth: reject malformed skill names at ingestion time rather than relying solely on the read-path parser to catch traversal sequences on next install.

Architecture

classDiagram
    direction LR
    class InstallContext {
        <<Dataclass>>
        +skill_subset tuple
        +skill_subset_from_cli bool
    }
    class DependencyReference {
        <<ValueObject>>
        +skill_subset list
        +to_apm_yml_entry()
    }
    class install {
        <<CLIEntryPoint>>
    }
    class _validate_and_add_packages_to_apm_yml
    class _resolve_package_references
    class _install_apm_packages
    install ..> _validate_and_add_packages_to_apm_yml : passes skill_subset
    install ..> InstallContext : constructs
    _validate_and_add_packages_to_apm_yml ..> _resolve_package_references : forwards
    _resolve_package_references ..> DependencyReference : mutates skill_subset
    _install_apm_packages ..> InstallContext : reads
    class InstallContext:::touched
    class _resolve_package_references:::touched
    class _validate_and_add_packages_to_apm_yml:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A["install() L1312: _skill_subset = tuple(skill_names)"] --> B["_validate_and_add_packages_to_apm_yml(skill_subset)"]
    B --> C["_resolve_package_references(skill_subset)"]
    C --> D["for package in packages"]
    D --> E["resolve_parsed_dependency_reference -> dep_ref"]
    E --> F{"skill_subset truthy?"}
    F -->|Yes| G["Normalize strip dedupe; dep_ref.skill_subset = normalized"]
    F -->|No| H["dep_ref.skill_subset unchanged"]
    G --> I{"marketplace or gitlab?"}
    H --> I
    I -->|Yes| J["_apm_yml_entries set via dependency_reference_to_yaml_entry"]
    I -->|No| K{"_validate_package_exists"}
    K -->|Pass| L{"skill_subset and canonical not in entries?"}
    L -->|Yes| M["Safety branch L489: _apm_yml_entries[canonical] = to_apm_yml_entry"]
    L -->|No| N["fallback plain canonical string"]
    J --> O["_merge_packages_into_yml writes apm.yml"]
    M --> O
    N --> O
    A --> P["InstallContext(skill_subset, skill_subset_from_cli)"]
    P --> Q["_install_apm_packages reads ctx"]
Loading

Recommendation

Merge this PR promptly -- it restores a tier-1 install-fidelity promise broken since v0.11.0, is well-tested at the unit boundary, and carries zero blocking findings from six active panelists. Track the hermetic integration test as the highest-signal follow-up (prevents future regression-trap drift on the persistence contract); CHANGELOG credit can land in the merge commit or a fast-follow.


Full per-persona findings

Python Architect

  • [recommended] Normalization block re-executes identically on every loop iteration at src/apm_cli/commands/install.py:446
    skill_subset is invariant across the for-package loop in _resolve_package_references. The strip/dedupe/filter logic runs N times producing the same _normalized list each time. O(packages * skills) wasted work and clutters the hot loop with input-sanitization concerns that belong at the call boundary.
    Suggested: Normalize once before the for-package loop, or normalize at L1317 when _skill_subset is first constructed from CLI args.
  • [recommended] Safety branch at L489 papers over missing unified _apm_yml_entries population at src/apm_cli/commands/install.py:489
    The normal (github, non-insecure) path never sets an entry, relying on _merge_packages_into_yml falling back to the plain canonical string. The new safety branch adds a fourth conditional site -- chain-of-special-cases anti-pattern. Future-state: unconditionally call _apm_yml_entries[canonical] = dep_ref.to_apm_yml_entry() after dep_ref is fully configured.
    Suggested: Track refactor: unconditionally populate _apm_yml_entries[canonical] = dep_ref.to_apm_yml_entry() once per resolved dep_ref.
  • [nit] Type annotation: prefer tuple[str, ...] over builtins.tuple[str, ...] at src/apm_cli/commands/install.py:229
    PEP 585 allows tuple[str, ...] directly. The codebase already uses str | None (PEP 604). builtins.tuple is unconventional.
    Suggested: skill_subset: tuple[str, ...] | None = None

CLI Logging Expert

  • [recommended] Silent drop of empty/duplicate skill names violates Name-the-thing and Include-the-fix rules at src/apm_cli/commands/install.py:446
    When the user passes --skill ' ,skill-a,skill-a', the normalization silently strips and dedupes. No DiagnosticCollector entry, no _rich_warning. A typo'd skill name vanishes without trace. Message-writing rules require the CLI to name the alteration and how to inspect it (--verbose).
    Suggested: After the normalization loop, compare len(skill_subset) vs len(_normalized). If fewer: emit [!] Dropped N empty/duplicate skill name(s); use --verbose to list. In --verbose, list the dropped values.
  • [nit] No verbose-only info line signals apm.yml entry promotion from string to dict form at src/apm_cli/commands/install.py
    Progressive disclosure: when apm.yml goes from 'org/repo' to {git, skills:[...]}, a one-line verbose info aids users diffing their manifest.

DevX UX Expert

  • [recommended] Replace-vs-merge semantics for repeated --skill invocations are undocumented at src/apm_cli/commands/install.py:1038
    Code does dep_ref.skill_subset = _normalized (overwrite, npm-like). But --skill resembles cargo --features which MERGES. No prior art in package managers for a subset selector; a user who runs install --skill foo then install --skill bar will be surprised that foo was dropped.
    Suggested: Amend --skill help text: "Re-running with a different --skill list REPLACES the previous subset. Use --skill * (or omit --skill) to reset to all skills."
  • [recommended] No CLI feedback when --skill overwrites an existing subset in apm.yml
    When apm.yml records skills:[foo] and the user runs install --skill bar, the subset is silently replaced. Mental model: install should acknowledge mutations. Emit Updated skill subset for org/repo: [foo] -> [bar].
    Suggested: One-line info message on subset overwrite. Implementation overlaps with cli-logging-expert silent-drop finding.
  • [nit] Help example uses --skill before package (positional order) at src/apm_cli/commands/install.py:1125
    npm/pip/cargo always put the package first in examples. Reordering to apm install org/bundle --skill my-skill reinforces natural reading order.

Supply Chain Security Expert

  • [recommended] Write-path normalization does not call validate_skill_name / validate_path_segments before persisting to apm.yml at src/apm_cli/commands/install.py:449
    The read-path parser at reference.py:657 calls validate_path_segments and rejects traversal sequences, so currently a malicious --skill ../../etc/passwd would be caught when re-reading the manifest. _promote_sub_skills uses skill names as set-membership filters against on-disk directory names, so path construction is safe. However defense-in-depth says: reject bad input at ingestion (write time), not only at consumption.
    Suggested: Inside the normalization loop, call validate_skill_name(s) (or at minimum validate_path_segments(s, context='--skill')) and raise a clear ValueError before persisting to apm.yml.

OSS Growth Hacker

  • [recommended] Missing CHANGELOG entry under [Unreleased] Fixed for [BUG] #1395 at CHANGELOG.md
    Named credit in CHANGELOG converts merge into a visible signal that community PRs ship fast; pattern from httpie/bun/uv compounds repeat contributions and external mentions.
    Suggested: Add under [Unreleased] Fixed: - Install: --skill subset filter is now persisted to apm.yml across reinstalls, fixing a regression since v0.11.0. (#1395, thanks @sergio-sisternes-epam)
  • [nit] Release-note angle: frame as install-idempotency promise restored
    Install round-trip idempotency is a tier-1 package-manager credibility promise. The release narrative can use "install fidelity restored: your --skill filters now survive reinstalls" as a repostable line.

Test Coverage Expert

  • [recommended] No hermetic integration test exercises install --skill -> apm.yml persistence without live network at tests/integration/test_skill_bundle_live.py:358
    Install pipeline surface floor is integration-with-fixtures. tests/integration/test_skill_bundle_live.py::test_skill_subset_persists_to_apm_yml does the full round-trip but is @pytest.mark.live (needs GITHUB_APM_PAT). tests/unit/test_skill_subset_persistence.py covers to_apm_yml_entry and set_skill_subset_for_entry at unit tier with real objects. The PR's new test_install_skill_subset.py mocks _validate_package_exists, resolve_parsed_dependency_reference, and to_apm_yml_entry itself -- it validates wiring but if real to_apm_yml_entry regressed, these mocked tests would still pass.
    Suggested: Add a non-@LiVe integration test in tests/integration/ that sets up a local bare git repo fixture, runs _validate_and_add_packages_to_apm_yml with skill_subset=['foo'], and asserts on-disk apm.yml contains {git: 'owner/repo', skills: ['foo']}.
    Proof (unknown at): tests/integration/test_skill_bundle_live.py::test_skill_subset_persists_to_apm_yml -- proves: apm install --skill persists the skills: field to apm.yml on disk after a real install [devx,portability-by-manifest]
    assert isinstance(entry, dict); assert 'skills' in entry; assert target_skill in entry['skills']
  • [nit] PR's new test_install_skill_subset.py mocks the persistence boundary it claims to test at tests/unit/commands/test_install_skill_subset.py
    5 tests in the new file mock dep_ref.to_apm_yml_entry via a lambda + mock the validation/resolve boundary. They prove orchestration wiring but not emitted YAML shape. Real persistence proof lives in test_skill_subset_persistence.py (33 unit tests passing on real DependencyReference). Tier gap is covered elsewhere; informational only.
    Proof (passed): tests/unit/test_skill_subset_persistence.py::TestDependencyReferenceSkillSubset::test_round_trip_parse_emit -- proves: DependencyReference round-trips skill_subset through to_apm_yml_entry and parse_from_dict correctly [devx,portability-by-manifest]
    ref2 = DependencyReference.parse_from_dict(emitted); assert ref2.skill_subset == ['cli', 'web']

Auth Expert -- inactive

PR touches only src/apm_cli/commands/install.py; no AuthResolver, token_manager, github_downloader, or remote-host classification code paths changed.

Doc Writer -- inactive

No README, CHANGELOG, MANIFESTO, docs/src/content/docs/, .apm/skills/, or instruction files changed in the diff.

Performance Expert -- inactive

PR touches src/apm_cli/commands/install.py only; no cache/, deps/, install/phases/, install/pipeline.py, install/resolve.py changes; no perf claim in PR body.

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

danielmeppiel pushed a commit that referenced this pull request May 22, 2026
Folds 6 RECOMMENDED follow-ups from the apm-review-panel pass on PR #1442
without changing the core regression fix.

Test coverage (top-priority follow-up):
- Add tests/integration/test_install_skill_subset_hermetic.py: hermetic
  end-to-end coverage of the install --skill -> apm.yml round-trip using
  REAL DependencyReference (no network, no mocked to_apm_yml_entry).
  Closes the regression-trap gap the existing unit tests had (they mocked
  the persistence boundary they claimed to prove).

Python architecture:
- Extract _normalize_skill_subset into apm_cli/install/skill_subset.py
  so the per-package normalization loop runs ONCE per invocation (was
  O(packages * skills)) and install.py stays within its LOC budget.
- Use PEP 585 'tuple[str, ...]' on InstallContext.skill_subset.

Supply-chain security (defense-in-depth):
- Call validate_path_segments(name, context='--skill <name>') on the
  write path. The read-path parser already rejects traversal at
  parse_from_dict; this guarantees a poisoned skills: entry never
  reaches apm.yml even if a future regression bypasses the parser.

CLI logging UX:
- Emit a warning when --skill input has empty or duplicate names that
  were silently dropped; list them with --verbose.

DevX UX:
- Document replace-vs-merge semantics in --skill help text.
- Reorder docstring example to put the package first (npm/pip/cargo
  convention).

OSS growth:
- CHANGELOG entry under [Unreleased] Fixed crediting the original author.

Deferred (separable refactors, not folded):
- Unconditional _apm_yml_entries population (cross-cutting structural
  change to four conditional sites).
- Verbose info on entry shape promotion (nit, low-signal polish).
- Diff-and-announce on --skill overwrite (needs apm.yml read-and-compare).

Co-authored-by: Sergio Sisternes <sergio_sisternes@epam.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danielmeppiel pushed a commit that referenced this pull request May 22, 2026
Folds 6 RECOMMENDED follow-ups from the apm-review-panel pass on PR #1442
without changing the core regression fix.

Test coverage (top-priority follow-up):
- Add tests/integration/test_install_skill_subset_hermetic.py: hermetic
  end-to-end coverage of the install --skill -> apm.yml round-trip using
  REAL DependencyReference (no network, no mocked to_apm_yml_entry).
  Closes the regression-trap gap the existing unit tests had (they mocked
  the persistence boundary they claimed to prove).

Python architecture:
- Extract _normalize_skill_subset into apm_cli/install/skill_subset.py
  so the per-package normalization loop runs ONCE per invocation (was
  O(packages * skills)) and install.py stays within its LOC budget.
- Use PEP 585 'tuple[str, ...]' on InstallContext.skill_subset.

Supply-chain security (defense-in-depth):
- Call validate_path_segments(name, context='--skill <name>') on the
  write path. The read-path parser already rejects traversal at
  parse_from_dict; this guarantees a poisoned skills: entry never
  reaches apm.yml even if a future regression bypasses the parser.

CLI logging UX:
- Emit a warning when --skill input has empty or duplicate names that
  were silently dropped; list them with --verbose.

DevX UX:
- Document replace-vs-merge semantics in --skill help text.
- Reorder docstring example to put the package first (npm/pip/cargo
  convention).

OSS growth:
- CHANGELOG entry under [Unreleased] Fixed crediting the original author.

Deferred (separable refactors, not folded):
- Unconditional _apm_yml_entries population (cross-cutting structural
  change to four conditional sites).
- Verbose info on entry shape promotion (nit, low-signal polish).
- Diff-and-announce on --skill overwrite (needs apm.yml read-and-compare).

Co-authored-by: Sergio Sisternes <sergio_sisternes@epam.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel force-pushed the fix/1395-skill-subset-persist branch from 151c880 to b48f073 Compare May 22, 2026 07:46
@danielmeppiel
Copy link
Copy Markdown
Collaborator

Panel follow-ups folded (advisory shepherd)

Folded 8 of 11 RECOMMENDED follow-ups from the apm-review-panel into commit b48f0739 (rebased onto current main).

Folded

  • Hermetic integration test using real DependencyReference -- tests/integration/test_install_skill_subset_hermetic.py (5 cases)
  • CHANGELOG entry under [Unreleased] Fixed, crediting @sergio-sisternes-epam
  • Write-path defense-in-depth: validate_path_segments(name, context="--skill <name>") mirrors the read-path call
  • Warning on dropped empty/duplicate --skill names (_rich_warning + verbose _rich_info listing dropped entries)
  • Normalization hoisted out of the per-package loop in _resolve_package_references
  • PEP 585 annotation on InstallContext.skill_subset -> tuple[str, ...] | None
  • --skill Click help text now documents replace-vs-merge semantics
  • Docstring example reordered (package first)

Normalization extracted to src/apm_cli/install/skill_subset.py to honor the install.py <= 2010 LOC architecture invariant.

Deferred (separable, opened-for-follow-up rather than gated here)

  • _apm_yml_entries unconditional population at L489 -- cross-cutting structural refactor across 4 conditional sites
  • Verbose info on entry-shape promotion (string -> dict) -- low-signal polish
  • Diff-and-announce on --skill overwrite -- requires apm.yml read-and-compare before mutation

Evidence

  • Mutation-break gate: dropped the safety branch -> hermetic tests FAIL; dropped validate_path_segments -> traversal test FAILS. Both guards restored. RED confirmed.
  • Lint contract silent: uv run --extra dev ruff check src/ tests/ + ruff format --check src/ tests/
  • Targeted tests: 5 hermetic + commands suite + architecture invariant all pass
  • CI green on b48f0739: Lint, CI shards, CodeQL, Coverage Combine, PR Binary Smoke, gate, APM Self-Check, NOTICE Drift Check

This comment is advisory. Maintainer + author retain merge authority.

@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/1395-skill-subset-persist branch from cd9cf35 to abb0662 Compare May 22, 2026 08:09
The --skill CLI argument was computed but never passed to the
apm.yml persistence layer. Wire skill_subset through
_validate_and_add_packages_to_apm_yml and _resolve_package_references
so DependencyReference.to_apm_yml_entry() emits the dict form
with skills: key. Also pass skill_subset to _install_apm_dependencies
so the integration pipeline honours the filter.

Added skill_subset/skill_subset_from_cli fields to InstallContext so
the values flow through _install_apm_packages without needing a
separate parameter on every function boundary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/1395-skill-subset-persist branch from abb0662 to 7e3a000 Compare May 22, 2026 08:13
@danielmeppiel danielmeppiel merged commit 2aaa5cb into main May 22, 2026
10 checks passed
@danielmeppiel danielmeppiel deleted the fix/1395-skill-subset-persist branch May 22, 2026 12:42
danielmeppiel added a commit that referenced this pull request May 22, 2026
)

* feat(batch-bug-shepherd): operator visibility + fold-in invariant

Refactors the batch-bug-shepherd skill to address two genesis-validated
blockers and ship two missing capabilities discovered during a real
sweep-all run over 14 microsoft/apm bugs:

1. OPERATOR VISIBILITY (was: silent 30-minute fan-outs)
   - New asset assets/progress-diagram.md: mermaid template + 5-state
     palette (pending/active/done/blocked/skipped) + per-phase render
     rules + dispatch-table contract.
   - SKILL.md adds 'Operator visibility is a contract' invariant; each
     phase boundary re-renders the diagram with current-phase coloring
     and prints a subagent_id -> target dispatch table BEFORE fan-out.
   - Operator can follow long sagas at a glance instead of waiting in
     the dark for the next checkpoint.

2. FOLD-IN INVARIANT (was: panel recommendations silently dropped)
   - assets/verdict-schema.json: shepherd_return gains required
     recommended_followups[] channel; completion_return gains
     folded_followups[] + deferred_followups[]; extracted reusable
     followup_item definition.
   - assets/shepherd-prompt.md: fixed verdict mapping bug
     (ship_with_followups + 0 blocking -> ready-to-merge, not
     needs-author-changes); added recommended_followups extraction
     step with required source_persona + optional fold_hint tagging.
   - assets/completion-prompt.md: full rewrite. Adds
     RECOMMENDED_FOLLOWUPS input; encodes FOLD vs DEFER classifier
     (FOLD: touches diff / single helper / regression trap / hermetic
     test / inline comment; DEFER: cross-cutting refactor / new
     feature / broad doc / architectural addition); per-FOLD item
     consultation with source_persona + python-architect lens;
     DEFER items filed as gh issue create tracking issues (never
     silently dropped); mid-flight reclassify rule to avoid stalls.
   - SKILL.md adds 'Bias toward folding recommended items' invariant
     and rewrites Phase 4 spawn contract (9 steps) to thread the
     recommended_followups channel end-to-end.

Eval gate
  - +3 rubric anchors per content fixture
    (progress-diagram-header, mermaid-flowchart-rendered,
    dispatch-table-before-fanout) and +3 invariant anchors
    (recommended-followups-channel, fold-defer-classifier,
    tracking-issue-for-defer).
  - All 12 new anchors MATCH with_skill fixtures and MISS
    without_skill fixtures (clean value delta).
  - +3 no-fire trigger items for single-PR fold-in phrasing so the
    dispatcher will not misfire the batch outer-loop on single-PR
    fold work (e.g. 'fold the panel recommendations into PR #1234'
    remains apm-review-panel completion territory).

Validation
  - Schema validates via jsonschema Draft7; accepts new shapes,
    rejects shepherd_return missing recommended_followups[].
  - SKILL.md: 367 lines / 4483 tokens (caps: 500 / 5000).
  - Description: 965 / 1024 chars; mentions FOLD invariant.
  - 0 non-ASCII bytes across all modified files.
  - All 4 changed JSON files parse.

Real-task evidence (this skill iteration was driven by a live run)
  - 5 of 6 in-flight community PRs had their panel recommendations
    folded in-PR by completion subagents following the new contract,
    yielding 22 folded items and 8 deferred-to-tracking items across
    PRs #1387, #1396, #1441, #1443, #1444. The 6th (#1442) is in
    flight as this lands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* batch-bug-shepherd: add Phase 5 mergeability gate

Adds a post-wave gate that re-probes mergeability for every PR the
saga marked ready-to-merge, dispatches one conflict-resolution
subagent per CONFLICTING PR, and partitions returns into four
post-gate statuses before the final report claims anything is
mergeable.

Mergeability is post-wave truth, not pre-wave assumption: a PR
that Phase 4 marked ready can stop being mergeable the moment the
maintainer lands another PR onto main. Without this gate the
report ships stale ready claims.

Design driven through the genesis skill end-to-end (steps 1-6
handoff packet, steps 7a-7b coder pass, step 8 validation):

- NEW Phase 5 (mergeability gate) between completion (Phase 4) and
  renamed final report (Phase 5 -> Phase 6).
- Sub-phases 5a probe (read-only, single-thread, gh pr view --json
  mergeStateStatus), 5b fan-out (one conflict-resolution subagent
  per CONFLICTING PR), 5c trust-but-verify re-probe + four-way
  partition (resolved / requires-author-action /
  requires-human-judgment / resolution-failed).
- NEW assets/conflict-resolution-prompt.md spawn body for 5b.
  Encodes rebase, faithful merge of both intents, mutation-break
  re-check, lint silent, --force-with-lease push, re-probe,
  resolution-confirmation comment.
- NEW references/mergeability-gate.md load-on-demand orchestrator
  step-by-step (load trigger: WHEN ENTERING PHASE 5). Keeps
  SKILL.md under 5000-token budget.
- Schema extends verdict-schema.json oneOf with
  conflict_resolution_return; --force-with-lease enforced via
  regex pattern guard on push_command field; bare --force
  rejected. Five rejection cases validated.
- Two-comment-per-PR cap as new architecture invariant: at most
  one completion-confirmation (Phase 4) + one
  resolution-confirmation (Phase 5b) per PR.
- Progress diagram extended with WAVE4 subgraph (P5a/P5b/P5c),
  skipped-state semantics, P5b dispatch table requirement.
- Final report extended with three new partition sections plus a
  RESOLUTION CONFIRMATION COMMENT block and mergeability-gate
  disciplines line.
- Evals: +3 content rubric anchors (mergeability-probe-cli,
  force-with-lease-on-push, post-wave-partition-columns) + 1
  optional anchor (two-comment-cap); +1 fire + 2 no-fire trigger
  items; fixture diff shows the gate firing on a sweep with 2
  conflicting PRs and the without-skill failure mode (stale ready
  claim).

SKILL.md: 388 lines / 4867 tokens (budget 500/5000). ASCII only.
CI lint pair silent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(bbs): add Phase 1.5 strategic-alignment gate + PRINCIPLES.md

Adds a new wave between Phase 1 (triage) and Phase 2 (PR-in-flight
cross-reference) that checks every LEGIT bug against the project's
rejection contract before spending shepherd / fix / completion work
on it.

What changes:

- NEW PRINCIPLES.md at repo root: 7 numbered principles encoding
  the project's hard nos (P1 no invented frontmatter; P2 multi-harness
  with traction gating; P3 vendor neutral; P4 UX floor is not a
  trade) plus 3 supporting principles (P5 portability; P6 reliability
  over magic; P7 community over feature count). Bound to apm-ceo +
  bbs Phase 1.5 + apm-triage-panel + apm-review-panel as the
  rejection contract.

- NEW bbs Phase 1.5 strategic-alignment gate (WAVE 1.5):
  - one apm-ceo subagent per LEGIT row, in parallel
  - 4-state verdict: aligned | aligned-with-reservations |
    out-of-scope | wrong-direction
  - schema-validated returns; FAILS OPEN on infrastructure failure
    (malformed-x2 or non-citable principle) so legit bugs are
    never silently demoted under gate breakage
  - ABORTS only when apm-ceo.agent.md or PRINCIPLES.md itself is
    missing (operator-actionable error)
  - demoted rows flip to status triaged-deferred and SKIP Phase
    2/3/4/5; surface in Phase 6 under 'Recommend close as
    out-of-scope' partition
  - aligned-with-reservations rows stay in saga; downstream phases
    surface the reservations in review prose
  - deferred-PR strategic-rejection comment subagent (S7+S4+A9)
    posts a courtesy comment on any open PR whose underlying issue
    was demoted, using the would-be Phase-4 completion-comment
    slot (two-comments-per-PR cap preserved)

- Verdict schema extended with 5th oneOf member
  strategic_alignment_return (kind, issue, verdict, cited_principle,
  rationale, reservations).

- Ground-truth table grows two columns (strategic_verdict +
  strategic_rationale) and one status value (triaged-deferred).

- Progress diagram inserts P15 between P1 and P2; dispatch-table
  contract extends to Phase 1.5.

- Final-report template adds 'Recommend close as out-of-scope'
  partition and 'Aligned with reservations' surfacing section.

- 2 new fire trigger evals + 1 no-fire (PRINCIPLES.md authoring
  guard) + 1 new rubric anchor on the three-issues-mixed scenario.

Genesis design artifact lives in the session plan store; SKILL.md
body remains within 500-line / 5000-token budget (406 lines /
4943 tokens after trimming pre-existing verbose passages on
operator-visibility, mergeability, fold-in, composition, and
operating-contract sections to make room).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]

3 participants