Skip to content

Epic: Close the producer-produced drift-detection gap (per-asset lockfile + skill-as-package + apm audit --drift) #898

@danielmeppiel

Description

@danielmeppiel

Epic: Close the producer-produced drift-detection gap in APM

Problem

APM has a class of bugs we keep paying for one PR at a time: the .apm/ source tree is the producer for many produced artifacts (the .github/{agents,instructions,skills}/ mirror, vendored package files under .apm/packages/, plugin bundles emitted by apm pack --format plugin, the apm.lock.yaml itself), and we have no deterministic way to detect when a produced artifact silently diverges from its producer.

Recent incidents:

This Epic unifies the in-flight foundation (#889) with the unsolved per-asset granularity and dependency-closure gaps (#684, #887, #896) into a single coherent design and a single Epic-PR.

Producer-produced map of APM today

flowchart LR
    subgraph SRC["Hand-authored sources"]
        APMSRC[".apm/ source tree"]
        APMYML["apm.yml manifest"]
        POLICY["apm-policy.yml"]
    end

    subgraph TODAY["Covered today (PR #762)"]
        LOCKHASH["[OK] apm.lock.yaml local_deployed_file_hashes"]
    end

    subgraph PARTIAL_889["Partial after PR #889"]
        MIRROR["[PARTIAL] .github/{agents,instructions,skills}/ mirror"]
        DEPLOYED["[PARTIAL] vendored package files under .apm/packages/"]
    end

    subgraph EPICGAP["Gap closed by this Epic"]
        BUNDLE["[GAP] skill bundle internals (e.g. .github/skills/apm-review-panel/assets/*)"]
        PLUGIN["[GAP] apm pack --format plugin intra-bundle file enumeration"]
        DEPCLOSURE["[GAP] dependency closure (verdict-template cross-refs)"]
    end

    APMSRC -- "apm install --target copilot" --> MIRROR
    APMYML -- "apm install (resolve deps)" --> DEPLOYED
    APMSRC -- "apm install (compute hashes)" --> LOCKHASH
    APMSRC -- "apm pack --format plugin" --> PLUGIN
    APMSRC -- "apm install (walk bundle)" --> BUNDLE
    POLICY -- "apm audit --ci" --> DEPCLOSURE

    classDef ok fill:#d4f4dd,stroke:#1a7f37,color:#000
    classDef partial fill:#fff4c2,stroke:#9a6700,color:#000
    classDef gap fill:#ffd6cc,stroke:#cf222e,color:#000
    class LOCKHASH ok
    class MIRROR,DEPLOYED partial
    class BUNDLE,PLUGIN,DEPCLOSURE gap
Loading

APM has four producer steps (install, pack, audit, manifest resolution) feeding seven produced artifacts. PR #762 covers lockfile hashing. PR #889 extends verification to mirror and vendored files at directory granularity. This Epic closes the remaining gap by enumerating bundle internals, plugin contents, and dependency closure.

Mandate

No partial fixes. This Epic ships the ideal solution. Pragmatism applies to scope sequencing within the Epic-PR, not to deferring core capability into future Epics. If a piece of the design exists in an open issue (#684, #887, #896), it is either implemented in the Epic-PR or its non-implementation is justified in writing in the Epic-PR description.

Design phase is mandatory. Before any implementation commits, the design must be reviewed by the apm-review-panel skill (seven personas including the new visual-communicator from PR #897). The CEO persona arbitrates conflicts and strategic calls. The verdict is posted as a comment on this Epic before the Epic-PR is opened.

Single Epic-PR. Alongside the in-flight #889 this gives exactly two active PRs in the Epic critical path. Auxiliary PRs (e.g. PR #897 visual-communicator persona) are scoped narrowly and ship independently.

Drift-detection architecture after the Epic

flowchart TD
    START(["apm audit --drift (alias of audit --ci + content-integrity)"]) --> FORK{Two parallel checks}

    FORK --> H1[Hash integrity check]
    FORK --> P1[Policy closure check]

    subgraph HASHLANE["Lane A: per-file hash integrity"]
        H1 --> H2["[I/O] Read apm.lock.yaml local_deployed_file_hashes"]
        H2 --> H3["[FS] Walk every entry (files AND bundle internals)"]
        H3 --> H4["[EXEC] Recompute sha256 per file"]
        H4 --> H5{Hash matches lock entry?}
        H5 -- "no" --> H6["Emit drift: path, expected, actual"]
        H5 -- "yes" --> H7[Mark file verified]
        H3 --> H8{Lock entry has no file?}
        H8 -- "missing on disk" --> H9[Emit drift: deleted]
        H3 --> H10{File on disk has no entry?}
        H10 -- "untracked" --> H11[Emit drift: orphan]
    end

    subgraph POLICYLANE["Lane B: dependency closure"]
        P1 --> P2["[I/O] Read apm-policy.yml"]
        P2 --> P3["[I/O] Read apm.yml deps"]
        P3 --> P4["[FS] Scan declared packages for cross-references"]
        P4 --> P5{Every cross-ref declared as dep?}
        P5 -- "no" --> P6["Emit policy violation: missing dependency"]
        P5 -- "yes" --> P7[Mark closure verified]
    end

    H6 --> REPORT[Drift report exit code 1]
    H9 --> REPORT
    H11 --> REPORT
    P6 --> REPORT
    H7 --> CLEAN{All checks passed?}
    P7 --> CLEAN
    CLEAN -- "yes" --> OK[Clean exit code 0]
    CLEAN -- "no" --> REPORT
Loading

After the Epic, apm audit --drift runs two parallel lanes. Lane A walks every entry in the lockfile and recomputes per-file SHA256 hashes including inside skill bundles, emitting drift on mismatch, deletion, or orphan. Lane B reads the policy file and verifies that every cross-reference between declared packages is also declared as a dependency.

Issue and PR landscape

flowchart TD
    subgraph MERGED["Merged"]
        PR762["PR #762 per-file content hashes (MERGED)"]
    end

    subgraph INFLIGHT["In flight (1 active PR)"]
        PR889["PR #889 hash-verify deployed files + includes manifest field (branch: feat/audit-includes-887)"]
        I684["#684 verification engine"]
        I887["#887 audit blindness"]
    end

    subgraph PLANNED["Planned (1 future PR)"]
        EPIC["This Epic close producer-produced drift-detection gap (OPEN)"]
        I896["#896 per-asset granularity skill-as-package"]
        EPICPR["Epic-PR per-asset lockfile + skill-as-package + audit --drift alias (PLANNED)"]
    end

    PR762 -.-> PR889
    PR889 -- "closes" --> I684
    PR889 -- "closes" --> I887
    EPIC -- "subsumes" --> I896
    EPICPR -- "closes" --> EPIC
    EPICPR -.-> PR889

    classDef merged fill:#d4f4dd,stroke:#1a7f37,color:#000
    classDef active fill:#fff4c2,stroke:#9a6700,color:#000
    classDef planned fill:#dbe9ff,stroke:#0969da,color:#000
    class PR762 merged
    class PR889,I684,I887 active
    class EPIC,I896,EPICPR planned
Loading

PR #762 is merged. PR #889 is the only active PR in the critical path; it closes #684 and #887 and extends #762. This Epic subsumes #896 and will be closed by a single planned Epic-PR that extends PR #889 with per-asset granularity, skill-as-package, and the audit --drift alias.

Mandated workflow

stateDiagram-v2
    [*] --> EpicOpen
    EpicOpen --> DesignPhase : invoke apm-review-panel skill
    DesignPhase --> PanelFindings : seven personas raise findings
    PanelFindings --> CEOArbitration : findings disagree or strategic call
    PanelFindings --> DesignApproved : findings converge
    CEOArbitration --> DesignApproved : verdict published
    CEOArbitration --> DesignPhase : verdict requests redesign
    DesignApproved --> Implementation : open Epic-PR
    Implementation --> PRReview : request apm-review-panel on PR
    PRReview --> Revise : BLOCKER or HIGH findings
    Revise --> PRReview
    PRReview --> Dogfood : panel verdict approve
    Dogfood --> SelfHostMigration : run apm audit --drift on microsoft/apm
    SelfHostMigration --> Ship : drift report clean
    SelfHostMigration --> Revise : drift report dirty
    Ship --> [*]
Loading

Scope of the Epic-PR

The Epic-PR must deliver all of the following. Anything dropped requires a written justification in the PR body:

  1. Per-asset lockfile granularity. apm.lock.yaml enumerates every file inside skill and plugin bundles, not just bundle directories. Each entry carries a sha256. Replaces or extends the directory entries that PR fix(install): harden stale-file cleanup with per-file content-hash provenance (#666 follow-up) #762 introduced and PR feat(audit): close audit-blindness gap for local .apm/ content (#887) #889 verifies.
  2. apm audit --drift alias. Documented user-facing alias of apm audit --ci scoped to the content-integrity and dependency-closure checks. Exit code 1 on any drift, 0 on clean.
  3. Skills and agents as first-class packages (Skills and agents as first-class APM packages: intra-repo local-dep resolution + per-asset lockfile hashes #896). Either (a) skills with their own apm.yml manifest get full transitive resolution including agent cross-refs, or (b) the design adopts a plugin-bundle model via apm pack --format plugin, or (c) hybrid. The design phase chooses; the Epic-PR implements the chosen model.
  4. Dependency-closure policy rule. New apm-policy.yml check that fails when a declared package contains a cross-reference to an asset that is not declared as a dependency. Catches the PR harden(apm-review-panel): one-comment discipline + Hybrid E auth routing + apm-primitives-architect persona #882 class of incidents at audit time.
  5. Producer-produced coverage meta-pattern. A CI check (or apm audit --drift lane) that walks all known producer outputs and asserts every produced file is recorded in some lockfile entry. Detects future producer-produced gaps deterministically.
  6. Self-host migration. microsoft/apm itself runs apm audit --drift in its own CI workflow and the run is clean. Any current drift in this repo is fixed as part of the Epic-PR.
  7. Documentation. Update docs/src/content/docs/ Starlight pages for the new --drift alias, the new policy rule, the per-asset lockfile schema, and the skill-as-package model. Update packages/apm-guide/.apm/skills/apm-usage/ resource files per the documentation rule.

Out of scope

Sub-issues

Related

Definition of done

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions