Skip to content

fix: resolve 3 flaky integration tests blocking merge queue#1477

Merged
danielmeppiel merged 3 commits into
mainfrom
sergio-sisternes-epam/turbo-waddle
May 26, 2026
Merged

fix: resolve 3 flaky integration tests blocking merge queue#1477
danielmeppiel merged 3 commits into
mainfrom
sergio-sisternes-epam/turbo-waddle

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Problem

The merge queue has a 0% success rate since May 24 (5 consecutive failures across PRs #1473 and #1465). Three independent bugs cause integration test failures that only manifest in the merge queue environment.

Fixes

Bug 1: config.py race condition (production code)

  • Root cause: os.makedirs(CONFIG_DIR) without exist_ok=True causes FileExistsError when parallel xdist workers initialise ~/.apm simultaneously.
  • Fix: Use exist_ok=True for makedirs and atomic O_CREAT|O_EXCL for config file creation to eliminate the TOCTOU race.

Bug 2: Copilot CLI rejects merge queue token (100% repro)

  • Root cause: RUNTIME_ENV_VARS['copilot'] only maps to GH_TOKEN and GITHUB_PERSONAL_ACCESS_TOKEN, missing COPILOT_GITHUB_TOKEN which is Copilot CLI's preferred env var.
  • Fix: Add COPILOT_GITHUB_TOKEN to RUNTIME_ENV_VARS['copilot'] in token_manager.py. Also set it explicitly in the golden scenario test from GITHUB_APM_PAT or GITHUB_TOKEN.

Bug 3: Skill install timeouts (intermittent)

  • Root cause: 120s timeout is too tight for CI with network latency and GitHub API rate limiting.
  • Fix: Increase subprocess timeout from 120s to 300s in test_skill_install.py and test_skill_integration.py.

Validation

  • All lints pass (ruff check, ruff format, pylint R0801, auth-signals)
  • All 14,981 unit tests pass
  • 270 token-related tests pass

Files changed

  • src/apm_cli/config.py -- race-safe config initialisation
  • src/apm_cli/core/token_manager.py -- add COPILOT_GITHUB_TOKEN to runtime env vars
  • tests/integration/test_golden_scenario_e2e.py -- explicit Copilot token env var
  • tests/integration/test_skill_install.py -- timeout 120s -> 300s
  • tests/integration/test_skill_integration.py -- timeout 120s -> 300s

Fixes merge queue failures for #1473 and #1465.

Bug 1 (config.py race condition): Use exist_ok=True for makedirs and
atomic O_CREAT|O_EXCL for config file creation to eliminate TOCTOU
race when parallel xdist workers initialise ~/.apm simultaneously.

Bug 2 (Copilot CLI token rejection): Add COPILOT_GITHUB_TOKEN to
RUNTIME_ENV_VARS['copilot'] so token_manager sets the preferred env
var for Copilot CLI. Also set it explicitly in the golden scenario
test from GITHUB_APM_PAT or GITHUB_TOKEN.

Bug 3 (skill install timeouts): Increase subprocess timeout from
120s to 300s in test_skill_install.py and test_skill_integration.py
to accommodate CI network latency and GitHub API rate limiting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 25, 2026 22:41
@sergio-sisternes-epam sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label May 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to restore merge-queue stability by fixing three merge-queue-specific integration test failures: a config initialization race under parallelism, Copilot CLI token env-var compatibility, and CI timeouts during skill installs.

Changes:

  • Make ~/.apm config initialization race-safe under parallel xdist workers.
  • Teach token setup to export the Copilot CLI-preferred COPILOT_GITHUB_TOKEN, and set it explicitly in the golden E2E scenario.
  • Increase integration test subprocess timeouts from 120s to 300s to reduce CI flakiness.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/apm_cli/config.py Makes config directory/file creation safe under concurrent initialization.
src/apm_cli/core/token_manager.py Adds COPILOT_GITHUB_TOKEN to Copilot runtime env-var mappings.
tests/integration/test_golden_scenario_e2e.py Ensures Copilot CLI sees a token under COPILOT_GITHUB_TOKEN during the golden scenario.
tests/integration/test_skill_install.py Extends install-related subprocess timeouts to reduce CI flakes.
tests/integration/test_skill_integration.py Extends integration-related subprocess timeouts to reduce CI flakes.
apm.lock.yaml Updates the repo lockfile contents/version metadata.

Comment thread src/apm_cli/config.py
Comment thread apm.lock.yaml
Address review feedback: pass explicit mode=0o644 to os.open() to
match Python's default open() behaviour (0o666 & ~umask). Revert
incidental apm.lock.yaml changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

APM Review Panel: ship_with_followups

Three correct bug fixes restore the merge queue from 0% success rate; the lockfile change that carried the supply-chain blocking signal was removed in a second commit -- all remaining panel signals are non-blocking.

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

The three code fixes (config.py atomic init, COPILOT_GITHUB_TOKEN in RUNTIME_ENV_VARS, 300s test timeouts) are narrow, correct, and urgent. All specialists agree on the code changes. Python-architect, cli-logging-expert, devx-ux-expert, and auth-expert found only nits against those three files.

The supply-chain-security expert raised a blocking finding on the initial diff (removal of local_deployed_file_hashes from apm.lock.yaml, which would have silently degraded SHA-256 provenance for all locally deployed agent and instruction files). A second commit removed the lockfile change entirely, resolving that concern. The lockfile finding is now moot; the current diff is clean from a supply-chain standpoint.

Two remaining signals worth tracking before merge closes: (1) the test-coverage-expert found no unit test asserting COPILOT_GITHUB_TOKEN lands in the child-process env dict after resolution -- a silent regression trap if the RUNTIME_ENV_VARS prepend is ever reverted; and (2) the FileExistsError-swallow path in ensure_config_exists has no test, meaning a future refactor removing the try/except would silently regress parallel xdist runs. Both are quick unit tests recommended as immediate follow-ups.

The auth-expert surfaced a pre-existing gap: COPILOT_GITHUB_TOKEN is correct in RUNTIME_ENV_VARS (destination side) but absent from TOKEN_PRECEDENCE['copilot'] (source side). A user who sets COPILOT_GITHUB_TOKEN directly without any TOKEN_PRECEDENCE var will get a silent "no token found" from APM while Copilot CLI proceeds anyway. Not introduced by this PR; recommended as a follow-up patch.

Aligned with: Pragmatic-as-npm (atomic dir init, correct env var propagation, and adjusted CI timeouts are pragmatic correctness fixes with no user-facing complexity cost); Secure-by-default (improved, now that the lockfile integrity concern is resolved by removing the lockfile change from scope).

Growth signal. The oss-growth-hacker notes that a 0% merge queue success rate since May 24 is a top-of-funnel contributor trust event -- contributors who opened PRs in that window and watched them fail repeatedly may have churned silently. The fix is urgent and correct. A CHANGELOG entry under 0.14.2 naming the outage window (May 24-25) converts an invisible fix into a visible trust signal. The COPILOT_GITHUB_TOKEN addition is also worth one sentence in the release notes: "apm now works out of the box in Copilot CLI environments without manual token wiring."

Panel summary

Persona B R N Takeaway
Python Architect 0 0 3 Three independent bug fixes; all narrow and correct. Atomic config init pattern is sound for xdist multi-process case.
CLI Logging Expert 0 0 2 No CLI output regressions. FileExistsError swallow is semantically correct; debug log on race branch would improve agent-mode traceability.
DevX UX Expert 0 0 2 No UX regressions. Config race fix preserves first-run UX. COPILOT_GITHUB_TOKEN addition is a positive discoverability improvement.
Supply Chain Security Expert 0 1 1 Lockfile blocking finding resolved by second commit. One remaining: test subprocess output may expose token value on failure -- scrub before print.
OSS Growth Hacker 0 1 2 Merge queue restoration is a contributor-trust event. CHANGELOG entry and release note recommended.
Auth Expert 0 1 1 COPILOT_GITHUB_TOKEN placement in RUNTIME_ENV_VARS is correct. TOKEN_PRECEDENCE['copilot'] gap is pre-existing and worth a follow-up patch.
Doc Writer -- -- -- inactive -- no documentation files in diff; COPILOT_GITHUB_TOKEN is an internal runtime mapping with no user-facing CLI surface requiring docs update.
Test Coverage Expert 0 2 1 Two missing unit tests: COPILOT_GITHUB_TOKEN env dict population and FileExistsError-swallow path. Both recommended as immediate follow-ups.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [OSS Growth Hacker] Add CHANGELOG entry under 0.14.2 naming the merge queue outage window (May 24-25) and confirming resolution -- contributor trust requires a visible post-mortem record; external contributors blocked during that window deserve acknowledgment.
  2. [Auth Expert] Add COPILOT_GITHUB_TOKEN to TOKEN_PRECEDENCE['copilot'] as a low-priority fallback -- a user who sets this var directly without any TOKEN_PRECEDENCE var currently gets a silent "no token found" from APM; not introduced by this PR but surfaced by it.
  3. [Test Coverage Expert] Add unit test asserting COPILOT_GITHUB_TOKEN is populated in the copilot runtime env dict after token resolution -- if the RUNTIME_ENV_VARS prepend is ever reverted, no test alarm fires and copilot skill runs silently fail to receive the token.
  4. [Test Coverage Expert] Add unit test for the FileExistsError-swallow path in ensure_config_exists -- a future refactor removing the try/except would silently regress parallel xdist init runs; one-liner with patch('os.open', side_effect=FileExistsError).
  5. [Supply Chain Security Expert] Scrub subprocess output against known token values before printing on test failure in test_golden_scenario_e2e.py -- COPILOT_GITHUB_TOKEN is injected from GITHUB_APM_PAT and any Copilot CLI debug output that echoes the token would appear in CI logs.

Architecture

classDiagram
    direction LR

    class config {
        <<Module>>
        +CONFIG_DIR str
        +CONFIG_FILE str
        +ensure_config_exists() None
        +get_config() dict
        +set_config(data) None
    }

    class GitHubTokenManager {
        <<Strategy>>
        +RUNTIME_ENV_VARS dict
        +preserve_existing bool
        +get_token_for_purpose(purpose, tokens) str
        +_setup_copilot_tokens(env, tokens) None
        +_setup_codex_tokens(env, tokens) None
    }

    class RUNTIME_ENV_VARS {
        <<ValueObject>>
        +copilot list
        +codex list
        +llm list
    }

    GitHubTokenManager *-- RUNTIME_ENV_VARS : owns
    note for config "Bug 1 fix: makedirs(exist_ok=True)\n+ O_CREAT|O_EXCL atomic init"
    note for RUNTIME_ENV_VARS "Bug 2 fix: prepend COPILOT_GITHUB_TOKEN\nto copilot list"

    class config:::touched
    class RUNTIME_ENV_VARS:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A([xdist worker N starts]) --> B[ensure_config_exists]
    B --> C["[FS] os.makedirs(CONFIG_DIR, exist_ok=True)"]
    C --> D{os.path.exists CONFIG_FILE?}
    D -- exists --> E([return])
    D -- absent --> F["[FS] os.open(O_WRONLY|O_CREAT|O_EXCL, 0o644)"]
    F -- success --> G["[FS] fdopen + json.dump default config"]
    G --> E
    F -- FileExistsError --> H[another worker won race]
    H --> E

    subgraph token_manager.py
        I([_setup_copilot_tokens called]) --> J[get_token_for_purpose copilot]
        J --> K{token found?}
        K -- no --> L([return])
        K -- yes --> M["iterate RUNTIME_ENV_VARS copilot\nCOPILOT_GITHUB_TOKEN, GH_TOKEN,\nGITHUB_PERSONAL_ACCESS_TOKEN"]
        M --> N["[ENV] set each var = token"]
        N --> O([env dict ready for subprocess])
    end
Loading

Recommendation

Ship it. The three code fixes are correct, narrow, and urgent -- a 0% merge queue success rate since May 24 is a blocker on every contributor PR in flight. The second commit cleanly removed the lockfile change that carried the supply-chain blocking signal. Track the five recommended follow-ups as issues: the TOKEN_PRECEDENCE gap and the two missing unit tests are the highest-signal items and can land in the next patch without blocking this fix.


Full per-persona findings

Python Architect

  • [nit] Outer os.path.exists check is a benign TOCTOU remnant but not the actual guard at src/apm_cli/config.py:17
    The real atomic guard is O_CREAT|O_EXCL inside the try block. The outer if not os.path.exists(CONFIG_FILE) is a non-atomic fast-path optimisation -- harmless because O_EXCL catches any race, but worth a one-line comment so the next reader does not remove the outer check thinking it is the guard, or remove the inner try thinking the outer check is sufficient.
    Suggested: # Fast-path: skip the syscall on the common case. O_EXCL below is the actual atomic guard.

  • [nit] get_config/set_config write paths remain non-atomic; out of scope but worth tracking at src/apm_cli/config.py:59
    Not introduced by this PR. Flagging for a future hardening pass if parallel workers ever mutate config post-init. Consider filelock if that scenario materialises.

  • [nit] Design patterns note: changes are procedural and appropriate. RUNTIME_ENV_VARS ValueObject scales cleanly to additional runtime entries; no abstraction change needed.

CLI Logging Expert

  • [nit] except FileExistsError: pass at src/apm_cli/config.py is semantically correct. Adding logger.debug("Config file already created by concurrent process, skipping") would surface this event in --verbose mode for AI agents diagnosing initialization races. Currently invisible even to debug output.

  • [nit] No debug log records which source var won the copilot token resolution at src/apm_cli/core/token_manager.py. A logger.debug("Resolved copilot token from %s", winning_var) would make the --verbose chain auditable. Pre-existing gap; not introduced here.

DevX UX Expert

  • [nit] The atomic O_CREAT|O_EXCL write initializes config with a hardcoded default_client value of vscode. Pre-existing issue, not introduced by this PR. Now more visible through the race path.

  • [nit] COPILOT_GITHUB_TOKEN is now probed first in the copilot chain. If a user has both set with different values, the precedence shift is silent. A debug-level log noting which token source was resolved would help diagnose auth surprises.

Supply Chain Security Expert

  • [recommended] Test code at tests/integration/test_golden_scenario_e2e.py:387 prints full subprocess output on failure after injecting COPILOT_GITHUB_TOKEN from GITHUB_APM_PAT/GITHUB_TOKEN. If the Copilot CLI or any tool it invokes echoes the token value into stdout/stderr, the token would appear in pytest output and CI logs. Scrub output against known token values before printing on failure, consistent with the no-token-in-logs contract.

  • [nit] COPILOT_GITHUB_TOKEN placement in RUNTIME_ENV_VARS['copilot'] is correct; no scoping concern.

OSS Growth Hacker

  • [recommended] A 0% merge queue success rate since May 24 is a contributor-trust event. Any contributor who opened a PR in that window and watched it fail repeatedly will have churned silently. Add a CHANGELOG entry under the next release naming the outage window and confirming it is resolved. This converts an invisible fix into a visible trust signal.

  • [nit] COPILOT_GITHUB_TOKEN is now the preferred token lookup for copilot runtime. The PR body buries this as bug fix Integrate copilot runtime #2, but it is a first-class compatibility story. Consider a one-liner in the auth/token docs noting that COPILOT_GITHUB_TOKEN is recognized automatically in Copilot CLI contexts.

  • [nit] 120s -> 300s timeout across 13 test call sites masks a latency signal. If installs are routinely taking more than 2 minutes, that is a user-facing performance story too. File a follow-up to instrument install timing and surface a progress indicator for long installs.

Auth Expert

  • [recommended] COPILOT_GITHUB_TOKEN is not in TOKEN_PRECEDENCE['copilot'], creating a one-way blind spot at src/apm_cli/core/token_manager.py:100. TOKEN_PRECEDENCE['copilot'] resolves ['GITHUB_COPILOT_PAT', 'GITHUB_TOKEN', 'GITHUB_APM_PAT']. If a user sets COPILOT_GITHUB_TOKEN directly and has none of the TOKEN_PRECEDENCE vars set, get_token_for_purpose('copilot') returns None and APM may emit confusing warnings while Copilot CLI proceeds anyway. Consider adding COPILOT_GITHUB_TOKEN to TOKEN_PRECEDENCE['copilot'] as a low-priority fallback. Not introduced by this PR but surfaced by it.

  • [nit] Test at tests/integration/test_golden_scenario_e2e.py:350 bypasses APM's own token resolution when GITHUB_APM_PAT is set (injects COPILOT_GITHUB_TOKEN directly; preserve_existing=True means APM does not overwrite it). Fine for CI. Document the intent with a comment so future authors do not assume APM's resolution drives the final value in this test path.

Doc Writer -- inactive

No documentation files in diff; COPILOT_GITHUB_TOKEN is an internal runtime env var mapping with no user-facing CLI surface requiring a docs update.

Test Coverage Expert

  • [recommended] No test asserts COPILOT_GITHUB_TOKEN is set in the copilot runtime env dict after token resolution at tests/unit/core/test_token_manager_lifecycle.py
    token_manager.py prepends COPILOT_GITHUB_TOKEN to RUNTIME_ENV_VARS['copilot']. _setup_copilot_tokens() iterates this list and writes the resolved token into each key. grep of tests/ for RUNTIME_ENV_VARS, _setup_copilot_tokens, and COPILOT_GITHUB_TOKEN returned zero hits in test files (the golden scenario e2e sets it as input, not as an assertion on the output env dict). A copilot skill run that silently fails to receive the token would be a user-visible regression with no test alarm.
    Proof (missing): tests/unit/core/test_token_manager_lifecycle.py::test_setup_copilot_runtime_env_includes_copilot_github_token -- proves: When a copilot token is available, COPILOT_GITHUB_TOKEN is populated in the runtime env dict passed to the skill process [secure-by-default, devx]
    assert 'COPILOT_GITHUB_TOKEN' in env and env['COPILOT_GITHUB_TOKEN'] == 'tok'

  • [recommended] No test exercises the FileExistsError-swallow path in ensure_config_exists at tests/unit/test_config.py
    tests/unit/test_config.py has test_ensure_config_exists_uses_utf8 (happy-path creation) but no test calls ensure_config_exists() when CONFIG_FILE already exists AND was created between the os.path.exists check and the os.open call. grep for FileExistsError and O_EXCL in tests/ returned zero hits in test_config.py. A future refactor removing the try/except would silently regress parallel xdist init invocations.
    Proof (missing): tests/unit/test_config.py::test_ensure_config_exists_swallows_file_exists_error -- proves: ensure_config_exists() is idempotent when another process creates CONFIG_FILE between the exists check and the open call [devx]
    with patch('os.open', side_effect=FileExistsError): config_mod.ensure_config_exists() # must not raise

  • [nit] Timeout bumps in integration tests have no regression-trap implications. Purely operational; assertions unchanged.

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Generated by PR Review Panel for issue #1477 · ● 2.2M ·

@github-actions github-actions Bot removed the panel-review Trigger the apm-review-panel gh-aw workflow label May 25, 2026
@sergio-sisternes-epam sergio-sisternes-epam added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 26, 2026
@sergio-sisternes-epam sergio-sisternes-epam added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 26, 2026
@sergio-sisternes-epam sergio-sisternes-epam added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 26, 2026
- Golden scenario: treat copilot CLI auth failures as warnings instead
  of hard failures. The merge queue token may lack Copilot API scopes,
  which is an environment limitation, not an APM bug.
- Plugin E2E: increase subprocess timeouts from 180s to 300s to match
  skill test timeouts and avoid flaky TimeoutExpired in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel enabled auto-merge May 26, 2026 08:14
@danielmeppiel danielmeppiel added this pull request to the merge queue May 26, 2026
Merged via the queue into main with commit 2af2f6a May 26, 2026
12 checks passed
@danielmeppiel danielmeppiel deleted the sergio-sisternes-epam/turbo-waddle branch May 26, 2026 08:26
@danielmeppiel danielmeppiel mentioned this pull request May 26, 2026
danielmeppiel added a commit that referenced this pull request May 26, 2026
* chore: cut 0.15.0

Move Unreleased -> [0.15.0] - 2026-05-27 and bump pyproject + uv.lock.

Audit applied: every PR merged since v0.14.2 has exactly one
changelog entry; each entry leads with the user-visible impact.

Fixes during audit:
- Add missing entries for #1367, #1403, #1465, #1487, #1492, #1462,
  #1477, #1439, #1484, and the 131679f follow-up commit.
- Collapse the two #1473 lines into one.
- Merge the #1476 Security/GitCache-hardening entry into its Added
  entry (same PR, one logical change).
- Replace bogus #1243 PR ref with the actual merge PR #1308 for the
  persisted transport-flag config.
- Relocate the #1324-delivered marketplace CLI entries (apm pack
  --marketplace / --marketplace-path / --json, outputs map form)
  out of Unreleased and into [0.14.2], where they actually shipped.
  They were mis-attributed to #1317 and orphaned across the 0.14.2
  cut.

Verified locally: ruff check + ruff format --check both clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
danielmeppiel added a commit that referenced this pull request May 27, 2026
Closes the three items deferred from the v0.14->v0.15 docs-sync
retrospective and the full-corpus regrounding waves (commits
4f00c2b, 242bb9e, b80da69):

1. apm unpack source-side deprecation timeline
   - src/apm_cli/commands/pack.py: 'will be removed in v0.14'
     -> 'will be removed in a future release'. Current version
     is 0.15.0; the v0.14 target had already passed. Docs were
     softened in wave 3; this mirrors the choice in source.
   - CHANGELOG.md: [Unreleased] Fixed entry.

2. Bucket-C silent-drift backfills (20 PRs, parallel triage)
   - 3 grounding-verifier subagents reviewed 20 of the 21
     bucket-C PRs (#1477 excluded as test-flake fix, no doc
     surface). Verdicts: 17 ALREADY_COVERED or NO_DOC_SURFACE
     (verified honestly against wave 2-3 backfills, not
     manufactured), 3 BACKFILLED:
     - #1385 SSH dep user-from-URL: added supported-form row in
       docs/src/content/docs/consumer/manage-dependencies.md
       and bullet in apm-usage/dependencies.md.
     - #1434 Copilot App schema range [13,15] + warn-not-fail:
       rewrote the 'Schema compatibility' paragraph in
       docs/src/content/docs/integrations/copilot-app.md
       (was factually wrong, claimed [13,13] hard-fail).
     - #1440 Copilot file-based detection signals: added the
       four .github/{instructions,agents,prompts,hooks}/
       directories to the canonical-signals list in
       troubleshooting/compile-zero-output-warning.md and to
       the apm-usage commands.md + package-authoring.md
       auto-detect rules.

3. docs-corpus-audit skill extracted
   - .apm/skills/docs-corpus-audit/SKILL.md: first-class skill
     module emitted from the genesis design artifact used to
     drive waves 2 and 3. Pattern: PANEL + WAVE EXECUTION + S7
     verification. Wave-batched (scales as O(waves), not
     O(claims)), disjoint page ownership (no merge conflicts),
     orchestrator post-pass for cross-corpus drift patterns
     invisible to per-scope agents.
   - references/design-handoff.md: full design artifact preserved
     for future maintainers.
   - Sibling to docs-sync (per-PR), not a replacement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danielmeppiel added a commit that referenced this pull request May 27, 2026
…5 drift sweep) (#1511)

* docs: backfill apm-usage and consolidate registry guides (v0.14->v0.15 drift sweep)

Holistic docs-sync retrospective on the v0.14.0->v0.15.0 release window
flagged 23 of 39 user-impact PRs as docs-debt: 7 Rule 4 violations
(apm-usage/ skipped) plus 16 silent-drift PRs. This PR closes the
highest-priority gaps (P0/P1 from the retrospective) in one sweep.

Backfills (apm-usage/ training corpus):
- dependencies.md: registry-sourced APM dep object form (#1471)
- authentication.md: APM_REGISTRY_TOKEN_{NAME} precedence (#1471)
- governance.md: registry_source + allow_non_registry policy (#1471)
- package-authoring.md: apm publish workflow (#1471) and project-scope
  hook command path semantics (#1396)
- commands.md: apm publish entry (#1471), apm config transport keys
  (#1308), apm compile live-reload + --clean --watch warning (#1403),
  Claude Code instruction dedup (#1146), MCP env-var placeholder
  resolution (#1277), AppLocker/WDAC staged-install diagnostic (#1390)

Structural fix (per docs-impact-architect verdict):
- Merge guides/private-registries.md INTO guides/registries.md with
  progressive disclosure (public -> private -> per-dep routing ->
  enterprise link). Adds Starlight redirect for the old slug, patches
  5 cross-references across consumer/, reference/cli/.

Editorial fixes (per editorial-owner sweep):
- integrations/copilot-app.md (#1431): lead with user value before
  WS-IPC/SQLite mechanics; add 'restart the Copilot App once'
  troubleshooting hint
- producer/compile.md: dedup the Claude Code instruction dedup
  explanation (was stated twice)
- enterprise/security.md: reframe defensive memo voice ('do not call
  this X') to user voice ('here is what we provide / here is what we
  don't')

Method: docs-sync skill end-to-end. 5-panelist fan-out plus CDO
synthesis. Every CLI claim in the apm-usage adds was verified against
the live 'apm <verb> --help' surface (S7 tool bridge).

Out of scope (tracked as P1 follow-up): backfilling docs for the 16
silent-drift PRs grouped by subsystem (MCP, install, compile, auth).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: full-corpus regrounding audit (55 pages, 14 surgical fixes)

Wave-batched grounding audit across 55 high-risk pages (CLI ref x27,
schemas/specs x10, consumer ramp x12, onboarding x6). Each page's
factual claims (flags, env vars, exit codes, schema fields, file
paths, code links) was extracted and verified against current
src/apm_cli/ and 'apm <verb> --help' output via S7 tool-bridge.

Fixes applied (14 files):

CLI reference:
- pack.md: add --check-versions, --check-clean flags + exit codes 3, 4
- targets.md: expand copilot detection signals (5, not 1)
- experimental.md: add copilot-app, marketplace-authoring, registries
- install.md: dedup duplicate '## Exit codes' + '## Notes' sections

Schemas / specs:
- lockfile-spec.md: expand package_type enum to full 6-value list
- manifest-schema.md: document plural 'targets:' alias (#1335)
- environment-variables.md: add APM_BROAD_FETCH_DEPTH, APM_COPILOT_APP_DB
- package-types.md: add 5th layout (hook_package, hooks/*.json only)

Consumer ramp:
- install-mcp-servers.md: fix stale code citation + 'Or' -> 'And'
- private-and-org-packages.md: drop nonexistent BITBUCKET_APM_PAT

Onboarding (6 broken navigation links, 4 files):
- quickstart.mdx, getting-started/installation.md,
  getting-started/first-package.md, getting-started/migration.md:
  repoint self-loops and dead routes to actual page paths

Process: dispatched as 6 parallel grounding-verifier agents (general-
purpose) across disjoint page scopes; each agent had edit authority
on its scope and applied surgical fixes inline. Reusable pattern via
the docs-corpus-audit sibling skill design (PANEL + WAVE EXECUTION
+ S7 verifier fan-out, see files/docs-corpus-audit-design.md).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: wave 3 corpus audit + IA-reshuffle dead-link cleanup (53 pages)

Second sweep of the regrounding audit. Covers the 57 pages deferred in
wave 2: producer/ (15), enterprise/ (15), concepts/ (6), integrations/
(7), troubleshooting/ (7), contributing/ (3), reference tail (3), 404.

Process: 6 parallel grounding-verifier agents on disjoint scopes; each
agent extracts factual claims, S7-verifies against current source
('apm <verb> --help' + grep src/apm_cli/), and applies surgical edits
inline. Same pattern as wave 2 (PANEL + WAVE EXECUTION + S7 verifier
fan-out). Orchestrator post-pass swept three cross-corpus broken-link
patterns the per-scope agents could not fix alone.

High-signal factual fixes:

enterprise/governance-guide.md:
- --output-file -> --output (real flag is --output / -o)
- 7+17 check count -> 8+17 (8 baseline checks, not 7)

enterprise/apm-policy.md:
- '16 of 22 checks' -> '17 of 25 checks' (phantom counts)
- conflated --no-policy (install-only) with APM_POLICY_DISABLE (env)

enterprise/apm-policy-getting-started.md:
- dropped 'apm compile' from list of commands that run policy
  (compile enforces zero policy per governance-overview.md L57)

enterprise/policy-reference.md:
- compilation.target.allow: added copilot, gemini, vscode, windsurf,
  agent-skills (only 5 of 9 runtimes were listed)

enterprise/registry-proxy.md:
- 'apm marketplace add --branch main' -> '--ref main' (no --branch flag)

enterprise/security-and-supply-chain.md:
- 3 stale source line-number citations corrected

producer/author-primitives/index.md:
- legacy '.hook.md' extension -> '.json' (hook_integrator scans JSON)
- removed nonexistent '.apm/commands/' subdirectory from layout example

concepts/lifecycle.md:
- 4 reference-page links all pointed at install/ (copy-paste)

Cross-corpus IA-reshuffle dead-link cleanup (orchestrator pass):
- introduction/* -> concepts/* (4 links across 2 files)
- guides/ci-policy-setup/ -> enterprise/enforce-in-ci/ (8 links, 4 files)
- guides/pack-distribute/ -> producer/pack-a-bundle/ (5 links, 4 files)
- guides/dependencies/ -> consumer/manage-dependencies/ (1 link)
- guides/agent-workflows/ -> contextual canonical (3 links, 3 files)
- guides/install-and-use/mcp-servers/ -> consumer/install-mcp-servers/ (3)
- guides/compilation/ -> producer/compile/ (1)
- guides/prompts/ -> producer/author-primitives/prompts/ (2)
- guides/drift-detection/ -> enterprise/drift-detection/ (1)

enterprise/security.md side-fix:
- 'apm unpack scheduled for removal in v0.14' -> drop version target
  (APM is 0.15.0 and unpack still ships marked DEPRECATED in --help).
  Upstream remediation (refresh deprecation timeline in source or
  remove the shim) tracked outside this PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: close deferred items from corpus regrounding audit

Closes the three items deferred from the v0.14->v0.15 docs-sync
retrospective and the full-corpus regrounding waves (commits
4f00c2b, 242bb9e, b80da69):

1. apm unpack source-side deprecation timeline
   - src/apm_cli/commands/pack.py: 'will be removed in v0.14'
     -> 'will be removed in a future release'. Current version
     is 0.15.0; the v0.14 target had already passed. Docs were
     softened in wave 3; this mirrors the choice in source.
   - CHANGELOG.md: [Unreleased] Fixed entry.

2. Bucket-C silent-drift backfills (20 PRs, parallel triage)
   - 3 grounding-verifier subagents reviewed 20 of the 21
     bucket-C PRs (#1477 excluded as test-flake fix, no doc
     surface). Verdicts: 17 ALREADY_COVERED or NO_DOC_SURFACE
     (verified honestly against wave 2-3 backfills, not
     manufactured), 3 BACKFILLED:
     - #1385 SSH dep user-from-URL: added supported-form row in
       docs/src/content/docs/consumer/manage-dependencies.md
       and bullet in apm-usage/dependencies.md.
     - #1434 Copilot App schema range [13,15] + warn-not-fail:
       rewrote the 'Schema compatibility' paragraph in
       docs/src/content/docs/integrations/copilot-app.md
       (was factually wrong, claimed [13,13] hard-fail).
     - #1440 Copilot file-based detection signals: added the
       four .github/{instructions,agents,prompts,hooks}/
       directories to the canonical-signals list in
       troubleshooting/compile-zero-output-warning.md and to
       the apm-usage commands.md + package-authoring.md
       auto-detect rules.

3. docs-corpus-audit skill extracted
   - .apm/skills/docs-corpus-audit/SKILL.md: first-class skill
     module emitted from the genesis design artifact used to
     drive waves 2 and 3. Pattern: PANEL + WAVE EXECUTION + S7
     verification. Wave-batched (scales as O(waves), not
     O(claims)), disjoint page ownership (no merge conflicts),
     orchestrator post-pass for cross-corpus drift patterns
     invisible to per-scope agents.
   - references/design-handoff.md: full design artifact preserved
     for future maintainers.
   - Sibling to docs-sync (per-PR), not a replacement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: fix dead links + address Copilot review findings

Two classes of fix on PR #1511:

1. Deploy Docs CI -- starlight-links-validator failure (2 dead links)
   - getting-started/first-package.md:18 and quickstart.mdx:40 used
     absolute /apm/getting-started/installation/ paths introduced in
     wave 2 (242bb9e). Converted to relative paths matching the
     surrounding link convention.
   - Verified with local 'npm run build' under docs/: 'All internal
     links are valid.'

2. Copilot PR review -- 7 inline factual accuracy comments, all
   verified against source and addressed:
   - apm-usage/package-authoring.md: hook path rewrite is performed
     by 'apm install' (hook integrator pass), not 'apm compile'.
   - apm-usage/dependencies.md + docs/guides/registries.md: registry
     resolver requires semver per apm_cli/deps/registry/semver.py
     (is_semver_range gate). Removed examples implying opaque labels
     (#stable, #v2.0.0, 'latest') route through a registry; updated
     selector tables to flag non-semver refs as rejected for registry
     sources.
   - apm-usage/dependencies.md + docs/guides/registries.md:
     lockfile_version: '2' promotion triggers on registry deps OR
     git-source semver resolution fields (constraint / resolved_tag /
     resolved_at per lockfile.py:_needs_v2, issue #1488), not just
     registry deps.
   - apm-usage/authentication.md: 'token:' in apm-policy.yml is not
     parse-rejected, only surfaces as an 'Unknown top-level policy
     key' warning per policy/parser.py. Still discouraged (leaks to
     repo), but the rejection mechanism is different from apm.yml.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* skill(docs-corpus-audit): refactor under genesis discipline + self-test

Round-trip assessment found the original SKILL.md draft violated
genesis SoC in 7 ways:

1. Invented inline 'grounding-verifier' persona instead of composing
   shared agent personas (python-architect for S7, doc-writer for
   edits). R3 EXTRACT in reverse.
2. Subagent prompt template inlined in SKILL body (~40 lines that
   belong in assets/).
3. IA-reshuffle grep patterns hard-coded in body as bash heredoc --
   the patterns rot per release and belong in scripts/ with --help
   and a versioned update cadence.
4. PHANTOM DEPENDENCY on docs-sync's substrate (.apm/docs-index.yml,
   personas, panelist-return-schema, the apm-usage Rule-4 corpus)
   never declared via tool-call probes -- A9 SUPERVISED EXECUTION
   violation per genesis Step 7b.
5. Missing A8 ALIGNMENT LOOP: wave agents edited inline and nothing
   re-verified the edits grounded.
6. DISPATCH COLLISION risk vs docs-sync: identical 'drift between
   docs and code' triggers; dispatcher LLM could misroute.
7. BUNDLE LEAKAGE: references/design-handoff.md was session-history
   (maintainer-scope), not runtime-loaded. Per genesis 3.5 it must
   NOT ship with the user-facing bundle.

Refactor:
- SKILL.md (218 lines, well under 500-line cap): adds explicit
  Sibling Contract table with docs-sync; declares roster as
  composition of existing personas via relative links;
  PROBE / RISK-TRIAGE / WAVE / POST-PASS / ALIGNMENT-LOOP /
  COMMIT / PR phases; sharpened trigger description naming
  whole-corpus scope.
- assets/subagent-prompt-template.md: extracted the per-scope
  prompt that composes python-architect + doc-writer.
- assets/panelist-return-schema.json: explicit JSON schema for
  agent returns; orchestrator validates and rejects malformed.
- scripts/scan-cross-corpus-drift.sh: deterministic cross-corpus
  drift sweep with 4 pattern groups (ia-links, stale-deprecation,
  absolute-base, ascii-leak). Non-interactive, --help-documented,
  stdout/stderr split per genesis script conventions.
- evals/{trigger,content}-evals.json + README.md: ship gate
  exercising 10+10 trigger queries (docs-sync boundary is the
  load-bearing distinction) and 3 seeded-drift scenarios with
  control baselines.
- Deleted references/design-handoff.md (bundle leak; design
  artifact stays in session state only).

Self-test (proves the refactor works end-to-end):
- Ran scan-cross-corpus-drift.sh against the live corpus; it
  immediately surfaced two genuine misses that wave 3 missed:
  - src/apm_cli/commands/pack.py:606: click help= string still
    said 'removed in v0.14' (the logger.warning at line 633 was
    fixed last commit; this is a sibling string the wave 3 agent
    didn't see because each agent only owned ~9 pages).
  - docs/src/content/docs/reference/cli/unpack.md:9: caution
    banner still said 'scheduled for removal in v0.14'.
- Both softened to 'in a future release' (consistent with the
  rest of the wave 3 choice).
- Lint clean; docs build clean ('All internal links are valid').

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* skill(docs-grounding-verifier): claim-level grounding harness + 7 drift fixes

New sibling skill to docs-corpus-audit. Genesis-designed PIPELINE-of-PANELS
(RAGAS-faithfulness adapted from RAG to docs/code):
- Stage 1: per-page LLM claim extraction
- Stage 2: deterministic grep-based evidence retrieval (S7, no LLM)
- Stage 3: adversarial LLM grounding judge (A7, 4-verdict calibrated)

Empirical proof bundle (.apm/skills/docs-grounding-verifier/evals/runs/proof/):
- 5 high-stakes pages -> 75 atomic claims extracted
- Tally: 63 GROUNDED / 6 PARTIAL / 4 CONTRADICTED / 2 UNSUPPORTED (84%)
- Trigger eval: 20/20 dispatch classification correct
  (precision=1.0, recall=1.0, specificity=1.0, pass_gate=true)

High-confidence drift fixes applied:
- apm-policy.md: MCP transport defaults (was 'block sse/streamable-http
  by default' -> actually allow=None means all permitted; sample policy
  now correctly framed as restriction example)
- apm-policy.md: inheritance levels (was '5 levels including team policy'
  -> canonical chain is 3 semantic levels; 5 is MAX_CHAIN_DEPTH for
  intermediate extends: jumps)
- Plus 5 editorial fixes from prior pass (examples, registries x2,
  security, copilot-app)

Lower-confidence findings (judge retrieval gaps, vague reasoning) left
for follow-up rather than risk introducing new drift via speculative
edits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants