Skip to content

docs(tests): codify integration-test marker procedure as APM instructions primitive#1249

Merged
danielmeppiel merged 6 commits intomainfrom
feat/integration-tests-instructions-primitive
May 10, 2026
Merged

docs(tests): codify integration-test marker procedure as APM instructions primitive#1249
danielmeppiel merged 6 commits intomainfrom
feat/integration-tests-instructions-primitive

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

Follow-up to #1247: codifies the integration-test marker procedure as a versioned APM primitive that applies to humans (.github/instructions/) AND to the test-coverage-expert persona, then locks the rule in place with a hermetic regression-trap test that asserts pyproject.toml, tests/integration/conftest.py::_MARKER_CHECKS, the docs registry table, and the instructions rule all stay in sync. Stacked on #1247.

Note

Base branch: refactor/test-integration-script-retire-pr2 (PR #1247). Rebase to main once #1247 merges.

Problem (WHY)

PR #1247 retired the manual per-file enumeration in scripts/test-integration.sh and moved gating to declarative pytestmark = pytest.mark.requires_* markers. The mechanics are real and live, but the procedure for using them ("where do I drop a new integration test? which marker do I pick? what must I NOT do?") survived only in:

That makes the rule invisible to future contributors and to fresh agent sessions. Three concrete failure modes follow:

  1. A new integration test gets added under tests/integration/ with no marker → it runs unconditionally and breaks CI on machines that lack the runtime/token it needs.
  2. A contributor copies an older pattern and writes if not os.getenv("APM_E2E_TESTS"): pytest.skip(...) inside the test body — the test-coverage-expert reviewer might not flag it because the rule is not declared anywhere it loads.
  3. Someone adds a new requires_X marker to pyproject.toml, registers the predicate in conftest.py, but forgets to update the docs registry table → the marker becomes invisible to anyone who reads the docs as the source of truth.

The encoding rule that this repo enforces for source files is the right precedent: write the contract down once, in .apm/instructions/, and let it apply by applyTo: glob.

Approach (WHAT)

What Where
Procedure as a primitive New H2 in .apm/instructions/tests.instructions.md: "Integration tests: placement and markers" — drop file in tests/integration/, declare pytestmark, pick from registry; no script edit.
Marker quick-map 7-row table embedded in the rule: each row maps a need ("real network", "downloads runtimes", "runtime binary present", etc.) to the canonical marker name.
Persona cross-ref .apm/agents/test-coverage-expert.agent.md step 5 ("Probe the test tree") now links to the instructions file and explicitly instructs the persona to flag ungated os.getenv("APM_E2E_TESTS") self-skips.
Regression trap tests/integration/test_marker_registry_sync.py: 7 hermetic invariants over pyproject.toml, conftest.py, docs, and the rule.
CHANGELOG One line under ## [Unreleased]Changed.

Important

The .github/instructions/ and .github/agents/ mirror files are maintained manually, not regenerated by apm compile. Verified: deleting .github/instructions/tests.instructions.md and re-running apm compile --clean does not recreate it. Repository convention (commits 095424fd, 6c99c5b0) updates .apm/ and .github/ copies in lockstep within the same commit.

Implementation (HOW)

File Change
.apm/instructions/tests.instructions.md +65 lines: new H2 with procedure, quick-map table, anti-patterns block.
.github/instructions/tests.instructions.md 1:1 mirror of source (manual sync per convention).
.apm/agents/test-coverage-expert.agent.md Step 5 bullet now links to instructions file and adds the "flag ungated self-skips" instruction.
.github/agents/test-coverage-expert.agent.md 1:1 mirror.
.github/copilot-instructions.md Auto-refreshed by apm compile (Build ID + version bump).
tests/integration/test_marker_registry_sync.py New file: 7 hermetic invariant tests (no marker required — always runs).
CHANGELOG.md One line under [Unreleased] → Changed.

What the regression-trap test asserts

flowchart LR
    A[pyproject.toml<br/>markers list] -- "every requires_*" --> B[conftest.py<br/>_MARKER_CHECKS]
    A -- "every requires_* / live" --> C[docs registry table<br/>integration-testing.md]
    C -- "no phantom markers" --> A
    D[.apm/instructions/<br/>tests.instructions.md] -- "every cited marker" --> A
    E[tests/integration/<br/>test_*.py files] -- "no runtime os.getenv<br/>self-skips on gate env vars" --> F[lint guard]
    B -- "no orphan predicates" --> A
Loading

The 7 invariants (test names mirror the assertions verbatim):

  1. test_every_pyproject_gating_marker_has_conftest_predicate — a declared requires_X without a predicate is dead config.
  2. test_every_conftest_predicate_is_declared_in_pyproject — a predicate without a declaration emits PytestUnknownMarkWarning.
  3. test_every_gating_marker_is_documented_in_registryrequires_* and live MUST appear in the docs table.
  4. test_docs_registry_only_names_declared_markers — no phantom markers in the docs table.
  5. test_apm_rule_only_names_declared_markers — marker names cited in the rule body must really exist.
  6. test_integration_tests_use_pytestmark_not_runtime_self_skip — static lint over tests/integration/test_*.py forbidding os.getenv("APM_E2E_TESTS") + pytest.skip patterns.
  7. test_tomllib_available — sanity guard on the Python version assumption (stdlib tomllib requires 3.11+).

Trade-offs

  • Manual .github/ mirror sync, no automation. A pre-commit hook could enforce parity, but adding one is out of scope; the regression-trap test catches drift in marker names, and reviewers can spot .apm/.github/ divergence in the PR diff.
  • Static lint heuristic, not AST. Test Add support for GitHub EU #6 uses regex windows rather than AST analysis. A test that names a gate env var in a string literal AND uses pytest.skip somewhere within 400 chars could in principle false-positive. Current count of such patterns under tests/integration/: zero. The test itself is exempt via path comparison.
  • live and benchmark carved out of the docs-sync direction. integration, slow, and benchmark are taxonomy markers (not gates) and are intentionally NOT required to appear in the docs registry; the invariant only goes one direction for them (docs must not advertise phantom markers).

Validation

$ uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/
All checks passed!
734 files already formatted

$ uv run pytest tests/integration/test_marker_registry_sync.py tests/integration/test_drift_check.py -v
tests/integration/test_marker_registry_sync.py::test_every_pyproject_gating_marker_has_conftest_predicate PASSED
tests/integration/test_marker_registry_sync.py::test_every_conftest_predicate_is_declared_in_pyproject PASSED
tests/integration/test_marker_registry_sync.py::test_every_gating_marker_is_documented_in_registry PASSED
tests/integration/test_marker_registry_sync.py::test_docs_registry_only_names_declared_markers PASSED
tests/integration/test_marker_registry_sync.py::test_apm_rule_only_names_declared_markers PASSED
tests/integration/test_marker_registry_sync.py::test_integration_tests_use_pytestmark_not_runtime_self_skip PASSED
tests/integration/test_marker_registry_sync.py::test_tomllib_available PASSED
... (33 test_drift_check.py tests) ...
============================== 40 passed in 3.01s ==============================

How to test

  1. Check out this branch off PR refactor(tests): retire script enumeration; pytest discovers tests/integration/ #1247.
  2. Run the lint pair: uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/ (both silent).
  3. Run the new test file: uv run pytest tests/integration/test_marker_registry_sync.py -v (all 7 pass).
  4. Verify the persona wiring: open .github/agents/test-coverage-expert.agent.md, step 5 ("Probe the test tree") links to ../instructions/tests.instructions.md and names the anti-pattern explicitly.
  5. Smoke-test the regression trap: in a scratch branch, rename one requires_* marker in pyproject.toml and re-run test Why do we need a GitHub token? #1 — it MUST fail with a clear message naming the missing marker.

Daniel Meppiel and others added 4 commits May 10, 2026 18:31
…tegration/

PR2 of #1166. Per acceptance criteria:

- scripts/test-integration.sh no longer enumerates individual pytest
  files; it invokes pytest tests/integration/ once and lets the
  marker registry (PR1, #1167) handle per-test gating. Script shrinks
  778 -> 402 lines.
- New test files dropped into tests/integration/ are picked up
  automatically; the only contract is to add the right requires_*
  marker (or hermetic = no marker).
- Adds requires_apm_binary / requires_github_token / requires_runtime_copilot
  pytestmarks to 11 files that needed them now that pytest discovers
  the full directory (test-coverage-expert audit).
- Adds 'live' to the addopts deselect list so test_skill_bundle_live.py
  preserves its current 'opt-in only' behaviour.
- Adds APM_RUN_INTEGRATION_TESTS=1 to ci-integration.yml so
  network-integration tests (transport selection) actually run in CI.
- CI integration-tests job timeout 20 -> 30 min to absorb the
  newly-discovered tests.
- Docs (integration-testing.md) reframe the script as the CI
  orchestrator, not legacy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- test_registry_client_integration.py: add requires_network_integration
  marker (was relying on runtime self-skip; live HTTP to api.mcp.github.com
  would fire on any dev pytest invocation). Sec panel finding.
- test_runtime_smoke.py: add requires_e2e_mode marker (was unmarked;
  downloads real codex/llm binaries). Test-coverage panel finding.
- docs/integration-testing.md: drop pre-fix/post-fix phrasing per
  docs-current-behaviour-only convention; document the live marker
  in the registry table. Doc-writer + devx-ux findings.
- scripts/test-integration.sh: export APM_RUN_INTEGRATION_TESTS for
  symmetry with APM_E2E_TESTS; update banner to reflect thin-orchestrator
  identity; drop vestigial 'Integration Testing Coverage!' comment.
  Py-arch + cli-log nits.
- ci-integration.yml: add inline comment on the 20->30 timeout bump.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ions primitive

Issue #1166 / follow-up to PR #1247. PR #1247 retired the manual
per-file enumeration in scripts/test-integration.sh and shifted gating
to declarative pytestmarkers, but the procedure ("where to drop a new
integration test, which marker to use, what NOT to do") lived only in
PR comments and the test-coverage-expert agent's tribal knowledge.

This PR extracts that procedure into a versioned APM primitive that
applies to both human contributors (via .github/instructions/) and the
test-coverage-expert persona (via a cross-reference at step 5 of its
review procedure), AND adds a hermetic regression-trap test that
guarantees the rule's pointers do not silently drift.

Changes:

- .apm/instructions/tests.instructions.md: new H2 "Integration tests:
  placement and markers" with procedure, the canonical marker quick-map
  (7 rows), and anti-patterns. Mirrored to .github/.
- .apm/agents/test-coverage-expert.agent.md: step 5 now links to the
  instructions file and explicitly instructs the persona to flag
  ungated live-network/runtime-binary calls. Mirrored to .github/.
- tests/integration/test_marker_registry_sync.py: 7 hermetic
  assertions verifying that pyproject.toml markers,
  tests/integration/conftest.py::_MARKER_CHECKS, the docs registry
  table, and the instructions rule all stay in sync; plus a static
  lint forbidding new os.getenv-based runtime self-skips on gate env
  vars in tests/integration/.
- CHANGELOG.md: one line under [Unreleased] Changed.
- .github/copilot-instructions.md: auto-refreshed by apm compile.

Lint (uv run --extra dev ruff check + format --check): silent.
Tests (uv run pytest tests/integration/test_marker_registry_sync.py
tests/integration/test_drift_check.py): 40 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Base automatically changed from refactor/test-integration-script-retire-pr2 to main May 10, 2026 18:32
danielmeppiel and others added 2 commits May 10, 2026 20:33
apm install rewrites cross-primitive links from .apm/ source paths to
the equivalent paths from the .github/ deployment root. The manual
mirror in the previous commit shipped the unrewritten path, which
caused APM Self-Check drift.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 10, 2026 18:36
@danielmeppiel danielmeppiel merged commit cc13377 into main May 10, 2026
12 checks passed
@danielmeppiel danielmeppiel deleted the feat/integration-tests-instructions-primitive branch May 10, 2026 18:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Codifies the integration-test marker placement procedure as an APM instructions primitive (mirrored into .github/), wires the guidance into the test-coverage-expert agent persona, and adds a hermetic “registry sync” integration test to prevent drift between pyproject.toml markers, tests/integration/conftest.py::_MARKER_CHECKS, and the docs marker registry.

Changes:

  • Add tests/integration/test_marker_registry_sync.py to assert marker registry invariants across pyproject.toml, docs, .apm/instructions/, and integration conftest wiring.
  • Extend .apm/instructions/tests.instructions.md and .github/instructions/tests.instructions.md with an explicit “Integration tests: placement and markers” procedure + quick-map + anti-patterns.
  • Update test-coverage-expert persona docs to point at the new instruction contract; add a changelog entry and refresh deployed-file hashes.
Show a summary per file
File Description
tests/integration/test_marker_registry_sync.py New regression-trap tests to keep marker declarations, docs, and conftest predicates in sync.
CHANGELOG.md Adds an Unreleased “Changed” entry describing the new instructions + regression-trap.
apm.lock.yaml Updates hashes for modified deployed .github/ instruction/agent files.
.github/instructions/tests.instructions.md Mirrors the new integration-test marker procedure for GitHub-integration consumers.
.apm/instructions/tests.instructions.md Source primitive documenting the integration-test marker contract.
.github/agents/test-coverage-expert.agent.md Updates persona step to reference the marker placement contract and flag ungated self-skips.
.apm/agents/test-coverage-expert.agent.md Source primitive mirroring the persona update.
.github/copilot-instructions.md Regenerated header/version metadata.

Copilot's findings

Comments suppressed due to low confidence (1)

tests/integration/test_marker_registry_sync.py:173

  • In test_apm_rule_only_names_declared_markers, the token_re/first cited = ... block and the follow-up comment about the regex capturing only the suffix are redundant/inaccurate (the pattern has no capturing group, so findall() returns the full token). This is easy to misread and looks like leftover experimentation; consider removing the first regex + assignment and keeping a single pattern/cited derivation.
    # Match identifiers OR the placeholder shape inside backticks / plain.
    token_re = re.compile(r"\brequires_(?:runtime_<name>|[a-z][a-z0-9_]*)")
    cited = set(token_re.findall(body))
    # Re-prefix because the regex captured only the suffix; rebuild full names.
    # (The regex above intentionally returns the full token thanks to \b
    # anchoring and the group inside; re-derive with findall on full pattern.)
    full_re = re.compile(r"\brequires_(?:runtime_<name>|[a-z][a-z0-9_]+)")
    cited = set(full_re.findall(body))
  • Files reviewed: 8/8 changed files
  • Comments generated: 2

import sys
from pathlib import Path

import tomllib
Comment thread CHANGELOG.md
- Integration tests now use marker-driven discovery: 21 `pytestmark = pytest.mark.skipif(...)` chains across `tests/integration/` are replaced with declarative `requires_*` markers, with precondition logic centralized in `tests/integration/conftest.py` and auto-skipping at collection time. PR1 of #1166. (#1167)
- Integration test apm-binary resolution now prefers the local build (`./dist/apm-<os>-<arch>/apm`) over a system-wide `apm` on `PATH`, so contributors validating the binary under test are not silently shadowed by a global install; the bearer-token marker (`requires_ado_bearer`) discards the captured JWT immediately and persists only the boolean outcome. (#1167)
- `scripts/test-integration.sh` is now a thin orchestrator: it builds/locates the apm binary, sets up runtimes and tokens, then invokes `pytest tests/integration/` exactly once. The 28 per-file pytest enumerations were removed; the marker registry handles per-test gating, and new test files dropped into `tests/integration/` are picked up automatically. PR2 of #1166. (#1247)
- Integration-test marker procedure codified as `.apm/instructions/tests.instructions.md` (wired into `test-coverage-expert` persona) and guarded by a regression-trap test that asserts `pyproject.toml`, `tests/integration/conftest.py::_MARKER_CHECKS`, the docs registry table, and the instructions rule stay in sync. (#1166)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants