ci(smoke): replace runtime-binary smoke with README-promise smoke#1251
ci(smoke): replace runtime-binary smoke with README-promise smoke#1251danielmeppiel merged 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR retargets the Tier-2 (merge-queue) smoke gate to exercise APM's core, README-aligned workflows using the built PyInstaller binary, replacing the previous runtime/execution-layer smoke target that could skip silently.
Changes:
- Add a new hermetic integration smoke suite (
test_core_smoke.py) that runsapm --version,init,install,compile,audit, andpolicy --helpagainst the resolved binary. - Update the Tier-2 smoke job in
ci-integration.ymlto download the build artifact, setAPM_BINARY_PATH+APM_E2E_TESTS=1, and run only the new core smoke suite. - Add a changelog entry describing the smoke target change.
Show a summary per file
| File | Description |
|---|---|
tests/integration/test_core_smoke.py |
Adds a fast, hermetic smoke suite intended to validate core CLI pipelines against the built binary. |
.github/workflows/ci-integration.yml |
Adjusts the Tier-2 smoke job to run the new core smoke tests using the downloaded binary artifact. |
CHANGELOG.md |
Records the Tier-2 smoke target change under [Unreleased]. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 2
| - Integration tests now use marker-driven discovery: 21 `pytestmark = pytest.mark.skipif(...)` chains across `tests/integration/` are replaced with declarative `requires_*` markers, with precondition logic centralized in `tests/integration/conftest.py` and auto-skipping at collection time. PR1 of #1166. (#1167) | ||
| - Integration test apm-binary resolution now prefers the local build (`./dist/apm-<os>-<arch>/apm`) over a system-wide `apm` on `PATH`, so contributors validating the binary under test are not silently shadowed by a global install; the bearer-token marker (`requires_ado_bearer`) discards the captured JWT immediately and persists only the boolean outcome. (#1167) | ||
| - `scripts/test-integration.sh` is now a thin orchestrator: it builds/locates the apm binary, sets up runtimes and tokens, then invokes `pytest tests/integration/` exactly once. The 28 per-file pytest enumerations were removed; the marker registry handles per-test gating, and new test files dropped into `tests/integration/` are picked up automatically. PR2 of #1166. (#1247) | ||
| - Tier-2 smoke job now runs `tests/integration/test_core_smoke.py` against the built apm binary, exercising `init` / `install` / `compile` / `audit` / `policy` to fail fast on the README's three promises (portable / secure / governed). Replaces the previous `test_runtime_smoke.py` smoke target, which exercised the experimental `apm run` execution layer; that file remains in the heavy integration job under `requires_runtime_*` markers. |
There was a problem hiding this comment.
Fixed in abc07ec — tightened to one line and added (#1251) trailer.
| assert produced, ( | ||
| "apm compile did not produce any of the expected output files: " | ||
| f"{[str(p) for p in candidates]}" | ||
| ) |
There was a problem hiding this comment.
Declining this change — empirically incorrect on a fresh apm init fixture. Verified just now:
$ apm init probe -y --target copilot && cd probe
$ # add one .apm/instructions/x.instructions.md
$ apm install && apm compile -t copilot
Pattern Source Coverage Placement Metrics
** x.instructions.md 1/1 ./AGENTS.md rel: 100%
Generated 1 AGENTS.md file
$ ls .github/copilot-instructions.md
ls: .github/copilot-instructions.md: No such file or directory
Target detection is signal-file-driven, not flag-driven (src/apm_cli/core/target_detection.py:166-167): apm compile -t copilot lists copilot as eligible, but the actual surface chosen depends on whether .github/copilot-instructions.md exists as a signal. On a fresh fixture (no signal file), compile correctly emits only AGENTS.md per the canonical fallback.
If I tightened the assertion to require both, the test would fail on its own fixture. The current OR check is the correct robust contract for a smoke test that does not preconfigure target signals.
The tier-2 smoke job in ci-integration.yml previously ran test_runtime_smoke.py, which exercises codex/llm runtime install scripts and preview_workflow -- the execution layer that backs the experimental 'apm run' command (literally marked '(experimental)' in its CLI help text). That surface is not in README.md's three promises and is not what users hit when they install APM as a package manager. Worse, the smoke job had silently no-op'd for some time: it never downloaded the built binary, never ran 'apm runtime setup', and never set APM_E2E_TESTS=1, so every test in test_runtime_smoke.py skipped at collection. Green ribbon, zero signal. This replaces the smoke target with test_core_smoke.py: a hermetic, network-free pre-flight against the built binary that exercises: - TestBinaryStartup -- 'apm --version' (PyInstaller bundle sane) - TestPortableByManifest -- init / install / compile (promise 1) - TestSecureByDefault -- audit pipeline reachable (promise 2) - TestGovernedByPolicy -- policy command surface reachable (promise 3) Six tests, ~10 seconds wall-clock. Maps 1:1 to README.md's stated core capabilities. The job now downloads the build artifact and sets APM_BINARY_PATH so the resolved binary is the one under test in the merge-queue tentative merge SHA, not a system install. test_runtime_smoke.py stays in the heavy integration job under the existing requires_runtime_* and requires_e2e_mode markers; the runtime-install coverage it provides is legitimate, just misplaced as a fail-fast smoke gate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9e2fb79 to
76af775
Compare
Replace ``apm policy --help`` with ``apm policy status``. ``--help`` only verified Click wiring; ``status`` runs the real discovery pipeline (git-remote probe, org resolution, cache lookup, rule evaluator, status renderer) and asserts the diagnostic table fields, giving the smoke job a real signal on the governance stack rather than help-text theater. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the one-concise-line + "(#PR)" format used by adjacent [Unreleased] entries, per Copilot review feedback on PR #1251. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TL;DR
Replace the tier-2 smoke job's test target. It used to run
test_runtime_smoke.py, which exercises codex/llm runtime install scripts and thepreview_workflowexecution layer behindapm run— a command literally marked(experimental)in its own CLI help text and absent from the README. Worse, the job had silently no-op'd: it never downloaded the binary, never ranapm runtime setup, never setAPM_E2E_TESTS=1, so every test in the file skipped at collection. New targettest_core_smoke.pyruns against the built binary in ~10s and exercises the README's three promises (portable / secure / governed) end to end.Problem (WHY)
The README defines APM's value prop as a package manager + compiler + audit/policy tool. The smoke job before this PR was testing none of that:
test_runtime_smoke.pycoverssetup-codex.sh/setup-llm.sh/RuntimeFactory.runtime_exists/preview_workflow. That is the execution layer that backsapm run, which the CLI itself flags(experimental)and the README never mentions. Even if it ran green, passing it would not raise confidence in any README-stated capability.download-artifactstep, noapm runtime setupstep, and noAPM_E2E_TESTS=1env var. After PR refactor(tests): retire script enumeration; pytest discovers tests/integration/ #1247 addedpytestmark = pytest.mark.requires_e2e_modeto the file, every test now skips at collection. Before that, internalpytest.skipifblocks self-skipped on missing runtimes. Either way: green ribbon, zero signal.microsoft/apm#770introduced thebuild → smoke → integration → release-validationchain so the merge queue fails fast before the 30-minute heavy suite runs. The chain still earns its place; we just need to point smoke at something that maps to the README.Approach (WHAT)
test_core_smoke.py(new)test_runtime_smoke.py?requires_runtime_*+requires_e2e_modemarkers.uv run apmor built binary?Implementation (HOW)
tests/integration/test_core_smoke.py(NEW)apm init -y --target copilot+ one local.apm/instructions/style.instructions.md. 60s subprocess timeout per command. Gated byrequires_e2e_mode+requires_apm_binary..github/workflows/ci-integration.ymldownload-artifacts the build,chmod +x's the binary, setsAPM_BINARY_PATH=${{ github.workspace }}/dist/apm-linux-x86_64/apm+APM_E2E_TESTS=1, runs onlytests/integration/test_core_smoke.py. The previousGITHUB_TOKEN/GITHUB_APM_PATenv vars are dropped — smoke is hermetic now.CHANGELOG.md[Unreleased] / Changed.Test taxonomy
Diagrams
Before/after smoke-tier flow — the chain shape is unchanged; only the smoke step's content changed:
flowchart LR subgraph BEFORE [Before this PR] B1[build] --> B2[smoke-test<br/>test_runtime_smoke.py<br/>SKIP all - no env vars] B2 --> B3[integration-tests<br/>30 min] B3 --> B4[release-validation] end subgraph AFTER [After this PR] A1[build] --> A2[smoke-test<br/>test_core_smoke.py<br/>~10s, 6 tests, hermetic] A2 --> A3[integration-tests<br/>30 min<br/>incl. test_runtime_smoke.py] A3 --> A4[release-validation] endTest-to-promise mapping — each smoke test exercises exactly one README capability dimension:
flowchart LR R1[README promise 1<br/>Portable by manifest] --> T1[apm init] R1 --> T2[apm install] R1 --> T3[apm compile -t copilot] R2[README promise 2<br/>Secure by default] --> T4[apm audit] R3[README promise 3<br/>Governed by policy] --> T5[apm policy --help] R0[Bundle integrity] --> T0[apm --version]Trade-offs
requires_runtime_*markers when runtimes are present, which is the correct gate.download-artifactstep (~5s) but means the smoke job tests the same bundle the integration job will, instead ofuv run apmfrom source — important because PyInstaller bundling is itself a source of regressions.test_core_smoke.pydoes not assert exact output paths forapm compile(accepts either.github/copilot-instructions.mdorAGENTS.md). Target routing has legitimate branches; smoke contract is "compile ran end-to-end and wrote something marked by APM", not "verify routing taxonomy" — that's the heavy suite's job.apm policytest only exercises--help. Real policy enforcement needs fixtures withapm-policy.yml+ denied sources, which belong in the heavy suite. Smoke catches the regression mode "policy module fails to import inside the bundle", which is enough for fail-fast.Benefits
uv sync.GH_MODELS_PAT, noGH_CLI_PAT, noADO_APM_PAT. Reduces the attack surface by one job.test_runtime_smoke.pykeeps its coverage. Moves to the heavy job under existing markers; nothing is deleted.Validation
CI Lint contract per
.apm/instructions/linting.instructions.md: both ruff commands silent. Tests run against a wrapper that delegates touv run apmbecause the locally-installed Homebrewapmis older than the source under test (lacksinit --target); CI will use the freshly-built binary downloaded from thebuildjob, which has full feature parity with source.How to test
uv run ./scripts/build-binary.sh.BIN="$(pwd)/dist/apm-$(uname -s | tr 'A-Z' 'a-z')-$(uname -m | sed 's/x86_64/x86_64/;s/arm64/arm64/')/apm".APM_E2E_TESTS=1 APM_BINARY_PATH="$BIN" uv run pytest tests/integration/test_core_smoke.py -v.To verify the workflow change end-to-end: enqueue this PR into the merge queue and inspect the
Smoke Test (Linux)job — it should download the artifact and report 6 passed tests instead of the previous all-skipped output.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com