Skip to content

ci(smoke): replace runtime-binary smoke with README-promise smoke#1251

Merged
danielmeppiel merged 3 commits intomainfrom
feat/smoke-test-core-promises
May 10, 2026
Merged

ci(smoke): replace runtime-binary smoke with README-promise smoke#1251
danielmeppiel merged 3 commits intomainfrom
feat/smoke-test-core-promises

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

Replace the tier-2 smoke job's test target. It used to run test_runtime_smoke.py, which exercises codex/llm runtime install scripts and the preview_workflow execution layer behind apm run — a command literally marked (experimental) in its own CLI help text and absent from the README. Worse, the job had silently no-op'd: it never downloaded the binary, never ran apm runtime setup, never set APM_E2E_TESTS=1, so every test in the file skipped at collection. New target test_core_smoke.py runs against the built binary in ~10s and exercises the README's three promises (portable / secure / governed) end to end.

Problem (WHY)

"Portable by manifest. ... Secure by default. ... Governed by policy." — README.md

The README defines APM's value prop as a package manager + compiler + audit/policy tool. The smoke job before this PR was testing none of that:

  • Wrong target. test_runtime_smoke.py covers setup-codex.sh / setup-llm.sh / RuntimeFactory.runtime_exists / preview_workflow. That is the execution layer that backs apm run, which the CLI itself flags (experimental) and the README never mentions. Even if it ran green, passing it would not raise confidence in any README-stated capability.
  • Silently green. The job had no download-artifact step, no apm runtime setup step, and no APM_E2E_TESTS=1 env var. After PR refactor(tests): retire script enumeration; pytest discovers tests/integration/ #1247 added pytestmark = pytest.mark.requires_e2e_mode to the file, every test now skips at collection. Before that, internal pytest.skipif blocks self-skipped on missing runtimes. Either way: green ribbon, zero signal.
  • Tier-2 design is sound; the target is wrong. microsoft/apm#770 introduced the build → smoke → integration → release-validation chain so the merge queue fails fast before the 30-minute heavy suite runs. The chain still earns its place; we just need to point smoke at something that maps to the README.

Approach (WHAT)

Decision Choice Why
Replace vs delete the smoke job Replace Tier-2 fail-fast design from #770 is sound; only the target is wrong.
New smoke target test_core_smoke.py (new) Maps 1:1 to README's three promises; hermetic; ~10s.
Keep test_runtime_smoke.py? Yes, in heavy suite The runtime-install coverage is legitimate, just misplaced as a smoke gate. Already gated by requires_runtime_* + requires_e2e_mode markers.
Run against source uv run apm or built binary? Built binary The merge queue ships binaries; smoke must test the bundle PyInstaller produced, not source imports.
Network calls in smoke? None Smoke tier must be hermetic so flakes here are real failures, not infra blips. Fixture project has zero remote deps.

Implementation (HOW)

File Change
tests/integration/test_core_smoke.py (NEW) Six tests, four classes (one per README promise + binary startup). Hermetic fixture: apm init -y --target copilot + one local .apm/instructions/style.instructions.md. 60s subprocess timeout per command. Gated by requires_e2e_mode + requires_apm_binary.
.github/workflows/ci-integration.yml Smoke job now download-artifacts the build, chmod +x's the binary, sets APM_BINARY_PATH=${{ github.workspace }}/dist/apm-linux-x86_64/apm + APM_E2E_TESTS=1, runs only tests/integration/test_core_smoke.py. The previous GITHUB_TOKEN / GITHUB_APM_PAT env vars are dropped — smoke is hermetic now.
CHANGELOG.md One line under [Unreleased] / Changed.

Test taxonomy

TestBinaryStartup         -- apm --version              (PyInstaller bundle sane)
TestPortableByManifest    -- apm init / install / compile  (README promise 1)
TestSecureByDefault       -- apm audit                  (README promise 2)
TestGovernedByPolicy      -- apm policy --help          (README promise 3)

Diagrams

Before/after smoke-tier flow — the chain shape is unchanged; only the smoke step's content changed:

flowchart LR
    subgraph BEFORE [Before this PR]
        B1[build] --> B2[smoke-test<br/>test_runtime_smoke.py<br/>SKIP all - no env vars]
        B2 --> B3[integration-tests<br/>30 min]
        B3 --> B4[release-validation]
    end
    subgraph AFTER [After this PR]
        A1[build] --> A2[smoke-test<br/>test_core_smoke.py<br/>~10s, 6 tests, hermetic]
        A2 --> A3[integration-tests<br/>30 min<br/>incl. test_runtime_smoke.py]
        A3 --> A4[release-validation]
    end
Loading

Test-to-promise mapping — each smoke test exercises exactly one README capability dimension:

flowchart LR
    R1[README promise 1<br/>Portable by manifest] --> T1[apm init]
    R1 --> T2[apm install]
    R1 --> T3[apm compile -t copilot]
    R2[README promise 2<br/>Secure by default] --> T4[apm audit]
    R3[README promise 3<br/>Governed by policy] --> T5[apm policy --help]
    R0[Bundle integrity] --> T0[apm --version]
Loading

Trade-offs

  • Smoke no longer covers runtime install scripts. Conscious choice: those scripts test third-party CLI installers (codex, llm) more than they test APM. The heavy integration job still runs them under requires_runtime_* markers when runtimes are present, which is the correct gate.
  • Smoke now needs the build artifact. Adds one download-artifact step (~5s) but means the smoke job tests the same bundle the integration job will, instead of uv run apm from source — important because PyInstaller bundling is itself a source of regressions.
  • test_core_smoke.py does not assert exact output paths for apm compile (accepts either .github/copilot-instructions.md or AGENTS.md). Target routing has legitimate branches; smoke contract is "compile ran end-to-end and wrote something marked by APM", not "verify routing taxonomy" — that's the heavy suite's job.
  • apm policy test only exercises --help. Real policy enforcement needs fixtures with apm-policy.yml + denied sources, which belong in the heavy suite. Smoke catches the regression mode "policy module fails to import inside the bundle", which is enough for fail-fast.

Benefits

  1. Smoke job stops being a no-op. Six real assertions against the built binary, instead of "all skipped".
  2. Failure correlation is direct. A red smoke job now points at one of the three README promises being broken, not at a misconfigured runtime installer.
  3. Total smoke wall-clock stays fast. ~10 seconds locally; estimated ~30s on CI including artifact download and uv sync.
  4. No new CI secrets needed. Smoke is hermetic — no GH_MODELS_PAT, no GH_CLI_PAT, no ADO_APM_PAT. Reduces the attack surface by one job.
  5. test_runtime_smoke.py keeps its coverage. Moves to the heavy job under existing markers; nothing is deleted.

Validation

$ uv run --extra dev ruff check src/ tests/
All checks passed!

$ uv run --extra dev ruff format --check src/ tests/
734 files already formatted

$ APM_E2E_TESTS=1 APM_BINARY_PATH=/tmp/apm-source-wrapper \
    uv run pytest tests/integration/test_core_smoke.py -v
tests/integration/test_core_smoke.py::TestBinaryStartup::test_apm_version_runs PASSED                       [ 16%]
tests/integration/test_core_smoke.py::TestPortableByManifest::test_init_scaffolds_manifest PASSED            [ 33%]
tests/integration/test_core_smoke.py::TestPortableByManifest::test_install_pipeline_runs PASSED              [ 50%]
tests/integration/test_core_smoke.py::TestPortableByManifest::test_compile_pipeline_produces_output PASSED   [ 66%]
tests/integration/test_core_smoke.py::TestSecureByDefault::test_audit_pipeline_runs PASSED                   [ 83%]
tests/integration/test_core_smoke.py::TestGovernedByPolicy::test_policy_command_surface_reachable PASSED     [100%]
============================== 6 passed in 6.95s ===============================

CI Lint contract per .apm/instructions/linting.instructions.md: both ruff commands silent. Tests run against a wrapper that delegates to uv run apm because the locally-installed Homebrew apm is older than the source under test (lacks init --target); CI will use the freshly-built binary downloaded from the build job, which has full feature parity with source.

How to test

  1. Check out this branch.
  2. Build the binary: uv run ./scripts/build-binary.sh.
  3. Resolve the binary path: BIN="$(pwd)/dist/apm-$(uname -s | tr 'A-Z' 'a-z')-$(uname -m | sed 's/x86_64/x86_64/;s/arm64/arm64/')/apm".
  4. Run the smoke suite: APM_E2E_TESTS=1 APM_BINARY_PATH="$BIN" uv run pytest tests/integration/test_core_smoke.py -v.
  5. Expected: 6 passed in well under 60s.

To verify the workflow change end-to-end: enqueue this PR into the merge queue and inspect the Smoke Test (Linux) job — it should download the artifact and report 6 passed tests instead of the previous all-skipped output.


Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Copilot AI review requested due to automatic review settings May 10, 2026 18:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR retargets the Tier-2 (merge-queue) smoke gate to exercise APM's core, README-aligned workflows using the built PyInstaller binary, replacing the previous runtime/execution-layer smoke target that could skip silently.

Changes:

  • Add a new hermetic integration smoke suite (test_core_smoke.py) that runs apm --version, init, install, compile, audit, and policy --help against the resolved binary.
  • Update the Tier-2 smoke job in ci-integration.yml to download the build artifact, set APM_BINARY_PATH + APM_E2E_TESTS=1, and run only the new core smoke suite.
  • Add a changelog entry describing the smoke target change.
Show a summary per file
File Description
tests/integration/test_core_smoke.py Adds a fast, hermetic smoke suite intended to validate core CLI pipelines against the built binary.
.github/workflows/ci-integration.yml Adjusts the Tier-2 smoke job to run the new core smoke tests using the downloaded binary artifact.
CHANGELOG.md Records the Tier-2 smoke target change under [Unreleased].

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread CHANGELOG.md Outdated
- Integration tests now use marker-driven discovery: 21 `pytestmark = pytest.mark.skipif(...)` chains across `tests/integration/` are replaced with declarative `requires_*` markers, with precondition logic centralized in `tests/integration/conftest.py` and auto-skipping at collection time. PR1 of #1166. (#1167)
- Integration test apm-binary resolution now prefers the local build (`./dist/apm-<os>-<arch>/apm`) over a system-wide `apm` on `PATH`, so contributors validating the binary under test are not silently shadowed by a global install; the bearer-token marker (`requires_ado_bearer`) discards the captured JWT immediately and persists only the boolean outcome. (#1167)
- `scripts/test-integration.sh` is now a thin orchestrator: it builds/locates the apm binary, sets up runtimes and tokens, then invokes `pytest tests/integration/` exactly once. The 28 per-file pytest enumerations were removed; the marker registry handles per-test gating, and new test files dropped into `tests/integration/` are picked up automatically. PR2 of #1166. (#1247)
- Tier-2 smoke job now runs `tests/integration/test_core_smoke.py` against the built apm binary, exercising `init` / `install` / `compile` / `audit` / `policy` to fail fast on the README's three promises (portable / secure / governed). Replaces the previous `test_runtime_smoke.py` smoke target, which exercised the experimental `apm run` execution layer; that file remains in the heavy integration job under `requires_runtime_*` markers.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in abc07ec — tightened to one line and added (#1251) trailer.

Comment on lines +194 to +197
assert produced, (
"apm compile did not produce any of the expected output files: "
f"{[str(p) for p in candidates]}"
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declining this change — empirically incorrect on a fresh apm init fixture. Verified just now:

$ apm init probe -y --target copilot && cd probe
$ # add one .apm/instructions/x.instructions.md
$ apm install && apm compile -t copilot
  Pattern  Source              Coverage  Placement      Metrics
  **       x.instructions.md   1/1       ./AGENTS.md    rel: 100%
Generated 1 AGENTS.md file
$ ls .github/copilot-instructions.md
ls: .github/copilot-instructions.md: No such file or directory

Target detection is signal-file-driven, not flag-driven (src/apm_cli/core/target_detection.py:166-167): apm compile -t copilot lists copilot as eligible, but the actual surface chosen depends on whether .github/copilot-instructions.md exists as a signal. On a fresh fixture (no signal file), compile correctly emits only AGENTS.md per the canonical fallback.

If I tightened the assertion to require both, the test would fail on its own fixture. The current OR check is the correct robust contract for a smoke test that does not preconfigure target signals.

The tier-2 smoke job in ci-integration.yml previously ran
test_runtime_smoke.py, which exercises codex/llm runtime install
scripts and preview_workflow -- the execution layer that backs the
experimental 'apm run' command (literally marked '(experimental)' in
its CLI help text). That surface is not in README.md's three promises
and is not what users hit when they install APM as a package manager.

Worse, the smoke job had silently no-op'd for some time: it never
downloaded the built binary, never ran 'apm runtime setup', and never
set APM_E2E_TESTS=1, so every test in test_runtime_smoke.py skipped
at collection. Green ribbon, zero signal.

This replaces the smoke target with test_core_smoke.py: a hermetic,
network-free pre-flight against the built binary that exercises:

- TestBinaryStartup     -- 'apm --version' (PyInstaller bundle sane)
- TestPortableByManifest -- init / install / compile (promise 1)
- TestSecureByDefault    -- audit pipeline reachable (promise 2)
- TestGovernedByPolicy   -- policy command surface reachable (promise 3)

Six tests, ~10 seconds wall-clock. Maps 1:1 to README.md's stated
core capabilities. The job now downloads the build artifact and
sets APM_BINARY_PATH so the resolved binary is the one under test
in the merge-queue tentative merge SHA, not a system install.

test_runtime_smoke.py stays in the heavy integration job under the
existing requires_runtime_* and requires_e2e_mode markers; the
runtime-install coverage it provides is legitimate, just misplaced
as a fail-fast smoke gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel force-pushed the feat/smoke-test-core-promises branch from 9e2fb79 to 76af775 Compare May 10, 2026 18:51
Daniel Meppiel and others added 2 commits May 10, 2026 21:00
Replace ``apm policy --help`` with ``apm policy status``. ``--help``
only verified Click wiring; ``status`` runs the real discovery
pipeline (git-remote probe, org resolution, cache lookup, rule
evaluator, status renderer) and asserts the diagnostic table fields,
giving the smoke job a real signal on the governance stack rather
than help-text theater.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the one-concise-line + "(#PR)" format used by adjacent
[Unreleased] entries, per Copilot review feedback on PR #1251.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 66ae577 into main May 10, 2026
23 checks passed
@danielmeppiel danielmeppiel deleted the feat/smoke-test-core-promises branch May 10, 2026 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants