test(coverage): enforce 75% unit gate and push integration to 60% by sergio-sisternes-epam · Pull Request #1414 · microsoft/apm

sergio-sisternes-epam · 2026-05-20T08:07:52Z

TL;DR

Enforce a 75% unit coverage gate in CI and push integration test coverage from 44% to 60% with 36 new test files (~2,760 tests). Unit coverage is report-only no longer — PRs that regress below 75% now fail the Tests job. Integration remains report-only with a Phase 3 placeholder.

Note

Closes #1401. Phase 2 of the progressive coverage ratchet (#1398). Phase 1 shipped in #1404.

Problem (WHY)

Unit coverage has no enforcement gate — a PR could delete half the test suite and CI would still pass. The only signal was a $GITHUB_STEP_SUMMARY table nobody was required to read.
Integration test coverage sat at 44%, leaving large surfaces (adapters, marketplace resolver, policy checks, CLI commands, MCP registry, diagnostics) untested at the integration boundary.
[!] Without a gate, coverage drifts downward over time as new features land without proportional test additions — the ratchet pattern in #1398 exists precisely to prevent this.

Approach (WHAT)

#	Fix
1	Add `fail_under = 75` to `[tool.coverage.report]` in `pyproject.toml` — pytest-cov enforces the gate automatically when `--cov` is passed.
2	Add Phase 3 placeholder comment in `ci-integration.yml` recording the 44% baseline and 54% target — no active gate yet.
3	Add 36 integration test files covering adapters, CLI commands, compilation, deps, marketplace, policy, MCP, diagnostics, and the new copilot-app target modules.

Implementation (HOW)

pyproject.toml — One line: fail_under = 75 under [tool.coverage.report]. pytest-cov reads this and exits non-zero when coverage drops below 75%. The current unit baseline is ~78%, giving 3 points of headroom.
.github/workflows/ci-integration.yml — Three comment lines recording the integration baseline (44%) and Phase 3 target (54%). The coverage step retains continue-on-error: true — no enforcement.
tests/integration/test_*_coverage.py (36 files) — Hermetic integration tests that exercise real code paths with minimal mocking (only external I/O: HTTP, subprocess, os.environ, Path.home). Organised in progressive waves covering: CLI commands via CliRunner, pure-logic functions, adapter helpers, marketplace/policy/registry operations, dep resolution, MCP registry, diagnostics/validation, and the new copilot-app target modules from feat(experimental): copilot-app target deploys scheduled prompts to App DB #1405.

Diagrams

Legend: How the coverage gate integrates into the existing CI pipeline — the unit job now has a hard gate while integration remains report-only.

flowchart LR
    subgraph Unit["Unit CI (ci.yml)"]
        UT[pytest --cov] --> CK{coverage >= 75%?}
        CK -->|yes| PASS[job passes]
        CK -->|no| FAIL[job fails]:::new
    end
    subgraph Integration["Integration CI (ci-integration.yml)"]
        S1[shard 1] --> FI[fan-in]
        S2[shard 2] --> FI
        S3[shard 3] --> FI
        S4[shard 4] --> FI
        FI --> SUM["summary (report-only)"]
    end
    classDef new stroke-dasharray: 5 5;
    class FAIL new;

Trade-offs

Gate at 75%, not 77%. The ratchet rule in [FEATURE] Progressive test coverage gates (strangler-fig ratchet) #1398 suggests actual - 3 when actual exceeds the gate by 5+ points (78 - 3 = 75 exactly). Kept 75% to give the community margin during refactors — the gate can be ratcheted up in a future Phase.
Integration remains report-only. Enforcing an integration gate requires stable network-independent tests; the 16 test_skill_bundle_live failures prove we are not there yet. Phase 3 will activate the gate once those are addressed.
Wave-numbered test files. Test files retain wave numbering (test_wave3_*, test_wave7_*) from the iterative development process. Renaming would produce a large churn commit with no coverage benefit — a follow-up housekeeping PR is the right venue.
Branch coverage included. pyproject.toml has branch = true, so the 60% integration figure is combined line+branch coverage (64% statement-only, 50% branch-only). This is stricter but more meaningful.

Benefits

Unit regressions blocked — any PR dropping unit coverage below 75% fails CI before merge.
Integration coverage +16 percentage points — from 44% to 60%, covering ~31,000 of ~52,000 line+branch targets.
2,760+ new integration tests — exercising CLI commands, adapters, marketplace, policy, MCP, diagnostics, and copilot-app modules.
Phase 3 baseline recorded — the 44% baseline and 54% target are documented in ci-integration.yml for the next ratchet step.

Validation

Lint (clean)

$ uv run --extra dev ruff check src/ tests/
All checks passed!
$ uv run --extra dev ruff format --check src/ tests/
830 files already formatted

Unit tests (8,815 passed, coverage ~78%)

$ uv run --extra dev pytest tests/unit tests/test_console.py -n auto --dist worksteal -q --tb=line
8815 passed, 1 skipped, 1 warning, 33 subtests passed in 8.74s

Integration tests (2,622 passed, coverage 60%)

$ uv run --extra dev pytest tests/integration/ -q --tb=line --cov=apm_cli --cov-report=json:coverage-integration.json --cov-config=pyproject.toml --override-ini="addopts="
2622 passed, 222 skipped, 2 xfailed in 99.14s
Coverage: 60% (23,918/37,423 stmts, 7,324/14,800 branches)

16 failures in test_skill_bundle_live.py are pre-existing network-dependent tests that pass in CI.

Scenario Evidence

#	Scenario (user promise)	Principle(s)	Test(s) proving it	Type
1	A PR that drops unit coverage below 75% fails the CI Tests job	Governed by policy	`pyproject.toml` `fail_under = 75` enforced by pytest-cov (verified locally by setting `fail_under = 99` and confirming non-zero exit)	config
2	Integration coverage is visible but does not block PRs	DevX	`.github/workflows/ci-integration.yml` line 274: `continue-on-error: true`	config
3	`apm install`, `apm audit`, `apm outdated`, `apm view` CLI commands work end-to-end on valid project fixtures	DevX, Portability by manifest	`tests/integration/test_commands_deep_coverage.py` `tests/integration/test_wave5_e2e_coverage.py`	integration
4	Policy checks enforce allow/deny rules and emit correct diagnostics	Governed by policy, Secure by default	`tests/integration/test_policy_coverage.py` `tests/integration/test_wave7_policy_registry_coverage.py`	integration
5	Copilot-app target namespacing, URI encoding, and DB resolution work correctly	Multi-harness support, Portability by manifest	`tests/integration/test_copilot_app_targets_coverage.py`	integration

How to test

Pull the branch and run uv run --extra dev pytest tests/unit tests/test_console.py -n auto --dist worksteal --cov — verify exit code 0 and coverage >= 75%.
Temporarily set fail_under = 99 in pyproject.toml and re-run — verify exit code is non-zero (FAIL Required test coverage of 99.0% not reached).
Run uv run --extra dev pytest tests/integration/ -q --cov=apm_cli --override-ini="addopts=" — verify 2,600+ tests pass and coverage >= 60%.
Confirm CI passes on the PR (unit gate green, integration report-only).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add fail_under = 75 to [tool.coverage.report] in pyproject.toml so pytest-cov exits non-zero when unit coverage drops below 75%. Integration coverage remains report-only; a placeholder comment marks where the gate will be added once the baseline is established. Closes #1401 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Baseline measured locally at 44% (518 passed, 222 skipped). Phase 3 target set to 54% (baseline + 10 points). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add 57 integration tests covering: - DependencyReference.parse() (27 tests): All shorthand, HTTPS, SSH, SCP, virtual packages, local paths, Azure DevOps, edge cases - DownloadDelegate.resilient_get() (5 tests): HTTP resilience with 429/503 rate-limit retry logic, connection error retry, exhaustion - compile command (3 tests): Minimal project, skills directory, agents directory - outdated command helpers (14 tests): Tag detection, remote tip finding, version stripping, marketplace checks - view command helpers (4 tests): Package path resolution, fallback scan, traversal attack rejection - GitHub downloader utilities (3 tests): Repository cleanup, progress reporting Coverage strategy: Execute all Python code (no mocking), mock only external I/O boundaries (HTTP requests, subprocess). All tests passing with ruff/format. Results: - 57/57 tests passing ✓ - apm_cli.models.dependency.reference: 43% coverage - Tested on real file I/O, CliRunner, request retry logic Fixes #1401 Phase 2 commitment

- 40 comprehensive integration tests covering 8 APM CLI command modules - Tests for: prune, run, config, experimental, update, runtime, policy, and _format_target_label - 44% code coverage across 989 statements in target files (baseline) - Test classes organized by command with helper functions for fixture setup - All tests passing with proper Click CliRunner isolation - Includes error handling and edge case validation tests

Add 34 integration test files covering: - CLI commands (compile, deps, install, view, init, pack, mcp, audit, uninstall) - Adapters (copilot, codex, vscode env helpers) - Dependencies (download strategies, resolver, dep references) - Marketplace (resolver, publisher, builder, yml_editor) - Policy (discovery, checks, registry operations) - Models (validation, detection evidence, lockfile) - Utils (diagnostics, formatters, console) - Integration patterns (mcp_integrator, hook_integrator, skill_integrator) Coverage: 44.12% -> 60.09% (line+branch combined metric) - 2,485 integration tests passing (up from 518) - All 8,733 unit tests still passing - Only pre-existing test_skill_bundle_live.py failures (require network) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ertion - 137 new tests for copilot_app_db, targets, prompt_integrator, experimental - Fix test_outdated_no_lockfile assertion to match updated message - Covers new code from #1405 (copilot-app target) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Sergio Sisternes and others added 6 commits May 20, 2026 08:47

Record integration coverage baseline and 54% target

ef45b18

Baseline measured locally at 44% (518 passed, 222 skipped). Phase 3 target set to 54% (baseline + 10 points). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sergio-sisternes-epam requested a review from danielmeppiel as a code owner May 20, 2026 08:07

Copilot AI review requested due to automatic review settings May 20, 2026 08:07

sergio-sisternes-epam added the testing Deprecated: use area/testing. Kept for issue history; will be removed in milestone 0.10.0. label May 20, 2026

Copilot AI reviewed May 20, 2026

View reviewed changes

danielmeppiel approved these changes May 20, 2026

View reviewed changes

danielmeppiel added this pull request to the merge queue May 20, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(coverage): enforce 75% unit gate and push integration to 60%#1414

test(coverage): enforce 75% unit gate and push integration to 60%#1414
sergio-sisternes-epam wants to merge 6 commits into
mainfrom
coverage/1401-phase2

sergio-sisternes-epam commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sergio-sisternes-epam commented May 20, 2026

TL;DR

Problem (WHY)

Approach (WHAT)

Implementation (HOW)

Diagrams

Trade-offs

Benefits

Validation

Scenario Evidence

How to test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants