Skip to content

fix(ci): scope-preserving PR auto-heal — never escalate to test_extend (#1403)#1440

Merged
gltanaka merged 5 commits into
mainfrom
fix/issue-1403-guard
Jun 6, 2026
Merged

fix(ci): scope-preserving PR auto-heal — never escalate to test_extend (#1403)#1440
gltanaka merged 5 commits into
mainfrom
fix/issue-1403-guard

Conversation

@Serhan-Asad
Copy link
Copy Markdown
Collaborator

Problem (#1403)

PR auto-heal was optimizing for "make the whole module synchronized" instead of "keep this PR narrowly scoped", re-bloating narrow fix PRs with unrelated test_extend output (e.g. #1390, flagged non-mergeable on 6/4).

For a Python module whose tests pass but coverage is below target, sync_determine_operation returns test_extend. In PR auto-heal, heal_module routes verify/generate/test/crash through pdd --force --strength 0.5 sync <module>, and that nested pdd sync re-derives the same coverage gap internally and appends generated tests (rewriting .pdd/meta command to test_extend). As the issue stresses, detection-only suppression is not enough — the guard must also exist at execution time.

Fix — a two-layer guard from one default-off signal

PDD_DISABLE_TEST_EXTEND, set only in PR auto-heal mode (is_pr_mode = not skip_ci):

Layer Where Behavior when flag set
Detection sync_determine_operation (test_extend_disabled()) coverage-gap branch returns the existing all_synced no-op for all languages. Called by both the parent detect_drift and the nested pdd sync, so one branch covers detection and execution re-derivation.
Execution backstop sync_orchestration test_extend branch mirrors the existing non-Python skip: logs test_extend_skipped, accepts current state, writes no test file.
Signal source ci_drift_heal.main sets os.environ flag only around the in-process detect_drift (restored in finally — no leak) and passes it explicitly in the pdd sync subprocess env.
Clean no-op ci_drift_heal.detect_drift treats all_synced as "no drift" (with nothing/synced) so the guarded module is a clean skip, not an "unknown operation" heal failure.

Push-to-main (--skip-ci) is unaffected — whole-module coverage growth still runs. The signal is default-off, so unset behavior is byte-for-byte unchanged. update / example / required auto-deps remain allowed in PR heal.

Prompts (ci_drift_heal, sync_determine_operation, sync_orchestration) updated to match (source of truth).

Tests (genuine RED → GREEN; verified failing on unpatched source)

  • sync_determine_operation: flag → all_synced; unset still returns test_extend (regression guard).
  • sync_orchestration: suppressed test_extend appends no tests, logs test_extend_skipped, saves no test_extend fingerprint; Python test_extend still runs when unset (baseline).
  • ci_drift_heal: PR mode sets the flag for in-process detection and the subprocess env, and restores os.environ (no leak); push-to-main with --diff-base keeps it unset (the discriminator is skip_ci, not diff_base).
  • e2e (test_ci_drift_heal_e2e.py): real python -m pdd.ci_drift_heal parent + PATH-level fake pdd child proves the env contract crosses the parent→child boundary and no test_extend churn is committed.

Local: test_ci_drift_heal + test_sync_determine_operation + e2e = 315 passed; test_sync_orchestration = 214 passed. Zero new ruff findings on the changed files.

Supersedes

This PR is the complete, tested version of both. Opened as draft per the issue's workaround so the (still-unpatched) auto-heal bot doesn't re-bloat this branch before merge.

Closes #1403.

🤖 Generated with Claude Code

Test User and others added 2 commits June 5, 2026 14:18
#1403)

PR auto-heal was re-bloating narrow fix PRs: for a Python module whose
tests pass but coverage is below target, `sync_determine_operation`
returns `test_extend`, and `heal_module` routes verify/generate/test/crash
through `pdd sync`, which re-derives the same coverage gap internally and
appends unrelated generated tests (rewriting `.pdd/meta` command to
`test_extend`). This made narrow PRs non-mergeable (e.g. #1390).

Add a single env-var signal, `PDD_DISABLE_TEST_EXTEND`, set only in PR
auto-heal mode (`not skip_ci`) and enforced at two layers:

- Detection (`sync_determine_operation.test_extend_disabled`): the
  coverage-gap branch returns the existing `all_synced` no-op for all
  languages when the flag is set. Because this function is called by both
  the in-process `detect_drift` and the nested `pdd sync`, one branch
  covers both the detection and execution paths the issue requires.
- Execution backstop (`sync_orchestration`): mirror the existing
  non-Python `test_extend` skip — log `test_extend_skipped`, accept the
  current state, and write no test file.
- `ci_drift_heal.main` sets the flag on `os.environ` only around the
  in-process `detect_drift` call (restored in `finally`, no leak) and
  passes it explicitly in the `pdd sync` subprocess env. Push-to-main
  (`--skip-ci`) is unaffected — whole-module coverage growth still runs.
- `detect_drift` now treats `all_synced` as "no drift" (alongside
  nothing/synced) so the guarded no-op is a clean skip, not an
  "unknown operation" heal failure.

Prompts updated to match (source of truth). Regression tests prove: PR
mode suppresses + propagates the flag, push-to-main keeps test_extend,
the orchestrator never appends tests / writes a `test_extend` fingerprint
when suppressed, and an e2e run proves the parent→child env contract.
The flag is default-off, so unset behavior is byte-for-byte unchanged.

Supersedes #1416 (prompts-only, code never regenerated) and #1432
(tests + partial code; missed the all_synced filter and env restore).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ame helper (#1403)

Codex review round 1 found a real regression: detect_drift skipped
'all_synced' BEFORE the git-based reclassification, so a PR that changed
code without its prompt (and had a low-coverage passing run_report) was
silently dropped instead of being promoted to 'update'. This also
regressed the pre-existing non-Python all_synced coverage-gap path under
--diff-base.

- detect_drift: move the 'all_synced' no-drift skip to AFTER git
  reclassification, so an all_synced module whose code changed without its
  prompt is still promoted to 'update'; only a still-terminal all_synced
  is dropped.
- Rename helper test_extend_disabled() -> is_test_extend_disabled() so
  pytest does not collect it as a test when imported into a test module,
  and so the name reads as a predicate.

New regression tests:
- detect_drift: all_synced + code-only change -> update (not dropped);
  terminal all_synced -> clean skip (never an unknown-operation failure).
- is_test_extend_disabled truthiness incl. falsey (0/false/off/'') and
  whitespace; unset -> False.
- main(): os.environ flag restored even when detect_drift raises, and a
  pre-existing value is restored exactly (not clobbered).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Serhan-Asad
Copy link
Copy Markdown
Collaborator Author

Review loop converged (claude implement ↔ codex review, 3 rounds)

  • Round 1 — codex found a real regression: detect_drift skipped all_synced before git reclassification, dropping code-only-change modules that should become update. Fixed (e6b34498c): the all_synced skip now runs after reclassification; added tests test_all_synced_with_code_only_change_reclassified_as_update + test_all_synced_terminal_is_skipped_not_unknown_operation. Also renamed test_extend_disabledis_test_extend_disabled (so pytest doesn't collect the helper) and added falsey-parser + env-restore-on-exception tests.
  • Round 2/3 — codex flagged that the coverage branch returns before file-state analysis, so an example-only change coexisting with a low-coverage run report is masked. Confirmed pre-existing (identical on main, which returns test_extend → fails the heal) and independent of this guard; the shipped non-Python all_synced branch has the same property. There is no scenario where main heals that example but this PR doesn't — the divergence is strictly fail/re-bloat → clean skip, and PDD_HEAL_SKIP_REVIEW_ONLY_EXAMPLE_DRIFT=1 already skips review-only example drift by design. Codex agreed it is not a blocker and merge-ready with a documented follow-up. Tracked in sync_determine_operation: coverage branch short-circuits before file-state drift detection #1442.

Kept as draft intentionally (issue #1403's own workaround) so the still-unpatched auto-heal bot can't re-bloat this branch before merge.

@gltanaka gltanaka merged commit 2c542f0 into main Jun 6, 2026
9 checks passed
@gltanaka gltanaka deleted the fix/issue-1403-guard branch June 6, 2026 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PR auto-heal should not escalate narrow fix PRs into unrelated test_extend coverage churn

2 participants