fix(ci): scope-preserving PR auto-heal — never escalate to test_extend (#1403)#1440
Merged
Conversation
#1403) PR auto-heal was re-bloating narrow fix PRs: for a Python module whose tests pass but coverage is below target, `sync_determine_operation` returns `test_extend`, and `heal_module` routes verify/generate/test/crash through `pdd sync`, which re-derives the same coverage gap internally and appends unrelated generated tests (rewriting `.pdd/meta` command to `test_extend`). This made narrow PRs non-mergeable (e.g. #1390). Add a single env-var signal, `PDD_DISABLE_TEST_EXTEND`, set only in PR auto-heal mode (`not skip_ci`) and enforced at two layers: - Detection (`sync_determine_operation.test_extend_disabled`): the coverage-gap branch returns the existing `all_synced` no-op for all languages when the flag is set. Because this function is called by both the in-process `detect_drift` and the nested `pdd sync`, one branch covers both the detection and execution paths the issue requires. - Execution backstop (`sync_orchestration`): mirror the existing non-Python `test_extend` skip — log `test_extend_skipped`, accept the current state, and write no test file. - `ci_drift_heal.main` sets the flag on `os.environ` only around the in-process `detect_drift` call (restored in `finally`, no leak) and passes it explicitly in the `pdd sync` subprocess env. Push-to-main (`--skip-ci`) is unaffected — whole-module coverage growth still runs. - `detect_drift` now treats `all_synced` as "no drift" (alongside nothing/synced) so the guarded no-op is a clean skip, not an "unknown operation" heal failure. Prompts updated to match (source of truth). Regression tests prove: PR mode suppresses + propagates the flag, push-to-main keeps test_extend, the orchestrator never appends tests / writes a `test_extend` fingerprint when suppressed, and an e2e run proves the parent→child env contract. The flag is default-off, so unset behavior is byte-for-byte unchanged. Supersedes #1416 (prompts-only, code never regenerated) and #1432 (tests + partial code; missed the all_synced filter and env restore). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ame helper (#1403) Codex review round 1 found a real regression: detect_drift skipped 'all_synced' BEFORE the git-based reclassification, so a PR that changed code without its prompt (and had a low-coverage passing run_report) was silently dropped instead of being promoted to 'update'. This also regressed the pre-existing non-Python all_synced coverage-gap path under --diff-base. - detect_drift: move the 'all_synced' no-drift skip to AFTER git reclassification, so an all_synced module whose code changed without its prompt is still promoted to 'update'; only a still-terminal all_synced is dropped. - Rename helper test_extend_disabled() -> is_test_extend_disabled() so pytest does not collect it as a test when imported into a test module, and so the name reads as a predicate. New regression tests: - detect_drift: all_synced + code-only change -> update (not dropped); terminal all_synced -> clean skip (never an unknown-operation failure). - is_test_extend_disabled truthiness incl. falsey (0/false/off/'') and whitespace; unset -> False. - main(): os.environ flag restored even when detect_drift raises, and a pre-existing value is restored exactly (not clobbered). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collaborator
Author
Review loop converged (claude implement ↔ codex review, 3 rounds)
Kept as draft intentionally (issue #1403's own workaround) so the still-unpatched auto-heal bot can't re-bloat this branch before merge. |
This was referenced Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (#1403)
PR auto-heal was optimizing for "make the whole module synchronized" instead of "keep this PR narrowly scoped", re-bloating narrow fix PRs with unrelated
test_extendoutput (e.g. #1390, flagged non-mergeable on 6/4).For a Python module whose tests pass but coverage is below target,
sync_determine_operationreturnstest_extend. In PR auto-heal,heal_moduleroutesverify/generate/test/crashthroughpdd --force --strength 0.5 sync <module>, and that nestedpdd syncre-derives the same coverage gap internally and appends generated tests (rewriting.pdd/metacommand totest_extend). As the issue stresses, detection-only suppression is not enough — the guard must also exist at execution time.Fix — a two-layer guard from one default-off signal
PDD_DISABLE_TEST_EXTEND, set only in PR auto-heal mode (is_pr_mode = not skip_ci):sync_determine_operation(test_extend_disabled())all_syncedno-op for all languages. Called by both the parentdetect_driftand the nestedpdd sync, so one branch covers detection and execution re-derivation.sync_orchestrationtest_extendbranchtest_extend_skipped, accepts current state, writes no test file.ci_drift_heal.mainos.environflag only around the in-processdetect_drift(restored infinally— no leak) and passes it explicitly in thepdd syncsubprocessenv.ci_drift_heal.detect_driftall_syncedas "no drift" (withnothing/synced) so the guarded module is a clean skip, not an "unknown operation" heal failure.Push-to-main (
--skip-ci) is unaffected — whole-module coverage growth still runs. The signal is default-off, so unset behavior is byte-for-byte unchanged.update/example/ requiredauto-depsremain allowed in PR heal.Prompts (
ci_drift_heal,sync_determine_operation,sync_orchestration) updated to match (source of truth).Tests (genuine RED → GREEN; verified failing on unpatched source)
sync_determine_operation: flag →all_synced; unset still returnstest_extend(regression guard).sync_orchestration: suppressedtest_extendappends no tests, logstest_extend_skipped, saves notest_extendfingerprint; Pythontest_extendstill runs when unset (baseline).ci_drift_heal: PR mode sets the flag for in-process detection and the subprocess env, and restoresos.environ(no leak); push-to-main with--diff-basekeeps it unset (the discriminator isskip_ci, notdiff_base).test_ci_drift_heal_e2e.py): realpython -m pdd.ci_drift_healparent + PATH-level fakepddchild proves the env contract crosses the parent→child boundary and notest_extendchurn is committed.Local:
test_ci_drift_heal+test_sync_determine_operation+ e2e = 315 passed;test_sync_orchestration= 214 passed. Zero new ruff findings on the changed files.Supersedes
pdd change) — edited prompt prose only; code was never regenerated, so the guard was non-functional.pdd fix) — failing tests + partial code; missed theall_synceddetect_drift filter (guarded module became an "unknown operation" failure) and leftos.environmutated.This PR is the complete, tested version of both. Opened as draft per the issue's workaround so the (still-unpatched) auto-heal bot doesn't re-bloat this branch before merge.
Closes #1403.
🤖 Generated with Claude Code