ci: only re-run gated matrices on ready-for-ci label transitions by igerber · Pull Request #509 · igerber/diff-diff

igerber · 2026-05-31T16:52:45Z

Summary

CI trigger refinement. The three gated matrices (rust-test.yml, notebooks.yml, docs-tests.yml) list labeled/unlabeled in their pull_request trigger and gate jobs on the ready-for-ci label. Their guards only checked the label was present, so churning any unrelated label (e.g. adding bug) on a PR that already carried ready-for-ci re-fired the entire matrix — including the multi-hour pure-Python fallback. The refined guard skips a labeled/unlabeled event unless the triggering label is ready-for-ci; code pushes and the initial ready-for-ci add are unaffected.
ci-gate.yml deliberately untouched. It is the cheap one-step required check and must keep re-evaluating on every label event to flip the gate red/green when ready-for-ci is added/removed.
Regression coverage. New TestCiWorkflowLabelEventGuard parses each guarded workflow and exact-matches the per-job if: guard, locks ci-gate.yml's presence-based contract (parsed trigger types + named-step if:), and asserts the matrices keep labeled/unlabeled in their trigger types. TestRustTestWorkflowPathFilter is extended so the guarded workflows stay under the suite's path filters (a workflow-only edit still runs the locks).
TODO cleanup. Removes the now-resolved tech-debt row (Add label-gated CI and documentation dependency map #269).

Methodology references (required if estimator / math changes)

Method name(s): N/A - no methodology changes (CI workflow + test-harness only; no diff_diff/, rust/src/, or docs/methodology/ files touched)
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: tests/test_openai_review.py (TestCiWorkflowLabelEventGuard; extended TestRustTestWorkflowPathFilter). Full file passes (243 tests). Edited workflows validated to parse and fold to the intended if: expression.
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

github-actions · 2026-05-31T16:58:11Z

Overall Assessment

Looks good. I did not find any unmitigated P0 or P1 issues.

Executive Summary

No methodology surfaces changed. The diff is confined to GitHub Actions workflows, TODO.md, and workflow-lock tests, so there is no estimator/math/SE/identification/default-behavior change to cross-check against the Methodology Registry.
The refined job guard has the right behavior for the important cases: ordinary PR updates and adding ready-for-ci still run the expensive workflows, while unrelated labeled/unlabeled churn no longer does. .github/workflows/rust-test.yml:L64-L69, .github/workflows/notebooks.yml:L29-L34, .github/workflows/docs-tests.yml:L44-L49
The pattern fix was applied consistently across all current ready-for-ci-gated jobs, and the new tests lock both the exact if: predicate and the required pull_request.types. tests/test_openai_review.py:L2107-L2320
ci-gate.yml correctly remains presence-based and label-event-sensitive, which is the right companion behavior for this change. .github/workflows/ci-gate.yml:L3-L22, tests/test_openai_review.py:L2279-L2304
I found only minor P3 follow-ups in the new regression tests and one stale comment reference.

Methodology

No findings. No methodology files or estimator code are touched.

Code Quality

No findings. The implementation is small, consistent across the affected jobs, and the guard logic matches the intended contract.

Performance

No findings. The main effect is positive: unrelated non-ready-for-ci label churn should stop re-fanning out into the expensive matrices. .github/workflows/rust-test.yml:L64-L69, .github/workflows/notebooks.yml:L29-L34, .github/workflows/docs-tests.yml:L44-L49

Maintainability

Severity: P3. Impact: TestCiWorkflowLabelEventGuard only checks the jobs enumerated in EXPECTED_JOBS, so a future new top-level job added to one of these workflows could miss the refined guard without failing this suite. Concrete fix: assert the discovered top-level job set matches an explicit allowlist, or assert every non-exempt job in each guarded workflow carries the expected if:. tests/test_openai_review.py:L2126-L2130, tests/test_openai_review.py:L2260-L2277

Tech Debt

No findings. Removing the resolved TODO entry looks justified given the consistent workflow changes and the added regression coverage.

Security

No findings. I did not see any secret-handling, permission, or injection regression in the changed workflows.

Documentation/Tests

Severity: P3. Impact: the new tests skip when a targeted workflow file is missing, so deleting or renaming one of the guarded workflows would weaken the contract test instead of failing it. Concrete fix: replace those pytest.skip(...) branches with hard assertions on file existence. tests/test_openai_review.py:L2260-L2265, tests/test_openai_review.py:L2285-L2287, tests/test_openai_review.py:L2312-L2315
Severity: P3. Impact: one breadcrumb comment now points at stale rust-test.yml line numbers, which makes future maintenance slightly harder. Concrete fix: update the comment to the current location or reference the python-fallback job by name instead of hard-coded line numbers. .github/workflows/docs-tests.yml:L67-L70

Validation note: I could not run tests/test_openai_review.py locally because pytest is not installed in this review environment; the assessment above is based on source inspection and manual guard evaluation.

github-actions · 2026-05-31T17:07:17Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 36c170b4f751b789d0d45de0128802843f502376

Overall Assessment

⚠️ Needs changes

The workflow-guard re-review items look resolved, but this PR introduces one unmitigated P1 on the PowerAnalysis methodology contract: it removes the registry/TODO/source-audit disclosures for a known analytical-power mismatch without reconciling the underlying implementation.

Executive Summary

The previous re-review P3s on the CI guard tests appear resolved: missing workflows now fail loudly, the guarded workflows were added to both rust-test.yml path filters, and the guarded-job allowlist is explicit in the new test coverage. tests/test_openai_review.py:L2077 tests/test_openai_review.py:L2281 .github/workflows/rust-test.yml:L15
No estimator implementation changed, but the PR does change the authoritative PowerAnalysis methodology surfaces: diff_diff/power.py:L1218, docs/methodology/REGISTRY.md:L3199, docs/references.rst:L241, and it removes the related TODO.md row plus the two paper-review audit files.
The current code still uses stats.norm.ppf and a Moulton-style panel variance in _get_critical_values() / _compute_variance(), but the PR removes the note that documented the mismatch and leaves the registry claiming different formulas. diff_diff/power.py:L1267 diff_diff/power.py:L1276 docs/methodology/REGISTRY.md:L3214
The new PowerAnalysis class docstring also introduces a self-canceling panel-variance factor that is wrong even relative to the current implementation. diff_diff/power.py:L1226 diff_diff/power.py:L1325
I did not run tests in this review environment; this assessment is from diff/source inspection.

Methodology

Severity: P1. Impact: PowerAnalysis remains implemented with normal critical values (stats.norm.ppf) and a panel variance sigma^2 * (1/n_treated + 1/n_control) * (1 + (T-1)*rho) / T, plus a sample-size formula derived from that implementation, but this PR removes the REGISTRY.md “under review” note, deletes the two source-audit docs, removes the TODO tracking row, and leaves the registry asserting a t-based MDE and an R^2/cluster-size-adjusted SE instead. That turns a previously documented deviation into an undocumented methodology mismatch in the project’s authoritative contract. Bloom’s paper derives MDE from normal sampling distributions and explicitly discusses the 1-R^2 and T(1-T) terms for the cross-sectional setup; Burlig’s panel-DD formula is t-based serial-correlation-robust (SCR), with separate pre/post counts and psi terms, and the paper points to pcpanel as the companion implementation. diff_diff/power.py:L1267 diff_diff/power.py:L1276 diff_diff/power.py:L1524 docs/methodology/REGISTRY.md:L3214 docs/references.rst:L246 (bpb-us-e2.wpmucdn.com)
Concrete fix: restore a REGISTRY.md **Note:** (and either the TODO row or equivalent source-audit note) until the implementation is actually reconciled, or rewrite the registry/docstring/reference text to match the implementation exactly: z-based critical values, no R^2 term in code, and an explicit note that the panel path is a Moulton/equicorrelated simplification rather than Burlig Eq. 2.
Severity: P1. Impact: the new class docstring’s panel formula multiplies by (1 + (T-1)*rho) / (1 + (T-1)*rho), which cancels to 1 and therefore contradicts _compute_variance()’s actual / T scaling. This is a newly introduced incorrect variance formula in an in-code methodology surface. diff_diff/power.py:L1226 diff_diff/power.py:L1325
Concrete fix: update the docstring to the implemented expression, or remove the explicit formula until the analytical contract is rewritten.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings. The previous workflow-test maintainability concerns appear addressed by the explicit guarded-workflow path coverage and the top-level job allowlist / hard existence assertions. tests/test_openai_review.py:L2082 tests/test_openai_review.py:L2126 tests/test_openai_review.py:L2284

Tech Debt

No separate finding beyond the P1 methodology issue above; the TODO.md cleanup is premature because it removes tracking for that still-unresolved contract mismatch.

Security

No findings.

Documentation/Tests

No separate findings beyond the P1 methodology/documentation issue above. The earlier workflow-test P3s are resolved in this revision. tests/test_openai_review.py:L2274 .github/workflows/rust-test.yml:L32

Path to Approval

Reinstate a docs/methodology/REGISTRY.md **Note:** that accurately describes the current PowerAnalysis implementation, including the z-based critical values and the non-Burlig panel approximation.
Fix the PowerAnalysis class docstring formula at diff_diff/power.py:L1226 so it matches _compute_variance().
Restore the removed methodology tracking/audit material for PowerAnalysis (TODO.md row and/or the two paper-review docs), unless this PR also changes the implementation enough to close the mismatch.
Add a focused methodology regression test that locks the documented analytical formulas against _get_critical_values(), _compute_variance(), and _compute_required_n() for one 2x2 case and one panel case.

The rust-test, notebooks, and docs-tests workflows are gated on the ready-for-ci label and list labeled/unlabeled in their pull_request trigger types. Their job guards only checked that ready-for-ci was PRESENT, so churning any unrelated label (e.g. adding `bug`) on a PR that already carried ready-for-ci re-fired the entire matrix — including the multi-hour pure-Python fallback. Refine the 7 job guards across the three matrices so a labeled/unlabeled event only (re)runs when the triggering label is ready-for-ci; code pushes and the initial ready-for-ci add are unaffected. ci-gate.yml (the cheap one-step required check) is intentionally left to re-evaluate on every label event so the gate still flips red/green on add/remove. Lock the behavior with TestCiWorkflowLabelEventGuard in tests/test_openai_review.py and drop the resolved TODO row. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T17:12:34Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e6bd4de81026dbe2e24ff677c372e16fe30abe17

Overall Assessment

✅ Looks good

All prior P1+ issues from the previous rerun that are relevant to this revision appear resolved or are no longer present in the diff. I did not find any new unmitigated P0/P1 issues in the current workflow/test-only changeset.

Executive Summary

The ready-for-ci label-transition guard is now applied consistently across all expensive matrix jobs in .github/workflows/docs-tests.yml:L42-L49, .github/workflows/notebooks.yml:L27-L34, and .github/workflows/rust-test.yml:L62-L69, .github/workflows/rust-test.yml:L101-L108, .github/workflows/rust-test.yml:L210-L218.
The new regression tests lock the exact per-job guard, the ci-gate.yml exception, and the required labeled/unlabeled trigger types in tests/test_openai_review.py:L2107-L2356.
rust-test.yml now includes docs-tests.yml, notebooks.yml, and ci-gate.yml in both push.paths and pull_request.paths, so workflow-only edits still execute the guard-contract tests that cover them. See .github/workflows/rust-test.yml:L16-L22, .github/workflows/rust-test.yml:L43-L49, and tests/test_openai_review.py:L2082-L2104.
The previous rerun’s PowerAnalysis methodology concern is not present in the current diff; this revision does not touch estimator/math/SE/methodology surfaces.
I could not execute pytest locally in this environment because pytest is not installed, so test validation here is by source inspection and guard truth-table reasoning.

Methodology

No findings. This diff is limited to CI workflows, TODO tracking, and workflow-review tests; no estimator, weighting, variance/SE, identification, or default-behavior code changed.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the resolved PR #269 TODO row is appropriate given that the underlying behavior is now fixed and pinned by explicit regression coverage.

Security

No findings.

Documentation/Tests

No findings. The added tests are appropriately targeted to the regression being fixed and cover both the guarded workflows and the ci-gate.yml non-refined exception in tests/test_openai_review.py:L2107-L2356.

igerber force-pushed the fix/ci-label-event-filter branch from 199fccc to 36c170b Compare May 31, 2026 17:02

igerber force-pushed the fix/ci-label-event-filter branch from 36c170b to e6bd4de Compare May 31, 2026 17:09

igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026

igerber merged commit 9e96c8f into main May 31, 2026
33 of 34 checks passed

igerber deleted the fix/ci-label-event-filter branch May 31, 2026 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: only re-run gated matrices on ready-for-ci label transitions#509

ci: only re-run gated matrices on ready-for-ci label transitions#509
igerber merged 1 commit into
mainfrom
fix/ci-label-event-filter

igerber commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 31, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant