Skip to content

ci: only re-run gated matrices on ready-for-ci label transitions#509

Merged
igerber merged 1 commit into
mainfrom
fix/ci-label-event-filter
May 31, 2026
Merged

ci: only re-run gated matrices on ready-for-ci label transitions#509
igerber merged 1 commit into
mainfrom
fix/ci-label-event-filter

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 31, 2026

Summary

  • CI trigger refinement. The three gated matrices (rust-test.yml, notebooks.yml, docs-tests.yml) list labeled/unlabeled in their pull_request trigger and gate jobs on the ready-for-ci label. Their guards only checked the label was present, so churning any unrelated label (e.g. adding bug) on a PR that already carried ready-for-ci re-fired the entire matrix — including the multi-hour pure-Python fallback. The refined guard skips a labeled/unlabeled event unless the triggering label is ready-for-ci; code pushes and the initial ready-for-ci add are unaffected.
  • ci-gate.yml deliberately untouched. It is the cheap one-step required check and must keep re-evaluating on every label event to flip the gate red/green when ready-for-ci is added/removed.
  • Regression coverage. New TestCiWorkflowLabelEventGuard parses each guarded workflow and exact-matches the per-job if: guard, locks ci-gate.yml's presence-based contract (parsed trigger types + named-step if:), and asserts the matrices keep labeled/unlabeled in their trigger types. TestRustTestWorkflowPathFilter is extended so the guarded workflows stay under the suite's path filters (a workflow-only edit still runs the locks).
  • TODO cleanup. Removes the now-resolved tech-debt row (Add label-gated CI and documentation dependency map #269).

Methodology references (required if estimator / math changes)

  • Method name(s): N/A - no methodology changes (CI workflow + test-harness only; no diff_diff/, rust/src/, or docs/methodology/ files touched)
  • Paper / source link(s): N/A
  • Any intentional deviations from the source (and why): None

Validation

  • Tests added/updated: tests/test_openai_review.py (TestCiWorkflowLabelEventGuard; extended TestRustTestWorkflowPathFilter). Full file passes (243 tests). Edited workflows validated to parse and fold to the intended if: expression.
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

Looks good. I did not find any unmitigated P0 or P1 issues.

Executive Summary

Methodology

No findings. No methodology files or estimator code are touched.

Code Quality

No findings. The implementation is small, consistent across the affected jobs, and the guard logic matches the intended contract.

Performance

No findings. The main effect is positive: unrelated non-ready-for-ci label churn should stop re-fanning out into the expensive matrices. .github/workflows/rust-test.yml:L64-L69, .github/workflows/notebooks.yml:L29-L34, .github/workflows/docs-tests.yml:L44-L49

Maintainability

  • Severity: P3. Impact: TestCiWorkflowLabelEventGuard only checks the jobs enumerated in EXPECTED_JOBS, so a future new top-level job added to one of these workflows could miss the refined guard without failing this suite. Concrete fix: assert the discovered top-level job set matches an explicit allowlist, or assert every non-exempt job in each guarded workflow carries the expected if:. tests/test_openai_review.py:L2126-L2130, tests/test_openai_review.py:L2260-L2277

Tech Debt

No findings. Removing the resolved TODO entry looks justified given the consistent workflow changes and the added regression coverage.

Security

No findings. I did not see any secret-handling, permission, or injection regression in the changed workflows.

Documentation/Tests

Validation note: I could not run tests/test_openai_review.py locally because pytest is not installed in this review environment; the assessment above is based on source inspection and manual guard evaluation.

@igerber igerber force-pushed the fix/ci-label-event-filter branch from 199fccc to 36c170b Compare May 31, 2026 17:02
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 36c170b4f751b789d0d45de0128802843f502376


Overall Assessment

⚠️ Needs changes

The workflow-guard re-review items look resolved, but this PR introduces one unmitigated P1 on the PowerAnalysis methodology contract: it removes the registry/TODO/source-audit disclosures for a known analytical-power mismatch without reconciling the underlying implementation.

Executive Summary

Methodology

  • Severity: P1. Impact: PowerAnalysis remains implemented with normal critical values (stats.norm.ppf) and a panel variance sigma^2 * (1/n_treated + 1/n_control) * (1 + (T-1)*rho) / T, plus a sample-size formula derived from that implementation, but this PR removes the REGISTRY.md “under review” note, deletes the two source-audit docs, removes the TODO tracking row, and leaves the registry asserting a t-based MDE and an R^2/cluster-size-adjusted SE instead. That turns a previously documented deviation into an undocumented methodology mismatch in the project’s authoritative contract. Bloom’s paper derives MDE from normal sampling distributions and explicitly discusses the 1-R^2 and T(1-T) terms for the cross-sectional setup; Burlig’s panel-DD formula is t-based serial-correlation-robust (SCR), with separate pre/post counts and psi terms, and the paper points to pcpanel as the companion implementation. diff_diff/power.py:L1267 diff_diff/power.py:L1276 diff_diff/power.py:L1524 docs/methodology/REGISTRY.md:L3214 docs/references.rst:L246 (bpb-us-e2.wpmucdn.com)
    Concrete fix: restore a REGISTRY.md **Note:** (and either the TODO row or equivalent source-audit note) until the implementation is actually reconciled, or rewrite the registry/docstring/reference text to match the implementation exactly: z-based critical values, no R^2 term in code, and an explicit note that the panel path is a Moulton/equicorrelated simplification rather than Burlig Eq. 2.

  • Severity: P1. Impact: the new class docstring’s panel formula multiplies by (1 + (T-1)*rho) / (1 + (T-1)*rho), which cancels to 1 and therefore contradicts _compute_variance()’s actual / T scaling. This is a newly introduced incorrect variance formula in an in-code methodology surface. diff_diff/power.py:L1226 diff_diff/power.py:L1325
    Concrete fix: update the docstring to the implemented expression, or remove the explicit formula until the analytical contract is rewritten.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings. The previous workflow-test maintainability concerns appear addressed by the explicit guarded-workflow path coverage and the top-level job allowlist / hard existence assertions. tests/test_openai_review.py:L2082 tests/test_openai_review.py:L2126 tests/test_openai_review.py:L2284

Tech Debt

No separate finding beyond the P1 methodology issue above; the TODO.md cleanup is premature because it removes tracking for that still-unresolved contract mismatch.

Security

No findings.

Documentation/Tests

No separate findings beyond the P1 methodology/documentation issue above. The earlier workflow-test P3s are resolved in this revision. tests/test_openai_review.py:L2274 .github/workflows/rust-test.yml:L32

Path to Approval

  1. Reinstate a docs/methodology/REGISTRY.md **Note:** that accurately describes the current PowerAnalysis implementation, including the z-based critical values and the non-Burlig panel approximation.
  2. Fix the PowerAnalysis class docstring formula at diff_diff/power.py:L1226 so it matches _compute_variance().
  3. Restore the removed methodology tracking/audit material for PowerAnalysis (TODO.md row and/or the two paper-review docs), unless this PR also changes the implementation enough to close the mismatch.
  4. Add a focused methodology regression test that locks the documented analytical formulas against _get_critical_values(), _compute_variance(), and _compute_required_n() for one 2x2 case and one panel case.

The rust-test, notebooks, and docs-tests workflows are gated on the
ready-for-ci label and list labeled/unlabeled in their pull_request
trigger types. Their job guards only checked that ready-for-ci was
PRESENT, so churning any unrelated label (e.g. adding `bug`) on a PR that
already carried ready-for-ci re-fired the entire matrix — including the
multi-hour pure-Python fallback.

Refine the 7 job guards across the three matrices so a labeled/unlabeled
event only (re)runs when the triggering label is ready-for-ci; code
pushes and the initial ready-for-ci add are unaffected. ci-gate.yml (the
cheap one-step required check) is intentionally left to re-evaluate on
every label event so the gate still flips red/green on add/remove.

Lock the behavior with TestCiWorkflowLabelEventGuard in
tests/test_openai_review.py and drop the resolved TODO row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the fix/ci-label-event-filter branch from 36c170b to e6bd4de Compare May 31, 2026 17:09
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e6bd4de81026dbe2e24ff677c372e16fe30abe17


Overall Assessment

✅ Looks good

All prior P1+ issues from the previous rerun that are relevant to this revision appear resolved or are no longer present in the diff. I did not find any new unmitigated P0/P1 issues in the current workflow/test-only changeset.

Executive Summary

Methodology

No findings. This diff is limited to CI workflows, TODO tracking, and workflow-review tests; no estimator, weighting, variance/SE, identification, or default-behavior code changed.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the resolved PR #269 TODO row is appropriate given that the underlying behavior is now fixed and pinned by explicit regression coverage.

Security

No findings.

Documentation/Tests

No findings. The added tests are appropriately targeted to the regression being fixed and cover both the guarded workflows and the ci-gate.yml non-refined exception in tests/test_openai_review.py:L2107-L2356.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026
@igerber igerber merged commit 9e96c8f into main May 31, 2026
33 of 34 checks passed
@igerber igerber deleted the fix/ci-label-event-filter branch May 31, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant