HAD Phase 3 follow-up: joint Stute pretest + event-study workflow by igerber · Pull Request #353 · igerber/diff-diff

igerber · 2026-04-23T23:49:03Z

Summary

Adds joint Stute pretest (stute_joint_pretest + joint_pretrends_test + joint_homogeneity_test) to close the paper step-2 gap that Phase 3 did_had_pretest_workflow flagged with an "Assumption 7 pre-trends test NOT run" caveat.
Extends did_had_pretest_workflow with aggregate="event_study" multi-period dispatch (QUG + joint pre-trends + joint homogeneity-linearity); aggregate="overall" preserves Phase 3 behavior bit-exactly.
Sum-of-CvMs aggregation with shared-η Mammen wild bootstrap across horizons per unit (Delgado-Manteiga 2001); per-horizon scale-invariant exact-linear short-circuit; reciprocal front-door guards on both data-in wrappers.
Eq (18) linear-trend detrending (paper Section 5.2 Pierce-Schott p=0.51) DEFERRED to Phase 4 replication harness where the published value serves as parity anchor.

Paper: de Chaisemartin, Ciccia, D'Haultfœuille, Knau (2026, arXiv:2405.04465v6). Sections 4.2-4.3 + 5.2.

Test plan

CI: pytest tests/test_had_pretests.py (115 tests total — 69 existing Phase 3 + 46 new covering core, wrappers, workflow, and serialization).
CI: black, ruff, mypy on diff_diff/had_pretests.py, diff_diff/__init__.py, tests/test_had_pretests.py.
Verify aggregate="overall" path is bit-exact with Phase 3 (TestMultiPeriodWorkflow::test_overall_aggregate_unchanged).
Verify event-study verdict does not emit the "paper step 2 deferred" string (TestMultiPeriodWorkflow::test_no_paper_step_2_deferred_string_on_event_study).
Verify shared-η bootstrap structure (TestStuteJointPretest::test_shared_eta_across_horizons_white_box).
Verify reciprocal validator twin parity (D=0 in pre vs D>0 in post; base-period ordering).

Notes

HADPretestReport.stute and .yatchew are now Optional because the event-study path emits None on those fields. aggregate="overall" always populates them, so Phase 3 consumers are unaffected. A handful of existing tests add explicit is not None narrowing assertions.
Event-study step 3 uses joint Stute only; no joint Yatchew variant exists because the paper does not derive one. Users needing Yatchew-style adjacent-difference robustness under multi-period data can call yatchew_hr_test on each (base, post) pair manually. REGISTRY.md documents this asymmetry.
Phase 3 follow-up TODO rows 98 (joint Eq 18) and 102 (multi-period workflow dispatch) are deleted; a new row tracks Eq 18 linear-trend detrending deferred to Phase 4.

🤖 Generated with Claude Code

…patch Adds `stute_joint_pretest` (residuals-in core) plus `joint_pretrends_test` and `joint_homogeneity_test` data-in wrappers for the paper's step-2 (mean-independence pre-trends) and step-3 (linearity joint extension) nulls. Extends `did_had_pretest_workflow` with `aggregate="event_study"` multi-period dispatch that closes the "paper step 2 deferred" gap previously flagged on two-period reports. Implementation highlights: - Sum-of-CvMs aggregation (Delgado 1993; Escanciano 2006) with shared Mammen wild bootstrap multiplier across horizons per unit to preserve vector-valued empirical-process unit-level dependence (Delgado-Manteiga 2001; Hlavka-Huskova 2020). - Per-horizon scale- and translation-invariant exact-linear short-circuit (a single degenerate horizon does not collapse the joint test). - Reciprocal front-door guards on both wrappers: non-empty horizon list, base_period ordering, D=0 invariant (pre-trends) and D>0 existence (post-homogeneity). - Backward-compatible HADPretestReport extension: new fields pretrends_joint, homogeneity_joint, aggregate with defaults; stute and yatchew become Optional. summary, to_dict, to_dataframe, and __repr__ branch on aggregate and preserve Phase 3 schemas bit-exactly on the aggregate="overall" path. - Eq (18) linear-trend detrending (paper Section 5.2 Pierce-Schott p=0.51) deferred to Phase 4 replication harness where the published value serves as parity anchor; TODO row migrated accordingly. 46 new tests (115 total in tests/test_had_pretests.py) covering: K=1 parity with stute_test, shared-eta white-box, per-horizon short- circuit independence, full reciprocal-validator matrix, event-study verdict priority, serialization round-trip across aggregates. Includes regression tests asserting the "paper step 2 deferred" string is absent from any event-study verdict. Closes TODO.md Phase 3 rows for joint Eq 18 and multi-period workflow dispatch. See REGISTRY.md HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and the no-joint-Yatchew acknowledgment (the paper does not derive one; multi-period Yatchew remains available per-horizon via yatchew_hr_test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-23T23:55:51Z

Overall Assessment

Needs changes. The highest unmitigated issues are P1s in the new joint Stute APIs.

Executive Summary

Documented methodology choices are fine: sum-of-CvMs aggregation, shared-η bootstrap, no joint Yatchew variant, and the Eq. 18 detrending deferral are all recorded in docs/methodology/REGISTRY.md:L2338-L2351 and TODO.md:L98-L98.
P1: the new public data-in wrappers joint_pretrends_test and joint_homogeneity_test accept first_treat_col but never run the event-study validator, so direct calls bypass the Appendix B.2 last-cohort filter and constant-post-dose guard.
P1: the joint Stute core dropped the constant-dose degeneracy guard that stute_test already has; constant-d inputs can yield a bogus fail-to-reject in joint_pretrends_test or a singular-matrix crash in joint_homogeneity_test.
P2: the PR says the overall-path report outputs remain bit-exact, but summary(), to_dict(), and __repr__ now change on the aggregate="overall" path.
P3: docs/tests still overstate full step-2 closure for any >=3-period panel and do not cover the two edge cases above.

Methodology

P1: joint_pretrends_test and joint_homogeneity_test never use first_treat_col or call _validate_multi_period_panel / _validate_had_panel_event_study (diff_diff/had_pretests.py:L1571-L1604, diff_diff/had_pretests.py:L2030-L2164, diff_diff/had_pretests.py:L2167-L2298). Impact: the new public wrappers can silently run on staggered panels, time-varying post-dose panels, or otherwise unvalidated data even though Appendix B.2 requires last-cohort filtering and constant post-treatment dose (diff_diff/had.py:L889-L1324). Concrete fix: validate/filter inside each wrapper, operate on data_filtered, and verify that base_period / pre_periods / post_periods are subsets of the validated pre/post sets.
P1: stute_joint_pretest has no zero-variation-in-D guard, unlike stute_test (diff_diff/had_pretests.py:L1229-L1248 vs diff_diff/had_pretests.py:L1885-L1995). Impact: with constant d, joint_pretrends_test builds mean-zero intercept-only residuals (diff_diff/had_pretests.py:L2144-L2155) and the tie-collapsed CvM becomes mechanically zero (diff_diff/had_pretests.py:L837-L880), so the test can report a misleading fail-to-reject; joint_homogeneity_test can instead hit a singular [1, d] solve (diff_diff/had_pretests.py:L1991-L1995, diff_diff/had_pretests.py:L2287-L2298). Concrete fix: mirror the single-horizon behavior in the joint core: when np.var(doses_arr) <= 0, warn and return all-NaN/inconclusive output instead of computing a statistic or attempting the refit.

Code Quality

P2: the PR claims the aggregate="overall" path remains bit-exact, but HADPretestReport.__repr__, summary(), and to_dict() now always inject aggregate or aggregate headers on the legacy path (diff_diff/had_pretests.py:L610-L665), which contradicts CHANGELOG.md:L12-L12. Impact: downstream consumers comparing exact dict keys or formatted strings will see a breaking change on the supposedly unchanged path. Concrete fix: either keep the extra aggregate fields/header event-study-only or update the compatibility claim and versioning/docs to acknowledge the output change.

Performance

No findings.

Maintainability

No additional findings beyond the dead-parameter/validator-bypass issue above.

Tech Debt

No blocker. The Eq. 18 linear-trend detrending gap is explicitly tracked in TODO.md:L98-L98, so I did not count that deferral as a defect.

Security

No findings.

Documentation/Tests

P3: the registry/changelog still read as though aggregate="event_study" closes step 2 for any balanced >=3-period panel (docs/methodology/REGISTRY.md:L2328-L2336, CHANGELOG.md:L11-L12), but the implementation intentionally leaves pretrends_joint=None when there is only one pre-period (diff_diff/had_pretests.py:L2432-L2452). Impact: documentation overpromises workflow completeness, although the code correctly returns an inconclusive report. Concrete fix: document that the step-2 gap is only closed when there are at least two pre-periods.
P3: the new tests cover shared-η bootstrap and serialization, but they do not exercise direct-wrapper validation/first_treat_col propagation or constant-d joint inputs (tests/test_had_pretests.py:L1555-L1906, tests/test_had_pretests.py:L1906-L2330). Impact: both P1 issues above shipped uncovered. Concrete fix: add regression tests for direct wrapper calls on staggered panels and on all-treated same-dose panels.

Path to Approval

Make joint_pretrends_test and joint_homogeneity_test run the same event-study validation/filtering contract that did_had_pretest_workflow(..., aggregate="event_study") uses, with first_treat_col actually wired through.
Add a constant-d guard to the joint Stute core that returns NaN/inconclusive instead of a spurious p=1 or a singular-matrix failure.
Add regression tests covering those two cases.
Either restore truly bit-exact overall-path report outputs or correct the compatibility documentation.

Assumptions

I am treating joint_pretrends_test and joint_homogeneity_test as public API because they are exported from diff_diff/__init__.py:L468-L472 and documented in the registry. If they are meant to be internal-only, the safer fix is to de-export them and document the prevalidated-data requirement instead.

P1 - wrapper validator + first_treat_col wiring (had_pretests.py): `joint_pretrends_test` and `joint_homogeneity_test` now route through `_validate_had_panel_event_study` when the panel has >= 3 periods, so direct wrapper calls inherit the Appendix B.2 last-cohort filter, constant-post-dose invariant, and staggered/no-first_treat_col raise contract. `first_treat_col` is actually wired through instead of accepted-but-ignored. Subset checks (base_period in validated t_pre_list; pre_periods / post_periods subsets of the corresponding validated set) run after the validator, so callers get crisp errors on mistyped horizons rather than silent miscomputation. P1 - constant-d degeneracy guard in `stute_joint_pretest`: When `ptp(doses) <= 0` (all units share identical dose), warn and return all-NaN inference instead of computing a mechanically-zero CvM (mean-independence null - bogus fail-to-reject) or attempting a singular `[1, d]` refit (linearity null - matrix solve crash). Uses `np.ptp` rather than `np.var` because var-of-constant yields ~1e-32 rounding noise that would slip past a `<= 0` comparison. Mirrors stute_test's intent at single-horizon scale. P2 - bit-exact overall-path serialization: `HADPretestReport.__repr__`, `summary()`, and `to_dict()` now produce Phase 3-identical output when `aggregate="overall"` - no `aggregate` key in the dict, no header line in the summary, no new kwarg in the repr. The `aggregate` field remains on the dataclass internally and is surfaced in these serializations only on `aggregate="event_study"`. Restores the CHANGELOG's bit-exact compatibility claim. P3 - regression tests + docs: Four new tests cover the P1 edge cases: constant-d core path, direct-wrapper staggered panel (with and without first_treat_col), and wrapper-level constant-d propagation. REGISTRY.md and CHANGELOG.md document that step-2 closure requires >=2 pre-periods (the base `F-1` plus at least one earlier placebo); on single-pre- period panels the workflow emits `pretrends_joint=None` with a skip note in the verdict and `all_pass=False`. Existing tests updated for the new validator path: the pre-period D=0 and all-zero post-period checks now fire via the event-study validator's staggered-cohort or contiguous-dose guards before the wrapper's local reciprocal guards can run; regex matchers widened to accept either error surface. `test_to_dict_overall_preserves_phase3_schema` now asserts the ABSENCE of the `aggregate` key on the overall path to match the restored bit-exact schema. 119 tests pass (115 + 4 new R1 regressions); black/ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T00:06:07Z

/ai-review

github-actions · 2026-04-24T00:12:02Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8da8e43699e3a1260ad2e9af528713d3c2f47c07

Overall Assessment

⚠️ Needs changes

Executive Summary

Previous review blockers appear addressed: the direct wrappers now route through the event-study validator, constant-dose degeneracy now returns NaN + warning, overall-path serialization is restored, and the docs/TODO entries were updated.
One unmitigated P1 remains in the new joint Stute/event-study path: ordered-categorical time labels are validated chronologically in had.py, but the new pretest code reintroduces raw Python </> comparisons on labels.
That can silently drop valid pre-period horizons, emit a false "joint pre-trends skipped" verdict, or raise on valid post periods, so the joint tests may run on the wrong horizons.
The methodology choices called out in the PR body are documented and therefore not defects: sum-of-CvMs aggregation, shared-η bootstrap, no joint Yatchew variant, and Eq. 18 detrending deferred to Phase 4.

Methodology

Severity: P1 Impact: The new joint-pretest workflow breaks the existing ordered-categorical chronology contract. joint_pretrends_test and joint_homogeneity_test check period order with raw t < base_period / t > base_period, and did_had_pretest_workflow(..., aggregate="event_study") builds earlier_pre the same way. On supported ordered-categorical panels where lexical and chronological order differ (for example "q2" vs "q10" or month-name labels), this can silently exclude valid placebo horizons, produce a spurious "joint pre-trends skipped" path, or reject valid post-period inputs. That changes which horizons enter the joint CvM test, so the methodology is not being applied to the intended sample. Concrete fix: Compare periods by validated chronological rank, not raw label comparison. In the workflow, earlier_pre should be t_pre_list[:-1]. In both wrappers, either derive the same _sort_key/rank map used by the validator or rely on membership against validator-produced t_pre_list / t_post_list instead of raw </> checks. References: diff_diff/had_pretests.py:L2160-L2178, diff_diff/had_pretests.py:L2344-L2358, diff_diff/had_pretests.py:L2594-L2600, diff_diff/had.py:L960-L1007, diff_diff/had.py:L1214-L1229, tests/test_had.py:L2751-L2965.

Code Quality

No additional findings.

Performance

No findings.

Maintainability

No additional findings.

Tech Debt

Severity: P3 Impact: The Eq. 18 linear-trend detrending gap is properly tracked, so I did not count that deferral as a blocker. Concrete fix: None required in this PR unless you want to pull Phase 4 work forward. References: TODO.md:L98-L98, docs/methodology/REGISTRY.md:L2349-L2349.

Security

No findings.

Documentation/Tests

Severity: P3 Impact: The new had_pretests suite exercises numeric-year panels only, so it missed the ordered-categorical chronology contract that the existing HAD event-study suite already protects. That is how the P1 above slipped through. Concrete fix: Add regression tests for joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow(..., aggregate="event_study") using ordered categorical labels whose lexical and chronological order disagree. References: tests/test_had_pretests.py:L1579-L2474, tests/test_had.py:L2751-L2965.

Verification note: I did not execute the test suite in this shell because the available Python environment is missing project dependencies such as pandas.

Path to Approval

Replace all raw period-label </> comparisons in the new joint-pretest code with chronology-aware rank comparisons from the validated time order. Minimum touchpoints: joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow(..., aggregate="event_study").
Add ordered-categorical regression tests. A concrete fixture would use categories like ["q1", "q2", "q10", "post"] and assert that pretrends_joint includes both earlier pre-periods, plus a second fixture proving the homogeneity/workflow path accepts valid ordered-categorical post periods without false ordering errors.

P1 - ordered-categorical chronology: raw `t < base_period` / `t > base_period` comparisons in `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow(aggregate= "event_study")` silently misorder ordered-categorical time columns whose lexical and chronological order disagree (e.g. categories ["q1", "q2", "q10"] sort lexically as "q1" < "q10" < "q2"). On such panels the raw comparison could (a) silently drop valid pre-period horizons via the raw `<` check, (b) emit a spurious "joint pre-trends skipped" verdict from the workflow's `earlier_pre` filter, or (c) raise on valid post-period inputs. Fix: new private helper `_build_period_rank` returns a {period_label: chronological_rank} map using the ordered- categorical category order when applicable, natural sort on numeric / datetime otherwise. Both wrappers compare period labels via rank (`rank[t1] < rank[t2]`) instead of raw Python `<`/`>`. The workflow's `earlier_pre` replaces the raw-< filter with `list(t_pre_list[:-1])` - `t_pre_list` is already chronologically sorted by the validator (via its `_sort_key`), so excluding the last element yields the earlier pre-periods regardless of dtype. P3 - ordered-categorical regression tests: new `TestOrderedCategoricalChronology` class (4 tests) with a fixture using categories `["q1", "q2", "q10", "post"]`. Covers (a) direct pretrends wrapper picks up both earlier placebos, (b) pretrends wrapper rejects lexically-ordered-but-chrono-invalid input (e.g. pre=["q10"], base="q2"), (c) homogeneity wrapper accepts valid post-period input, (d) workflow event-study dispatch surfaces both earlier placebos in `pretrends_joint.horizon_labels` without the false skip note. 123 tests pass (119 + 4 new); black/ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T00:21:28Z

/ai-review

github-actions · 2026-04-24T00:27:44Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 84835defc69e7cdb36598f7c9b51425ce0ea2e5e

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior ordered-categorical chronology blocker appears resolved: the new code now compares periods by validated rank instead of raw label order, and the added regression coverage matches that fix.
Severity: P1 [Newly identified]. The new direct joint_homogeneity_test() API can still return finite results on invalid negative-dose data when called on a 2-period panel, because its n_periods < 3 fallback skips the event-study validator and then collapses unit dose with max().
The methodology choices called out in the registry are documented and not defects: sum-of-CvMs aggregation, shared-η bootstrap, no joint Yatchew variant, and Eq. 18 detrending deferred to Phase 4.
I did not find additional P1+ issues in the changed event-study workflow, chronology handling, or report-serialization paths.

Methodology

Severity: P1 [Newly identified]. Impact: joint_homogeneity_test() only routes through _validate_had_panel_event_study() when data[time_col].nunique() >= 3, then otherwise falls through to _aggregate_for_joint_test(), which defines unit dose as groupby(unit_col)[dose_col].max(). On a 2-period direct call, an invalid negative post dose is therefore silently converted into 0 before stute_joint_pretest() sees it (max(0, -d) = 0), so the wrapper can produce a finite joint-Stute result on data that violates the HAD support restriction D_{g,t} >= 0. That is a missing assumption check on a new public method, and it can change treated/control composition with no warning. Concrete fix: add a row-level non-negative-dose guard before the max() collapse, ideally in _aggregate_for_joint_test() so both data-in wrappers inherit it consistently; keep the existing row-level guard in _validate_had_panel_event_study() for the multi-period path. References: diff_diff/had_pretests.py:L1667-L1775, diff_diff/had_pretests.py:L2409-L2496, diff_diff/had_pretests.py:L1220-L1227, diff_diff/had.py:L1244-L1256.

Code Quality

Severity: none. Impact: no additional code-quality issues identified in the changed lines. Concrete fix: none.

Performance

Severity: none. Impact: no performance regressions stood out beyond the documented joint-bootstrap cost. Concrete fix: none.

Maintainability

Severity: none. Impact: no additional maintainability issues identified that should block this PR. Concrete fix: none.

Tech Debt

Severity: P3. Impact: Eq. 18 linear-trend detrending remains deferred, but it is explicitly tracked in both the methodology registry and TODO.md, so it is not a blocker under the project’s review rules. Concrete fix: none required in this PR unless Phase 4 work is being pulled forward. References: docs/methodology/REGISTRY.md:L2336-L2349, TODO.md:L98-L98.

Security

Severity: none. Impact: no security issues identified in the changed files. Concrete fix: none.

Documentation/Tests

Severity: P3. Impact: the new joint_homogeneity_test coverage exercises the validated multi-period path, but not the n_periods < 3 fallback where the missing negative-dose guard lives, which is how the P1 above slipped through. Concrete fix: add a direct-wrapper regression using a 2-period panel with mixed negative and positive post doses and assert that the call raises ValueError instead of returning a finite result. References: tests/test_had_pretests.py:L1862-L2036.

Path to Approval

Add a front-door row-level dose_col >= 0 validation on the direct wrapper path before groupby(...).max() collapses unit dose, so joint_homogeneity_test() cannot silently recode negative post doses to zero.
Add a regression test covering direct joint_homogeneity_test() on a 2-period panel with at least one negative post dose; if 2-period direct wrapper calls are intentionally unsupported, reject them explicitly and document that restriction.

Verification note: I did not execute the test suite in this shell because the available Python environment is missing project dependencies such as pandas.

P1 - row-level non-negative-dose guard in `_aggregate_for_joint_test`: On a 2-period direct call to `joint_pretrends_test` or `joint_homogeneity_test`, the n_periods < 3 path skips `_validate_had_panel_event_study` (which requires >= 3 periods) and falls through to `_aggregate_for_joint_test`. That helper collapsed unit dose via `groupby(unit_col)[dose_col].max()`, which silently recodes a negative post dose to 0 (`max(0, -d) = 0` for positive pre-period d), allowing finite joint-Stute output on data that violates the HAD support restriction `D_{g,t} >= 0` (paper Section 2). Fix: add a row-level `dose_col >= 0` check in `_aggregate_for_joint_test` BEFORE the groupby/max collapse. Centralizes the guard so both data-in wrappers inherit it on the n_periods < 3 fallback path. The multi-period path already enforces the same invariant via `_validate_had_panel_event_study`, so the contract is consistent across both wrapper dispatch modes. P3 - regression test: new `TestJointHomogeneityTest::test_two_period_negative_post_dose_raises` constructs a 2-period panel with a single unit carrying a negative post dose and asserts the wrapper raises `ValueError` with the "negative dose value" substring rather than producing a finite statistic via the groupby-max collapse. 124 tests pass (123 + 1 new R3 regression); black/ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T00:30:57Z

/ai-review

github-actions · 2026-04-24T00:38:58Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0040bad7b4679a369ec5b404389bccdec94cf25c

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior direct-wrapper negative-dose blocker appears resolved: _aggregate_for_joint_test() now rejects row-level negative doses before the groupby(...).max() collapse, and a regression covers the 2-period path at diff_diff/had_pretests.py:L1740-L1758 and tests/test_had_pretests.py:L2038-L2075.
Severity: P1 [Newly identified]. The new public stute_joint_pretest() API silently collapses distinct horizon keys whose str() representations collide, which can overwrite one horizon and double-count another in S_joint / p_value.
The methodology choices called out in the registry are documented and therefore not defects: sum-of-CvMs aggregation, shared-η wild bootstrap, no joint Yatchew variant, and Eq. 18 detrending deferred to Phase 4.
I did not execute the test suite in this shell; the available Python environment is missing pandas, so this is a source-only review.

Methodology

Severity: P1 [Newly identified]. Impact: stute_joint_pretest() accepts non-string horizon keys, but it re-keys them with str(k) before computing the joint statistic. If two distinct keys collide after stringification, one horizon is overwritten in residuals_arrays / fitted_arrays, then the surviving horizon is counted twice when per_horizon_stats and S_joint are formed. That produces wrong statistical output with no warning and an inconsistent result surface (n_horizons can exceed the number of distinct diagnostics). Concrete fix: reject non-unique str() label mappings up front, or keep raw keys throughout the computation and stringify only when materializing output. References: diff_diff/had_pretests.py:L1979-L2001, diff_diff/had_pretests.py:L2058-L2061, tests/test_had_pretests.py:L1538-L1551.
Severity: P3-informational. Impact: The remaining methodology deviations are documented rather than silent: event-study step 3 uses joint Stute only, and Eq. 18 linear-trend detrending is still deferred to Phase 4 / TODO.md. Under the project rules these are not blockers. Concrete fix: none required in this PR. References: docs/methodology/REGISTRY.md:L2347-L2349, TODO.md:L98-L98.

Code Quality

Severity: P3. Impact: the staggered-panel auto-filter warning is emitted twice on the new paths, because _validate_had_panel_event_study() already warns and the wrappers / workflow warn again off filter_info. That adds noise without new information. Concrete fix: emit the warning in exactly one layer, ideally the validator, and let wrappers/workflow consume filter_info silently. References: diff_diff/had.py:L1133-L1148, diff_diff/had_pretests.py:L2255-L2264, diff_diff/had_pretests.py:L2442-L2451, diff_diff/had_pretests.py:L2626-L2635.

Performance

No findings.

Maintainability

No findings beyond the duplicate-warning issue above.

Tech Debt

No untracked tech-debt blockers. The Eq. 18 detrending deferment is already tracked in TODO.md:L98-L98.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the new test coverage proves that non-string horizon keys are part of the supported surface, but it does not cover the string-collision edge case that triggers the P1 above. Concrete fix: add a regression using colliding labels such as {1: ..., "1": ...} and assert an explicit failure. References: tests/test_had_pretests.py:L1538-L1551.

Path to Approval

Add a uniqueness guard for stringified horizon labels in stute_joint_pretest() so distinct horizons cannot be collapsed before S_joint is computed.
Add a regression test covering colliding horizon identifiers (for example {1: ..., "1": ...}) and assert that the call fails explicitly rather than silently overwriting one horizon and double-counting the other.

P1 - stringified-label collision guard in stute_joint_pretest: The core indexed residuals_arrays / fitted_arrays by `str(k)` with no uniqueness check on the stringified keys. Two distinct raw keys whose str() forms collide (e.g. {1: ..., "1": ...} both stringify to "1", or custom objects with identical __str__) would silently overwrite one entry and then be double-counted in S_joint = sum(S_k) because the surviving horizon's statistic gets summed twice while n_horizons still reports K=2. That produces wrong methodology output with no diagnostic. Fix: compute the stringified labels once up front and reject any collision explicitly with a ValueError listing which raw keys collide to which stringified form. Centralizes the check before any residual/fitted array is dropped. Replaces the ad-hoc post-hoc re-keying with a reuse of the pre-computed collision-free list. P3 - dedupe staggered-filter UserWarning: `_validate_had_panel_event_study` already warns on the staggered auto-filter path; both joint-pretest wrappers and the event-study workflow were re-emitting the same information with a wrapper-prefixed message. Each staggered call therefore surfaced two warnings to the user. Removes the secondary emissions; wrappers now consume `_filter_info` silently. Existing tests still pass because the validator's own `"Staggered-timing panel detected"` message satisfies the regex matchers. P3 - collision regression test: new `TestStuteJointPretest::test_stringified_key_collision_raises` exercises (a) the int 1 + str "1" case and (b) a pair of custom objects with identical __str__ but distinct hash; both must raise `ValueError` with "collision after str" in the message. 125 tests pass (124 + 1 new R4 collision regression); black/ruff/ mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T00:44:15Z

/ai-review

github-actions · 2026-04-24T00:51:13Z

🔁 AI review rerun (requested by @igerber)

Head SHA: db170bd62366c83ddaab794294c69363679c9a17

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior rerereview P1 around horizon-label string collisions appears resolved: stute_joint_pretest() now rejects non-unique str() mappings before re-keying, with regression coverage.
I did not find an unmitigated methodology mismatch in the new joint/event-study implementation. The remaining deviations called out in the PR are documented in the Methodology Registry and/or TODO.md, so they are not defects.
Severity P1: the new joint Stute path raises on G < 10, so did_had_pretest_workflow(aggregate="event_study") can now error on otherwise valid small panels, or after last-cohort auto-filtering, instead of returning the NaN/inconclusive diagnostic surface used by the existing single-horizon workflow.
There is still documentation drift in-code: the top-level had_pretests.py module docstring says step 2 is deferred and the workflow is two-period-only, which no longer matches the shipped API.
This was a source-only review. I could not execute pytest here because the available shell is missing numpy and pytest.

Methodology

Severity: P3-informational. Impact: I did not find an unmitigated methodology defect in the new joint/event-study logic. The sum-of-CvMs aggregation, shared-η wild bootstrap, joint-Stute-only event-study step 3, and Eq. 18 detrending deferment are all explicitly documented in docs/methodology/REGISTRY.md:L2336-L2351; the Eq. 18 deferment is also tracked in TODO.md:L98-L98. The previously reported label-collision issue is now guarded at diff_diff/had_pretests.py:L1982-L2003, with regression coverage at tests/test_had_pretests.py:L1620-L1634.
Concrete fix: none required.

Code Quality

Severity: P1. Impact: stute_joint_pretest() hard-raises on G < 10 at diff_diff/had_pretests.py:L1966-L1967, while the existing single-horizon stute_test() explicitly treats the same condition as a warning + all-NaN result at diff_diff/had_pretests.py:L1230-L1246. Because the new event-study workflow calls the joint test directly for both step 2 and step 3 (diff_diff/had_pretests.py:L2673-L2703), a valid small multi-period panel, or a staggered panel whose last-cohort filter leaves fewer than 10 units, now crashes instead of producing an inconclusive report. That is a new edge-case regression on the public workflow path.
Concrete fix: mirror stute_test() here: emit UserWarning and return a StuteJointResult with cvm_stat_joint=np.nan, p_value=np.nan, reject=False, and full-NaN per_horizon_stats when G < _MIN_G_STUTE. Update the current small-G regression at tests/test_had_pretests.py:L1410-L1421 and add a workflow-level regression covering aggregate="event_study" after last-cohort filtering.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3-informational. Impact: Eq. 18 linear-trend detrending remains deferred, but it is explicitly documented and tracked (docs/methodology/REGISTRY.md:L2347-L2349, TODO.md:L98-L98), so it is not a blocker under the project’s deferred-work rules.
Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the top-level module docstring still says Phase 3 only ships steps 1 and 3, that step 2 is deferred, and that did_had_pretest_workflow() is a two-period-only entry point (diff_diff/had_pretests.py:L1-L26). That now contradicts the new aggregate="event_study" API and the updated registry entry, so readers relying on in-code docs will get stale methodology guidance.
Concrete fix: update the module docstring to describe both aggregate="overall" and aggregate="event_study", and narrow the remaining deferred scope to Eq. 18 detrending.

Path to Approval

Change stute_joint_pretest() to follow the existing small-sample contract used by stute_test(): warning + NaN result, not ValueError, when G < 10.
Add/update regressions so did_had_pretest_workflow(aggregate="event_study") is exercised on a valid small panel, including a staggered panel whose last-cohort filter leaves <10 units, and assert an inconclusive report rather than an exception.

P1 - stute_joint_pretest G<_MIN_G_STUTE warn+NaN contract: The joint core raised `ValueError` on G < 10, while single-horizon `stute_test` emits a `UserWarning` and returns a NaN result on the same condition. Because the event-study workflow dispatches into the joint core for both step-2 pre-trends and step-3 homogeneity, a staggered panel whose last-cohort auto-filter leaves fewer than 10 units would now crash the workflow instead of surfacing an inconclusive report - a regression versus Phase 3's two-period behavior. Fix: mirror the single-horizon contract. Emit `UserWarning` ("below the minimum ... Returning NaN result") and return a `StuteJointResult` with `cvm_stat_joint=nan`, `p_value=nan`, `reject=False`, and a full-NaN `per_horizon_stats` dict keyed by the validated horizon labels (so the diagnostic surface is consistent with the NaN-propagation branch). `n_bootstrap < _MIN_N_BOOTSTRAP` and non-numeric `alpha` still raise; only the small-G branch relaxes. Test updates: - `test_small_G_raises` renamed to `test_small_G_warns_returns_nan` and rewritten to assert the new contract. - New `test_event_study_small_panel_after_filter_inconclusive_not_ crash` covers the workflow-level regression: a staggered fixture with 40 early-cohort + 6 late-cohort units filters to G=6 after the validator's last-cohort auto-filter; `did_had_pretest_ workflow(aggregate="event_study")` now completes without exception, emits the "below the minimum" warning, and surfaces a NaN joint-Stute report with `all_pass=False`. P3 - module docstring refresh: `had_pretests.py` top-level docstring still said Phase 3 shipped steps 1 + 3 only, that step 2 was deferred, and that `did_had_pretest_workflow` was a two-period-only entry point. That drifted after the joint-pretest follow-up landed. Rewrote the docstring to describe: (a) the three single-horizon tests, (b) the three new joint helpers (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`), (c) both workflow dispatch modes (`aggregate="overall"` two-period and `aggregate="event_study"` multi-period), and (d) the narrowed deferment - only Eq. 18 linear-trend detrending remains, tracked in TODO for Phase 4 alongside the Pierce-Schott replication. 126 tests pass (125 + 1 new R5 workflow regression, -0 + 1 converted from raise to warn); black/ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T01:01:48Z

/ai-review

github-actions · 2026-04-24T01:06:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e3f7450022fc0ca781a1ecdd59e0fabd33641b27

Overall Assessment

✅ Looks good

Executive Summary

The prior P1 is resolved: stute_joint_pretest() now mirrors stute_test() on G < 10 by warning and returning NaN instead of raising, and the workflow regression is covered.
I did not find an unmitigated methodology mismatch in the new joint/event-study HAD workflow.
The main non-paper-literal choices in this PR are documented in the Methodology Registry, so they are non-blocking under the review rules.
Ordered-categorical time handling, constant-dose degeneracy, negative-dose rejection, staggered last-cohort filtering, and stringified horizon-label collisions are now explicitly guarded.
One minor in-code docs drift remains: stute_joint_pretest() still documents G < _MIN_G_STUTE as a ValueError path.
This was a source-only review; I could not run pytest because numpy, pandas, pytest, and scipy are not installed in this environment.

Methodology

Severity P3. Impact: No unmitigated methodology defect found. The non-paper-literal choices in this PR — sum-of-CvMs aggregation, shared-η bootstrap across horizons, joint-Stute-only event-study step 3, and Eq. 18 detrending deferral — are explicitly documented in docs/methodology/REGISTRY.md:L2336-L2351 and match the companion paper notes in docs/methodology/papers/dechaisemartin-2026-review.md:L189-L206. Concrete fix: none required.

Code Quality

No findings. The prior small-G crash is resolved by the warning + NaN path in diff_diff/had_pretests.py:L2070-L2091, with direct and workflow regressions at tests/test_had_pretests.py:L1410-L1432 and tests/test_had_pretests.py:L2467-L2519.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: Eq. 18 linear-trend detrending remains deferred, but it is explicitly tracked in TODO.md:L98-L98 and documented as a deliberate Phase 4 deferral in docs/methodology/REGISTRY.md:L2347-L2349. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

Severity P3. Impact: stute_joint_pretest() still says G < _MIN_G_STUTE raises ValueError in diff_diff/had_pretests.py:L1965-L1970, but the implementation now warns and returns an all-NaN StuteJointResult in diff_diff/had_pretests.py:L2070-L2091. Callers relying on the in-code contract could code against the wrong failure mode. Concrete fix: update the Raises/Notes text to describe the warning + NaN behavior.
Verification note: I could not execute pytest tests/test_had_pretests.py here because the environment is missing numpy, pandas, pytest, and scipy.

P3 - stute_joint_pretest docstring drift: The Raises block still listed `G < _MIN_G_STUTE` as a ValueError condition, but R5 converted that branch to a UserWarning + full-NaN StuteJointResult return to match single-horizon stute_test and keep the event-study workflow from crashing on staggered-filtered small panels. Fix: rewrote the Returns and Raises docstring blocks to describe the actual contract. Returns now enumerates the three NaN-result branches (small G, constant dose, any-NaN residuals / fitted) with their warning behavior. Raises is narrowed to the genuinely-raising conditions: empty input, key-mismatch, str-label collision, shape mismatch, negative doses, too-few bootstrap replicates, invalid alpha. Explicitly notes that small-G does NOT raise. No code changes; docstring-only edit. 126 tests still pass; black/ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T09:37:31Z

/ai-review

github-actions · 2026-04-24T09:46:36Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f381ed53d6392cd0c05525828ab5484e44fb43e2

Overall Assessment

✅ Looks good

Executive Summary

Affected methods reviewed: stute_joint_pretest(), joint_pretrends_test(), joint_homogeneity_test(), did_had_pretest_workflow(..., aggregate="event_study"), and the new HADPretestReport/StuteJointResult serialization surfaces.
The prior re-review docstring issue is resolved: stute_joint_pretest() now documents the G < 10 warn-and-return-NaN contract correctly at diff_diff/had_pretests.py:L1963-L1982.
I did not find an unmitigated P0/P1 methodology defect in the new joint-Stute/event-study workflow. The paper-facing behavior matches the multi-period joint-test / last-cohort structure, and the non-paper-literal choices are documented in docs/methodology/REGISTRY.md:L2336-L2351. citeturn2view1turn3view0turn3view3
One P2 code-quality issue remains for direct callers of the new public residuals-in API: singular custom design_matrix input falls through to a raw LinAlgError.
I could not execute pytest tests/test_had_pretests.py locally because pytest, numpy, pandas, and scipy are not installed in this environment.

Methodology

Severity P3. Impact: No unmitigated methodology defect found in diff_diff/had_pretests.py:L2241-L2812. The paper supports a pre-trends Stute test built from pre-period placebo residuals, a joint linearity test across post-treatment horizons in multi-period designs, and last-cohort handling under staggered timing; the PR’s departures from paper-literal implementation are explicitly documented in docs/methodology/REGISTRY.md:L2336-L2351, so they are informational rather than defects. Concrete fix: none required. citeturn2view1turn3view0turn3view3

Code Quality

Severity P2. Impact: stute_joint_pretest() is now a public API, but a rank-deficient custom design_matrix will raise a raw LinAlgError at diff_diff/had_pretests.py:L2202-L2205 instead of following the function’s otherwise front-door validation style. This does not affect the built-in wrappers, but it makes direct use of the new residuals-in core harder to handle predictably. Concrete fix: catch np.linalg.LinAlgError around the np.linalg.solve(X.T @ X, X.T) call, raise a clear ValueError for singular/rank-deficient designs, and add a regression test with duplicate columns in design_matrix.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: Eq. 18 linear-trend detrending remains deferred, but it is explicitly documented in docs/methodology/REGISTRY.md:L2347-L2349 and tracked in TODO.md:L98-L98, so this is properly recorded deferred work rather than an approval blocker. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

Severity P3. Impact: Two new docstrings still describe Yatchew as “step 4” in the workflow (diff_diff/had_pretests.py:L44-L46 and L1849-L1850), but docs/methodology/REGISTRY.md:L2328-L2332 defines step 4 as the final “use TWFE if none of the tests rejects” decision; Yatchew is the alternative linearity test within step 3. This is documentation drift only, not a behavioral bug. Concrete fix: rename those references to “step 3 alternative (Yatchew)”; while there, tighten joint_homogeneity_test()’s post_periods docstring at diff_diff/had_pretests.py:L2464-L2466 from >= base_period to > base_period to match the actual check.
Verification note: this was a source-only re-review because the local environment is missing the scientific Python test stack.

P2 - explicit ValueError on singular design_matrix: `stute_joint_pretest` previously surfaced a raw `np.linalg.LinAlgError` to direct callers when `design_matrix` was rank-deficient (e.g. duplicate columns), breaking the front-door validation style of the rest of the function. Wrap the `np.linalg.solve(X.T @ X, X.T)` precompute in a try/except and re-raise as `ValueError` with a message naming the likely cause (linearly-dependent columns) and the shape. Regression: new `TestStuteJointPretest::test_singular_design_matrix_raises_valueerror` constructs a (G, 2) design with two identical columns and asserts the explicit `ValueError("rank-deficient")`. P3 - Yatchew "step 4" -> "step 3 alternative" docstring drift: Two docstrings (module header and `_compose_verdict_event_study`) referred to the Yatchew-HR test as "step 4". Paper Section 4.2-4.3 defines step 4 as the final admissibility decision ("use TWFE if none of the tests rejects"), not a separate diagnostic; Yatchew is the alternative linearity test within step 3. Updated both docstrings to describe Yatchew as the step-3 alternative (subsumed by joint Stute on the event-study path) and clarified that paper step 4 has no separate code path. P3 - `joint_homogeneity_test` post_periods docstring: Text said `>= base_period` but the actual guard is strict `> base_period` in chronological order. Tightened the Parameters block to match. 127 tests pass (126 + 1 new R7 regression); black/ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T09:51:09Z

/ai-review

github-actions · 2026-04-24T09:56:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7c7d5cdbffb5e7a8e6df3ccae9a788047d6e03d1

Overall Assessment

✅ Looks good

Executive Summary

Reviewed the new methodology surface in diff_diff/had_pretests.py:L1908-L2874: stute_joint_pretest(), joint_pretrends_test(), joint_homogeneity_test(), and did_had_pretest_workflow(..., aggregate="event_study").
I did not find an unmitigated P0/P1 methodology defect. The implementation matches the project’s documented interpretation of the paper: intercept-only residualization for joint pre-trends, [1, D] residualization for joint post-linearity, shared-η wild bootstrap, and event-study validator reuse. See docs/methodology/REGISTRY.md:L2336-L2351 and docs/methodology/papers/dechaisemartin-2026-review.md:L189-L206.
The prior re-review P2 is resolved: rank-deficient custom design_matrix input now raises a clear ValueError instead of a raw LinAlgError, diff_diff/had_pretests.py:L2205-L2220, with regression coverage at tests/test_had_pretests.py:L1589-L1609.
The prior documentation drift is also resolved: the module/report/workflow docs now describe the event-study path and Yatchew’s role correctly, diff_diff/had_pretests.py:L1-L56, L565-L627, L2645-L2715.
Edge-case coverage is materially better: small-G NaN handling, constant-dose and negative-dose guards, ordered-categorical chronology, staggered-wrapper parity, and serialization branching are all covered in tests/test_had_pretests.py:L1672-L1749, L1849-L1916, L2128-L2157, L2192-L2529, L2574-L2779.
I could not run pytest here because pytest, numpy, pandas, and scipy are not installed in this environment.

Methodology

Severity P3. Impact: No unmitigated methodology finding. The new joint-Stute/event-study workflow matches the Methodology Registry and in-repo paper review for the shipped scope, diff_diff/had_pretests.py:L1908-L2874, docs/methodology/REGISTRY.md:L2336-L2351, docs/methodology/papers/dechaisemartin-2026-review.md:L189-L206. Concrete fix: none.
Severity P3. Impact: The two paper-nonliteral choices are explicitly documented, so they are informational rather than defects: sum-of-CvMs aggregation and using joint Stute only on the event-study step-3 path (no joint Yatchew variant), docs/methodology/REGISTRY.md:L2341-L2349. Concrete fix: none.

Code Quality

Severity P3. Impact: Previous finding resolved. Public direct callers now get a front-door ValueError on singular/rank-deficient design_matrix input instead of a raw linear-algebra exception, diff_diff/had_pretests.py:L2205-L2220; covered at tests/test_had_pretests.py:L1589-L1609. Concrete fix: none.
No additional findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: Eq. 18 linear-trend detrending remains deferred, but it is explicitly documented in the registry and tracked in TODO.md, so it is properly mitigated deferred work rather than a blocker, docs/methodology/REGISTRY.md:L2347-L2349, TODO.md:L98-L98. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

Severity P3. Impact: Previous docstring drift is resolved; the updated docs now align with the shipped workflow semantics and the event-study branch, diff_diff/had_pretests.py:L1-L56, L565-L627, L2645-L2715. Concrete fix: none.
Severity P3. Impact: Coverage on the new high-risk branches is strong, especially for chronology/order handling, staggered-wrapper behavior, small-sample NaN behavior, and event-study serialization/export surfaces, tests/test_had_pretests.py:L1672-L1749, L1849-L1916, L2128-L2157, L2192-L2529, L2574-L2779, diff_diff/__init__.py:L63-L76, L459-L473. Concrete fix: none.
Verification note: this was a source-only re-review because the local environment is missing pytest and the scientific Python stack.

…ression Closes the second P1 from the review. Python `_compute_observation_weights` had an extra `valid_control_at_t = D[t, :] == 0` gate that zeroed ω_j for units treated at the target period (other than the target unit itself). Rust's `compute_weight_matrix` has no such gate — per the paper's Eq. 2/3 and `docs/methodology/REGISTRY.md` TROP section, `ω_j = exp(-λ_unit × dist(j, i))` is distance-based for all `j ≠ i` and the treated-cell exclusion is the `(1 - W_{js})` factor applied inside `_estimate_model` via the control mask, not an extra target-period unit-weight gate. The empirical impact of removing the gate is zero on the ATT point estimate: same-cohort donors' pre-treatment rows are exactly absorbed by their own unit fixed effect `alpha_j` without propagating into `mu`, `beta`, or other units' parameters — adding them to the fit changes which rows are scored but not the solution the fit converges to. Verified: the flipped bootstrap-seed parity test, the main-fit parity test at `lambda_nn=inf` (`atol=1e-14`) and at `lambda_nn=0.1` (`atol=1e-10`), and the new same-cohort regression test (below) all pass before and after the gate removal. The change is structural alignment with the paper and Rust, not a numerical behavior shift. Test addition ------------- `TestTROPRustEdgeCaseParity::test_local_method_same_cohort_donor_parity` isolates the scenario the gate used to handle differently from Rust: a fixture with three treated units sharing one cohort (all treated at `t=5`) and three controls. Before the gate was removed, Python's and Rust's same-target-period donors diverged in which rows contributed to the fit; the tests prove the ATT point estimate was never affected (pre-treatment rows absorbed by `alpha_j`), and now both backends also agree structurally. Parametrized over the same regime split as the main-fit parity test (`lambda_nn=inf` → `atol=1e-14`, `lambda_nn=0.1` → `atol=1e-10`). Note on the other P1 in the review (HAD rollback claim): that finding was a phantom caused by a stale branch base — PR #353 (HAD joint Stute pretest) landed on `origin/main` between this branch's cut and the review run, so the PR diff against current `origin/main` appeared to "delete" the PR #353 additions. Resolved by rebasing onto the updated `origin/main` before this push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 24, 2026

igerber merged commit 869c19a into main Apr 24, 2026
23 of 24 checks passed

igerber deleted the had-joint-stute-pretest branch April 24, 2026 11:17

Conversation

igerber commented Apr 23, 2026

Summary

Test plan

Notes

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant