Surface row-count for axis-E silent coercion / drop sites by igerber · Pull Request #331 · igerber/diff-diff

igerber · 2026-04-19T12:11:32Z

Summary

Bundle the remaining axis-E (silent data sanitization) cleanups from the in-flight audit into one PR.
WooldridgeDiD (wooldridge.py): NaN cohort values were silently filled to 0 (never-treated) in both _filter_sample and fit(). Extract a shared _warn_and_fill_nan_cohort(df, cohort, stacklevel) helper so both entry paths emit one UserWarning reporting the affected row count before the .fillna(0).
ContinuousDiD (continuous_did.py): first_treat=inf was silently coerced to 0 via .replace([inf, -inf], 0). fit() now counts inf rows and emits a UserWarning with the row count before the coercion.
_compute_outcome_changes (utils.py): .dropna(subset=["_outcome_change"]) silently removed the first period per unit. The helper now distinguishes the expected first-period-per-unit drops from excess drops and emits a UserWarning with the breakdown when excess NaN first-differences are present.

Finding #25 (TROP D-matrix coercion) was verified during scoping to already be resolved at trop_local.py:60-66 — it warns with n_missing_structural and returns missing_mask. No code change required.

Methodology references (required if estimator / math changes)

Method name(s): WooldridgeDiD (ETWFE), ContinuousDiD, check_parallel_trends diagnostic helper.
Paper / source link(s): No estimator math changed. All three sites emit warnings around coercions that were already in the code; the coercions themselves (NaN cohort -> 0, inf first_treat -> 0, first-period dropna) are unchanged.
Any intentional deviations from the source (and why): None. The recategorization contract (NaN / inf as never-treated) is now documented under both §WooldridgeDiD and §ContinuousDiD in docs/methodology/REGISTRY.md.

Validation

Tests added/updated:
- tests/test_wooldridge.py::TestCohortNaNWarning — fit() + _filter_sample both warn with the NaN-cohort row count; clean-cohort run is silent.
- tests/test_continuous_did.py::TestEdgeCases::test_inf_first_treat_normalization — extended to assert the warning with the expected row count.
- tests/test_continuous_did.py::TestEdgeCases::test_no_inf_first_treat_no_warning — silent path coverage.
- tests/test_continuous_did.py::TestEdgeCases::test_inf_first_treat_warning_counts_rows_not_units — on a 4-unit x 3-period panel with 2 inf units (6 inf rows), the warning reports 6, locking in the row-level contract.
- tests/test_utils.py::TestComputeOutcomeChanges::test_silent_on_balanced_panel — clean balanced panel must not warn.
- tests/test_utils.py::TestComputeOutcomeChanges::test_warns_on_nan_outcomes_with_excess_drop_count — excess NaN-outcome drops surface via warning.
Wider regression run: tests/test_wooldridge.py + tests/test_continuous_did.py + tests/test_utils.py = 258 tests green.
Backtest / simulation / notebook evidence: N/A — this is user-visibility surfacing around existing coercions, no numeric-contract change.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Audit context: resolves findings #24, #26, #27 from the axis-E (silent data sanitization) slice of the in-flight silent-failures audit. Finding #25 (TROP D-matrix) verified as already-resolved during scoping. Closes axis E.

github-actions · 2026-04-19T12:23:42Z

Overall Assessment

⚠️ Needs changes.

The estimator math / weighting / SE behavior touched here looks fine: the ContinuousDiD first_treat=inf -> 0 convention and Wooldridge NaN cohort -> 0 convention are now explicitly documented in the Methodology Registry, so those specific deviations are informational rather than defects. The blocker is that ContinuousDiD.fit() still has another silent coercion in the same preprocessing block that this axis-E cleanup leaves untouched. I did not re-run the test suite here because this environment is missing the project Python deps.

Executive Summary

The PR’s actual methodology-facing changes are documented in the registry, so I did not find a paper/registry mismatch for the new first_treat=inf and NaN cohort warnings.
P1 [Newly identified]: ContinuousDiD.fit() still silently rewrites dose to 0.0 for first_treat == 0 rows in the same preprocessing block, so the axis-E cleanup is incomplete.
P2: _compute_outcome_changes() now emits a user-facing warning that hardcodes check_parallel_trends, even though the public callers are check_parallel_trends_robust() and equivalence_test_trends().
P3: the new diagnostic-warning tests only exercise the private helper, not the public wrappers that now surface the warning.

Methodology

P1 [Newly identified] ContinuousDiD.fit() still silently rewrites never-treated doses via diff_diff/continuous_did.py:281. The method reference still treats untreated observations as D=0 at docs/methodology/continuous-did.md:47 and docs/methodology/continuous-did.md:183, but the new registry note only documents the separate first_treat=inf -> 0 recode at docs/methodology/REGISTRY.md:723. Impact: this PR leaves the silent-sanitization pattern only partially fixed in the same estimator preprocessing path, so inconsistent input can still be silently normalized with no audit trail. Concrete fix: either raise on first_treat == 0 and dose != 0, or emit a row-count UserWarning parallel to the new first_treat warning and add a matching **Note:** to REGISTRY.md plus regression tests.
P3 The two coercions this PR does touch are properly documented deviations: ContinuousDiD at docs/methodology/REGISTRY.md:723 and Wooldridge at docs/methodology/REGISTRY.md:1307. Impact: none. Concrete fix: none.

Code Quality

P2 _compute_outcome_changes() hardcodes "check_parallel_trends dropped ..." in the new warning at diff_diff/utils.py:937, but the public callers are diff_diff/utils.py:823 and diff_diff/utils.py:1020, not check_parallel_trends(). Impact: callers of the robust/equivalence diagnostics get a warning naming the wrong API, which makes the surfaced data issue harder to trace. Concrete fix: pass a caller label into _compute_outcome_changes() or make the message helper-generic instead of naming check_parallel_trends.
P2 The new ContinuousDiD warning counts np.isinf(...) rows at diff_diff/continuous_did.py:234, but the actual recode at diff_diff/continuous_did.py:244 only replaces positive infinity. Impact: -inf inputs would trigger a warning claiming recoding to never-treated even though no such recode occurs. Concrete fix: use np.isposinf(...) to match current behavior, or explicitly reject/recode -np.inf and test that path.

Performance

No findings.

Maintainability

No additional findings beyond the partial axis-E cleanup and hardcoded warning text above.

Tech Debt

No existing TODO.md entry mitigates the blocking ContinuousDiD silent-dose coercion, so it remains unmitigated.

Security

No findings.

Documentation/Tests

P3 The new excess-drop warning is only asserted through the private helper at tests/test_utils.py:857. Impact: public API warning regressions can slip through, which is exactly why the hardcoded check_parallel_trends message is currently unprotected. Concrete fix: add warning assertions for check_parallel_trends_robust() and equivalence_test_trends() on a panel with excess NaN first differences.
P3 There is no regression test for the unresolved first_treat == 0 and dose != 0 path in ContinuousDiD. Impact: whichever behavior you choose for that contract violation is currently unguarded. Concrete fix: add a test that injects positive doses into never-treated rows and asserts the chosen warn/error contract.

Path to Approval

In diff_diff/continuous_did.py:281, stop silently normalizing first_treat == 0, dose != 0 rows. Either raise ValueError, or emit a row-count UserWarning consistent with the new axis-E surfaces and add a matching **Note:** in docs/methodology/REGISTRY.md; add regression tests for both the violating path and the clean path.

Three findings from the CI AI review on PR #331: P1 (newly identified): `ContinuousDiD.fit()` still silently rewrote `dose` to 0.0 for rows with `first_treat==0` but nonzero dose, in the same preprocessing block the PR had otherwise cleaned up. Counting those rows and emitting a `UserWarning` with the row count before the zeroing closes the axis-E slice for this estimator. REGISTRY updated with a matching **Note:** and two regression tests added (silent on clean data, warns with correct row count on polluted data). P2 (warning wording): The axis-E excess-drop warning in `_compute_outcome_changes` hardcoded "check_parallel_trends" but the function is called from `check_parallel_trends_robust` and `equivalence_test_trends`, not from the simple `check_parallel_trends`. Add a `caller_label` kwarg and thread the correct name from each public entry point; regression tests cover both callers surfacing the warning with their own name. P2 (inf asymmetry): The inf-first-treat counter used `np.isinf`, which counts both +inf and -inf, but the actual recode on the next line (`.replace([np.inf, float("inf")], 0)`) only touches +inf. A -inf input would have triggered a warning about "treating as never-treated" even though no such recode happened. Switch to `np.isposinf` so the warning count matches the recode set; add a test confirming -inf does not fire the warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T12:28:47Z

/ai-review

github-actions · 2026-04-19T12:37:27Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2d344b2b4800667c28b973542b3d81c2da6d3b45

Overall Assessment

⚠️ Needs changes.

The prior re-review findings on ContinuousDiD’s silent dose=0 coercion and the hardcoded diagnostic warning label are addressed. The remaining blocker is a new unmitigated ContinuousDiD contract gap: the PR now documents first_treat=-inf as rejected, but the estimator still does not validate or reject it and can silently exclude those units from both treated and control pools.

Executive Summary

The prior ContinuousDiD P1 is resolved: never-treated rows with nonzero dose now warn with a row count, and that deviation is documented in the registry.
The prior _compute_outcome_changes() warning-label issue is resolved: caller_label is threaded into both public wrappers, and wrapper-level warning tests were added.
P1 [Newly identified]: ContinuousDiD now says first_treat=-inf is rejected downstream, but no such validation exists; those units fall out of both treated and control masks silently.
The Wooldridge NaN cohort -> 0 warning path is now documented and covered in both fit() and _filter_sample().
Validation note: this was a static review only. I could not run the targeted tests here because pytest and pandas are unavailable in this environment.

Methodology

Severity P1 [Newly identified]. Impact: ContinuousDiD’s new note says -inf passes through and “will be rejected by downstream validators,” but there is no validator enforcing that. After the +inf -> 0 normalization, unit_cohort only treats g > 0 as treated and g == 0 as never-treated, and the per-cell masks only include those two cases. A first_treat=-inf unit therefore survives preprocessing but is omitted from both treated and control samples with no warning or error. That is a missing assumption check and a registry/code mismatch on a changed estimator path. Concrete fix: immediately after the +inf normalization, reject first_treat < 0 / -inf with a ValueError that reports the affected row count; if a different behavior is intended, document it exactly in the registry and warn before coercing or dropping. References: diff_diff/continuous_did.py:L230-L248, diff_diff/continuous_did.py:L314-L325, diff_diff/continuous_did.py:L799-L829, diff_diff/continuous_did.py:L886-L899, docs/methodology/REGISTRY.md:L723-L724, tests/test_continuous_did.py:L726-L754
Severity P3. Impact: the deviations this PR intentionally surfaces are now properly documented, so they are informational rather than defects: first_treat=+inf -> 0 and never-treated dose != 0 -> 0 in ContinuousDiD, and NaN cohort -> 0 in Wooldridge. Concrete fix: none. References: docs/methodology/REGISTRY.md:L723-L724, docs/methodology/REGISTRY.md:L1307-L1308, diff_diff/wooldridge.py:L116-L133, diff_diff/wooldridge.py:L418-L419

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see an existing TODO.md entry that mitigates the ContinuousDiD -inf validation gap.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new -inf regression only checks that the +inf recategorization warning does not fire; it does not assert the reject behavior that the new code comment and registry note now promise. That leaves the P1 contract unguarded. Concrete fix: replace the current try/except with an explicit pytest.raises(ValueError, match=...) (or whatever behavior is chosen) and keep a separate assertion that no +inf warning is emitted if needed. References: tests/test_continuous_did.py:L726-L754, docs/methodology/REGISTRY.md:L723-L723
Severity P3. Impact: test_silent_on_balanced_panel still filters only the old "check_parallel_trends dropped" string, so it would miss a regression if _compute_outcome_changes() began warning under its new generic label. Concrete fix: assert that no warnings are emitted in that helper call, or at least that no warning containing "dropped" appears. References: tests/test_utils.py:L845-L855

Path to Approval

In ContinuousDiD.fit(), add an explicit post-normalization validation that rejects first_treat < 0 / -inf with a clear error message and affected row count.
Update the ContinuousDiD note in REGISTRY.md so it matches the implemented behavior exactly.
Replace the current -inf test with a regression that asserts the chosen reject behavior.

igerber · 2026-04-19T12:41:42Z

/ai-review

github-actions · 2026-04-19T12:54:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2ace61d7286d27b36d673569a147fb9dd48372be

Overall Assessment

⚠️ Needs changes.

Executive Summary

The prior re-review blockers are resolved: ContinuousDiD now rejects negative first_treat, and _compute_outcome_changes() now threads caller-specific warning labels through both public wrappers.
Severity P1 [Newly identified]: ContinuousDiD.fit() still leaves NaN first_treat values untouched. Those units then satisfy neither the treated (g > 0) nor never-treated (g == 0) masks, so the estimation sample can shrink silently.
Severity P1 [Newly identified]: repo-wide pattern check shows StaggeredTripleDifference.fit() still silently normalizes first_treat=np.inf to 0, so the silent-coercion warning pattern is still incomplete across the remaining first_treat normalization sites.
The coercions this PR intended to surface are now properly documented in REGISTRY.md, so ContinuousDiD +inf -> 0, ContinuousDiD zeroing nonzero dose on first_treat=0, and Wooldridge NaN cohort -> 0 are P3 informational only.
Static review only: I syntax-checked the touched Python files, but I could not run pytest here because the local Python environment lacks pandas and the sandbox is read-only.

Methodology

Severity P1 [Newly identified]. Impact: ContinuousDiD’s public contract and new registry note now constrain valid first_treat inputs to 0, positive treatment periods, or +inf, but the changed validation block still does nothing for NaN. Those values survive preprocessing, unit_cohort / unit_cohorts only recognize g > 0 or g == 0, and _compute_dose_response_gt() uses the same treated/control masks, so affected units can fall out of both treated and control pools with no warning or error. Concrete fix: reject NaN first_treat before normalization with a clear ValueError and affected-row count, or, if NaN is intended to mean never-treated, document that in a **Note:** and warn before coercing. References: diff_diff/continuous_did.py:L183-L185, diff_diff/continuous_did.py:L230-L261, diff_diff/continuous_did.py:L328-L345, diff_diff/continuous_did.py:L812-L842, diff_diff/continuous_did.py:L899-L912, docs/methodology/REGISTRY.md:L723-L724.
Severity P1 [Newly identified]. Impact: under the repo’s pattern-consistency rule, the first_treat=np.inf -> 0 silent-coercion fix is still only partial. This PR adds a surfaced warning for ContinuousDiD, and the sibling normalization in diff_diff/staggered.py already warns, but StaggeredTripleDifference.fit() still silently applies .replace([np.inf, float("inf")], 0) even though its docstring accepts np.inf as a public input form. Concrete fix: mirror the warning contract already used in diff_diff/staggered.py:L1508-L1518 and add a **Note:**-labeled registry entry plus regression test, or explicitly reject np.inf there instead of silently recoding it. References: diff_diff/staggered_triple_diff.py:L221-L223, diff_diff/staggered_triple_diff.py:L285-L287, diff_diff/staggered.py:L1508-L1518, docs/methodology/REGISTRY.md:L1690-L1692, docs/methodology/REGISTRY.md:L1816-L1822.
Severity P3. Impact: the coercions this PR intentionally surfaced are now documented with **Note:** labels and warning coverage, so they are informational rather than blocking: ContinuousDiD +inf -> 0, ContinuousDiD zeroing nonzero dose on first_treat=0, and Wooldridge NaN cohort -> 0. Concrete fix: none. References: docs/methodology/REGISTRY.md:L723-L724, docs/methodology/REGISTRY.md:L1305-L1308, diff_diff/continuous_did.py:L230-L314, diff_diff/wooldridge.py:L116-L133, diff_diff/wooldridge.py:L151-L152, diff_diff/wooldridge.py:L418-L419.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Neither P1 is tracked under TODO.md:L51-L80, so neither is mitigated.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new tests cover the warning branches changed in this PR, but there is still no regression for ContinuousDiD NaN first_treat, and StaggeredTripleDifference still has a test that only asserts silent success on first_treat=np.inf. That leaves both remaining contract gaps unguarded. Concrete fix: add a pytest.raises/pytest.warns regression for the chosen ContinuousDiD NaN behavior and update the existing StaggeredTripleDifference test to assert the surfaced warning if np.inf remains supported. References: tests/test_continuous_did.py:L643-L808, tests/test_utils.py:L830-L908, tests/test_wooldridge.py:L1598-L1642, tests/test_staggered_triple_diff.py:L399-L406.

Path to Approval

In ContinuousDiD.fit(), add an explicit NaN first_treat validation before cohort construction and +inf normalization; include the affected row count in the error, or document-and-warn if coercion is intended instead.
Add a regression in tests/test_continuous_did.py for a mixed valid/NaN first_treat panel that currently would silently shrink the sample.
In StaggeredTripleDifference.fit(), surface the first_treat=np.inf -> 0 normalization with a UserWarning and convert the registry mention into a **Note:**-labeled entry if that input remains accepted.
Update tests/test_staggered_triple_diff.py:L399-L406 to assert the chosen np.inf behavior instead of silent success only.

igerber · 2026-04-19T13:01:06Z

/ai-review

github-actions · 2026-04-19T13:07:07Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d103eecaf3d493cfbaf8098e3ca7b461f4dde61d

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 findings in the changed code.
The two prior re-review blockers appear resolved: ContinuousDiD.fit() now rejects NaN first_treat before treated/control masking, and StaggeredTripleDifference.fit() now surfaces first_treat=inf -> 0 recoding with a warning.
The touched coercions are now explicitly documented in REGISTRY.md, so they are P3-informational under the review rules rather than defects.
_compute_outcome_changes() now distinguishes expected first-period drops from excess NaN first-difference drops and threads caller-specific labels through both public wrappers.
WooldridgeDiD now centralizes the NaN cohort -> 0 warning across both fit() and _filter_sample().
Static review only: I could not execute tests in this sandbox because pytest, numpy, and pandas are not installed.

Methodology

Severity: P3. Impact: No unmitigated methodology defects remain in scope. The affected methods are ContinuousDiD, StaggeredTripleDifference, WooldridgeDiD, and the parallel-trends diagnostic helper; the diff does not alter estimator math, weighting, or variance formulas, and the remaining coercions are now surfaced and documented instead of silent. Concrete fix: None. References: diff_diff/continuous_did.py:L230, diff_diff/staggered_triple_diff.py:L287, diff_diff/wooldridge.py:L116, diff_diff/utils.py:L900, docs/methodology/REGISTRY.md:L723, docs/methodology/REGISTRY.md:L1308, docs/methodology/REGISTRY.md:L1695
Severity: P3. Impact: The prior re-review P1s are addressed in the changed paths: ContinuousDiD now fails fast on NaN and negative first_treat values instead of silently letting those units fall out of both treated and control pools, and StaggeredTripleDifference no longer silently normalizes first_treat=inf. Concrete fix: None. References: diff_diff/continuous_did.py:L242, diff_diff/continuous_did.py:L261, diff_diff/staggered_triple_diff.py:L291, tests/test_continuous_did.py:L726, tests/test_continuous_did.py:L754, tests/test_staggered_triple_diff.py:L399

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not identify a new deferrable item in the changed paths that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: Coverage matches the changed behavior: ContinuousDiD warning/error paths, StaggeredTripleDifference inf -> 0 warning, _compute_outcome_changes() warning labeling, and WooldridgeDiD NaN-cohort warning all have targeted regression tests. Concrete fix: None. References: tests/test_continuous_did.py:L643, tests/test_staggered_triple_diff.py:L399, tests/test_utils.py:L830, tests/test_wooldridge.py:L1598

Static-review note: test execution was not possible here because pytest, numpy, and pandas are unavailable in the sandbox.

Three long-standing axis-E (silent data sanitization) sites now emit a UserWarning with the affected row count before the coercion, so users can no longer be silently shifted between the treated and control sides of an estimator or have rows vanish from a diagnostic pass without a signal. - WooldridgeDiD: NaN cohort values were filled to 0 (never-treated) both in `_filter_sample` and in `fit()`. Both now warn with the NaN row count before the fillna (finding #24). - ContinuousDiD: `first_treat=inf` was replaced with 0 silently. `fit()` now counts inf rows and warns before the replace, before any downstream drop-zero-dose / negative-dose validation (finding #26). - `_compute_outcome_changes` (the `check_parallel_trends` diff helper) dropped NaN first-differences without reporting the count. It now distinguishes the expected first-period-per-unit drops from excess drops caused by gaps / NaN outcomes and warns with the breakdown when excess drops are detected (finding #27). Finding #25 (TROP D-matrix coercion) was verified during scoping to already be resolved in `trop_local.py:60-66` via `n_missing_structural` + returned `missing_mask`; no code change required. REGISTRY updated under both WooldridgeDiD and ContinuousDiD to document the new warning contract. Covered by audit axis E (data sanitization). Findings #24, #26, #27 from docs/audits/silent-failures-findings.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- P2 (wooldridge): Extract shared `_warn_and_fill_nan_cohort(df, cohort, stacklevel)` helper used by both `_filter_sample` and `fit()`. Removes the copy-paste warning block that was flagged as a future drift risk. - P2 (tests): Add `test_inf_first_treat_warning_counts_rows_not_units` on a 4-unit x 3-period panel. 2 units carry inf across all 3 periods (6 inf rows, 2 inf units) — the warning must report 6, not 2, because `.replace(inf, 0)` is row-level. - P3 (utils wording): The `_compute_outcome_changes` excess-drop warning said "gaps or NaN outcomes" but the code actually counts all NaN first-differences. Rephrased to "additional NaN first-differences (e.g. NaN outcomes or unit-period gaps upstream)" so the message doesn't over-claim what the helper can detect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three findings from the CI AI review on PR #331: P1 (newly identified): `ContinuousDiD.fit()` still silently rewrote `dose` to 0.0 for rows with `first_treat==0` but nonzero dose, in the same preprocessing block the PR had otherwise cleaned up. Counting those rows and emitting a `UserWarning` with the row count before the zeroing closes the axis-E slice for this estimator. REGISTRY updated with a matching **Note:** and two regression tests added (silent on clean data, warns with correct row count on polluted data). P2 (warning wording): The axis-E excess-drop warning in `_compute_outcome_changes` hardcoded "check_parallel_trends" but the function is called from `check_parallel_trends_robust` and `equivalence_test_trends`, not from the simple `check_parallel_trends`. Add a `caller_label` kwarg and thread the correct name from each public entry point; regression tests cover both callers surfacing the warning with their own name. P2 (inf asymmetry): The inf-first-treat counter used `np.isinf`, which counts both +inf and -inf, but the actual recode on the next line (`.replace([np.inf, float("inf")], 0)`) only touches +inf. A -inf input would have triggered a warning about "treating as never-treated" even though no such recode happened. Switch to `np.isposinf` so the warning count matches the recode set; add a test confirming -inf does not fire the warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI AI re-review flagged (P1) that the previous commit claimed "-inf will be rejected by downstream validators" in both the code comment and REGISTRY.md, but no such validator existed. After the `+inf -> 0` normalization, `first_treat < 0` units fell out of both the treated (g > 0) and never-treated (g == 0) masks, so the affected units were silently excluded from the estimator — exactly the axis-E silent failure the PR was closing. - ContinuousDiD.fit() now validates `first_treat < 0` explicitly post-normalization and raises ValueError with the row count. -inf, -2, and any other negative value are all rejected. - REGISTRY.md note rewritten to match the implemented behavior. - Existing -inf test replaced with one that asserts `pytest.raises(ValueError)` matching the row-count message, plus a positive regression test confirming +inf warning stays silent on panels with only valid 0/positive `first_treat` values. - tests/test_utils.py::test_silent_on_balanced_panel tightened: the balanced-panel silence assertion now filters on any warning containing "dropped", so a regression that changed the warning label would no longer hide a genuine drop signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erage Two new P1 findings from CI AI re-review: 1. ContinuousDiD.fit() still accepted NaN `first_treat` values. NaN survives preprocessing but satisfies neither the treated (g > 0) nor never-treated (g == 0) mask, so affected units were silently excluded from both pools — same silent-failure shape as the already-rejected `first_treat < 0`. Reject NaN explicitly with ValueError + row count. 2. `StaggeredTripleDifference.fit()` silently recoded `first_treat=np.inf -> 0` via `.replace([np.inf, float("inf")], 0)`. Its sibling `staggered.py:1508-1519` already surfaces this with a UserWarning; this PR mirrors that contract so the estimator no longer shifts units between treated and never-treated pools without signaling. REGISTRY gets a matching **Note:** under the StaggeredTripleDifference section. Regression tests: - test_nan_first_treat_raises_with_row_count (ContinuousDiD) on a 4-unit x 3-period panel with 3 NaN rows. - test_inf_first_treat_works (StaggeredTripleDifference) upgraded from silent-success to `pytest.warns(UserWarning, match=row-count)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026

igerber and others added 5 commits April 19, 2026 09:52

igerber force-pushed the fix/axis-e-row-drop-counters branch from d103eec to 65d1bd8 Compare April 19, 2026 13:53

igerber merged commit 56730af into main Apr 19, 2026
19 checks passed

igerber deleted the fix/axis-e-row-drop-counters branch April 19, 2026 15:01

igerber mentioned this pull request Apr 20, 2026

Release 3.2.0: BusinessReport preview, dCDH survey completion, silent-failures audit #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface row-count for axis-E silent coercion / drop sites#331

Surface row-count for axis-E silent coercion / drop sites#331
igerber merged 5 commits intomainfrom
fix/axis-e-row-drop-counters

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 19, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant