Skip to content

Surface row-count for axis-E silent coercion / drop sites#331

Merged
igerber merged 5 commits intomainfrom
fix/axis-e-row-drop-counters
Apr 19, 2026
Merged

Surface row-count for axis-E silent coercion / drop sites#331
igerber merged 5 commits intomainfrom
fix/axis-e-row-drop-counters

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 19, 2026

Summary

  • Bundle the remaining axis-E (silent data sanitization) cleanups from the in-flight audit into one PR.
  • WooldridgeDiD (wooldridge.py): NaN cohort values were silently filled to 0 (never-treated) in both _filter_sample and fit(). Extract a shared _warn_and_fill_nan_cohort(df, cohort, stacklevel) helper so both entry paths emit one UserWarning reporting the affected row count before the .fillna(0).
  • ContinuousDiD (continuous_did.py): first_treat=inf was silently coerced to 0 via .replace([inf, -inf], 0). fit() now counts inf rows and emits a UserWarning with the row count before the coercion.
  • _compute_outcome_changes (utils.py): .dropna(subset=["_outcome_change"]) silently removed the first period per unit. The helper now distinguishes the expected first-period-per-unit drops from excess drops and emits a UserWarning with the breakdown when excess NaN first-differences are present.

Finding #25 (TROP D-matrix coercion) was verified during scoping to already be resolved at trop_local.py:60-66 — it warns with n_missing_structural and returns missing_mask. No code change required.

Methodology references (required if estimator / math changes)

  • Method name(s): WooldridgeDiD (ETWFE), ContinuousDiD, check_parallel_trends diagnostic helper.
  • Paper / source link(s): No estimator math changed. All three sites emit warnings around coercions that were already in the code; the coercions themselves (NaN cohort -> 0, inf first_treat -> 0, first-period dropna) are unchanged.
  • Any intentional deviations from the source (and why): None. The recategorization contract (NaN / inf as never-treated) is now documented under both §WooldridgeDiD and §ContinuousDiD in docs/methodology/REGISTRY.md.

Validation

  • Tests added/updated:
    • tests/test_wooldridge.py::TestCohortNaNWarningfit() + _filter_sample both warn with the NaN-cohort row count; clean-cohort run is silent.
    • tests/test_continuous_did.py::TestEdgeCases::test_inf_first_treat_normalization — extended to assert the warning with the expected row count.
    • tests/test_continuous_did.py::TestEdgeCases::test_no_inf_first_treat_no_warning — silent path coverage.
    • tests/test_continuous_did.py::TestEdgeCases::test_inf_first_treat_warning_counts_rows_not_units — on a 4-unit x 3-period panel with 2 inf units (6 inf rows), the warning reports 6, locking in the row-level contract.
    • tests/test_utils.py::TestComputeOutcomeChanges::test_silent_on_balanced_panel — clean balanced panel must not warn.
    • tests/test_utils.py::TestComputeOutcomeChanges::test_warns_on_nan_outcomes_with_excess_drop_count — excess NaN-outcome drops surface via warning.
  • Wider regression run: tests/test_wooldridge.py + tests/test_continuous_did.py + tests/test_utils.py = 258 tests green.
  • Backtest / simulation / notebook evidence: N/A — this is user-visibility surfacing around existing coercions, no numeric-contract change.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Audit context: resolves findings #24, #26, #27 from the axis-E (silent data sanitization) slice of the in-flight silent-failures audit. Finding #25 (TROP D-matrix) verified as already-resolved during scoping. Closes axis E.

@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes.

The estimator math / weighting / SE behavior touched here looks fine: the ContinuousDiD first_treat=inf -> 0 convention and Wooldridge NaN cohort -> 0 convention are now explicitly documented in the Methodology Registry, so those specific deviations are informational rather than defects. The blocker is that ContinuousDiD.fit() still has another silent coercion in the same preprocessing block that this axis-E cleanup leaves untouched. I did not re-run the test suite here because this environment is missing the project Python deps.

Executive Summary

  • The PR’s actual methodology-facing changes are documented in the registry, so I did not find a paper/registry mismatch for the new first_treat=inf and NaN cohort warnings.
  • P1 [Newly identified]: ContinuousDiD.fit() still silently rewrites dose to 0.0 for first_treat == 0 rows in the same preprocessing block, so the axis-E cleanup is incomplete.
  • P2: _compute_outcome_changes() now emits a user-facing warning that hardcodes check_parallel_trends, even though the public callers are check_parallel_trends_robust() and equivalence_test_trends().
  • P3: the new diagnostic-warning tests only exercise the private helper, not the public wrappers that now surface the warning.

Methodology

Code Quality

  • P2 _compute_outcome_changes() hardcodes "check_parallel_trends dropped ..." in the new warning at diff_diff/utils.py:937, but the public callers are diff_diff/utils.py:823 and diff_diff/utils.py:1020, not check_parallel_trends(). Impact: callers of the robust/equivalence diagnostics get a warning naming the wrong API, which makes the surfaced data issue harder to trace. Concrete fix: pass a caller label into _compute_outcome_changes() or make the message helper-generic instead of naming check_parallel_trends.
  • P2 The new ContinuousDiD warning counts np.isinf(...) rows at diff_diff/continuous_did.py:234, but the actual recode at diff_diff/continuous_did.py:244 only replaces positive infinity. Impact: -inf inputs would trigger a warning claiming recoding to never-treated even though no such recode occurs. Concrete fix: use np.isposinf(...) to match current behavior, or explicitly reject/recode -np.inf and test that path.

Performance

  • No findings.

Maintainability

  • No additional findings beyond the partial axis-E cleanup and hardcoded warning text above.

Tech Debt

  • No existing TODO.md entry mitigates the blocking ContinuousDiD silent-dose coercion, so it remains unmitigated.

Security

  • No findings.

Documentation/Tests

  • P3 The new excess-drop warning is only asserted through the private helper at tests/test_utils.py:857. Impact: public API warning regressions can slip through, which is exactly why the hardcoded check_parallel_trends message is currently unprotected. Concrete fix: add warning assertions for check_parallel_trends_robust() and equivalence_test_trends() on a panel with excess NaN first differences.
  • P3 There is no regression test for the unresolved first_treat == 0 and dose != 0 path in ContinuousDiD. Impact: whichever behavior you choose for that contract violation is currently unguarded. Concrete fix: add a test that injects positive doses into never-treated rows and asserts the chosen warn/error contract.

Path to Approval

  1. In diff_diff/continuous_did.py:281, stop silently normalizing first_treat == 0, dose != 0 rows. Either raise ValueError, or emit a row-count UserWarning consistent with the new axis-E surfaces and add a matching **Note:** in docs/methodology/REGISTRY.md; add regression tests for both the violating path and the clean path.

igerber added a commit that referenced this pull request Apr 19, 2026
Three findings from the CI AI review on PR #331:

P1 (newly identified): `ContinuousDiD.fit()` still silently rewrote
`dose` to 0.0 for rows with `first_treat==0` but nonzero dose, in the
same preprocessing block the PR had otherwise cleaned up. Counting
those rows and emitting a `UserWarning` with the row count before
the zeroing closes the axis-E slice for this estimator. REGISTRY
updated with a matching **Note:** and two regression tests added
(silent on clean data, warns with correct row count on polluted
data).

P2 (warning wording): The axis-E excess-drop warning in
`_compute_outcome_changes` hardcoded "check_parallel_trends" but the
function is called from `check_parallel_trends_robust` and
`equivalence_test_trends`, not from the simple `check_parallel_trends`.
Add a `caller_label` kwarg and thread the correct name from each
public entry point; regression tests cover both callers surfacing
the warning with their own name.

P2 (inf asymmetry): The inf-first-treat counter used `np.isinf`,
which counts both +inf and -inf, but the actual recode on the next
line (`.replace([np.inf, float("inf")], 0)`) only touches +inf. A
-inf input would have triggered a warning about "treating as
never-treated" even though no such recode happened. Switch to
`np.isposinf` so the warning count matches the recode set; add a
test confirming -inf does not fire the warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2d344b2b4800667c28b973542b3d81c2da6d3b45


Overall Assessment

⚠️ Needs changes.

The prior re-review findings on ContinuousDiD’s silent dose=0 coercion and the hardcoded diagnostic warning label are addressed. The remaining blocker is a new unmitigated ContinuousDiD contract gap: the PR now documents first_treat=-inf as rejected, but the estimator still does not validate or reject it and can silently exclude those units from both treated and control pools.

Executive Summary

  • The prior ContinuousDiD P1 is resolved: never-treated rows with nonzero dose now warn with a row count, and that deviation is documented in the registry.
  • The prior _compute_outcome_changes() warning-label issue is resolved: caller_label is threaded into both public wrappers, and wrapper-level warning tests were added.
  • P1 [Newly identified]: ContinuousDiD now says first_treat=-inf is rejected downstream, but no such validation exists; those units fall out of both treated and control masks silently.
  • The Wooldridge NaN cohort -> 0 warning path is now documented and covered in both fit() and _filter_sample().
  • Validation note: this was a static review only. I could not run the targeted tests here because pytest and pandas are unavailable in this environment.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not see an existing TODO.md entry that mitigates the ContinuousDiD -inf validation gap.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the new -inf regression only checks that the +inf recategorization warning does not fire; it does not assert the reject behavior that the new code comment and registry note now promise. That leaves the P1 contract unguarded. Concrete fix: replace the current try/except with an explicit pytest.raises(ValueError, match=...) (or whatever behavior is chosen) and keep a separate assertion that no +inf warning is emitted if needed. References: tests/test_continuous_did.py:L726-L754, docs/methodology/REGISTRY.md:L723-L723
  • Severity P3. Impact: test_silent_on_balanced_panel still filters only the old "check_parallel_trends dropped" string, so it would miss a regression if _compute_outcome_changes() began warning under its new generic label. Concrete fix: assert that no warnings are emitted in that helper call, or at least that no warning containing "dropped" appears. References: tests/test_utils.py:L845-L855

Path to Approval

  1. In ContinuousDiD.fit(), add an explicit post-normalization validation that rejects first_treat < 0 / -inf with a clear error message and affected row count.
  2. Update the ContinuousDiD note in REGISTRY.md so it matches the implemented behavior exactly.
  3. Replace the current -inf test with a regression that asserts the chosen reject behavior.

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2ace61d7286d27b36d673569a147fb9dd48372be


Overall Assessment

⚠️ Needs changes.

Executive Summary

  • The prior re-review blockers are resolved: ContinuousDiD now rejects negative first_treat, and _compute_outcome_changes() now threads caller-specific warning labels through both public wrappers.
  • Severity P1 [Newly identified]: ContinuousDiD.fit() still leaves NaN first_treat values untouched. Those units then satisfy neither the treated (g > 0) nor never-treated (g == 0) masks, so the estimation sample can shrink silently.
  • Severity P1 [Newly identified]: repo-wide pattern check shows StaggeredTripleDifference.fit() still silently normalizes first_treat=np.inf to 0, so the silent-coercion warning pattern is still incomplete across the remaining first_treat normalization sites.
  • The coercions this PR intended to surface are now properly documented in REGISTRY.md, so ContinuousDiD +inf -> 0, ContinuousDiD zeroing nonzero dose on first_treat=0, and Wooldridge NaN cohort -> 0 are P3 informational only.
  • Static review only: I syntax-checked the touched Python files, but I could not run pytest here because the local Python environment lacks pandas and the sandbox is read-only.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. Neither P1 is tracked under TODO.md:L51-L80, so neither is mitigated.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. In ContinuousDiD.fit(), add an explicit NaN first_treat validation before cohort construction and +inf normalization; include the affected row count in the error, or document-and-warn if coercion is intended instead.
  2. Add a regression in tests/test_continuous_did.py for a mixed valid/NaN first_treat panel that currently would silently shrink the sample.
  3. In StaggeredTripleDifference.fit(), surface the first_treat=np.inf -> 0 normalization with a UserWarning and convert the registry mention into a **Note:**-labeled entry if that input remains accepted.
  4. Update tests/test_staggered_triple_diff.py:L399-L406 to assert the chosen np.inf behavior instead of silent success only.

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d103eecaf3d493cfbaf8098e3ca7b461f4dde61d


Overall Assessment

✅ Looks good

Executive Summary

  • No unmitigated P0/P1 findings in the changed code.
  • The two prior re-review blockers appear resolved: ContinuousDiD.fit() now rejects NaN first_treat before treated/control masking, and StaggeredTripleDifference.fit() now surfaces first_treat=inf -> 0 recoding with a warning.
  • The touched coercions are now explicitly documented in REGISTRY.md, so they are P3-informational under the review rules rather than defects.
  • _compute_outcome_changes() now distinguishes expected first-period drops from excess NaN first-difference drops and threads caller-specific labels through both public wrappers.
  • WooldridgeDiD now centralizes the NaN cohort -> 0 warning across both fit() and _filter_sample().
  • Static review only: I could not execute tests in this sandbox because pytest, numpy, and pandas are not installed.

Methodology

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not identify a new deferrable item in the changed paths that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

Static-review note: test execution was not possible here because pytest, numpy, and pandas are unavailable in the sandbox.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026
igerber and others added 5 commits April 19, 2026 09:52
Three long-standing axis-E (silent data sanitization) sites now emit a
UserWarning with the affected row count before the coercion, so users
can no longer be silently shifted between the treated and control
sides of an estimator or have rows vanish from a diagnostic pass
without a signal.

- WooldridgeDiD: NaN cohort values were filled to 0 (never-treated)
  both in `_filter_sample` and in `fit()`. Both now warn with the
  NaN row count before the fillna (finding #24).
- ContinuousDiD: `first_treat=inf` was replaced with 0 silently.
  `fit()` now counts inf rows and warns before the replace, before
  any downstream drop-zero-dose / negative-dose validation
  (finding #26).
- `_compute_outcome_changes` (the `check_parallel_trends` diff
  helper) dropped NaN first-differences without reporting the count.
  It now distinguishes the expected first-period-per-unit drops
  from excess drops caused by gaps / NaN outcomes and warns with
  the breakdown when excess drops are detected (finding #27).

Finding #25 (TROP D-matrix coercion) was verified during scoping to
already be resolved in `trop_local.py:60-66` via `n_missing_structural`
+ returned `missing_mask`; no code change required.

REGISTRY updated under both WooldridgeDiD and ContinuousDiD to
document the new warning contract.

Covered by audit axis E (data sanitization). Findings #24, #26, #27
from docs/audits/silent-failures-findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- P2 (wooldridge): Extract shared `_warn_and_fill_nan_cohort(df, cohort,
  stacklevel)` helper used by both `_filter_sample` and `fit()`. Removes
  the copy-paste warning block that was flagged as a future drift risk.

- P2 (tests): Add `test_inf_first_treat_warning_counts_rows_not_units`
  on a 4-unit x 3-period panel. 2 units carry inf across all 3 periods
  (6 inf rows, 2 inf units) — the warning must report 6, not 2, because
  `.replace(inf, 0)` is row-level.

- P3 (utils wording): The `_compute_outcome_changes` excess-drop warning
  said "gaps or NaN outcomes" but the code actually counts all NaN
  first-differences. Rephrased to "additional NaN first-differences
  (e.g. NaN outcomes or unit-period gaps upstream)" so the message
  doesn't over-claim what the helper can detect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from the CI AI review on PR #331:

P1 (newly identified): `ContinuousDiD.fit()` still silently rewrote
`dose` to 0.0 for rows with `first_treat==0` but nonzero dose, in the
same preprocessing block the PR had otherwise cleaned up. Counting
those rows and emitting a `UserWarning` with the row count before
the zeroing closes the axis-E slice for this estimator. REGISTRY
updated with a matching **Note:** and two regression tests added
(silent on clean data, warns with correct row count on polluted
data).

P2 (warning wording): The axis-E excess-drop warning in
`_compute_outcome_changes` hardcoded "check_parallel_trends" but the
function is called from `check_parallel_trends_robust` and
`equivalence_test_trends`, not from the simple `check_parallel_trends`.
Add a `caller_label` kwarg and thread the correct name from each
public entry point; regression tests cover both callers surfacing
the warning with their own name.

P2 (inf asymmetry): The inf-first-treat counter used `np.isinf`,
which counts both +inf and -inf, but the actual recode on the next
line (`.replace([np.inf, float("inf")], 0)`) only touches +inf. A
-inf input would have triggered a warning about "treating as
never-treated" even though no such recode happened. Switch to
`np.isposinf` so the warning count matches the recode set; add a
test confirming -inf does not fire the warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI AI re-review flagged (P1) that the previous commit claimed "-inf
will be rejected by downstream validators" in both the code comment
and REGISTRY.md, but no such validator existed. After the `+inf -> 0`
normalization, `first_treat < 0` units fell out of both the treated
(g > 0) and never-treated (g == 0) masks, so the affected units were
silently excluded from the estimator — exactly the axis-E silent
failure the PR was closing.

- ContinuousDiD.fit() now validates `first_treat < 0` explicitly
  post-normalization and raises ValueError with the row count. -inf,
  -2, and any other negative value are all rejected.
- REGISTRY.md note rewritten to match the implemented behavior.
- Existing -inf test replaced with one that asserts
  `pytest.raises(ValueError)` matching the row-count message, plus
  a positive regression test confirming +inf warning stays silent
  on panels with only valid 0/positive `first_treat` values.
- tests/test_utils.py::test_silent_on_balanced_panel tightened: the
  balanced-panel silence assertion now filters on any warning
  containing "dropped", so a regression that changed the warning
  label would no longer hide a genuine drop signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erage

Two new P1 findings from CI AI re-review:

1. ContinuousDiD.fit() still accepted NaN `first_treat` values. NaN
   survives preprocessing but satisfies neither the treated (g > 0)
   nor never-treated (g == 0) mask, so affected units were silently
   excluded from both pools — same silent-failure shape as the
   already-rejected `first_treat < 0`. Reject NaN explicitly with
   ValueError + row count.

2. `StaggeredTripleDifference.fit()` silently recoded
   `first_treat=np.inf -> 0` via `.replace([np.inf, float("inf")], 0)`.
   Its sibling `staggered.py:1508-1519` already surfaces this with a
   UserWarning; this PR mirrors that contract so the estimator no
   longer shifts units between treated and never-treated pools
   without signaling. REGISTRY gets a matching **Note:** under the
   StaggeredTripleDifference section.

Regression tests:
- test_nan_first_treat_raises_with_row_count (ContinuousDiD) on a
  4-unit x 3-period panel with 3 NaN rows.
- test_inf_first_treat_works (StaggeredTripleDifference) upgraded
  from silent-success to `pytest.warns(UserWarning, match=row-count)`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the fix/axis-e-row-drop-counters branch from d103eec to 65d1bd8 Compare April 19, 2026 13:53
@igerber igerber merged commit 56730af into main Apr 19, 2026
19 checks passed
@igerber igerber deleted the fix/axis-e-row-drop-counters branch April 19, 2026 15:01
igerber added a commit that referenced this pull request Apr 20, 2026
…-failures audit

Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per
project SemVer convention, minor bumps are reserved for new estimators or new
module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults
(PR #318) add a new public API surface and drive this bump.

Headline work:
- PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner-
  ready output layer. Plain-English narrative summaries across all 16 result types,
  with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md.
- PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear
  regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the
  upcoming HeterogeneousAdoptionDiD estimator.
- PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A
  contract), heterogeneity + within-group-varying PSU under Binder TSL, and
  PSU-level Hall-Mammen wild bootstrap at cell granularity.
- PR #333 performance review - docs/performance-scenarios.md documents 5-7
  realistic practitioner workflows; benchmark harness extended.

Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339)
continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J.

CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default
CI after false-positive flakes; perf-review harness is the principled replacement.

Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml,
diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released:
2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the
comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers;
the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant