Skip to content

Phase 1a: Kernel infrastructure + HC2/Bell-McCaffrey variance#327

Merged
igerber merged 14 commits intomainfrom
did-no-untreated
Apr 19, 2026
Merged

Phase 1a: Kernel infrastructure + HC2/Bell-McCaffrey variance#327
igerber merged 14 commits intomainfrom
did-no-untreated

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 19, 2026

Summary

  • First of seven phased PRs implementing HeterogeneousAdoptionDiD (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026, arXiv:2405.04465v6).
  • New module diff_diff/local_linear.py: Epanechnikov / triangular / uniform kernels on [0, 1] with closed-form moment constants verified to 1e-12 vs numerical integration; univariate local-linear regression at a boundary via kernel-weighted OLS.
  • diff_diff/linalg.py: new vcov_type enum (classical / hc1 / hc2 / hc2_bm) with return_dof kwarg on compute_robust_vcov. HC2 one-way uses leverage-corrected meat with weighted-hat convention; HC2+Bell-McCaffrey one-way computes Imbens-Kolesar (2016) per-coefficient Satterthwaite DOF. CR2 Bell-McCaffrey cluster-robust uses symmetric matrix square root via eigendecomposition with Moore-Penrose pseudoinverse for singleton clusters and absorbed cluster fixed effects. Weighted cluster CR2 raises NotImplementedError (Phase 2+).
  • DifferenceInDifferences (and by inheritance MultiPeriodDiD, TwoWayFixedEffects) grows a vcov_type parameter with robust=True/False as backward-compat aliases. set_params re-validates the robust/vcov_type pair via the new resolve_vcov_type() helper. DiDResults.summary() prints a human-readable Variance: line.

Methodology references (required if estimator / math changes)

  • Method name(s): HeterogeneousAdoptionDiD (Phase 1a infrastructure only — the estimator itself lands in Phase 2).
  • Paper / source link(s):
    • de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026). Difference-in-Differences Estimators When No Unit Remains Untreated. arXiv:2405.04465v6.
    • Imbens & Kolesar (2016). Robust Standard Errors in Small Samples: Some Practical Advice. Review of Economics and Statistics, 98(4), 701-712.
    • Pustejovsky & Tipton (2018). Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models. Journal of Business & Economic Statistics, 36(4), 672-683.
    • Calonico, Cattaneo & Farrell (2018) — kernel moment conventions.
  • Any intentional deviations from the source (and why):
    • Committed benchmarks/data/clubsandwich_cr2_golden.json has "source": "python_self_reference" as a regression anchor until benchmarks/R/generate_clubsandwich_golden.R is run against R clubSandwich. Documented with a Note: in docs/methodology/REGISTRY.md and tracked in TODO.md.
    • Weighted CR2 Bell-McCaffrey is not implemented in this PR (raises NotImplementedError); the paper's applications don't use weighted cluster-robust and this is Phase 2+ per plan. Tracked in TODO.md.

Validation

  • Tests added/updated:
    • tests/test_local_linear.py (57 tests): kernel closed-form moments, local_linear_fit parity vs manual WLS, NaN/Inf input validation, error paths.
    • tests/test_linalg_hc2_bm.py (31 tests): HC2 hand-formula parity, Bell-McCaffrey DOF Satterthwaite derivation, CR2 adjustment matrix edge cases (singleton, rank-deficient, identity), CR2 parity harness with clubSandwich golden JSON, regression anchors for the unchanged HC1/CR1 paths.
    • tests/test_estimators_vcov_type.py (22 tests): robustvcov_type alias resolution, set_params conflict detection and re-derivation, MultiPeriodDiD / TwoWayFixedEffects inheritance, wild-bootstrap coexistence, summary variance-family label.
    • All existing tests/test_linalg.py (97) and tests/test_estimators.py (145) pass with no regressions.
  • Backtest / simulation / notebook evidence (if applicable): N/A — Phase 1a is infrastructure. Simulation coverage against Table 1 of the paper and Pierce-Schott Figure 2 replication are Phase 2 and Phase 4 criteria respectively.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Pre-merge state: Local AI review (round 2) → ✅ looks good, two P1 blockers from round 1 resolved, all P2 items addressed or documented. Paper review file at docs/methodology/papers/dechaisemartin-2026-review.md is gitignored (per the existing papers/ pattern at .gitignore:91) so it is not in the PR; the Phase 0 REGISTRY stub in this PR supersedes it as the authoritative reference.

🤖 Generated with Claude Code

igerber and others added 3 commits April 18, 2026 20:35
First of seven phased PRs implementing the HeterogeneousAdoptionDiD estimator
from de Chaisemartin, Ciccia, D'Haultfoeuille & Knau (2026, arXiv:2405.04465v6).
Ships the foundational RDD and small-sample variance infrastructure that
Phases 1b, 1c, 2, 3 all compose.

- diff_diff/local_linear.py (new): Epanechnikov, triangular, and uniform
  kernels on [0, 1] with closed-form moment constants matching numerical
  integration to 1e-12; univariate local-linear regression at a boundary
  via kernel-weighted OLS through solve_ols.
- diff_diff/linalg.py: new vcov_type enum (classical, hc1, hc2, hc2_bm)
  with return_dof kwarg on compute_robust_vcov. HC2 one-way uses
  leverage-corrected meat with weighted-hat convention; HC2+Bell-McCaffrey
  one-way computes the Imbens-Kolesar (2016) Satterthwaite DOF per
  coefficient. CR2 Bell-McCaffrey cluster-robust uses symmetric matrix
  square root via eigendecomposition with Moore-Penrose pseudoinverse for
  singleton clusters and absorbed cluster fixed effects. Weighted cluster
  CR2 raises NotImplementedError (Phase 2+). Rust backend guards skip
  non-hc1 paths.
- diff_diff/estimators.py: vcov_type threaded through DifferenceInDifferences
  (MultiPeriodDiD and TwoWayFixedEffects inherit via the base class).
  robust=True aliases vcov_type="hc1"; robust=False aliases "classical".
  Conflict detection at __init__. LinearRegression stores per-coefficient
  Bell-McCaffrey DOF and consumes it in get_inference.
- diff_diff/results.py: DiDResults gains vcov_type and cluster_name fields;
  summary() prints a human-readable Variance family line.
- benchmarks: R clubSandwich parity script plus JSON anchor
  (python_self_reference until R is run) for CR2 BM parity tests.
- Tests: three new focused suites (test_local_linear.py,
  test_linalg_hc2_bm.py, test_estimators_vcov_type.py, 104 new tests total).
  All 145 existing estimator tests plus 97 existing linalg tests pass with
  no regressions.
- Docs: REGISTRY.md HeterogeneousAdoptionDiD section with Phase 1a
  requirements checklist; ROADMAP.md entry updated with status line;
  TODO.md deferrals for weighted CR2, standalone-estimator threading,
  bread_inv perf kwarg, Rust HC2 backend, scores-based DOF.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- diff_diff/linalg.py: fix compute_robust_vcov docstring to reflect that
  vcov_type="hc2_bm" supports both one-way and CR2 cluster-robust paths
  (the earlier "queued as a follow-up" language was stale). Extract
  resolve_vcov_type(robust, vcov_type) as the single source of truth for
  alias resolution and conflict detection; DifferenceInDifferences and
  LinearRegression both consume it.
- diff_diff/estimators.py: DifferenceInDifferences.set_params re-validates
  the robust/vcov_type pair via resolve_vcov_type after mutation so invalid
  combinations (e.g. robust=False + vcov_type="hc2") raise instead of leaving
  the estimator in an inconsistent state.
- diff_diff/local_linear.py: local_linear_fit now validates d/y/weights for
  NaN and Inf at the API boundary, returning targeted ValueErrors rather
  than relying on downstream solve_ols failures. Removed a stale inline
  comment about missing solve_ols overload stubs (the stubs now include
  weights/weight_type).
- docs/methodology/REGISTRY.md: reframe the CR2 golden-JSON checkbox so it
  accurately reflects that the committed JSON is a python_self_reference
  stability anchor until the R script is run; authoritative clubSandwich
  regeneration is tracked in TODO.md.
- Tests: set_params conflict tests (robust=False + vcov_type="hc2" raises;
  robust=True restores hc1; invalid vcov_type rejected) and
  local_linear_fit NaN/Inf validation tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Class-level docstrings now fully describe the vcov_type enum (classical,
hc1, hc2, hc2_bm) on DifferenceInDifferences and MultiPeriodDiD, and
clarify that robust is a legacy alias. Renamed
test_set_params_rejects_conflict_on_robust_only to
test_set_params_robust_only_rederives_vcov_type so the name matches the
asserted behavior (robust-only mutation re-derives vcov_type from the
alias rather than raising).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes. Highest unmitigated severity is P1: the new vcov_type API is not propagated end-to-end to all estimators the PR says now support it, so users can request one variance method and silently receive another.

Executive Summary

  • The low-level HC2 / Bell-McCaffrey implementation in linalg.py looks plausible, but the new vcov_type option is not actually wired through all inherited estimators.
  • MultiPeriodDiD accepts vcov_type, yet its analytical fit path still uses the legacy solve_ols() default and a single generic df, so classical, hc2, and hc2_bm requests do not produce the advertised inference.
  • TwoWayFixedEffects also exposes the new option via inheritance, but its fit path hardcodes robust inference and never forwards vcov_type.
  • summary() can mislabel wild-bootstrap output with an analytical variance family, and the new tests do not exercise the affected MultiPeriodDiD, TwoWayFixedEffects, or clustered bootstrap branches.

Methodology

The new low-level math is directionally aligned with the cited source material: the HAD paper uses local-linear boundary regression plus Calonico-style bias-corrected inference for the nonparametric branch, and for the small-G TWFE application it explicitly switches to HC2 with Bell-McCaffrey/Imbens-Kolesar corrections; Imbens and Kolesar also recommend BM intervals routinely in small and moderate samples. (arxiv.org)

  • Severity: P1 [Newly identified]. Impact: MultiPeriodDiD now claims inherited vcov_type support in the registry, but its analytical path still calls solve_ols() without vcov_type at diff_diff/estimators.py:L1296, only recomputes homoskedastic SEs when not self.robust at diff_diff/estimators.py:L1428, and then applies one shared df to every period effect and the post-period average at diff_diff/estimators.py:L1456 and diff_diff/estimators.py:L1500. Explicit vcov_type="classical" is therefore ignored unless the caller also sets robust=False, and hc2 / hc2_bm silently fall back to legacy HC1-style inference instead of the requested variance/DOF method. The completeness claim is at docs/methodology/REGISTRY.md:L2287. Concrete fix: route the analytical fit through LinearRegression(..., vcov_type=self.vcov_type) or extend solve_ols() to return BM DOF; key off self.vcov_type == "classical" rather than self.robust; and either compute the correct contrast-specific DOF for the averaged post-treatment effect or reject vcov_type="hc2_bm" for that summary until it is implemented.
  • Severity: P1 [Newly identified]. Impact: TwoWayFixedEffects inherits the new public parameter, but the fit path hardcodes robust=True and never forwards vcov_type to LinearRegression at diff_diff/twfe.py:L217 and diff_diff/twfe.py:L231. Analytical TWFE inference therefore stays on the legacy robust/unit-cluster path regardless of the requested variance family, and the results object created at diff_diff/twfe.py:L365 does not record the requested variance metadata either. Concrete fix: pass robust=self.robust and vcov_type=self.vcov_type through the TWFE analytical path, then explicitly reject or document classical / hc2 when TWFE’s default unit clustering remains in force.

Code Quality

  • Severity: P2. Impact: DiDResults.summary() always prints a Variance: label when vcov_type is set at diff_diff/results.py:L195, but the wild-bootstrap path overwrites the reported SE/p-value/CI with bootstrap inference at diff_diff/estimators.py:L463 while storing a default CR1 analytical vcov at diff_diff/estimators.py:L601. A bootstrap run can therefore report bootstrap inference while simultaneously labeling it as "CR2 Bell-McCaffrey" or "Classical". Concrete fix: suppress the variance-family line whenever inference_method != "analytical", or relabel it as auxiliary analytical metadata rather than the source of the displayed inference.

Performance

Maintainability

  • No separate findings beyond the propagation issues above.

Tech Debt

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the new coverage only checks MultiPeriodDiD / TwoWayFixedEffects inheritance through get_params() at tests/test_estimators_vcov_type.py:L133 and tests/test_estimators_vcov_type.py:L138; it never fits either estimator with non-default vcov_type. The “wild bootstrap coexistence” test at tests/test_estimators_vcov_type.py:L224 also omits cluster=, so it never enters the bootstrap branch that mislabels Variance:. Concrete fix: add end-to-end fit tests for MultiPeriodDiD and TwoWayFixedEffects with vcov_type="classical" and vcov_type="hc2_bm", plus a real clustered bootstrap test that asserts the reported inference source and summary label stay consistent.

Path to Approval

  1. Make MultiPeriodDiD honor vcov_type end-to-end, including correct Bell-McCaffrey DOF handling for period-specific effects and either a correct contrast-specific DOF or an explicit rejection for the post-period average.
  2. Make TwoWayFixedEffects forward robust / vcov_type into LinearRegression, and add explicit validation for variance types that are incompatible with its default unit-clustering behavior.
  3. Add end-to-end tests that fit MultiPeriodDiD and TwoWayFixedEffects under non-default vcov_type settings and exercise the actual clustered wild-bootstrap branch.

CI review caught that Phase 1a wired vcov_type into DifferenceInDifferences
__init__/get_params but not into the overridden fit() paths on
MultiPeriodDiD and TwoWayFixedEffects, so `vcov_type="hc2_bm"` on either
silently produced HC1 inference. Summary output also mislabeled
wild-bootstrap inference with the analytical variance family.

- diff_diff/estimators.py MultiPeriodDiD.fit: pass vcov_type=self.vcov_type
  into the analytical solve_ols call; remove the `not self.robust`
  homoskedastic fallback (subsumed by compute_robust_vcov's classical
  branch). When vcov_type="hc2_bm" and no survey design, compute
  Bell-McCaffrey Satterthwaite DOF via _compute_bm_dof_from_contrasts for
  both per-coefficient period effects AND the post-period-average
  contrast; fall back to the shared analytical df otherwise. Store
  vcov_type and cluster_name on MultiPeriodDiDResults.
- diff_diff/twfe.py: forward self.robust and self.vcov_type into the two
  LinearRegression instantiations; store vcov_type and the TWFE auto-
  cluster label (or explicit self.cluster) on DiDResults.
- diff_diff/linalg.py: split _compute_bm_dof_oneway into a contrast-aware
  helper _compute_bm_dof_from_contrasts(X, bread, h_diag, contrasts) so
  MultiPeriodDiD can request BM DOF for the avg_att linear combination.
  The per-coefficient wrapper now delegates to the shared helper with
  contrasts=I_k.
- diff_diff/results.py DiDResults.summary and MultiPeriodDiDResults:
  gate the Variance family label on inference_method == "analytical" so
  wild-bootstrap output is no longer mislabeled; add vcov_type,
  cluster_name, inference_method, n_bootstrap, n_clusters fields to
  MultiPeriodDiDResults for symmetry with DiDResults and to drive the
  summary label.
- tests/test_estimators_vcov_type.py: add five end-to-end tests exercising
  the previously-untested paths - MultiPeriodDiD classical vs hc1 SE
  differ; MultiPeriodDiD hc2_bm CI is finite; TWFE hc1 vs hc2_bm SE differ
  (CR1 vs CR2); TWFE records the unit auto-cluster label in summary;
  wild-bootstrap with cluster suppresses the Variance line.

All 209 Phase 1a suites plus 145 estimator regression tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d907eca4b034c0aac4fb4ca97cc7c56f7d32c3e9


Overall Assessment

Blocker. Highest unmitigated severity is P0: unsupported cluster/vcov_type combinations still reach user-facing fit paths and can silently produce wrong or undocumented inference.

Executive Summary

  • The prior re-review issues around TwoWayFixedEffects propagation, unclustered MultiPeriodDiD propagation, and bootstrap summary() labeling look resolved in this diff (diff_diff/twfe.py:L217-L244, diff_diff/estimators.py:L1296-L1307, diff_diff/results.py:L195-L206).
  • P0: estimator-facing OLS paths bypass the public compute_robust_vcov() validation. Unsupported requests such as cluster + vcov_type="classical", cluster + vcov_type="hc2", and cluster + weights + vcov_type="hc2_bm" do not reliably raise and can silently return invalid inference.
  • P1: MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") still uses one-way Bell-McCaffrey DOF for period-level and average-post inference instead of CR2 cluster DOF.
  • The self-reference clubSandwich golden JSON and the follow-up work on standalone estimators are properly documented in REGISTRY/TODO and are not blockers by themselves.
  • I did not run the suite in this container because the available Python environment is missing numpy; the review below is from source inspection.

Methodology

Code Quality

  • No separate unmitigated findings beyond the validation split already captured under Methodology.

Performance

  • No unmitigated findings. The main new costs are already tracked in TODO.md:L91-L93.

Maintainability

  • No separate unmitigated findings beyond the duplicated public-vs-internal vcov validation logic already noted above.

Tech Debt

  • Severity: P3 informational. Impact: the committed clubSandwich JSON is explicitly marked as a self-reference anchor, and the remaining standalone-estimator vcov_type threading is tracked as deferred work (docs/methodology/REGISTRY.md:L2288, TODO.md:L80-L82, benchmarks/data/clubsandwich_cr2_golden.json:L476-L478). That tracking is acceptable. Concrete fix: none for approval. The only caveat is that the weighted-CR2 deferral still needs to be enforced in code, which is why it remains a blocker under Methodology.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Make the vcov_type/cluster validation run inside solve_ols() / _compute_robust_vcov_numpy() so unsupported combinations raise everywhere, not just through compute_robust_vcov().
  2. For MultiPeriodDiD, either implement cluster-aware CR2 Bell-McCaffrey DOF for each interaction coefficient and the average-post contrast, or explicitly reject cluster + vcov_type="hc2_bm" until that DOF path exists.
  3. Add regression tests covering the invalid estimator paths and the clustered MultiPeriodDiD(hc2_bm) behavior.

…2_bm

CI re-review flagged two unmitigated issues on top of the Phase 1a diff.

P0 - validation bypass: the `vcov_type`/`cluster`/`weights` raise logic lived
only in the public `compute_robust_vcov()` wrapper. `solve_ols` and
`_solve_ols_numpy` called `_compute_robust_vcov_numpy` directly and reached
the dispatch table unvalidated, so `cluster + classical`, `cluster + hc2`,
and `cluster + weights + hc2_bm` silently produced one-way SEs or a hybrid
weighted-CR2 result instead of raising. Extract the checks into a shared
`_validate_vcov_args()` helper and call it from both entry points so the
raise is universal.

P1 - MultiPeriodDiD cluster + hc2_bm: when `cluster_ids` is set, vcov comes
from `_compute_cr2_bm` (CR2 cluster-robust) but the new per-period and
post-average DOF block was still using `_compute_bm_dof_from_contrasts`,
which builds the one-way residual-maker and ignores clusters. Pairing CR2
SEs with one-way BM DOF is a broken hybrid. A contrast-aware CR2 BM DOF
helper is real work (Pustejovsky-Tipton per-cluster adjustment matrices
applied to arbitrary contrast vectors) and not in Phase 1a scope. For now,
reject the combination in `MultiPeriodDiD.fit` with a clear error that
points to the workarounds: drop `cluster` for one-way HC2+BM, or drop
`vcov_type="hc2_bm"` for CR1 (Liang-Zeger) cluster-robust. Track the full
implementation as follow-up.

Tests: four new negative-path tests exercising the previously-bypassed
paths (solve_ols rejects cluster+classical, cluster+hc2,
cluster+weights+hc2_bm; LinearRegression rejects cluster+hc2), plus a
MultiPeriodDiD cluster+hc2_bm rejection test.

All 299 Phase 1a + estimator regression tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7ba6d5838e816be12693634ae6a0a730ced1b3a7


Overall Assessment

⚠️ Needs changes. The prior numerical blockers look resolved, but the highest remaining unmitigated issue is P1: MultiPeriodDiD now intentionally rejects clustered hc2_bm, while the public methodology/API docs still say that combination is supported and there is no matching REGISTRY.md Note/Deviation documenting the limitation.

Executive Summary

Methodology

  • Severity: P1. Impact: The affected method is the new HC2/Bell-McCaffrey variance surface as exposed through MultiPeriodDiD. The class docstring says cluster + vcov_type="hc2_bm" dispatches to CR2 Bell-McCaffrey, but fit() now intentionally raises because the clustered post-average contrast DOF is not implemented. Failing fast is the correct numerical choice, but under the review policy this is still an undocumented mismatch until it is called out in the Methodology Registry with a **Note:**/deviation entry. diff_diff/estimators.py:L856-L870 diff_diff/estimators.py:L1294-L1312 docs/methodology/REGISTRY.md:L2283-L2288
    Concrete fix: Either implement clustered contrast-specific BM DOF for the post-period average, or add an explicit **Note:** / deviation note in REGISTRY.md and the MultiPeriodDiD docstring stating that clustered hc2_bm is currently unsupported and intentionally raises. If the implementation is deferred, add a matching TODO.md row.

  • Severity: P2. Impact: Survey fits use survey vcov, not the analytical vcov_type, but both summary methods still print the analytical variance-family label whenever inference_method == "analytical". That can tell users they got “CR2 Bell-McCaffrey” or “HC1” when the reported SEs actually came from survey Taylor linearization or replicate weights. diff_diff/linalg.py:L2307-L2360 diff_diff/linalg.py:L2475-L2524 diff_diff/results.py:L182-L206 diff_diff/results.py:L501-L515
    Concrete fix: Gate the Variance: line on the actual inference source, not just inference_method; suppress it when survey_metadata is present or replace it with a survey-specific label.

Code Quality

No separate unmitigated findings beyond the methodology/reporting issues above.

Performance

No unmitigated findings. The main new costs (bread_inv reuse and the naive O(n^2 k) CR2 DOF loop) are already tracked as deferred work.

Maintainability

No separate unmitigated findings beyond the public-contract mismatch already noted.

Tech Debt

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: The new tests cover the explicit MultiPeriodDiD(cluster, vcov_type="hc2_bm") rejection and bootstrap-summary suppression, but they do not exercise survey-fit summaries. That leaves the new survey-label mismatch unguarded. tests/test_estimators_vcov_type.py:L254-L273 tests/test_estimators_vcov_type.py:L352-L382
    Concrete fix: Add end-to-end summary tests for Taylor-linearized and replicate-weight survey fits asserting that the analytical Variance: line is suppressed or replaced with a survey-specific label.

Path to Approval

  1. Document the unsupported MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") combination as an explicit **Note:** / deviation in docs/methodology/REGISTRY.md and in the MultiPeriodDiD docstring, or implement clustered contrast-specific BM DOF for the post-period average.
  2. Fix the new summary labeling so survey-based fits do not print analytical vcov_type labels, and add regression tests for survey summaries in both the Taylor-linearization and replicate-weight paths.

…ov label

Addresses CI AI review on PR #327 (head 7ba6d58):

- P1: MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") now has a matching
  Note in docs/methodology/REGISTRY.md (both under MultiPeriodDiD standard-errors
  block and under the HeterogeneousAdoptionDiD Phase 1a requirements checklist)
  plus an explicit call-out in the MultiPeriodDiD docstring. Also clarifies
  the DifferenceInDifferences docstring that the limitation only applies to
  the multi-period subclass, since the scalar-coefficient base class handles
  cluster + CR2 Bell-McCaffrey correctly.

- P2: DiDResults.summary() and MultiPeriodDiDResults.summary() now suppress
  the analytical "Variance:" line when survey_metadata is present. Survey
  fits use Taylor linearization or replicate-weight variance, not the
  analytical HC/CR sandwich, so printing "HC1"/"CR2 Bell-McCaffrey" alongside
  survey-produced SEs was misleading. The survey design block already
  surfaces the actual inference source (weight type, strata/PSU counts,
  replicate method), so dropping the parallel label is the cleanest fix.

- P2-Tests: Four new tests in TestSummarySurveyLabeling pin the survey-fit
  suppression in both the Taylor-linearization path (SurveyDesign with
  PSU/strata) and the replicate-weight path (BRR), for both DiDResults and
  MultiPeriodDiDResults. A regression guard asserts the non-survey path
  still prints the analytical label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: dca1fce4a704f0c8c7bb91aeaeaa65591bcdcf0d


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The previous re-review blocker around MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") being undocumented is resolved: the limitation is now called out in both the class docstring and REGISTRY.md.
  • The earlier survey-summary labeling issue is also resolved: survey-backed summaries now suppress the analytical Variance: line, and the new tests cover both Taylor-linearized and replicate-weight paths.
  • P1: TwoWayFixedEffects still auto-clusters at unit level, so the inherited one-way variance families (robust=False/vcov_type="classical" and vcov_type="hc2") are not actually usable on the default non-survey TWFE path even though the new registry/API surface says vcov_type is threaded through TWFE.
  • P1: the new weighted one-way hc2_bm DOF path is internally inconsistent with the module’s own WLS transform, so weighted LinearRegression(..., vcov_type="hc2_bm") / compute_robust_vcov(..., return_dof=True) can emit wrong p-values/CIs with no warning.
  • The documented deferrals for weighted clustered CR2 and the self-reference clubSandwich JSON are properly tracked, so they are informational only.

Methodology

  • Severity: P1. Impact: TwoWayFixedEffects now inherits the new vcov_type surface, and the registry says the inheritance chain includes TWFE, but fit() still forces unit-level clustering when cluster=None. That means the low-level validator immediately rejects the inherited one-way families (classical, hc2) on the default non-survey TWFE path, so robust=False / vcov_type="classical" and vcov_type="hc2" are exposed but not actually usable. The new tests only check that TWFE can be constructed with vcov_type="hc2" and only fit hc1/hc2_bm, so this interaction slipped through. Concrete fix: either make TwoWayFixedEffects drop auto-clustering when the user explicitly requests a one-way family, or reject those combinations explicitly in the TWFE layer and document that deviation in the TWFE docs and REGISTRY.md; add fit-level regression tests for robust=False, vcov_type="classical", and vcov_type="hc2". diff_diff/twfe.py:L141 diff_diff/twfe.py:L217 diff_diff/linalg.py:L961 docs/methodology/REGISTRY.md:L2297 tests/test_estimators_vcov_type.py:L139 tests/test_estimators_vcov_type.py:L305

  • Severity: P1. Impact: the new weighted one-way Bell-McCaffrey DOF path is internally inconsistent with the solver’s own WLS definition. solve_ols() defines weighted estimation on the transformed design X* = sqrt(w) X, y* = sqrt(w) y, but _compute_bm_dof_from_contrasts() still builds q from the unscaled X and builds H as X (X'WX)^{-1} X' W, not from the same transformed design. LinearRegression.fit() then feeds that DOF vector directly into inference whenever weights is not None, cluster_ids is None, and vcov_type="hc2_bm". So the new weighted hc2_bm surface can silently report wrong small-sample p-values/CIs. Concrete fix: either re-derive the BM DOF helper on the transformed WLS design and validate it with dedicated weighted parity tests, or temporarily raise NotImplementedError for weights is not None and vcov_type="hc2_bm" until that derivation is implemented. diff_diff/linalg.py:L583 diff_diff/linalg.py:L594 diff_diff/linalg.py:L1424 diff_diff/linalg.py:L1427 diff_diff/linalg.py:L2367 TODO.md:L81 tests/test_linalg_hc2_bm.py:L560

Code Quality

No separate unmitigated findings.

Performance

No separate unmitigated findings. The new CR2/BM performance debt is already tracked in TODO.md, so it is non-blocking.

Maintainability

  • Severity: P2. Impact: set_params() mutates attributes before it validates the robust/vcov_type pair, so a failing call can leave the estimator in exactly the inconsistent state the new docstring says it avoids. That will not usually produce silent wrong numbers, but it does make downstream failures harder to reason about if caller code catches the ValueError. Concrete fix: validate on local variables before mutating self, or roll back on exception; add a regression test that a rejected set_params() call leaves the estimator unchanged. diff_diff/estimators.py:L799 diff_diff/estimators.py:L816

Tech Debt

  • Severity: P3 informational. Impact: the remaining documented deferrals are properly tracked: weighted clustered CR2 Bell-McCaffrey still raises NotImplementedError, and the committed clubSandwich golden JSON is still a self-reference anchor pending an authoritative R regeneration. Those are explicitly noted in TODO.md / REGISTRY.md, so they are not blockers. Concrete fix: none for approval. TODO.md:L81 TODO.md:L82 docs/methodology/REGISTRY.md:L2296 docs/methodology/REGISTRY.md:L2299

Security

No findings.

Documentation/Tests

No separate unmitigated findings beyond the issues above. The prior documentation blocker for MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") is fixed, and the new survey-summary tests cover the earlier labeling problem. I could not execute the test suite in this sandbox because the Python environment is missing numpy, so this review is based on static inspection.

Path to Approval

  1. Make the TWFE vcov_type contract real: either allow classical / one-way hc2 by bypassing auto unit clustering when those families are requested, or reject them explicitly in TwoWayFixedEffects and document that limitation in the TWFE docs and REGISTRY.md. Add fit-level regression tests for robust=False, vcov_type="classical", and vcov_type="hc2" on TWFE.
  2. Fix weighted one-way hc2_bm inference: either derive the BM DOF on the same transformed WLS design used by solve_ols() and add weighted parity tests, or raise NotImplementedError for any weighted one-way hc2_bm fit until that derivation is in place.

…_params

Addresses CI AI review on PR #327 head dca1fce:

P1 — TWFE auto-cluster vs. one-way vcov families:
  TwoWayFixedEffects.fit() forced cluster_var = unit when self.cluster was
  None, which collided with the validator: vcov_type in {"classical", "hc2"}
  is one-way-only and raises ValueError when cluster_ids is non-None. The
  inheritance surface advertised those families as usable on TWFE but they
  were not. Fix: when cluster is None AND vcov_type is a one-way family,
  drop the auto-cluster. The explicit choice of a one-way family wins over
  the TWFE default. cluster_name in DiDResults is now None on that path so
  summary() labels the one-way family (not "CR1 cluster-robust at unit").
  Docstring, REGISTRY would be the next doc pass if we ever hit a follow-up
  edge, but the TWFE docstring already documents the exception.

P1 — Weighted one-way hc2_bm silent math mismatch:
  _compute_bm_dof_from_contrasts builds its hat matrix from the unscaled
  design as X (X'WX)^{-1} X' W, but solve_ols solves weighted regression by
  transforming to X* = sqrt(w) X, y* = sqrt(w) y. The symmetric-idempotent
  residual maker M* = I - H* with H* = sqrt(W) X (X'WX)^{-1} X' sqrt(W) is
  the correct one for the Satterthwaite (trG)^2 / tr(G^2) ratio; the
  asymmetric X (X'WX)^{-1} X' W is neither transformed nor original-scale.
  Rather than ship silently-inconsistent small-sample p-values, extend the
  existing weighted-cluster CR2 deferral to cover weighted one-way as well:
  _validate_vcov_args now raises NotImplementedError for
  vcov_type="hc2_bm" + weights (with OR without cluster). Tracked in
  TODO.md under Methodology/Correctness (rederive on transformed design +
  add weighted parity tests).

P2 — set_params atomic validation:
  Previously set_params applied all setattr mutations BEFORE re-validating
  the robust/vcov_type pair. A failing call left the estimator in the
  half-configured state the alias/conflict check is designed to prevent,
  defeating callers that catch ValueError and keep using the object. Fix:
  validate unknown-key rejection + resolve_vcov_type on locals first, then
  apply mutations atomically.

Tests:
  - TestFitBehavior.test_twfe_honors_classical_without_autocluster +
    test_twfe_honors_robust_false_without_autocluster +
    test_twfe_honors_hc2_one_way: all three one-way entry points now
    succeed on TWFE (and cluster_name is None).
  - TestFitBehavior.test_twfe_explicit_cluster_still_clusters_under_hc2_bm:
    regression guard that explicit cluster= keeps the auto-bypass off.
  - TestHC2BMCluster.test_hc2_bm_weighted_one_way_not_implemented: locks
    the NotImplementedError at both public and internal entry points.
  - TestParamsRoundTrip.test_set_params_conflict_leaves_estimator_unchanged
    + test_set_params_unknown_key_leaves_estimator_unchanged: atomicity
    regression guards.

All 133 Phase 1a tests pass; 405 tests across estimators / survey / Phase
1a neighbours pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6836836778786a4a7c23621f0016e9adfe76ece9


Overall Assessment
⛔ Blocker

Executive Summary

  • The previous re-review blockers around weighted one-way hc2_bm, set_params() atomicity, survey summary labeling, and TWFE’s unusable classical/hc2 analytical paths were addressed at the API surface.
  • P0: the new HC2 / Bell-McCaffrey leverage corrections are computed on the residualized FE design, not the full fixed-effects projection. That makes the new vcov_type="hc2" / "hc2_bm" output methodologically wrong on TwoWayFixedEffects and on any absorb= fit that reaches the new leverage-based code.
  • P1: TwoWayFixedEffects(inference="wild_bootstrap", vcov_type="classical"|"hc2", cluster=None) now disables the estimator’s default unit-level clustering before the bootstrap branch, so a previously supported auto-clustered bootstrap fit now errors instead of bootstrapping.
  • The remaining weighted-hc2_bm deferrals and the self-reference clubSandwich golden JSON are properly documented in TODO.md / REGISTRY.md, so they are informational only.
  • I could not execute the test suite in this sandbox because the Python environment is missing numpy; this review is based on static inspection.

Methodology

Code Quality

  • No separate unmitigated findings beyond the methodology issues above.

Performance

  • No separate unmitigated findings.

Maintainability

  • No separate unmitigated findings.

Tech Debt

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the new tests never validate HC2 / HC2+BM on absorbed-FE regressions against a methodology reference. The low-level linalg tests are plain-OLS only, existing TWFE reference tests are still HC1-only, and the new TWFE vcov_type tests only assert that SEs differ from HC1. That gap is why the absorbed-FE leverage bug above slipped through. Concrete fix: add parity tests for TwoWayFixedEffects(vcov_type="hc2") and vcov_type="hc2_bm" plus DifferenceInDifferences(absorb=...), using a full-dummy regression or R fixest / clubSandwich reference. Refs: tests/test_linalg_hc2_bm.py:58, tests/test_methodology_twfe.py:426, tests/test_estimators_vcov_type.py:350.

Path to Approval

  1. Block or correctly re-derive HC2 / HC2+Bell-McCaffrey on absorbed-FE fits. The safe short-term fix is to raise on TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"}) and on DifferenceInDifferences / MultiPeriodDiD with absorb= plus those vcov_type values.
  2. Restore TWFE’s no-cluster wild-bootstrap behavior for one-way vcov_type values, or reject that combination explicitly before entering the bootstrap path.
  3. Add absorbed-FE parity tests for the new HC2 / HC2+BM surfaces and regression tests for TwoWayFixedEffects(inference="wild_bootstrap", vcov_type="classical"|"hc2", cluster=None).

…luster

Addresses CI AI review on PR #327 head 6836836 (⛔ Blocker).

P0 — HC2/CR2-BM applied to demeaned design produces wrong hat matrix:
  TWFE unconditionally demeans via within-transformation, and both
  DifferenceInDifferences(absorb=...) and MultiPeriodDiD(absorb=...) do the
  same before solving OLS on the reduced design. The HC2 leverage
  correction `h_ii = x_i' (X'X)^{-1} x_i` and the CR2 Bell-McCaffrey block
  adjustment `A_g = (I - H_gg)^{-1/2}` both depend on the FULL FE hat
  matrix, not the residualized one. FWL preserves coefficients and
  residuals but not the hat matrix, so applying HC2/CR2-BM to the demeaned
  regressors silently mis-states small-sample SEs and Satterthwaite DOF.

  Short-term fix: raise NotImplementedError in three places —
    - TwoWayFixedEffects.fit() unconditionally for vcov_type in {hc2, hc2_bm}
    - DifferenceInDifferences.fit() with absorb= and vcov_type in {hc2, hc2_bm}
    - MultiPeriodDiD.fit() with absorb= and vcov_type in {hc2, hc2_bm}

  HC1 and CR1 are unaffected (no leverage term; meat uses only the
  residuals, which FWL preserves). Workarounds documented in the error
  message: use vcov_type='hc1' with absorb=/TWFE, or switch to
  fixed_effects= dummies for a full-dummy design where the hat matrix is
  computed on the full projection. Lifting the guard requires computing
  HC2/CR2-BM from the full absorbed projection and validating against a
  full-dummy OLS or fixest/clubSandwich reference. Tracked in TODO.md.
  REGISTRY.md gets a matching Note under the Phase 1a checklist.

P1 — TWFE wild_bootstrap + one-way family dropped the auto-cluster:
  The prior commit's one-way-family auto-cluster bypass in TWFE
  (classical/hc2, cluster=None → cluster_var=None) applied even when
  inference="wild_bootstrap". That silently dropped the unit cluster the
  bootstrap path needed to resample residuals. Fix: gate the bypass on
  inference=="analytical", so wild-bootstrap fits keep the unit
  auto-cluster. Since hc2/hc2_bm now raise earlier, only "classical"
  reaches the bypass branch; cleaned up accordingly.

Tests:
  - test_twfe_rejects_hc2_and_hc2_bm: both combinations raise with the
    expected message.
  - test_did_absorb_rejects_hc2_and_hc2_bm: absorb= + hc2/hc2_bm rejected.
  - test_did_fixed_effects_dummies_still_accept_hc2_and_hc2_bm: dummy
    expansion path is unaffected (regression guard).
  - test_multi_period_absorb_rejects_hc2_and_hc2_bm: MultiPeriodDiD
    absorb= + hc2/hc2_bm rejected.
  - test_twfe_wild_bootstrap_preserves_auto_cluster: classical +
    wild_bootstrap + cluster=None keeps the unit auto-cluster
    (n_clusters == n_units).

Removed/replaced: test_twfe_fit_honors_vcov_type (tested HC2+BM on TWFE),
test_twfe_honors_hc2_one_way, test_twfe_explicit_cluster_still_clusters_under_hc2_bm
— those paths now raise, so their replacements are the negative-path tests.

All 135 Phase 1a tests pass; 448 tests across estimators / survey / TWFE
methodology / Phase 1a neighbours pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e113549a3f1bd4f8367221f5b9a2f6a36660ac79


Overall Assessment

✅ Looks good

Executive Summary

  • The prior re-review blockers appear resolved. HC2 / HC2+Bell-McCaffrey are now explicitly rejected on within-transformed FE fits in [diff_diff/estimators.py:303](/home/runner/work/diff-diff/diff-diff/diff_diff/estimators.py#L303), [diff_diff/estimators.py:1219](/home/runner/work/diff-diff/diff-diff/diff_diff/estimators.py#L1219), and [diff_diff/twfe.py:117](/home/runner/work/diff-diff/diff-diff/diff_diff/twfe.py#L117), matching the new registry note at [docs/methodology/REGISTRY.md:2296](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L2296).
  • TWFE’s classical analytical path now drops only the auto-cluster, while the wild-bootstrap path preserves unit clustering for resampling in [diff_diff/twfe.py:190](/home/runner/work/diff-diff/diff-diff/diff_diff/twfe.py#L190) and [diff_diff/twfe.py:420](/home/runner/work/diff-diff/diff-diff/diff_diff/twfe.py#L420), with regression coverage in [tests/test_estimators_vcov_type.py:428](/home/runner/work/diff-diff/diff-diff/tests/test_estimators_vcov_type.py#L428).
  • vcov_type is threaded consistently through the DifferenceInDifferences inheritance chain and into results summaries, and the new inference paths continue to use safe_inference() rather than inline t-stat logic. See [diff_diff/estimators.py:139](/home/runner/work/diff-diff/diff-diff/diff_diff/estimators.py#L139), [diff_diff/linalg.py:1059](/home/runner/work/diff-diff/diff-diff/diff_diff/linalg.py#L1059), and [diff_diff/results.py:195](/home/runner/work/diff-diff/diff-diff/diff_diff/results.py#L195).
  • Remaining weighted-hc2_bm limitations and the self-reference clubSandwich golden are fail-fast or explicitly documented/tracked, so they are informational rather than blockers. See [diff_diff/linalg.py:974](/home/runner/work/diff-diff/diff-diff/diff_diff/linalg.py#L974), [TODO.md:81](/home/runner/work/diff-diff/diff-diff/TODO.md#L81), and [benchmarks/data/clubsandwich_cr2_golden.json:477](/home/runner/work/diff-diff/diff-diff/benchmarks/data/clubsandwich_cr2_golden.json#L477).
  • Static review only: I could not execute the suite in this sandbox because numpy and pytest are unavailable.

Methodology

  • No unmitigated P0/P1 findings. The absorbed-FE HC2/BM mismatch from the prior review is now blocked instead of silently miscomputed, and MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") now fails fast rather than mixing CR2 SEs with one-way BM DOF. See [diff_diff/estimators.py:1350](/home/runner/work/diff-diff/diff-diff/diff_diff/estimators.py#L1350), [docs/methodology/REGISTRY.md:166](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L166), and [tests/test_estimators_vcov_type.py:300](/home/runner/work/diff-diff/diff-diff/tests/test_estimators_vcov_type.py#L300).
  • Severity: P3. Impact: the committed CR2 golden remains a self-reference regression anchor, so it stabilizes implementation changes but does not yet provide external clubSandwich parity. This is explicitly documented in the registry/TODO and is non-blocking. Concrete fix: none for approval; regenerate the JSON with the committed R script in the tracked follow-up. Refs: [benchmarks/data/clubsandwich_cr2_golden.json:477](/home/runner/work/diff-diff/diff-diff/benchmarks/data/clubsandwich_cr2_golden.json#L477), [benchmarks/R/generate_clubsandwich_golden.R:1](/home/runner/work/diff-diff/diff-diff/benchmarks/R/generate_clubsandwich_golden.R#L1), [docs/methodology/REGISTRY.md:2300](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L2300), [TODO.md:84](/home/runner/work/diff-diff/diff-diff/TODO.md#L84).

Code Quality

  • No unmitigated findings.

Performance

  • No unmitigated findings.

Maintainability

  • No unmitigated findings.

Tech Debt

  • Severity: P3. Impact: weighted one-way hc2_bm, weighted clustered CR2/BM, and the remaining standalone-estimator vcov_type threading are still deferred, but they now fail fast where needed and are explicitly tracked. Concrete fix: none for approval; keep the queued follow-ups open. Refs: [diff_diff/linalg.py:974](/home/runner/work/diff-diff/diff-diff/diff_diff/linalg.py#L974), [TODO.md:80](/home/runner/work/diff-diff/diff-diff/TODO.md#L80), [TODO.md:81](/home/runner/work/diff-diff/diff-diff/TODO.md#L81), [TODO.md:83](/home/runner/work/diff-diff/diff-diff/TODO.md#L83).

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the public helper docs lag the API change. solve_ols() and LinearRegression now accept vcov_type, but their parameter documentation does not describe that knob or its unsupported combinations, so the code/tests are ahead of the helper docs. Concrete fix: add vcov_type parameter docs and the new fail-fast limitation notes to those two docstrings. Refs: [diff_diff/linalg.py:429](/home/runner/work/diff-diff/diff-diff/diff_diff/linalg.py#L429), [diff_diff/linalg.py:453](/home/runner/work/diff-diff/diff-diff/diff_diff/linalg.py#L453), [diff_diff/linalg.py:2225](/home/runner/work/diff-diff/diff-diff/diff_diff/linalg.py#L2225), [diff_diff/linalg.py:2151](/home/runner/work/diff-diff/diff-diff/diff_diff/linalg.py#L2151).
  • No unmitigated test-coverage finding in the diff. The new tests cover the prior blockers and the new variance paths, including manual HC2/BM checks, absorb/TWFE guard rails, and summary labeling. See [tests/test_linalg_hc2_bm.py:192](/home/runner/work/diff-diff/diff-diff/tests/test_linalg_hc2_bm.py#L192), [tests/test_linalg_hc2_bm.py:474](/home/runner/work/diff-diff/diff-diff/tests/test_linalg_hc2_bm.py#L474), [tests/test_estimators_vcov_type.py:350](/home/runner/work/diff-diff/diff-diff/tests/test_estimators_vcov_type.py#L350), [tests/test_estimators_vcov_type.py:450](/home/runner/work/diff-diff/diff-diff/tests/test_estimators_vcov_type.py#L450), [tests/test_estimators_vcov_type.py:546](/home/runner/work/diff-diff/diff-diff/tests/test_estimators_vcov_type.py#L546), and [tests/test_estimators_vcov_type.py:643](/home/runner/work/diff-diff/diff-diff/tests/test_estimators_vcov_type.py#L643).

Addresses the P3 docs gap flagged by CI AI review on PR #327 head e113549:
both public helpers accept vcov_type but the parameter docs didn't list it
or its unsupported combinations.

- solve_ols: Parameters block now lists vcov_type with the four enum values
  and notes ``cluster_ids + {classical, hc2}`` and weighted hc2_bm raise.
- LinearRegression.__init__: same threading plus a note that the class
  stores ``self._bm_dof`` and threads it into get_inference.

No behavior change; purely docstring updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 90a93c9887f150b2d1e1106dd8a49b74d12a12dc


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Prior blockers from the earlier re-review look resolved: absorbed-FE hc2/hc2_bm paths now fail fast, and MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") is explicitly rejected in the same way the Methodology Registry now documents.
  • Severity P1: the new legacy alias logic hard-resolves robust=False to vcov_type="classical" before cluster handling, so clustered fits that previously worked now fail validation instead of producing clustered inference.
  • I did not find a new inline-inference or partial-NaN anti-pattern in the modified estimator/linalg paths; the changed analytical inference sites still route through safe_inference().
  • The committed CR2 golden is still a self-reference anchor, but that limitation is explicitly documented in REGISTRY.md and tracked in TODO.md, so it is informational only.
  • Static review only: I could not execute the new tests in this sandbox because numpy is unavailable.

Methodology

Affected methods in this PR: DifferenceInDifferences, MultiPeriodDiD, TwoWayFixedEffects, LinearRegression, and the shared HC2 / HC2+Bell-McCaffrey variance infrastructure.

  • Severity: P1. Impact: robust=False now maps to vcov_type="classical" unconditionally, and the new validator rejects any cluster/cluster_ids combination with classical. That means formerly valid clustered calls such as DifferenceInDifferences(robust=False, cluster=...), MultiPeriodDiD(robust=False, cluster=...), TwoWayFixedEffects(robust=False, cluster=...), and LinearRegression(robust=False, cluster_ids=...) now fail at fit time instead of preserving the old “cluster overrides robust” behavior. This is an undocumented default-behavior regression in the inference contract, not a documented registry deviation. Concrete fix: preserve backward compatibility when vcov_type is omitted by delaying alias resolution until cluster context is known, or by tracking whether vcov_type was explicit and remapping the implicit robust=False + cluster case back to CR1/hc1. Refs: diff_diff/linalg.py:L1021-L1071, diff_diff/linalg.py:L978-L988, diff_diff/linalg.py:L2170-L2175, diff_diff/linalg.py:L2374-L2387, diff_diff/estimators.py:L139-L157, diff_diff/estimators.py:L816-L846.
  • Severity: P3. Impact: the committed CR2 parity JSON is still a Python self-reference anchor rather than an authoritative clubSandwich export, so it only protects against internal drift for now. Because this is explicitly disclosed in the registry and tracked in TODO.md, it is informational and not a blocker. Concrete fix: none for approval; regenerate the golden from R in the tracked follow-up. Refs: docs/methodology/REGISTRY.md:L2296-L2300, TODO.md:L81-L84, benchmarks/data/clubsandwich_cr2_golden.json:L476-L479.

Code Quality

  • No unmitigated findings in the modified code paths.

Performance

  • No unmitigated findings. The new O(n^2 k) CR2/BM DOF cost is already tracked in TODO.md and is non-blocking for this PR.

Maintainability

  • No unmitigated findings.

Tech Debt

  • Severity: P3. Impact: weighted one-way BM, weighted clustered CR2/BM, and absorbed-FE HC2/BM remain deferred, but the PR now fails fast on those unsupported combinations and records them in TODO.md/REGISTRY.md, which is the right non-blocking treatment under the project rules. Concrete fix: none for approval; keep the tracked follow-ups open. Refs: TODO.md:L81-L84, docs/methodology/REGISTRY.md:L2296-L2300.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the new status text points readers to docs/methodology/papers/dechaisemartin-2026-review.md, but that file is not present in this checkout, so the new methodology breadcrumb is dead. Concrete fix: either commit that review artifact or replace the pointer with the in-tree registry section that is actually authoritative. Refs: ROADMAP.md:L103-L103, docs/methodology/REGISTRY.md:L2122-L2122.

Static-review note: I could not run pytest here because numpy is missing from the sandbox, so the assessment is based on diff inspection and surrounding source only.

Path to Approval

  1. Restore backward-compatible clustered behavior for the legacy alias path: when vcov_type is omitted, robust=False must not force classical once cluster/cluster_ids is present. The cleanest fix is to delay alias resolution until cluster context is known, or to remember whether vcov_type was explicit and remap the implicit robust=False + cluster case to hc1/CR1.
  2. Add regression tests for DifferenceInDifferences(robust=False, cluster=...), MultiPeriodDiD(robust=False, cluster=...), TwoWayFixedEffects(robust=False, cluster=...), and LinearRegression(robust=False, cluster_ids=...) so this legacy-cluster interaction is pinned end to end.

Addresses CI AI review on PR #327 head 90a93c9:

P1 — legacy alias broke clustered calls:
  The new `robust=False → vcov_type="classical"` alias was too eager.
  Clustered calls like `DifferenceInDifferences(robust=False, cluster="unit")`
  (and the TWFE/MultiPeriod/LinearRegression equivalents) used to produce
  CR1 cluster-robust SEs — the cluster structure silently overrode the
  non-robust flag. Phase 1a made them fail validation (classical is one-way
  only).

  Fix: track `_vcov_type_explicit` at __init__/set_params. At fit time, a
  new `_resolve_effective_vcov_type(effective_cluster_ids)` remaps implicit
  `"classical"` to `"hc1"` when a cluster structure is present, preserving
  CR1 behavior and emitting a UserWarning. Explicit `vcov_type="classical"`
  + cluster still raises (user made the choice deliberately).

  - DifferenceInDifferences.fit: remap at solve site; report remapped type
    on the result.
  - MultiPeriodDiD.fit: same pattern, both analytical and absorb paths.
  - TwoWayFixedEffects.fit: same pattern + the auto-cluster bypass now
    gates on `_vcov_type_explicit` so implicit classical keeps the unit
    auto-cluster (which feeds the remap). Wild-bootstrap behavior
    unchanged (already kept the auto-cluster).
  - LinearRegression.__init__: mirrors the remap for direct callers so
    the behavior is consistent across the library surface.

  All four LinearRegression call sites (DiD fit, MultiPeriod fit, TWFE
  two fit branches) drop the `robust=self.robust` forwarding when the
  remap could fire, since `robust=False + vcov_type="hc1"` would
  otherwise trip `resolve_vcov_type`'s conflict check. The resolved
  vcov_type becomes the single source of truth for the LR call.

P3 — dead pointer: force-add the paper review file.
  `docs/methodology/papers/dechaisemartin-2026-review.md` was gitignored
  by the `.gitignore:91` `papers/` pattern. ROADMAP.md:103 and
  REGISTRY.md:2122 referenced it, so the breadcrumb was dead. Force-added
  now, same treatment as the existing `rambachan-roth-2023-review.md`.

Tests: 7 new regression guards in TestFitBehavior covering
DifferenceInDifferences / MultiPeriodDiD / TwoWayFixedEffects /
LinearRegression `robust=False + cluster` round-trips, plus the explicit-
vs-implicit distinction (`test_explicit_classical_with_cluster_still_raises`
pins that deliberate classical + cluster still raises).

All 141 Phase 1a tests pass; 454 tests across estimators / survey / TWFE
methodology / Phase 1a neighbours pass (one flaky test-ordering failure
in `test_hc1_cluster_unchanged` passes standalone, unrelated to this fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3c4a393732dc58df436b5f382ce76411df73fef1


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Prior methodology blockers around absorbed-FE HC2/Bell-McCaffrey support and MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") now look properly fail-fast and documented in the Methodology Registry/TODO.
  • One P1 remains: the legacy robust=False compatibility fix is incomplete on the public LinearRegression.fit(..., cluster_ids=...) override path. Constructor-time clustering is handled, but fit-time clustering still falls into the new classical + cluster_ids validator and errors.
  • The estimator-layer remap added to DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects is the right pattern; it just was not carried through to the direct LinearRegression.fit() entry point.
  • The self-referential CR2 golden JSON is still only a regression anchor, but that limitation is explicitly documented in REGISTRY.md/TODO.md, so it is informational rather than blocking.
  • Static review only: I could not run pytest here because pytest and import-time dependencies like pandas are unavailable in the sandbox.

Methodology

Affected methods: DifferenceInDifferences, MultiPeriodDiD, TwoWayFixedEffects, LinearRegression, and the shared HC2 / HC2+Bell-McCaffrey variance infrastructure.

  • Severity: P1. Impact: the PR’s backward-compatibility repair for the robustvcov_type alias is still incomplete. The new estimator helpers correctly remap implicit robust=False + cluster to CR1/hc1, but the public LinearRegression API only performs that remap when cluster_ids are supplied to __init__. If a caller instead uses the documented fit-time override, LinearRegression(robust=False).fit(..., cluster_ids=...), self.vcov_type remains "classical" and solve_ols() hits the new classical SEs are one-way only validation path. That leaves a public clustered inference entry point regressed even though the PR positions robust=True/False as a backward-compat alias. Refs: diff_diff/linalg.py:2170, diff_diff/linalg.py:2252, diff_diff/linalg.py:2291, diff_diff/linalg.py:978. Concrete fix: give LinearRegression the same implicit-vs-explicit vcov_type tracking the estimator classes now use, and resolve the effective vcov_type after effective_cluster_ids is known inside fit(), not only in __init__.
  • Severity: P3. Impact: the methodology-side deviations I checked are now explicitly documented and therefore non-blocking: absorbed-FE HC2 / HC2+Bell-McCaffrey rejection, MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") rejection pending contrast-aware CR2 DOF, weighted BM NotImplementedError, and the still-self-referential CR2 golden file. Refs: docs/methodology/REGISTRY.md:166, docs/methodology/REGISTRY.md:2297, docs/methodology/REGISTRY.md:2300, TODO.md:81, TODO.md:84, benchmarks/data/clubsandwich_cr2_golden.json:477. Concrete fix: none for approval.

Code Quality

  • No additional unmitigated findings beyond the incomplete LinearRegression.fit(..., cluster_ids=...) propagation above.

Performance

  • No unmitigated findings. The O(n^2 k) CR2 Bell-McCaffrey DOF cost is already tracked in TODO.md:95. Concrete fix: none for approval.

Maintainability

  • No additional unmitigated findings.

Tech Debt

  • Severity: P3. Impact: the PR correctly tracks deferred follow-ups for weighted BM, absorbed-FE HC2/BM, the remaining standalone estimators that still do not expose vcov_type, and regeneration of the authoritative clubSandwich golden. Refs: TODO.md:80, TODO.md:81, TODO.md:83, TODO.md:84. Concrete fix: none for approval.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the new test file covers constructor-time cluster remapping, but it does not exercise the remaining broken public entry point, LinearRegression(robust=False).fit(..., cluster_ids=...), so the incomplete propagation escaped review. Refs: tests/test_estimators_vcov_type.py:1, diff_diff/linalg.py:2291. Concrete fix: add one direct regression test for fit-time cluster_ids override, not just constructor-time cluster_ids.

Path to Approval

  1. Move the implicit robust=False remap for LinearRegression from constructor-time only to fit-time resolution based on effective_cluster_ids, so LinearRegression(robust=False).fit(..., cluster_ids=...) preserves CR1/hc1 just like the estimator classes now do.
  2. Add a regression test that instantiates LinearRegression(robust=False) without constructor-time clusters, passes cluster_ids= to fit(), and asserts finite clustered inference instead of a classical SEs are one-way only failure.

Addresses CI AI review P1 on PR #327 head 3c4a393: the previous commit
only remapped `robust=False + cluster_ids=...` → `"hc1"` at
`LinearRegression.__init__`, so the documented
`LinearRegression(robust=False).fit(..., cluster_ids=...)` override path
still fell into `classical + cluster_ids` validation and errored.

Fix: track `_vcov_type_explicit` at __init__; relocate the remap to
`fit()`, where we already compute `effective_cluster_ids` (the union of
constructor-time and fit-time cluster context). Both entry points now
preserve CR1 behavior identically. Users who want non-robust SEs can
still pass `vcov_type="classical"` explicitly (and no cluster).

Tests: add `test_linear_regression_robust_false_fit_time_cluster_preserves_cr1`
for the fit-time override path. Existing constructor-time test retained.

All 143 Phase 1a tests pass; 313 tests in estimators / survey /
methodology TWFE regression pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 254660df7a413d9adae8460588f443507ab9390a


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The previous blocker around LinearRegression.fit(..., cluster_ids=...) looks addressed on the direct first-fit path.
  • The methodology-side deviations I checked are documented in the Registry/TODO and are non-blocking: absorbed-FE HC2/Bell-McCaffrey rejection, MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") rejection, weighted Bell-McCaffrey NotImplementedError, and the self-reference CR2 golden file.
  • One new P1 remains: the PR does not preserve the distinction between the user-configured variance family and the fit-time effective variance family.
  • That shows up in two ways: get_params() serializes alias-derived "classical" as if it were explicit, and LinearRegression.fit() remaps by mutating instance state. Both can select the wrong SE family after clone/reuse.
  • The added tests cover first-fit remap behavior, but not clone/re-fit state preservation.

Methodology

  • Severity: P1. Impact: the new backward-compatibility logic for robust/vcov_type is not stable outside the first fit of the original object. get_params() in diff_diff/estimators.py:L776 returns the resolved self.vcov_type, so an estimator created with implicit robust=False state will clone as explicit vcov_type="classical" and lose the clustered remap on the next fit. Separately, LinearRegression.fit() in diff_diff/linalg.py:L2291 applies the legacy clustered remap by changing self.vcov_type on the instance, so a later unclustered fit can silently use HC1 instead of classical SEs. That is a wrong-variance-family bug, not just a cosmetic state mismatch. Concrete fix: keep raw user input and fit-time effective vcov_type separate; have get_params() return the raw constructor value (None when alias-derived); never mutate self.vcov_type inside fit().
  • Severity: P3. Impact: the remaining methodology limitations I checked are explicitly documented and tracked: one-way/cluster weighted Bell-McCaffrey not implemented, HC2/CR2-BM rejected on absorbed-FE fits, MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") rejected pending contrast-aware CR2 DOF, and the committed CR2 JSON still being a self-reference anchor. See docs/methodology/REGISTRY.md:L164, docs/methodology/REGISTRY.md:L2293, and TODO.md:L80. Concrete fix: none for approval.

Code Quality

  • No additional unmitigated findings beyond the vcov-state bug above.

Performance

  • No unmitigated findings. The expensive CR2 BM DOF loop and repeated bread solves are already tracked in TODO.md:L93. Concrete fix: none for approval.

Maintainability

  • No additional unmitigated findings beyond the configured-vs-effective vcov_type state bug above.

Tech Debt

  • Severity: P3. Impact: deferred follow-ups are properly tracked for standalone-estimator vcov_type threading, weighted Bell-McCaffrey/CR2 support, absorbed-FE HC2/BM parity work, and regeneration of the authoritative clubSandwich golden file in TODO.md:L80. Concrete fix: none for approval.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the new tests cover the direct fit-time cluster override, but they do not pin the two state-preservation paths that now matter: get_params()/clone round-trips and repeated LinearRegression.fit() calls that alternate clustered and unclustered usage. That gap is why the P1 above can slip through despite the added suite. See tests/test_estimators_vcov_type.py:L88 and tests/test_estimators_vcov_type.py:L194. Concrete fix: add one clone round-trip test for DifferenceInDifferences(robust=False, cluster="unit"), and one repeat-fit test for LinearRegression(robust=False) that clusters once and then refits without clusters.

Path to Approval

  1. Preserve configured vs effective vcov_type state separately. Do not mutate self.vcov_type during fit(); compute a local _fit_vcov_type and, if needed, expose the actually-used family on a separate fitted/results attribute.
  2. Make get_params() round-trip the raw constructor input for vcov_type (None when alias-derived) so cloned estimators keep the same backward-compat remap behavior as the original object.
  3. Add regression tests for sklearn-style clone/get_params() round-trips and for repeated LinearRegression.fit() calls that switch between clustered and unclustered fits.

igerber and others added 2 commits April 19, 2026 08:47
Addresses the pattern underlying repeated CI review P1s on PR #327:
`fit()` was mutating configuration state (`self.vcov_type`, `self.weights`,
`self.weight_type`) to apply per-fit remaps (legacy alias, survey
canonicalization), which silently contaminated subsequent fits and broke
sklearn-style clone round-trips. This commit establishes a single
invariant across the whole inference surface:

  fit() is idempotent on configuration. It computes all effective
  fit-time values as locals, stores them on fitted attributes (`_` suffix),
  and never mutates the user-configured state.

LinearRegression changes:
  - `__init__` stores raw constructor `vcov_type` on `self._vcov_type_arg`
    alongside the resolved `self.vcov_type` and the existing
    `_vcov_type_explicit` flag.
  - `fit()` resolves `_fit_vcov_type`, `_fit_weights`, `_fit_weight_type`
    as locals at the top, based on:
      * effective cluster context (constructor OR fit-time override)
      * survey design canonicalization
      * legacy robust=False + cluster -> CR1 remap
    The configured fields on `self` are never written during fit.
  - The effective fit-time values are stored on fitted attributes
    `self._fit_vcov_type_`, `self._fit_weights_`, `self._fit_weight_type_`
    for downstream helpers (compute_deff). `compute_deff` now reads from
    those attrs (fallback to configured state for backward compat).
  - All ~15 read sites inside `fit()` switched from `self.X` to the
    corresponding `_fit_X` local.

DifferenceInDifferences (and inherited classes) changes:
  - `__init__` stores `self._vcov_type_arg` (raw, possibly None).
  - `get_params()` returns the raw arg so sklearn clones preserve the
    implicit-vs-explicit distinction (and therefore the backward-compat
    remap).
  - `set_params()` updates `_vcov_type_arg` and `_vcov_type_explicit`
    consistently: explicit `vcov_type=X` sets both; `robust=` alone
    clears to None / False.
  - The existing `_resolve_effective_vcov_type(effective_cluster_ids)`
    already returned a local; confirmed no site mutates self post-init.

Tests:
  - `test_get_params_round_trip_preserves_implicit_classical`: clone
    round-trip of `DifferenceInDifferences(robust=False, cluster="unit")`.
    Both orig and clone remap to CR1 at fit time (pinning that get_params
    returns None for alias path).
  - `test_get_params_round_trip_preserves_explicit_vcov_type`: round-trip
    for explicitly-set vcov_type.
  - `test_linear_regression_repeat_fit_clustered_then_unclustered`:
    repeat-fit idempotence — first fit with cluster remaps to hc1, second
    fit without cluster uses classical (not stale hc1 from prior fit).
  - Existing LinearRegression tests updated to assert `_fit_vcov_type_`
    (the fitted attr) is the remapped value, and `self.vcov_type`
    (configured) stays unchanged.
  - Survey test updated to assert `_fit_weights_` (fitted) is populated
    while `self.weights` (configured) stays at user's None.
  - `test_get_params_default_vcov_type` updated: default construction
    returns None for raw vcov_type, resolved is hc1.

Why this sets up Phase 1b+:
  Future additions (bandwidth selector, HeterogeneousAdoptionDiD class,
  vcov_type threading on the 8 standalone estimators, weighted BM DOF
  rework) all hit the same configured-vs-effective shape. The single
  invariant above is the place to hang them: each new remap becomes a
  local variable in fit(), never a write to self.

All 145 Phase 1a tests pass; 459 tests across estimators / survey /
methodology / Phase 1a neighbours pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 697571c4e1fb1012e4d42d486b7df7eb23b02f9d


Overall Assessment

✅ Looks good

Executive Summary

  • The prior P1 around configured vs effective vcov_type state appears resolved. The estimator layer now preserves the raw constructor value for clone round-trips, computes the fit-time effective family locally, and avoids mutating configured state during fit. See diff_diff/estimators.py:L157-L165, diff_diff/estimators.py:L417-L433, diff_diff/estimators.py:L793-L883, and diff_diff/linalg.py:L2343-L2480.
  • The re-review regression tests now directly cover the previously problematic clone/re-fit paths and the implicit-vs-explicit "classical" distinction. See tests/test_estimators_vcov_type.py:L202-L380.
  • The remaining methodology deviations I checked are explicitly documented and tracked, so they are non-blocking under the review policy: absorbed-FE HC2/Bell-McCaffrey rejection, MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") rejection, weighted Bell-McCaffrey NotImplementedError, and the self-reference CR2 golden file. See docs/methodology/REGISTRY.md:L2296-L2300 and TODO.md:L80-L84.
  • I did not find a new unmitigated P0/P1 in the changed methodology, inference, or parameter-threading code.
  • I could not execute the test suite here because pytest is not installed in this environment; this assessment is based on static review of the diff and the added tests.

Methodology

Code Quality

  • No findings.

Performance

  • Severity: P3. Impact: The expensive parts of the new HC2/CR2-Bell-McCaffrey implementation remain tracked, not unmitigated regressions: repeated bread inversion, lack of Rust HC2 support, and the naive O(n^2 k) CR2 DOF loop. Concrete fix: none for approval. See TODO.md:L93-L95.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: Remaining rollout gaps are explicitly tracked in TODO.md, especially threading vcov_type through the standalone estimators outside the DiD inheritance chain and regenerating the authoritative R CR2 golden file. Concrete fix: none for approval. See TODO.md:L80-L84.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: The added tests cover the re-review-specific regression cases that mattered here: clone round-trips, repeat-fit state preservation, explicit-vs-implicit "classical" handling, and summary labeling. Concrete fix: none. See tests/test_estimators_vcov_type.py:L202-L380 and tests/test_linalg_hc2_bm.py:L419-L570.
  • Residual risk: I could not run the suite locally because pytest is unavailable in this environment.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026
CI on PR #327 failed on `test_hc1_cluster_unchanged` across macOS py3.11
and Linux-arm py3.11/3.13. Root cause: the test asserted
`assert_array_equal` on two `compute_robust_vcov` call paths that reach
the same math but accumulate sub-machine-epsilon ordering differences
(5e-18 on macOS, 1.2e-17 on Linux arm) — likely BLAS reduction
ordering depending on which validator branch runs first. Both failures
showed `Max absolute difference among violations: ~1e-17`, well below
float64 machine epsilon (~2e-16).

Fix: switch both tests in `TestHC1Unchanged` to
`np.testing.assert_allclose(..., atol=1e-14, rtol=1e-14)`. The tolerance
is 3 orders of magnitude tighter than machine epsilon so the test still
catches any real regression in HC1/CR1 semantics while tolerating
Numpy BLAS reduction-order non-determinism across platforms.

Applies to:
  - TestHC1Unchanged.test_default_path_unchanged (one-way HC1)
  - TestHC1Unchanged.test_hc1_cluster_unchanged (CR1 cluster-robust)

Both tests pass locally in the combined suite (previously flaky on
cross-test ordering, which is the same symptom as the CI failure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber merged commit d9aaf86 into main Apr 19, 2026
18 of 19 checks passed
@igerber igerber deleted the did-no-untreated branch April 19, 2026 14:42
igerber added a commit that referenced this pull request Apr 20, 2026
…-failures audit

Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per
project SemVer convention, minor bumps are reserved for new estimators or new
module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults
(PR #318) add a new public API surface and drive this bump.

Headline work:
- PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner-
  ready output layer. Plain-English narrative summaries across all 16 result types,
  with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md.
- PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear
  regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the
  upcoming HeterogeneousAdoptionDiD estimator.
- PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A
  contract), heterogeneity + within-group-varying PSU under Binder TSL, and
  PSU-level Hall-Mammen wild bootstrap at cell granularity.
- PR #333 performance review - docs/performance-scenarios.md documents 5-7
  realistic practitioner workflows; benchmark harness extended.

Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339)
continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J.

CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default
CI after false-positive flakes; perf-review harness is the principled replacement.

Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml,
diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released:
2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the
comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers;
the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant