Phase 1a: Kernel infrastructure + HC2/Bell-McCaffrey variance#327
Phase 1a: Kernel infrastructure + HC2/Bell-McCaffrey variance#327
Conversation
First of seven phased PRs implementing the HeterogeneousAdoptionDiD estimator from de Chaisemartin, Ciccia, D'Haultfoeuille & Knau (2026, arXiv:2405.04465v6). Ships the foundational RDD and small-sample variance infrastructure that Phases 1b, 1c, 2, 3 all compose. - diff_diff/local_linear.py (new): Epanechnikov, triangular, and uniform kernels on [0, 1] with closed-form moment constants matching numerical integration to 1e-12; univariate local-linear regression at a boundary via kernel-weighted OLS through solve_ols. - diff_diff/linalg.py: new vcov_type enum (classical, hc1, hc2, hc2_bm) with return_dof kwarg on compute_robust_vcov. HC2 one-way uses leverage-corrected meat with weighted-hat convention; HC2+Bell-McCaffrey one-way computes the Imbens-Kolesar (2016) Satterthwaite DOF per coefficient. CR2 Bell-McCaffrey cluster-robust uses symmetric matrix square root via eigendecomposition with Moore-Penrose pseudoinverse for singleton clusters and absorbed cluster fixed effects. Weighted cluster CR2 raises NotImplementedError (Phase 2+). Rust backend guards skip non-hc1 paths. - diff_diff/estimators.py: vcov_type threaded through DifferenceInDifferences (MultiPeriodDiD and TwoWayFixedEffects inherit via the base class). robust=True aliases vcov_type="hc1"; robust=False aliases "classical". Conflict detection at __init__. LinearRegression stores per-coefficient Bell-McCaffrey DOF and consumes it in get_inference. - diff_diff/results.py: DiDResults gains vcov_type and cluster_name fields; summary() prints a human-readable Variance family line. - benchmarks: R clubSandwich parity script plus JSON anchor (python_self_reference until R is run) for CR2 BM parity tests. - Tests: three new focused suites (test_local_linear.py, test_linalg_hc2_bm.py, test_estimators_vcov_type.py, 104 new tests total). All 145 existing estimator tests plus 97 existing linalg tests pass with no regressions. - Docs: REGISTRY.md HeterogeneousAdoptionDiD section with Phase 1a requirements checklist; ROADMAP.md entry updated with status line; TODO.md deferrals for weighted CR2, standalone-estimator threading, bread_inv perf kwarg, Rust HC2 backend, scores-based DOF. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- diff_diff/linalg.py: fix compute_robust_vcov docstring to reflect that vcov_type="hc2_bm" supports both one-way and CR2 cluster-robust paths (the earlier "queued as a follow-up" language was stale). Extract resolve_vcov_type(robust, vcov_type) as the single source of truth for alias resolution and conflict detection; DifferenceInDifferences and LinearRegression both consume it. - diff_diff/estimators.py: DifferenceInDifferences.set_params re-validates the robust/vcov_type pair via resolve_vcov_type after mutation so invalid combinations (e.g. robust=False + vcov_type="hc2") raise instead of leaving the estimator in an inconsistent state. - diff_diff/local_linear.py: local_linear_fit now validates d/y/weights for NaN and Inf at the API boundary, returning targeted ValueErrors rather than relying on downstream solve_ols failures. Removed a stale inline comment about missing solve_ols overload stubs (the stubs now include weights/weight_type). - docs/methodology/REGISTRY.md: reframe the CR2 golden-JSON checkbox so it accurately reflects that the committed JSON is a python_self_reference stability anchor until the R script is run; authoritative clubSandwich regeneration is tracked in TODO.md. - Tests: set_params conflict tests (robust=False + vcov_type="hc2" raises; robust=True restores hc1; invalid vcov_type rejected) and local_linear_fit NaN/Inf validation tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Class-level docstrings now fully describe the vcov_type enum (classical, hc1, hc2, hc2_bm) on DifferenceInDifferences and MultiPeriodDiD, and clarify that robust is a legacy alias. Renamed test_set_params_rejects_conflict_on_robust_only to test_set_params_robust_only_rederives_vcov_type so the name matches the asserted behavior (robust-only mutation re-derives vcov_type from the alias rather than raising). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment
Executive Summary
Methodology The new low-level math is directionally aligned with the cited source material: the HAD paper uses local-linear boundary regression plus Calonico-style bias-corrected inference for the nonparametric branch, and for the small-
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
CI review caught that Phase 1a wired vcov_type into DifferenceInDifferences __init__/get_params but not into the overridden fit() paths on MultiPeriodDiD and TwoWayFixedEffects, so `vcov_type="hc2_bm"` on either silently produced HC1 inference. Summary output also mislabeled wild-bootstrap inference with the analytical variance family. - diff_diff/estimators.py MultiPeriodDiD.fit: pass vcov_type=self.vcov_type into the analytical solve_ols call; remove the `not self.robust` homoskedastic fallback (subsumed by compute_robust_vcov's classical branch). When vcov_type="hc2_bm" and no survey design, compute Bell-McCaffrey Satterthwaite DOF via _compute_bm_dof_from_contrasts for both per-coefficient period effects AND the post-period-average contrast; fall back to the shared analytical df otherwise. Store vcov_type and cluster_name on MultiPeriodDiDResults. - diff_diff/twfe.py: forward self.robust and self.vcov_type into the two LinearRegression instantiations; store vcov_type and the TWFE auto- cluster label (or explicit self.cluster) on DiDResults. - diff_diff/linalg.py: split _compute_bm_dof_oneway into a contrast-aware helper _compute_bm_dof_from_contrasts(X, bread, h_diag, contrasts) so MultiPeriodDiD can request BM DOF for the avg_att linear combination. The per-coefficient wrapper now delegates to the shared helper with contrasts=I_k. - diff_diff/results.py DiDResults.summary and MultiPeriodDiDResults: gate the Variance family label on inference_method == "analytical" so wild-bootstrap output is no longer mislabeled; add vcov_type, cluster_name, inference_method, n_bootstrap, n_clusters fields to MultiPeriodDiDResults for symmetry with DiDResults and to drive the summary label. - tests/test_estimators_vcov_type.py: add five end-to-end tests exercising the previously-untested paths - MultiPeriodDiD classical vs hc1 SE differ; MultiPeriodDiD hc2_bm CI is finite; TWFE hc1 vs hc2_bm SE differ (CR1 vs CR2); TWFE records the unit auto-cluster label in summary; wild-bootstrap with cluster suppresses the Variance line. All 209 Phase 1a suites plus 145 estimator regression tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker. Highest unmitigated severity is P0: unsupported Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…2_bm CI re-review flagged two unmitigated issues on top of the Phase 1a diff. P0 - validation bypass: the `vcov_type`/`cluster`/`weights` raise logic lived only in the public `compute_robust_vcov()` wrapper. `solve_ols` and `_solve_ols_numpy` called `_compute_robust_vcov_numpy` directly and reached the dispatch table unvalidated, so `cluster + classical`, `cluster + hc2`, and `cluster + weights + hc2_bm` silently produced one-way SEs or a hybrid weighted-CR2 result instead of raising. Extract the checks into a shared `_validate_vcov_args()` helper and call it from both entry points so the raise is universal. P1 - MultiPeriodDiD cluster + hc2_bm: when `cluster_ids` is set, vcov comes from `_compute_cr2_bm` (CR2 cluster-robust) but the new per-period and post-average DOF block was still using `_compute_bm_dof_from_contrasts`, which builds the one-way residual-maker and ignores clusters. Pairing CR2 SEs with one-way BM DOF is a broken hybrid. A contrast-aware CR2 BM DOF helper is real work (Pustejovsky-Tipton per-cluster adjustment matrices applied to arbitrary contrast vectors) and not in Phase 1a scope. For now, reject the combination in `MultiPeriodDiD.fit` with a clear error that points to the workarounds: drop `cluster` for one-way HC2+BM, or drop `vcov_type="hc2_bm"` for CR1 (Liang-Zeger) cluster-robust. Track the full implementation as follow-up. Tests: four new negative-path tests exercising the previously-bypassed paths (solve_ols rejects cluster+classical, cluster+hc2, cluster+weights+hc2_bm; LinearRegression rejects cluster+hc2), plus a MultiPeriodDiD cluster+hc2_bm rejection test. All 299 Phase 1a + estimator regression tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality No separate unmitigated findings beyond the methodology/reporting issues above. Performance No unmitigated findings. The main new costs ( Maintainability No separate unmitigated findings beyond the public-contract mismatch already noted. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…ov label Addresses CI AI review on PR #327 (head 7ba6d58): - P1: MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") now has a matching Note in docs/methodology/REGISTRY.md (both under MultiPeriodDiD standard-errors block and under the HeterogeneousAdoptionDiD Phase 1a requirements checklist) plus an explicit call-out in the MultiPeriodDiD docstring. Also clarifies the DifferenceInDifferences docstring that the limitation only applies to the multi-period subclass, since the scalar-coefficient base class handles cluster + CR2 Bell-McCaffrey correctly. - P2: DiDResults.summary() and MultiPeriodDiDResults.summary() now suppress the analytical "Variance:" line when survey_metadata is present. Survey fits use Taylor linearization or replicate-weight variance, not the analytical HC/CR sandwich, so printing "HC1"/"CR2 Bell-McCaffrey" alongside survey-produced SEs was misleading. The survey design block already surfaces the actual inference source (weight type, strata/PSU counts, replicate method), so dropping the parallel label is the cleanest fix. - P2-Tests: Four new tests in TestSummarySurveyLabeling pin the survey-fit suppression in both the Taylor-linearization path (SurveyDesign with PSU/strata) and the replicate-weight path (BRR), for both DiDResults and MultiPeriodDiDResults. A regression guard asserts the non-survey path still prints the analytical label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No separate unmitigated findings. Performance No separate unmitigated findings. The new CR2/BM performance debt is already tracked in Maintainability
Tech Debt
Security No findings. Documentation/Tests No separate unmitigated findings beyond the issues above. The prior documentation blocker for Path to Approval
|
…_params Addresses CI AI review on PR #327 head dca1fce: P1 — TWFE auto-cluster vs. one-way vcov families: TwoWayFixedEffects.fit() forced cluster_var = unit when self.cluster was None, which collided with the validator: vcov_type in {"classical", "hc2"} is one-way-only and raises ValueError when cluster_ids is non-None. The inheritance surface advertised those families as usable on TWFE but they were not. Fix: when cluster is None AND vcov_type is a one-way family, drop the auto-cluster. The explicit choice of a one-way family wins over the TWFE default. cluster_name in DiDResults is now None on that path so summary() labels the one-way family (not "CR1 cluster-robust at unit"). Docstring, REGISTRY would be the next doc pass if we ever hit a follow-up edge, but the TWFE docstring already documents the exception. P1 — Weighted one-way hc2_bm silent math mismatch: _compute_bm_dof_from_contrasts builds its hat matrix from the unscaled design as X (X'WX)^{-1} X' W, but solve_ols solves weighted regression by transforming to X* = sqrt(w) X, y* = sqrt(w) y. The symmetric-idempotent residual maker M* = I - H* with H* = sqrt(W) X (X'WX)^{-1} X' sqrt(W) is the correct one for the Satterthwaite (trG)^2 / tr(G^2) ratio; the asymmetric X (X'WX)^{-1} X' W is neither transformed nor original-scale. Rather than ship silently-inconsistent small-sample p-values, extend the existing weighted-cluster CR2 deferral to cover weighted one-way as well: _validate_vcov_args now raises NotImplementedError for vcov_type="hc2_bm" + weights (with OR without cluster). Tracked in TODO.md under Methodology/Correctness (rederive on transformed design + add weighted parity tests). P2 — set_params atomic validation: Previously set_params applied all setattr mutations BEFORE re-validating the robust/vcov_type pair. A failing call left the estimator in the half-configured state the alias/conflict check is designed to prevent, defeating callers that catch ValueError and keep using the object. Fix: validate unknown-key rejection + resolve_vcov_type on locals first, then apply mutations atomically. Tests: - TestFitBehavior.test_twfe_honors_classical_without_autocluster + test_twfe_honors_robust_false_without_autocluster + test_twfe_honors_hc2_one_way: all three one-way entry points now succeed on TWFE (and cluster_name is None). - TestFitBehavior.test_twfe_explicit_cluster_still_clusters_under_hc2_bm: regression guard that explicit cluster= keeps the auto-bypass off. - TestHC2BMCluster.test_hc2_bm_weighted_one_way_not_implemented: locks the NotImplementedError at both public and internal entry points. - TestParamsRoundTrip.test_set_params_conflict_leaves_estimator_unchanged + test_set_params_unknown_key_leaves_estimator_unchanged: atomicity regression guards. All 133 Phase 1a tests pass; 405 tests across estimators / survey / Phase 1a neighbours pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…luster Addresses CI AI review on PR #327 head 6836836 (⛔ Blocker). P0 — HC2/CR2-BM applied to demeaned design produces wrong hat matrix: TWFE unconditionally demeans via within-transformation, and both DifferenceInDifferences(absorb=...) and MultiPeriodDiD(absorb=...) do the same before solving OLS on the reduced design. The HC2 leverage correction `h_ii = x_i' (X'X)^{-1} x_i` and the CR2 Bell-McCaffrey block adjustment `A_g = (I - H_gg)^{-1/2}` both depend on the FULL FE hat matrix, not the residualized one. FWL preserves coefficients and residuals but not the hat matrix, so applying HC2/CR2-BM to the demeaned regressors silently mis-states small-sample SEs and Satterthwaite DOF. Short-term fix: raise NotImplementedError in three places — - TwoWayFixedEffects.fit() unconditionally for vcov_type in {hc2, hc2_bm} - DifferenceInDifferences.fit() with absorb= and vcov_type in {hc2, hc2_bm} - MultiPeriodDiD.fit() with absorb= and vcov_type in {hc2, hc2_bm} HC1 and CR1 are unaffected (no leverage term; meat uses only the residuals, which FWL preserves). Workarounds documented in the error message: use vcov_type='hc1' with absorb=/TWFE, or switch to fixed_effects= dummies for a full-dummy design where the hat matrix is computed on the full projection. Lifting the guard requires computing HC2/CR2-BM from the full absorbed projection and validating against a full-dummy OLS or fixest/clubSandwich reference. Tracked in TODO.md. REGISTRY.md gets a matching Note under the Phase 1a checklist. P1 — TWFE wild_bootstrap + one-way family dropped the auto-cluster: The prior commit's one-way-family auto-cluster bypass in TWFE (classical/hc2, cluster=None → cluster_var=None) applied even when inference="wild_bootstrap". That silently dropped the unit cluster the bootstrap path needed to resample residuals. Fix: gate the bypass on inference=="analytical", so wild-bootstrap fits keep the unit auto-cluster. Since hc2/hc2_bm now raise earlier, only "classical" reaches the bypass branch; cleaned up accordingly. Tests: - test_twfe_rejects_hc2_and_hc2_bm: both combinations raise with the expected message. - test_did_absorb_rejects_hc2_and_hc2_bm: absorb= + hc2/hc2_bm rejected. - test_did_fixed_effects_dummies_still_accept_hc2_and_hc2_bm: dummy expansion path is unaffected (regression guard). - test_multi_period_absorb_rejects_hc2_and_hc2_bm: MultiPeriodDiD absorb= + hc2/hc2_bm rejected. - test_twfe_wild_bootstrap_preserves_auto_cluster: classical + wild_bootstrap + cluster=None keeps the unit auto-cluster (n_clusters == n_units). Removed/replaced: test_twfe_fit_honors_vcov_type (tested HC2+BM on TWFE), test_twfe_honors_hc2_one_way, test_twfe_explicit_cluster_still_clusters_under_hc2_bm — those paths now raise, so their replacements are the negative-path tests. All 135 Phase 1a tests pass; 448 tests across estimators / survey / TWFE methodology / Phase 1a neighbours pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Addresses the P3 docs gap flagged by CI AI review on PR #327 head e113549: both public helpers accept vcov_type but the parameter docs didn't list it or its unsupported combinations. - solve_ols: Parameters block now lists vcov_type with the four enum values and notes ``cluster_ids + {classical, hc2}`` and weighted hc2_bm raise. - LinearRegression.__init__: same threading plus a note that the class stores ``self._bm_dof`` and threads it into get_inference. No behavior change; purely docstring updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods in this PR:
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Static-review note: I could not run Path to Approval
|
Addresses CI AI review on PR #327 head 90a93c9: P1 — legacy alias broke clustered calls: The new `robust=False → vcov_type="classical"` alias was too eager. Clustered calls like `DifferenceInDifferences(robust=False, cluster="unit")` (and the TWFE/MultiPeriod/LinearRegression equivalents) used to produce CR1 cluster-robust SEs — the cluster structure silently overrode the non-robust flag. Phase 1a made them fail validation (classical is one-way only). Fix: track `_vcov_type_explicit` at __init__/set_params. At fit time, a new `_resolve_effective_vcov_type(effective_cluster_ids)` remaps implicit `"classical"` to `"hc1"` when a cluster structure is present, preserving CR1 behavior and emitting a UserWarning. Explicit `vcov_type="classical"` + cluster still raises (user made the choice deliberately). - DifferenceInDifferences.fit: remap at solve site; report remapped type on the result. - MultiPeriodDiD.fit: same pattern, both analytical and absorb paths. - TwoWayFixedEffects.fit: same pattern + the auto-cluster bypass now gates on `_vcov_type_explicit` so implicit classical keeps the unit auto-cluster (which feeds the remap). Wild-bootstrap behavior unchanged (already kept the auto-cluster). - LinearRegression.__init__: mirrors the remap for direct callers so the behavior is consistent across the library surface. All four LinearRegression call sites (DiD fit, MultiPeriod fit, TWFE two fit branches) drop the `robust=self.robust` forwarding when the remap could fire, since `robust=False + vcov_type="hc1"` would otherwise trip `resolve_vcov_type`'s conflict check. The resolved vcov_type becomes the single source of truth for the LR call. P3 — dead pointer: force-add the paper review file. `docs/methodology/papers/dechaisemartin-2026-review.md` was gitignored by the `.gitignore:91` `papers/` pattern. ROADMAP.md:103 and REGISTRY.md:2122 referenced it, so the breadcrumb was dead. Force-added now, same treatment as the existing `rambachan-roth-2023-review.md`. Tests: 7 new regression guards in TestFitBehavior covering DifferenceInDifferences / MultiPeriodDiD / TwoWayFixedEffects / LinearRegression `robust=False + cluster` round-trips, plus the explicit- vs-implicit distinction (`test_explicit_classical_with_cluster_still_raises` pins that deliberate classical + cluster still raises). All 141 Phase 1a tests pass; 454 tests across estimators / survey / TWFE methodology / Phase 1a neighbours pass (one flaky test-ordering failure in `test_hc1_cluster_unchanged` passes standalone, unrelated to this fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods:
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Addresses CI AI review P1 on PR #327 head 3c4a393: the previous commit only remapped `robust=False + cluster_ids=...` → `"hc1"` at `LinearRegression.__init__`, so the documented `LinearRegression(robust=False).fit(..., cluster_ids=...)` override path still fell into `classical + cluster_ids` validation and errored. Fix: track `_vcov_type_explicit` at __init__; relocate the remap to `fit()`, where we already compute `effective_cluster_ids` (the union of constructor-time and fit-time cluster context). Both entry points now preserve CR1 behavior identically. Users who want non-robust SEs can still pass `vcov_type="classical"` explicitly (and no cluster). Tests: add `test_linear_regression_robust_false_fit_time_cluster_preserves_cr1` for the fit-time override path. Existing constructor-time test retained. All 143 Phase 1a tests pass; 313 tests in estimators / survey / methodology TWFE regression pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Addresses the pattern underlying repeated CI review P1s on PR #327: `fit()` was mutating configuration state (`self.vcov_type`, `self.weights`, `self.weight_type`) to apply per-fit remaps (legacy alias, survey canonicalization), which silently contaminated subsequent fits and broke sklearn-style clone round-trips. This commit establishes a single invariant across the whole inference surface: fit() is idempotent on configuration. It computes all effective fit-time values as locals, stores them on fitted attributes (`_` suffix), and never mutates the user-configured state. LinearRegression changes: - `__init__` stores raw constructor `vcov_type` on `self._vcov_type_arg` alongside the resolved `self.vcov_type` and the existing `_vcov_type_explicit` flag. - `fit()` resolves `_fit_vcov_type`, `_fit_weights`, `_fit_weight_type` as locals at the top, based on: * effective cluster context (constructor OR fit-time override) * survey design canonicalization * legacy robust=False + cluster -> CR1 remap The configured fields on `self` are never written during fit. - The effective fit-time values are stored on fitted attributes `self._fit_vcov_type_`, `self._fit_weights_`, `self._fit_weight_type_` for downstream helpers (compute_deff). `compute_deff` now reads from those attrs (fallback to configured state for backward compat). - All ~15 read sites inside `fit()` switched from `self.X` to the corresponding `_fit_X` local. DifferenceInDifferences (and inherited classes) changes: - `__init__` stores `self._vcov_type_arg` (raw, possibly None). - `get_params()` returns the raw arg so sklearn clones preserve the implicit-vs-explicit distinction (and therefore the backward-compat remap). - `set_params()` updates `_vcov_type_arg` and `_vcov_type_explicit` consistently: explicit `vcov_type=X` sets both; `robust=` alone clears to None / False. - The existing `_resolve_effective_vcov_type(effective_cluster_ids)` already returned a local; confirmed no site mutates self post-init. Tests: - `test_get_params_round_trip_preserves_implicit_classical`: clone round-trip of `DifferenceInDifferences(robust=False, cluster="unit")`. Both orig and clone remap to CR1 at fit time (pinning that get_params returns None for alias path). - `test_get_params_round_trip_preserves_explicit_vcov_type`: round-trip for explicitly-set vcov_type. - `test_linear_regression_repeat_fit_clustered_then_unclustered`: repeat-fit idempotence — first fit with cluster remaps to hc1, second fit without cluster uses classical (not stale hc1 from prior fit). - Existing LinearRegression tests updated to assert `_fit_vcov_type_` (the fitted attr) is the remapped value, and `self.vcov_type` (configured) stays unchanged. - Survey test updated to assert `_fit_weights_` (fitted) is populated while `self.weights` (configured) stays at user's None. - `test_get_params_default_vcov_type` updated: default construction returns None for raw vcov_type, resolved is hc1. Why this sets up Phase 1b+: Future additions (bandwidth selector, HeterogeneousAdoptionDiD class, vcov_type threading on the 8 standalone estimators, weighted BM DOF rework) all hit the same configured-vs-effective shape. The single invariant above is the place to hang them: each new remap becomes a local variable in fit(), never a write to self. All 145 Phase 1a tests pass; 459 tests across estimators / survey / methodology / Phase 1a neighbours pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # diff_diff/results.py
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI on PR #327 failed on `test_hc1_cluster_unchanged` across macOS py3.11 and Linux-arm py3.11/3.13. Root cause: the test asserted `assert_array_equal` on two `compute_robust_vcov` call paths that reach the same math but accumulate sub-machine-epsilon ordering differences (5e-18 on macOS, 1.2e-17 on Linux arm) — likely BLAS reduction ordering depending on which validator branch runs first. Both failures showed `Max absolute difference among violations: ~1e-17`, well below float64 machine epsilon (~2e-16). Fix: switch both tests in `TestHC1Unchanged` to `np.testing.assert_allclose(..., atol=1e-14, rtol=1e-14)`. The tolerance is 3 orders of magnitude tighter than machine epsilon so the test still catches any real regression in HC1/CR1 semantics while tolerating Numpy BLAS reduction-order non-determinism across platforms. Applies to: - TestHC1Unchanged.test_default_path_unchanged (one-way HC1) - TestHC1Unchanged.test_hc1_cluster_unchanged (CR1 cluster-robust) Both tests pass locally in the combined suite (previously flaky on cross-test ordering, which is the same symptom as the CI failure). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
HeterogeneousAdoptionDiD(de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026, arXiv:2405.04465v6).diff_diff/local_linear.py: Epanechnikov / triangular / uniform kernels on[0, 1]with closed-form moment constants verified to 1e-12 vs numerical integration; univariate local-linear regression at a boundary via kernel-weighted OLS.diff_diff/linalg.py: newvcov_typeenum (classical/hc1/hc2/hc2_bm) withreturn_dofkwarg oncompute_robust_vcov. HC2 one-way uses leverage-corrected meat with weighted-hat convention; HC2+Bell-McCaffrey one-way computes Imbens-Kolesar (2016) per-coefficient Satterthwaite DOF. CR2 Bell-McCaffrey cluster-robust uses symmetric matrix square root via eigendecomposition with Moore-Penrose pseudoinverse for singleton clusters and absorbed cluster fixed effects. Weighted cluster CR2 raisesNotImplementedError(Phase 2+).DifferenceInDifferences(and by inheritanceMultiPeriodDiD,TwoWayFixedEffects) grows avcov_typeparameter withrobust=True/Falseas backward-compat aliases.set_paramsre-validates therobust/vcov_typepair via the newresolve_vcov_type()helper.DiDResults.summary()prints a human-readableVariance:line.Methodology references (required if estimator / math changes)
benchmarks/data/clubsandwich_cr2_golden.jsonhas"source": "python_self_reference"as a regression anchor untilbenchmarks/R/generate_clubsandwich_golden.Ris run against RclubSandwich. Documented with a Note: indocs/methodology/REGISTRY.mdand tracked inTODO.md.NotImplementedError); the paper's applications don't use weighted cluster-robust and this is Phase 2+ per plan. Tracked inTODO.md.Validation
tests/test_local_linear.py(57 tests): kernel closed-form moments,local_linear_fitparity vs manual WLS, NaN/Inf input validation, error paths.tests/test_linalg_hc2_bm.py(31 tests): HC2 hand-formula parity, Bell-McCaffrey DOF Satterthwaite derivation, CR2 adjustment matrix edge cases (singleton, rank-deficient, identity), CR2 parity harness withclubSandwichgolden JSON, regression anchors for the unchanged HC1/CR1 paths.tests/test_estimators_vcov_type.py(22 tests):robust⇔vcov_typealias resolution,set_paramsconflict detection and re-derivation,MultiPeriodDiD/TwoWayFixedEffectsinheritance, wild-bootstrap coexistence, summary variance-family label.tests/test_linalg.py(97) andtests/test_estimators.py(145) pass with no regressions.Security / privacy
Pre-merge state: Local AI review (round 2) → ✅ looks good, two P1 blockers from round 1 resolved, all P2 items addressed or documented. Paper review file at
docs/methodology/papers/dechaisemartin-2026-review.mdis gitignored (per the existingpapers/pattern at.gitignore:91) so it is not in the PR; the Phase 0 REGISTRY stub in this PR supersedes it as the authoritative reference.🤖 Generated with Claude Code