Phase 2a: HeterogeneousAdoptionDiD class (single-period, 3 design paths)#346
Phase 2a: HeterogeneousAdoptionDiD class (single-period, 3 design paths)#346
Conversation
Ships the user-facing HAD estimator with three design-dispatch paths:
- continuous_at_zero (Design 1'): bias-corrected local-linear at boundary=0,
rescaled to beta-scale via Equation 8 divisor D_bar = (1/G) * sum(D_{g,2}).
- continuous_near_d_lower (Design 1): same path with regressor shift
D' = D - d_lower and boundary = float(d.min()).
- mass_point (Design 1 Section 3.2.4): sample-average 2SLS with instrument
1{D_{g,2} > d_lower}. Point estimate is the Wald-IV ratio; SE is the
structural-residual 2SLS sandwich [Z'X]^-1 Omega [Z'X]^-T for classical,
hc1, and CR1 (cluster-robust). hc2/hc2_bm raise NotImplementedError
pending a 2SLS-specific leverage derivation.
design="auto" resolves via the REGISTRY auto-detect rule (d.min() < 0.01 *
median(|d|) -> continuous_at_zero; modal fraction > 2% -> mass_point; else
continuous_near_d_lower), with d.min() == 0 as an unconditional tie-break.
Panel validator enforces balanced two-period panel, D_{g,1} = 0 for all
units, no NaN in key columns. >2 periods raises pointing to Phase 2b.
aggregate="event_study" (Appendix B.2 multi-period) raises
NotImplementedError; survey= and weights= also raise (queued for follow-up
survey-integration PR).
All inference fields (att, se, t_stat, p_value, conf_int) routed through
safe_inference() for NaN-safe CI gating. Raw design kept on self so
get_params()/clone() round-trip preserves "auto" even after fit.
105 new tests cover the 12 plan commit criteria: three design paths finite,
design="auto" correctness + edge cases, beta-scale rescaling at atol=1e-14,
mass-point Wald-IV at atol=1e-14, mass-point sandwich SE parity at
atol=1e-12 (classical/hc1/CR1), NotImplementedError contracts, panel
contract violations, NaN propagation, sklearn clone round-trip,
get_params signature enumeration.
REGISTRY.md ticks four Phase 2a checkboxes and adds a Note block
documenting the structural-residual 2SLS sandwich choice. TODO.md queues
Phase 2b (multi-period), survey integration, weights, hc2/hc2_bm
leverage derivation, pre-test diagnostics (Phase 3), Pierce-Schott
replication (Phase 4), and tutorial integration (Phase 5).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paper Section 3.2.4 defines the Design 1 mass-point estimator with
instrument Z = 1{D_{g,2} > d_lower} at d_lower = float(d.min()) (the
lower-support mass point). Phase 2a previously accepted user-supplied
d_lower overrides on the mass-point path without validation, which would
silently redefine the instrument/control split and identify a different
(LATE-like) estimand outside Phase 2a's documented scope.
Changes:
- diff_diff/had.py: fit() now raises ValueError on the mass-point path
when an explicit d_lower differs from float(d.min()) beyond float
tolerance. d_lower=None (auto-resolve) and d_lower == d.min() (within
tolerance) remain supported. Continuous paths are unaffected - they
already reject off-support d_lower via Phase 1c's _validate_had_inputs
(negative-dose check after the regressor shift).
- tests/test_had.py: new TestMassPointDLowerContract class adds 6
rejection/acceptance tests for the contract. The prior
test_force_mass_point_on_continuous_data test is renamed and clarified
to document that d_lower=0.0 matches d.min()==0.0 in that DGP. The
stale all-above-d_lower NaN-propagation test is removed (the input is
now unreachable from the public API; the helper-level NaN guard is
still tested via test_helper_returns_nan_on_empty_z_zero).
Smaller fixes:
- _validate_had_panel: first_treat_col comment no longer overstates the
cross-validation (Phase 2a does value-domain only, not dose
consistency).
- _fit_continuous dispatch: "Warn once" comment revised to "Warn"
(per-fit warnings are not suppressed across calls in Phase 2a).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename test_d_lower_contract_is_mass_point_only -> test_mass_point_equality_guard_does_not_fire_on_continuous and rewrite the docstring so it accurately describes what's exercised. Previously claimed "arbitrary d_lower" but only tests d_lower=d.min(); the renamed test now narrates that the mass-point-specific ValueError does not fire on the continuous path (the continuous path has its own upstream negative-dose guard after the regressor shift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No independent finding beyond the methodology/contract issues above. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
Address CI AI review on PR #346. **P0 (Methodology — estimator formula was wrong):** The continuous paths computed `att = tau_bc / D_bar` instead of the paper's `att = (mean(ΔY) - tau_bc) / den`. On a known DGP with β = 0.3 the old code returned ~0 (since tau_bc ≈ lim_{d↓d̲} E[ΔY|D≤d] ≈ 0); the fix recovers the true β at n=4000. - Design 1' (continuous_at_zero), paper Equation 7 / Theorem 3: β = (E[ΔY] - lim_{d↓0} E[ΔY | D_2 ≤ d]) / E[D_2] Implementation: `att = (dy_arr.mean() - tau_bc) / d_arr.mean()`, `se = se_robust / |d_arr.mean()|`. - Design 1 (continuous_near_d_lower), paper Theorem 4 / `WAS_{d_lower}` under Assumption 6: β = (E[ΔY] - lim_{d↓d_lower} E[ΔY | D_2 ≤ d]) / E[D_2 - d_lower] Implementation: regressor shift `D - d_lower`, evaluate local-linear fit at boundary=0 on the shifted scale; `att = (dy_arr.mean() - tau_bc) / (d_arr - d_lower).mean()`. CI endpoints reverse under the subtraction `(ΔȲ - tau_bc)`; safe_inference computes `att ± z · se` from scratch so reversal is automatic. **P1 (Validator — incomplete contract on continuous paths):** - `_validate_had_panel` now rejects negative post-period doses (`D_{g,2} < 0`) front-door on the ORIGINAL unshifted scale so errors reference the user's dose column, not shifted values. - `continuous_near_d_lower` now enforces `d_lower == float(d.min())` within float tolerance, mirroring the mass-point guard. Off-support `d_lower` would otherwise pass through Phase 1c's 5% plausibility heuristic and silently identify a different estimand. **P1 (Identification — Assumption 5/6 not surfaced):** Design 1 paths (continuous_near_d_lower + mass_point) now emit a UserWarning stating that `WAS_{d_lower}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. continuous_at_zero (Design 1', Assumption 3 only) does not emit the warning. **P2 (vcov_type docstring mismatch):** Corrected the constructor docstring to reflect actual behavior: `vcov_type=None` falls back to the `robust` flag (`robust=True -> hc1`, `robust=False -> classical`, the default); explicit `vcov_type` takes precedence over `robust`. **Tests rewritten + new regressions:** - `TestBetaScaleRescaling` now pins the corrected formula at atol=1e-14 (both designs), pins the CI endpoint reversal, and adds two "recover true β" asymptotic sanity tests. - New `TestDesign1DLowerContract` covers mass-point AND continuous_near_d_lower d_lower equality enforcement. - New `TestPostPeriodDoseContract` covers negative-dose rejection. - New `TestAssumptionFiveSixWarning` verifies the Design 1 warning is emitted on continuous_near_d_lower + mass_point and NOT on continuous_at_zero. **Docs:** REGISTRY.md HAD Phase 2a section updated with the corrected estimator formula, the d_lower contract on both Design 1 paths, the Assumption 5/6 warning note, and a CI-endpoint-reversal note. had.py module and result-class docstrings updated in kind. Targeted regression: 118 HAD tests + 497 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
**P1 #1 (Methodology): continuous_near_d_lower on mass-point samples** When a user explicitly forced design="continuous_near_d_lower" on a sample that actually satisfies the >2% modal-fraction mass-point criterion, the downstream regressor shift (D - d_lower) would move the support minimum to zero on the shifted scale. Phase 1c's mass-point rejection guard only fires when d.min() > 0 (_validate_had_inputs), so the silent coercion ran the nonparametric local-linear estimator on a sample the paper (Section 3.2.4) requires to use the 2SLS branch, producing the wrong estimand. Fix: `HeterogeneousAdoptionDiD.fit()` now runs the modal-fraction check on the ORIGINAL (unshifted) d_arr when the user explicitly selects design="continuous_near_d_lower". If the fraction at d.min() exceeds 2%, the fit raises ValueError pointing to design="mass_point" or design="auto". design="auto" is unaffected (_detect_design already correctly resolves such samples to mass_point). **P1 #2 (Code Quality): first_treat_col validator not dtype-agnostic** The previous validator called `.astype(np.float64)` and `int(v)` on grouped first_treat values, which crashed on otherwise-supported string-labelled two-period panels (period in {"A","B"}, first_treat in {0, "B"}). Rewrote using `pd.isna()` for missingness and raw-value set-membership against `{0, t_post}` with no numeric coercion. **P2 (Maintainability): cluster-applied mass-point stored wrong vcov_type** When cluster was supplied, `_fit_mass_point_2sls` unconditionally switches to the CR1 cluster-robust sandwich, but the result object stored the REQUESTED family ("hc1" or "classical") as `vcov_type`. `summary()` rendered correctly via the cluster_name branch, but `to_dict()` and downstream programmatic consumers saw the stale requested label. Fixed: when cluster is supplied, `vcov_type` is stored as `"cr1"` regardless of the requested family. Renamed the local variable from `vcov_effective` to `vcov_requested` to separate the input from the effective family. Updated the `HeterogeneousAdoptionDiDResults.summary()` branch so the cluster rendering still works with the new stored value. **Tests added (+8 regression):** - TestValidateHadPanel.test_first_treat_col_with_string_periods - TestValidateHadPanel.test_first_treat_col_dtype_agnostic_rejects_invalid_string - TestContinuousPathRejectsMassPoint (2 tests) - TestMassPointClusterLabel (4 tests: cr1 stored when clustered, base family when unclustered, classical+cluster collapses to cr1, to_dict shows effective family) Targeted regression: 126 HAD tests + 505 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…abel
**P1 (Methodology): _validate_had_panel inferred pre/post by lexicographic sort**
Previously the validator sorted the two period labels alphabetically
and assigned `t_pre=periods[0]`, `t_post=periods[1]`. On supported
string-labelled panels like `("pre", "post")` the alphabetic order is
["post", "pre"], so the code flipped pre and post and then raised on
the treated-period D>0 check for a valid design. Same bug for
`("before", "after")` and any non-alphabetic-chronological label pair.
Fix: identify `t_pre` as the unique period where dose == 0 for ALL
units (HAD paper Section 2 no-unit-untreated convention); `t_post` is
the other period. This is a DGP-consistent invariant, not a string
ordering. If neither period has all-zero dose, raise with the
contract message and per-period nonzero-count diagnostics. If both
periods have all-zero dose, raise (no treatment variation to estimate).
The existing pre-period D=0 check is now tautological and has been
removed since the inference itself enforces the invariant. Behavior
on valid numeric panels (e.g., 2020/2021) is unchanged.
**P2 (Code Quality): summary() hardcoded 'WAS' row label**
`HeterogeneousAdoptionDiDResults.summary()` printed "WAS" as the
parameter label regardless of the resolved design. For Design 1
paths (continuous_near_d_lower, mass_point) the stored
`target_parameter` is "WAS_d_lower" per paper Sections 3.2.2-3.2.4,
so the user-facing output misrepresented the estimand.
Fix: render `self.target_parameter` in the summary row. Now Design 1'
prints "WAS", Design 1 prints "WAS_d_lower", matching the stored
result metadata.
**Tests (+7 regression):**
- TestValidateHadPanel.test_semantic_pre_post_labels_not_lexicographic
- TestValidateHadPanel.test_semantic_pre_post_with_first_treat_col
- TestValidateHadPanel.test_semantic_pre_post_fit_end_to_end
- TestValidateHadPanel.test_before_after_labels
- TestValidateHadPanel.test_no_all_zero_period_raises
- TestValidateHadPanel.test_both_all_zero_periods_raises
- TestResultMethods.test_summary_uses_target_parameter_for_row_label
Targeted regression: 133 HAD tests + 512 total across Phase 1 and
adjacent surfaces, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
**P1 (Methodology): Design 1 paths must reject d_lower = 0**
Paper Section 3.2 partitions HAD by regime: `d_lower = 0` is Design 1'
(`continuous_at_zero`); `d_lower > 0` is Design 1 (`continuous_near_
d_lower` or `mass_point`). The auto-detect rule already respects this
partition, but explicit overrides previously allowed
`design="mass_point", d_lower=0` and `design="continuous_near_d_lower"`
on a `d.min()==0` sample to run silently, returning a
paper-incompatible estimand (2SLS with degenerate single-unit mass for
mass_point; Design 1' algebra relabeled as `WAS_d_lower` with a
spurious Assumption 5/6 warning for continuous_near_d_lower).
Fix: add a fit-time guard that raises `ValueError` when
`resolved_design in ("mass_point", "continuous_near_d_lower")` and the
resolved `d_lower_val` is within float tolerance of zero (same
tolerance family as `_detect_design`'s d.min()==0 tie-break). The
error message points users to `continuous_at_zero` or `auto` for
samples with support infimum at zero.
**Docstring + test updates:**
- Rewrote the `design` parameter docstring to document the
regime-partition contract precisely: each explicit override is
now described with its d_lower precondition and mass-point
compatibility.
- Rewrote the `d_lower` parameter docstring to note the
Design-1-requires-positive contract.
- Inverted the prior `test_force_mass_point_on_continuous_data_at_
support_infimum` test (which incorrectly codified the unsupported
behavior) into three rejection regressions:
`test_force_mass_point_on_d_lower_zero_sample_raises`,
`test_force_continuous_near_d_lower_on_d_lower_zero_sample_raises`,
`test_force_mass_point_d_lower_none_on_zero_sample_raises`.
Targeted regression: 135 HAD tests + 514 total across Phase 1 and
adjacent surfaces, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
**P1 (Methodology): reciprocal regime check for design="mass_point"**
Round 2 added a guard that rejects `design="continuous_near_d_lower"`
on samples with modal-min fraction > 2% (mass-point samples). The
reciprocal direction was left unguarded: `design="mass_point"` on a
continuous-near-d_lower sample (modal fraction <= 2%) would silently
run 2SLS, but the instrument Z = 1{D > d.min()} degenerates because
the "mass" side has almost no observations. The resulting point
estimate identifies the exact-d.min() cell rather than the paper's
boundary-limit estimand (Section 3.2.4 2SLS is only the documented
Design 1 estimator when the modal mass exists).
Fix: extend the existing pre-shift regime check so it fires
symmetrically on both explicit overrides. When
`resolved_design == "mass_point"` and the modal fraction at d.min()
does NOT exceed 2%, raise ValueError pointing users to
`continuous_near_d_lower` or `auto`. Same 2% threshold used by
`_detect_design()` and Phase 1c's `_validate_had_inputs()` so the
three dispatch paths share one regime rule.
**Docs:** Updated the `design` parameter docstring to document the
reciprocal mass-point contract.
**Tests (+4 regression):**
- TestMassPointPathRejectsContinuousSample.test_mass_point_on_continuous_near_sample_raises
- TestMassPointPathRejectsContinuousSample.test_mass_point_on_true_mass_point_sample_runs
- TestMassPointPathRejectsContinuousSample.test_mass_point_modal_at_threshold_runs (2.5%)
- TestMassPointPathRejectsContinuousSample.test_mass_point_modal_exactly_two_percent_raises
Targeted regression: 139 HAD tests + 518 total across Phase 1 and
adjacent surfaces, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
**P1 (Methodology): Phase 2a panel-only not documented as deviation** Paper Section 2 defines HAD on panel OR repeated cross-section data, but Phase 2a ships a panel-only implementation. The existing balanced-panel validator rejects RCS inputs (disjoint unit IDs between periods) with a generic "unit(s) do not appear in both periods" error, but the scope restriction was not documented in REGISTRY.md as a `**Note:**` deviation from the paper's documented surface. Fix: - REGISTRY.md: new `**Note (Phase 2a panel-only):**` block under the HAD Phase 2a entry, explaining the restriction and pointing at the follow-up RCS PR. - `docs/methodology/papers/dechaisemartin-2026-review.md`: mirrored implementation note on the panel-or-RCS scope line. - `HeterogeneousAdoptionDiD.fit()` docstring: new preamble paragraph stating the panel-only restriction and why (unit-level first differences required). - `TODO.md`: new row queuing RCS identification-path work. - `tests/test_had.py`: two regression tests (`test_repeated_cross_section_raises` and `test_repeated_cross_section_fit_raises`) that construct RCS-shaped input and assert the validator + fit() raise with the expected error. **P3 (Docs/Tests): theorem/equation references were off** Module docstring, result-class `att` docstring, and `_fit_continuous()` docstring cited the paper's theorems/equations inconsistently with the registry's source map: - Design 1' identification: said "Theorem 3" -> should be Theorem 1 with Equation 3; Equation 7 is the sample estimator. - Design 1 continuous-near-d_lower: said "Proposition 3 / Theorem 4" -> should be Theorem 3 / Equation 11 (Theorem 4 is the QUG null test, not this estimand). Fix: updated all three docstring sites to cite Theorem 1 / Equation 3 (Design 1' identification) + Equation 7 (sample estimator), and Theorem 3 / Equation 11 (Design 1 continuous-near-d_lower / WAS_d_lower under Assumption 6). Targeted regression: 141 HAD tests + 520 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests No additional findings. Path to Approval
|
…egistry refs
**P1 (Code Quality): cluster= must truly be ignored on continuous paths**
`HeterogeneousAdoptionDiD.fit()` previously passed `self.cluster` into
`_aggregate_first_difference()` before the design was resolved. The
aggregator validates the cluster column eagerly (missing column,
within-unit variance, NaN ID), so a valid continuous fit could abort
just because a shared config supplied an irrelevant `cluster=`. This
contradicted the documented "ignored with a warning on continuous
paths" contract.
Fix: defer cluster extraction until after design resolution. The
first aggregation call now passes `cluster_col=None` unconditionally;
a second aggregation pass with `cluster_col=cluster_arg` runs only
when `resolved_design == "mass_point"`, which is the only path that
consumes the extracted cluster array. Continuous paths emit the
existing `UserWarning` and proceed to fit without touching the
cluster column at all.
**P3 (Methodology): registry checklist theorem references were stale**
Round 6 fixed the theorem citations in `had.py` and the paper review
doc but missed the Phase 2a checklist line in `REGISTRY.md`, which
still said "Equation 7 / Theorem 3" for Design 1' identification and
"Theorem 4, WAS_{d̲} under Assumption 6" for the continuous-near-d_lower
path. Updated the checklist line to match: Theorem 1 / Equation 3
(identification) + Equation 7 (sample estimator) for Design 1'; Theorem
3 / Equation 11 for WAS_{d̲}.
**Tests (+4 regression):**
- test_missing_cluster_column_on_continuous_only_warns: continuous_at_zero
+ cluster='does_not_exist' -> warn + fit succeeds.
- test_nan_cluster_on_continuous_only_warns: NaN cluster IDs on continuous
path -> warn + fit succeeds.
- test_within_unit_varying_cluster_on_continuous_only_warns: within-unit-
varying cluster IDs on continuous -> warn + fit succeeds.
- test_auto_design_ignores_irrelevant_cluster_on_continuous: design='auto'
resolving to continuous_at_zero also ignores cluster gracefully.
Targeted regression: 145 HAD tests + 524 total across Phase 1 and
adjacent surfaces, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…idation **P1 (Methodology): snap tolerance-accepted d_lower to float(d.min())** The `d_lower == float(d.min())` within-tolerance contract accepted user-supplied values that differed from the support infimum by float rounding noise (1e-15). Downstream algebra is not tolerance-aware: - `mass_point` + `d_lower = d.min() - ε`: `Z = d > d_lower` puts every mass-point unit (at exactly d.min()) into Z=1 because they are strictly > d_lower. The Z=0 control group empties and the Wald-IV ratio becomes undefined. - `continuous_near_d_lower` + `d_lower = d.min() + ε`: the regressor shift `d - d_lower` produces a minimum of `-ε` on the shifted scale. Phase 1c's `_validate_had_inputs` then raises on the negative-dose guard, aborting an otherwise acceptable fit. Fix: after the tolerance check passes on Design 1 paths, snap `d_lower_val = float(d_arr.min())` before the mass-point instrument construction, cohort counts, and the continuous-path regressor shift. This preserves the advertised tolerance contract while keeping downstream algebra exact. **P2 (Code Quality): row-level NaN/domain validation** The previous `first_treat_col` and `cluster=` validators collapsed each unit to a single value via `groupby().first()` before checking for NaN or out-of-domain values. pandas's `.first()` silently skips NaN, so a unit with rows `[valid, NaN]` would pass a unit-level check even though the raw NaN on the bad row should be rejected. Fix: for both `first_treat_col` (in `_validate_had_panel`) and `cluster=` (in `_aggregate_first_difference`), validate raw per-row values BEFORE the per-unit collapse: - Row-level NaN check on the raw column. - Row-level domain check on raw values (for first_treat_col). - Per-unit-constancy check using `nunique(dropna=False)` so a `[value, NaN]` within a unit registers as 2 distinct values. **Tests (+5 regression):** - test_mass_point_d_lower_below_min_within_tolerance_snaps: asserts d_lower = d.min() - 1e-15 collapses ULP-identically to the exact case across att, se, n_mass_point, n_above_d_lower. - test_continuous_near_d_lower_above_within_tolerance_snaps: asserts d_lower = d.min() + 1e-15 collapses to the exact case (no negative-shift rejection). - test_first_treat_col_mixed_row_nan_raises: unit with [valid, NaN] rows raises. - test_first_treat_col_mixed_row_invalid_value_raises: unit with [valid, 999] rows raises with the 999 in the error message. - test_mixed_row_nan_cluster_raises_on_mass_point: mass-point with cluster column holding [valid, NaN] per unit raises. Targeted regression: 150 HAD tests + 529 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Review 9 is overall ✅ Looks good; this PR addresses the remaining P2 and two P3s to close the loop cleanly. **P2 (Code Quality): `robust=True` silently ignored on continuous paths** The continuous dispatch block warns on `vcov_type` and `cluster` when they are supplied on a path that ignores them, but `robust=True` slipped through the same check. Since `robust=True` on the mass-point path is the backward-compat alias for `vcov_type="hc1"`, a user passing `robust=True` on `continuous_at_zero` / `continuous_near_d_lower` would reasonably expect it to do something; in reality the continuous paths always use the CCT-2014 robust SE from Phase 1c and the flag has no effect. Fix: extend the ignore-warning block to emit a `UserWarning` when `robust=True` and `resolved_design` is continuous. **P3 (Maintainability): _validate_had_panel return docstring stale** The Returns section said ``(t_pre, t_post)`` is returned in "min then max" order, but the implementation now uses the HAD dose invariant (the all-zero-dose period is `t_pre` regardless of label order). Updated the docstring to describe the identified semantic order and the arbitrary-dtype-label support (int, str, datetime). **Tests (+2 regression):** - test_robust_true_ignored_on_continuous_warns: `robust=True` on continuous path emits a warning containing "robust". - test_robust_false_silent_on_continuous: `robust=False` (the default) on continuous path does NOT emit the `robust=True is ignored` warning, keeping the default construction silent. Targeted regression: 152 HAD tests + 531 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
**P1 (Code Quality): sklearn parameter contract**
The class docstring advertises `sklearn.base.clone(est)` compatibility,
but the actual methods did not match sklearn's
`BaseEstimator.get_params(deep=True)` / `set_params(**params)` surface:
- `get_params()` did not accept a `deep` keyword. sklearn's `clone`
calls `get_params(deep=False)`, so any integration would have
failed with a TypeError on the missing kwarg.
- `set_params()` validated keys with `hasattr(self, key)`. That would
silently accept non-constructor attribute names like `fit`, and a
typo or malicious kwargs dict could overwrite estimator methods.
Fix:
- `get_params(self, deep: bool = True)` matches sklearn's signature.
`deep` is accepted for compat; this estimator has no nested
sub-estimators, so `deep=True` and `deep=False` return the same
dict. `del deep` documents the no-op explicitly and silences
unused-arg linters.
- `set_params(**params)` now restricts to keys from `get_params()`.
Non-constructor attribute names (including method names like `fit`
and dunder/private attrs) raise `ValueError`.
**Tests (+4 regression):**
- test_set_params_rejects_method_names: `set_params(fit=...)` raises
and `est.fit` stays callable.
- test_set_params_rejects_private_attrs: `_internal=42` raises.
- test_get_params_accepts_deep_keyword: `deep=True`, `deep=False`, and
no-arg all return the same dict.
- test_sklearn_clone_round_trip_if_available: `sklearn.base.clone`
round-trips the estimator; gated on `pytest.importorskip("sklearn")`
so it skips cleanly when sklearn is not in the test matrix.
Targeted regression: 155 HAD tests (+ 1 skipped) + 534 total (+1
skipped) across Phase 1 and adjacent surfaces, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated severity: Executive Summary
Methodology No findings. The estimator’s three design paths, Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings. The Phase 2a scope cuts I checked ( Security No findings. Documentation/Tests
|
…mple
Round 11 is ✅ Looks good overall; this PR closes the remaining P2 and
P3 for cleanliness.
**P2 (Code Quality): set_params must be atomic**
The previous implementation applied `setattr()` inside the validation
loop and called `_validate_constructor_args()` only afterward. A
multi-key call like `set_params(alpha=0.1, design="made_up")` would
mutate `self.alpha` before raising on the invalid design, leaving
the estimator half-updated. Callers that catch `ValueError` and
retry would see the wrong state.
Fix: dry-run validation via `type(self)(**merged)` before any
attribute write. If the constructor raises on the combined state,
`self` is not mutated. Mutations are applied only after validation
passes, so rejection leaves the estimator fully rollback-safe.
**P3 (Documentation/Tests): docstring example was non-runnable**
The class docstring example constructed data via
`generate_continuous_did_data()` which defaults to `n_periods=4`,
while `HeterogeneousAdoptionDiD.fit()` requires exactly two periods
in Phase 2a. A user copying the example verbatim would hit the
two-period rejection.
Fix: rewrote the example to build a two-period HAD panel by hand
with an explicit `D_{g,1} = 0` pre-period column, a unit-level
`dose_post`, and `delta_y` so the shape matches the estimator's
contract exactly. Kept `doctest: +SKIP` markers on the RNG and fit
lines.
**Tests (+3 rollback regressions):**
- test_set_params_rollback_on_failure: multi-key call with invalid
design keeps estimator unchanged.
- test_set_params_rollback_on_invalid_key: unknown kwarg keeps
estimator unchanged.
- test_set_params_rollback_on_invalid_alpha: out-of-range alpha
keeps estimator unchanged.
Targeted regression: 158 HAD tests (+1 sklearn-gated skip) + 537
total (+1 skipped) across Phase 1 and adjacent surfaces, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…inuous_at_zero **P1 (Methodology): continuous_at_zero silently ignored nonzero d_lower** `HeterogeneousAdoptionDiD.fit()` unconditionally set `d_lower_val = 0.0` on the `continuous_at_zero` path, so calls like `HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.5)` silently fit Design 1' at zero. This contradicted paper Section 3.2's Design 1' regime contract (`d_lower = 0`) and the class docstring's promise that mismatched overrides raise. It also created a parameter- contract mismatch: `get_params()` reported the user's nonzero `d_lower` while the fitted result reported `d_lower=0.0`. Fix: after `resolved_design == "continuous_at_zero"` is known, check whether the user supplied an explicit `d_lower`. If the absolute value exceeds the float-tolerance band (same family as the Design 1 d_lower guards below), raise `ValueError` pointing at `continuous_near_d_lower` / `mass_point` / `auto`. `d_lower=None` (auto-resolve) and `d_lower=0.0` (redundant but benign) continue to succeed. **Tests (+4 regression):** - test_continuous_at_zero_with_nonzero_d_lower_raises: d_lower=0.5 raises with "d_lower == 0" / "Design 1'" pointer. - test_continuous_at_zero_with_small_d_lower_raises: d_lower=0.01 also raises (not just large overrides). - test_continuous_at_zero_with_zero_d_lower_succeeds: d_lower=0.0 exactly is accepted (redundant-but-valid case). - test_auto_on_zero_sample_ignores_user_d_lower: design='auto' resolving to continuous_at_zero must ALSO reject explicit nonzero d_lower, not silently drop it. Targeted regression: 162 HAD tests (+1 sklearn-gated skip) + 541 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentHighest unmitigated severity: Executive Summary
Methodology
Code QualityNo additional findings. The previous PerformanceNo findings. MaintainabilityNo additional findings beyond the Tech Debt
SecurityNo findings. Documentation/Tests
Path to Approval
|
… docstring **P1 (Methodology): d_lower=NaN/inf bypassed comparison guards** The Design 1 regime guards (`d_lower > 0`, `d_lower == d.min()` within tolerance, `d_lower <= tol` for continuous_at_zero) all use comparison operators that return False for NaN. A user passing `d_lower=np.nan` therefore slipped through every check and ended up at `0.0` (continuous_at_zero branch) or `d.min()` (the snap). On a zero-support sample, that could silently re-enter the paper- incompatible continuous_near_d_lower / mass_point path the earlier review rounds had fixed. Fix: reject non-finite d_lower front-door in `_validate_constructor_args()`. This makes `__init__`, `set_params()`, and the atomic dry-run validation all share the same contract: `d_lower` must be None or a finite scalar. NaN and +/-inf raise `ValueError`. Atomic `set_params()` confirms that a failing `d_lower=NaN` call leaves the estimator unchanged. **P3 (Documentation/Tests): safe_inference coverage overstated** The result-class docstring and REGISTRY said "all 5 inference fields are routed through safe_inference". In reality, `safe_inference` only gates the downstream triple (t_stat, p_value, conf_int) - `att` and `se` are raw estimator outputs. The fit paths themselves return `(nan, nan)` on degenerate configurations, which combined with the safe_inference gate produces the advertised "all five fields NaN together" behavior, but the docstring wording was misleading. Fix: rewrote the Results docstring preamble and the REGISTRY.md Phase 2a NaN-propagation line to describe the two-layer NaN flow (fit-path NaN on att/se + safe_inference on the downstream triple) precisely. **Tests (+4 regression):** - test_d_lower_nan_raises: __init__ rejects NaN. - test_d_lower_posinf_raises / test_d_lower_neginf_raises: +/-inf rejected. - test_d_lower_nan_via_set_params_raises: set_params(d_lower=NaN) raises AND leaves the estimator unchanged (atomic rollback). Targeted regression: 166 HAD tests (+1 sklearn-gated skip) + 545 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good. Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Round 14 is ✅ Looks good; this closes the final residual P3 for wording accuracy. **P3 (Docs): NaN-contract wording overstated** The prior wording said users can expect all five result fields to be finite or all NaN together. That is true on most degenerate paths (constant-y, no-dose-variation) but NOT on the single-cluster CR1 configuration of the mass-point path: `_fit_mass_point_2sls` returns `(att=beta_hat, se=nan)` there - `att` is finite because the Wald-IV ratio is well defined, while CR1 requires G >= 2 clusters so `se` is NaN. `safe_inference` then gates the downstream triple correctly, but the doc claim that `att` and `se` are coupled with the triple was inaccurate. Fix: rewrote the result-class docstring preamble and the REGISTRY Phase 2a NaN-propagation line to describe the invariant precisely: - GUARANTEED NaN coupling is on the downstream triple (`t_stat`, `p_value`, `conf_int`) via `safe_inference`. - `att` and `se` are RAW estimator outputs from the chosen fit path and are NOT gated by safe_inference. - On most degenerate paths, fit paths return `(nan, nan)` so all five move together, but on the single-cluster CR1 edge the fit path returns `(att=beta_hat, se=nan)` and only the downstream triple becomes NaN via the gate. No code behavior changes; the `assert_nan_inference` fixture already only checks the downstream triple, matching the restated contract. Targeted regression: 166 HAD tests (+1 sklearn-gated skip) + 545 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
HeterogeneousAdoptionDiD+HeterogeneousAdoptionDiDResultsas the user-facing estimator for de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026). Three design-dispatch paths: Design 1' (continuous_at_zero, boundary=0), Design 1 continuous-near-d_lower (regressor shiftD - d_lower, boundary=d.min()), and Design 1 mass-point (Wald-IV / 2SLS with instrument1{D > d_lower}).bias_corrected_local_linearand apply Equation 8's β-scale rescalingtau_bc / D_barwhereD_bar = (1/G) * sum(D_{g,2}). Mass-point path uses the structural-residual 2SLS sandwich[Z'X]^-1 Ω [Z'X]^-Tfor classical/hc1 and cluster-robust CR1. HC2 and HC2-BM raiseNotImplementedErrorpending a 2SLS-specific leverage derivation.design="auto"resolves via the REGISTRY line-2320 rule:d.min() == 0ord.min()/median(|d|) < 0.01→ continuous_at_zero; modal fraction atd.min()> 2% → mass_point; else continuous_near_d_lower. Mass-point path enforcesd_lower == float(d.min())within float tolerance (paper Section 3.2.4 contract); mismatched overrides raise with a clear pointer to the unsupported estimand.D_{g,1} = 0for all units, and first_treat_col value-domain validation.>2periods raises with a Phase 2b pointer.aggregate="event_study",survey=, andweights=all raiseNotImplementedErrorpointing at follow-up PRs.att,se,t_stat,p_value,conf_int) routed throughsafe_inferencefor NaN-safe CI gating. Rawdesignpreserved onselfsoget_params()/sklearn.clone()round-trip reproduces"auto"even after fit.Methodology references (required if estimator / math changes)
u = ΔY - α̂ - β̂·D, not the reduced-form Wald-IV-scaled OLS shortcut) to match canonical 2SLS (e.g.,AER::ivreg,ivreg2). Documented indocs/methodology/REGISTRY.mdunder the HAD Phase 2a Note block. HC2 and HC2-BM require a 2SLS-specific leverage derivation and are queued for follow-up (tracked inTODO.md).Validation
tests/test_had.py(NEW, 110 tests across 16 test classes covering the 12 plan commit criteria: smoke on all 3 designs, auto-detect correctness + 2 edge cases, β-scale rescaling at atol=1e-14, mass-point Wald-IV at atol=1e-14, mass-point sandwich SE parity at atol=1e-12 (classical/hc1/CR1), NotImplementedError contracts for hc2/hc2_bm/event_study/survey/weights, panel-contract violations, NaN propagation, sklearn clone round-trip, get_params signature enumeration, mass-point d_lower equality contract).bias_corrected_local_linearoutput.Security / privacy
Generated with Claude Code