From f875f558905c14554a816f6d61b1fe7598681d73 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 17:14:36 -0400 Subject: [PATCH 01/17] Phase 2a: HeterogeneousAdoptionDiD class (single-period) Ships the user-facing HAD estimator with three design-dispatch paths: - continuous_at_zero (Design 1'): bias-corrected local-linear at boundary=0, rescaled to beta-scale via Equation 8 divisor D_bar = (1/G) * sum(D_{g,2}). - continuous_near_d_lower (Design 1): same path with regressor shift D' = D - d_lower and boundary = float(d.min()). - mass_point (Design 1 Section 3.2.4): sample-average 2SLS with instrument 1{D_{g,2} > d_lower}. Point estimate is the Wald-IV ratio; SE is the structural-residual 2SLS sandwich [Z'X]^-1 Omega [Z'X]^-T for classical, hc1, and CR1 (cluster-robust). hc2/hc2_bm raise NotImplementedError pending a 2SLS-specific leverage derivation. design="auto" resolves via the REGISTRY auto-detect rule (d.min() < 0.01 * median(|d|) -> continuous_at_zero; modal fraction > 2% -> mass_point; else continuous_near_d_lower), with d.min() == 0 as an unconditional tie-break. Panel validator enforces balanced two-period panel, D_{g,1} = 0 for all units, no NaN in key columns. >2 periods raises pointing to Phase 2b. aggregate="event_study" (Appendix B.2 multi-period) raises NotImplementedError; survey= and weights= also raise (queued for follow-up survey-integration PR). All inference fields (att, se, t_stat, p_value, conf_int) routed through safe_inference() for NaN-safe CI gating. Raw design kept on self so get_params()/clone() round-trip preserves "auto" even after fit. 105 new tests cover the 12 plan commit criteria: three design paths finite, design="auto" correctness + edge cases, beta-scale rescaling at atol=1e-14, mass-point Wald-IV at atol=1e-14, mass-point sandwich SE parity at atol=1e-12 (classical/hc1/CR1), NotImplementedError contracts, panel contract violations, NaN propagation, sklearn clone round-trip, get_params signature enumeration. REGISTRY.md ticks four Phase 2a checkboxes and adds a Note block documenting the structural-residual 2SLS sandwich choice. TODO.md queues Phase 2b (multi-period), survey integration, weights, hc2/hc2_bm leverage derivation, pre-test diagnostics (Phase 3), Pierce-Schott replication (Phase 4), and tutorial integration (Phase 5). Co-Authored-By: Claude Opus 4.7 (1M context) --- TODO.md | 9 + diff_diff/__init__.py | 9 + diff_diff/had.py | 1161 ++++++++++++++++++++++++++++++++ docs/methodology/REGISTRY.md | 12 +- tests/test_had.py | 1216 ++++++++++++++++++++++++++++++++++ 5 files changed, 2402 insertions(+), 5 deletions(-) create mode 100644 diff_diff/had.py create mode 100644 tests/test_had.py diff --git a/TODO.md b/TODO.md index 244451e5..bd8be7a5 100644 --- a/TODO.md +++ b/TODO.md @@ -91,6 +91,15 @@ Deferred items from PR reviews that were not addressed before merge. | `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium | | `bias_corrected_local_linear`: support multi-eval grid (`neval > 1`) with cross-covariance (`covgrid=TRUE` branch of `lprobust.R:253-378`). Not needed for HAD but useful for multi-dose diagnostics. | `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Low | | Clustered-DGP parity: Phase 1c's DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster bug in `lpbwselect.mse.dpi`'s pilot fits. Once nprobust ships a fix (or we derive one independently), add a clustered-auto-bandwidth parity test. | `benchmarks/R/generate_nprobust_lprobust_golden.R` | Phase 1c | Low | +| `HeterogeneousAdoptionDiD` Phase 2b: multi-period event-study extension (Appendix B.2). `aggregate="event_study"` raises `NotImplementedError` in Phase 2a; Phase 2b will aggregate per-cohort first-differences into an event-study result with `.event_study_effects` / `.event_study_se` result fields. | `diff_diff/had.py`, `tests/test_had.py` | Phase 2a | Medium | +| `HeterogeneousAdoptionDiD`: survey-design integration (`survey=SurveyDesign(...)`). Currently raises `NotImplementedError`. Requires Taylor-linearization of the β-scale rescaling and replicate-weight-compatible 2SLS variance on the mass-point path. | `diff_diff/had.py` | Phase 2a | Medium | +| `HeterogeneousAdoptionDiD`: `weights=` support. Deferred jointly with survey integration. nprobust's `lprobust` has no weight argument so the nonparametric continuous path needs a derivation; the 2SLS mass-point path needs weighted-sandwich parity. | `diff_diff/had.py` | Phase 2a | Medium | +| `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium | +| `HeterogeneousAdoptionDiD` continuous paths: thread `cluster=` through `bias_corrected_local_linear` (Phase 1c's wrapper already supports cluster; Phase 2a ignores it with a `UserWarning` on the continuous path to keep scope tight). | `diff_diff/had.py`, `diff_diff/local_linear.py` | Phase 2a | Low | +| `HeterogeneousAdoptionDiD` Phase 3: `qug_test()`, `stute_test()`, `yatchew_hr_test()` pre-test diagnostics (paper Section 3.3). Composite helper `did_had_pretest_workflow()`. Not part of Phase 2a scope. | `diff_diff/had.py`, new module | Phase 2a | Medium | +| `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low | +| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low | +| `HeterogeneousAdoptionDiD` staggered-timing reduction: Phase 2a requires exactly 2 time periods and raises on `>2` periods with or without `first_treat_col`. A "last-cohort subgroup" reduction scheme (slice to max-cohort's 2-period window) could lift this in a targeted follow-up PR before full Phase 2b multi-period aggregation. | `diff_diff/had.py::_validate_had_panel` | Phase 2a | Low | #### Performance diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index be3a1472..5be7e97c 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -55,6 +55,10 @@ triangular_kernel, uniform_kernel, ) +from diff_diff.had import ( + HeterogeneousAdoptionDiD, + HeterogeneousAdoptionDiDResults, +) from diff_diff.estimators import ( DifferenceInDifferences, MultiPeriodDiD, @@ -252,6 +256,7 @@ EDiD = EfficientDiD ETWFE = WooldridgeDiD DCDH = ChaisemartinDHaultfoeuille +HAD = HeterogeneousAdoptionDiD __version__ = "3.2.0" __all__ = [ @@ -431,6 +436,10 @@ # Bias-corrected local-linear (Phase 1c for HeterogeneousAdoptionDiD) "BiasCorrectedFit", "bias_corrected_local_linear", + # HeterogeneousAdoptionDiD (Phase 2a) + "HeterogeneousAdoptionDiD", + "HeterogeneousAdoptionDiDResults", + "HAD", # Datasets "load_card_krueger", "load_castle_doctrine", diff --git a/diff_diff/had.py b/diff_diff/had.py new file mode 100644 index 00000000..2398ad29 --- /dev/null +++ b/diff_diff/had.py @@ -0,0 +1,1161 @@ +""" +Heterogeneous Adoption Difference-in-Differences (HAD) estimator (Phase 2a). + +Implements the de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026) +Weighted-Average-Slope (WAS) estimator with three design-dispatch paths: + +1. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density + continuous at zero. Evaluates the bias-corrected local-linear fit at + ``boundary = 0`` and rescales to the beta-scale via Equation 8's divisor + ``D_bar = (1/G) * sum(D_{g,2})``. Invokes Assumption 3. + +2. Design 1 continuous-near-d_lower (``continuous_near_d_lower``): + ``d_lower > 0``, continuous boundary density. Evaluates the bias-corrected + local-linear fit at ``boundary = float(d.min())`` after the regressor + shift ``D' = D - d_lower``. Same divisor, same CI path. + Invokes Assumption 5 or 6. + +3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction at + ``d.min()`` exceeds 2%. Uses a sample-average 2SLS estimator with + instrument ``Z_g = 1{D_{g,2} > d_lower}``. Point estimate reduces to the + Wald-IV ratio + ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard error + via the structural-residual 2SLS sandwich (see ``_fit_mass_point_2sls``). + +Phase 2a ships the single-period path only; the multi-period event-study +extension (paper Appendix B.2) is queued for Phase 2b. + +Infrastructure reused from Phase 1: +- ``diff_diff.local_linear.bias_corrected_local_linear`` (Phase 1c) +- ``diff_diff.local_linear.BiasCorrectedFit``, ``BandwidthResult`` +- ``diff_diff.utils.safe_inference`` (NaN-safe CI gating) +- ``diff_diff.prep.balance_panel`` (panel validation) + +References +---------- +- de Chaisemartin, C., Ciccia, D., D'Haultfoeuille, X., & Knau, F. (2026). + Difference-in-Differences Estimators When No Unit Remains Untreated. + arXiv:2405.04465v6. +- Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust + nonparametric confidence intervals for regression-discontinuity designs. + Econometrica, 82(6), 2295-2326. +""" + +from __future__ import annotations + +import warnings +from dataclasses import dataclass +from typing import Any, Dict, Optional, Tuple + +import numpy as np +import pandas as pd + +from diff_diff.local_linear import ( + BandwidthResult, + BiasCorrectedFit, + bias_corrected_local_linear, +) +from diff_diff.utils import safe_inference + +__all__ = [ + "HeterogeneousAdoptionDiD", + "HeterogeneousAdoptionDiDResults", +] + + +# ============================================================================= +# Module-level constants +# ============================================================================= + +# REGISTRY line 2320 auto-detect boundary: d.min() < 0.01 * median(|d|) resolves +# to Design 1' (continuous_at_zero). Distinct from Phase 1c's stricter +# _DESIGN_1_PRIME_RATIO = 0.05 in ``local_linear.py`` which is a plausibility +# guard on the nonparametric fit, not the auto-detect rule. +_CONTINUOUS_AT_ZERO_THRESHOLD = 0.01 + +# REGISTRY line 2320 mass-point rule: modal fraction at d.min() > 0.02 resolves +# to Design 1 mass-point. Mirrored (not imported) from +# ``diff_diff.local_linear._validate_had_inputs`` where the same constant +# enforces the upstream mass-point rejection on the nonparametric path. +_MASS_POINT_THRESHOLD = 0.02 + +_VALID_DESIGNS = ( + "auto", + "continuous_at_zero", + "continuous_near_d_lower", + "mass_point", +) +_VALID_AGGREGATES = ("overall", "event_study") +# Mass-point 2SLS supports: classical, hc1 (HC0 is an unscaled variant not +# publicly exposed; hc2 and hc2_bm raise NotImplementedError pending a +# 2SLS-specific leverage derivation and R parity anchor). +_MASS_POINT_VCOV_SUPPORTED = ("classical", "hc1") +_MASS_POINT_VCOV_UNSUPPORTED = ("hc2", "hc2_bm") + +# Target-parameter label per design. Design 1' targets the WAS (Assumption 3); +# Design 1 targets WAS_{d_lower} (Assumption 5 or 6), which also applies to +# the mass-point path (paper Section 3.2.4). +_TARGET_PARAMETER = { + "continuous_at_zero": "WAS", + "continuous_near_d_lower": "WAS_d_lower", + "mass_point": "WAS_d_lower", +} + + +# ============================================================================= +# Results dataclass +# ============================================================================= + + +@dataclass +class HeterogeneousAdoptionDiDResults: + """Estimator output for :class:`HeterogeneousAdoptionDiD`. + + All inference fields (``att``, ``se``, ``t_stat``, ``p_value``, + ``conf_int``) are routed through :func:`diff_diff.utils.safe_inference` + at fit time; when SE is non-finite, zero, or negative, every inference + field is NaN. + + Attributes + ---------- + att : float + Point estimate of the WAS parameter on the beta-scale. For the + continuous designs: ``tau_bc / D_bar`` where ``tau_bc`` is the + bias-corrected local-linear statistic on the mu-scale and + ``D_bar = (1/G) * sum(D_{g,2})``. For the mass-point design: + the 2SLS coefficient directly. + se : float + Standard error on the beta-scale. For continuous designs, the + CCT-2014 robust SE divided by ``D_bar``. For mass-point, the 2SLS + structural-residual sandwich SE. + t_stat, p_value, conf_int : inference fields + Routed through ``safe_inference``; NaN when SE is non-finite. + alpha : float + CI level used at fit time (0.05 for a 95% CI). + design : str + Resolved design mode: ``"continuous_at_zero"``, + ``"continuous_near_d_lower"``, or ``"mass_point"``. ``"auto"`` is + resolved to one of the three concrete modes before storing. + target_parameter : str + Estimand label: ``"WAS"`` for Design 1', ``"WAS_d_lower"`` for the + other two. Pins the estimand semantically even when two designs + share the same divisor. + d_lower : float + Support infimum ``d_lower``. ``0.0`` for Design 1'; + ``float(d.min())`` for the other two. + dose_mean : float + ``D_bar = (1/G) * sum(D_{g,2})``. + n_obs : int + Number of units contributing to the estimator (post panel + aggregation to unit-level first differences). + n_treated : int + Number of units with ``D_{g,2} > d_lower``. + n_control : int + Number of units at or below ``d_lower`` (the "not-treated" subset). + n_mass_point : int or None + Mass-point path only: number of units with ``D_{g,2} == d_lower``. + ``None`` on continuous paths. + n_above_d_lower : int or None + Mass-point path only: number of units with ``D_{g,2} > d_lower``. + ``None`` on continuous paths. + inference_method : str + ``"analytical_nonparametric"`` (continuous designs) or + ``"analytical_2sls"`` (mass-point). + vcov_type : str or None + Variance-covariance family used. ``None`` on continuous paths + (they use the CCT-2014 robust SE from Phase 1c, not the library's + ``vcov_type`` enum). Mass-point: one of ``"classical"`` or + ``"hc1"`` (CR1 when ``cluster`` is supplied). + cluster_name : str or None + Column name of the cluster variable on the mass-point path when + cluster-robust SE is requested. ``None`` otherwise. + survey_metadata : object or None + Always ``None`` in Phase 2a. Field shape kept for future-compat + with a planned survey integration PR. + bandwidth_diagnostics : BandwidthResult or None + Full Phase 1b MSE-DPI selector output on the continuous paths + (when bandwidths were auto-selected). ``None`` on the mass-point + path (parametric, no bandwidth). + bias_corrected_fit : BiasCorrectedFit or None + Full Phase 1c bias-corrected local-linear fit on the continuous + paths. ``None`` on the mass-point path. + """ + + # Point estimate + inference (safe_inference-gated) + att: float + se: float + t_stat: float + p_value: float + conf_int: Tuple[float, float] + alpha: float + + # Design metadata + design: str + target_parameter: str + d_lower: float + dose_mean: float + + # Sample counts + n_obs: int + n_treated: int + n_control: int + n_mass_point: Optional[int] + n_above_d_lower: Optional[int] + + # Inference metadata + inference_method: str + vcov_type: Optional[str] + cluster_name: Optional[str] + survey_metadata: Optional[Any] + + # Nonparametric-only diagnostics + bandwidth_diagnostics: Optional[BandwidthResult] + bias_corrected_fit: Optional[BiasCorrectedFit] + + def __repr__(self) -> str: + return ( + f"HeterogeneousAdoptionDiDResults(" + f"att={self.att:.4f}, se={self.se:.4f}, " + f"design={self.design!r}, n_obs={self.n_obs})" + ) + + def summary(self) -> str: + """Formatted summary table.""" + width = 72 + conf_level = int((1 - self.alpha) * 100) + lines = [ + "=" * width, + "HeterogeneousAdoptionDiD Estimation Results".center(width), + "=" * width, + "", + f"{'Design:':<30} {self.design:>20}", + f"{'Target parameter:':<30} {self.target_parameter:>20}", + f"{'d_lower:':<30} {self.d_lower:>20.6g}", + f"{'D_bar (dose mean):':<30} {self.dose_mean:>20.6g}", + f"{'Observations (units):':<30} {self.n_obs:>20}", + f"{'Above d_lower:':<30} {self.n_treated:>20}", + f"{'At/below d_lower:':<30} {self.n_control:>20}", + ] + if self.n_mass_point is not None: + lines.append(f"{'At d_lower (mass point):':<30} {self.n_mass_point:>20}") + if self.n_above_d_lower is not None: + lines.append(f"{'Strictly above d_lower:':<30} {self.n_above_d_lower:>20}") + lines.append(f"{'Inference method:':<30} {self.inference_method:>20}") + if self.vcov_type is not None: + label = self.vcov_type + if self.cluster_name is not None: + label = f"CR1 at {self.cluster_name}" + lines.append(f"{'Variance:':<30} {label:>20}") + if self.bandwidth_diagnostics is not None: + bw = self.bandwidth_diagnostics + lines.append(f"{'Bandwidth h (MSE-DPI):':<30} {bw.h_mse:>20.6g}") + lines.append(f"{'Bandwidth b (bias):':<30} {bw.b_mse:>20.6g}") + if self.bias_corrected_fit is not None: + bc = self.bias_corrected_fit + lines.append(f"{'Bandwidth h used:':<30} {bc.h:>20.6g}") + lines.append(f"{'Obs in window (n_used):':<30} {bc.n_used:>20}") + lines.extend( + [ + "", + "-" * width, + ( + f"{'Parameter':<15} {'Estimate':>12} {'Std. Err.':>12} " + f"{'t-stat':>10} {'P>|t|':>10}" + ), + "-" * width, + ( + f"{'WAS':<15} {self.att:>12.4f} {self.se:>12.4f} " + f"{self.t_stat:>10.3f} {self.p_value:>10.4f}" + ), + "-" * width, + "", + ( + f"{conf_level}% Confidence Interval: " + f"[{self.conf_int[0]:.4f}, {self.conf_int[1]:.4f}]" + ), + "=" * width, + ] + ) + return "\n".join(lines) + + def print_summary(self) -> None: + """Print the summary to stdout.""" + print(self.summary()) + + def to_dict(self) -> Dict[str, Any]: + """Return results as a dict of scalars.""" + return { + "att": self.att, + "se": self.se, + "t_stat": self.t_stat, + "p_value": self.p_value, + "conf_int_lower": self.conf_int[0], + "conf_int_upper": self.conf_int[1], + "alpha": self.alpha, + "design": self.design, + "target_parameter": self.target_parameter, + "d_lower": self.d_lower, + "dose_mean": self.dose_mean, + "n_obs": self.n_obs, + "n_treated": self.n_treated, + "n_control": self.n_control, + "n_mass_point": self.n_mass_point, + "n_above_d_lower": self.n_above_d_lower, + "inference_method": self.inference_method, + "vcov_type": self.vcov_type, + "cluster_name": self.cluster_name, + } + + def to_dataframe(self) -> pd.DataFrame: + """Return a one-row DataFrame of the result dict.""" + return pd.DataFrame([self.to_dict()]) + + +# ============================================================================= +# Panel validation and aggregation +# ============================================================================= + + +def _validate_had_panel( + data: pd.DataFrame, + outcome_col: str, + dose_col: str, + time_col: str, + unit_col: str, + first_treat_col: Optional[str], +) -> Tuple[int, int]: + """Validate a HAD panel and return ``(t_pre, t_post)``. + + Enforces the Phase 2a panel contract: + + - All required columns present. + - Exactly two distinct time periods. Staggered timing (``>2`` periods) + with ``first_treat_col=None`` raises; with ``first_treat_col`` it + also raises (multi-period reduction is Phase 2b). + - Balanced panel (all units observed at both periods). + - ``D_{g, t_pre} == 0`` for all units (HAD no-unit-untreated pre-period). + - No NaN in outcome, dose, or unit columns. + + Parameters + ---------- + data : pd.DataFrame + outcome_col, dose_col, time_col, unit_col : str + first_treat_col : str or None + Optional column for cross-validation. Supplied column must contain + ``0`` for never-treated and the post-period value for treated units. + + Returns + ------- + tuple[int, int] + ``(t_pre, t_post)`` - the two period identifiers, min then max. + + Raises + ------ + ValueError + """ + required = [outcome_col, dose_col, time_col, unit_col] + if first_treat_col is not None: + required.append(first_treat_col) + missing = [c for c in required if c not in data.columns] + if missing: + raise ValueError(f"Missing column(s) in data: {missing}. Required: {required}.") + + periods = np.sort(np.asarray(data[time_col].unique())) + if len(periods) < 2: + raise ValueError( + f"HAD requires a two-period panel; got {len(periods)} distinct " + f"period(s) in column {time_col!r}." + ) + if len(periods) > 2: + if first_treat_col is None: + raise ValueError( + f"HAD Phase 2a requires exactly two time periods " + f"(got {len(periods)} in {time_col!r}) when " + f"first_treat_col=None. Multi-period / staggered adoption " + f"support is queued for Phase 2b (Appendix B.2 event-study)." + ) + raise ValueError( + f"HAD Phase 2a requires exactly two time periods " + f"(got {len(periods)} in {time_col!r}). Staggered adoption " + f"reduction (first_treat_col supplied with >2 periods) is " + f"queued for Phase 2b (Appendix B.2 event-study)." + ) + + t_pre = int(periods[0]) if np.issubdtype(periods.dtype, np.integer) else periods[0] + t_post = int(periods[1]) if np.issubdtype(periods.dtype, np.integer) else periods[1] + + # Balanced-panel check: every unit appears exactly once per period. + counts = data.groupby([unit_col, time_col]).size() + if (counts != 1).any(): + n_bad = int((counts != 1).sum()) + raise ValueError( + f"Unbalanced panel: {n_bad} unit-period cells have != 1 " + f"observation. HAD requires a balanced two-period panel " + f"(each unit observed exactly once at each period)." + ) + unit_counts = data.groupby(unit_col)[time_col].nunique() + incomplete = unit_counts[unit_counts != 2] + if len(incomplete) > 0: + raise ValueError( + f"Unbalanced panel: {len(incomplete)} unit(s) do not appear " + f"in both periods. HAD requires a balanced two-period panel." + ) + + # NaN checks on key columns. + for col in [outcome_col, dose_col, unit_col]: + if bool(data[col].isna().any()): + n_nan = int(data[col].isna().sum()) + raise ValueError( + f"{n_nan} NaN value(s) found in column {col!r}. HAD " + f"does not silently drop NaN rows; drop or impute before " + f"calling fit()." + ) + + # Pre-period no-unit-untreated check. + pre_mask = data[time_col] == t_pre + pre_doses = np.asarray(data.loc[pre_mask, dose_col], dtype=np.float64) + nonzero_pre = pre_doses != 0 + if nonzero_pre.any(): + n_bad = int(nonzero_pre.sum()) + raise ValueError( + f"HAD requires D_{{g,1}} = 0 for all units (pre-period " + f"untreated). {n_bad} unit(s) have nonzero dose at " + f"t_pre={t_pre}. Drop these units or verify the dose column." + ) + + # Optional cross-validation via first_treat_col: if supplied, the + # first_treat == 0 units must have D_{g, t_post} == 0, and first_treat + # == t_post units may have any D_{g, t_post} >= 0. We do NOT require + # strict consistency (the paper treats D_{g,2} as the primary signal), + # but a quick sanity check catches obvious column mix-ups. + if first_treat_col is not None: + ft = data.groupby(unit_col)[first_treat_col].first().to_numpy() + if np.any(np.isnan(ft.astype(np.float64, copy=False))): + raise ValueError( + f"first_treat_col={first_treat_col!r} contains NaN. " + f"Use 0 for never-treated units and t_post for treated." + ) + valid_values = {0, t_post} + bad = [v for v in np.unique(ft) if int(v) not in valid_values] + if bad: + raise ValueError( + f"first_treat_col={first_treat_col!r} contains value(s) " + f"{bad} outside the allowed set {{0, {t_post}}} for a " + f"two-period HAD panel. Staggered timing with multiple " + f"cohorts is Phase 2b." + ) + + return t_pre, t_post + + +def _aggregate_first_difference( + data: pd.DataFrame, + outcome_col: str, + dose_col: str, + time_col: str, + unit_col: str, + t_pre: Any, + t_post: Any, + cluster_col: Optional[str], +) -> Tuple[np.ndarray, np.ndarray, Optional[np.ndarray], np.ndarray]: + """Reduce a balanced two-period panel to unit-level first differences. + + Returns + ------- + d_arr : np.ndarray, shape (G,) + Post-period dose ``D_{g,2}`` per unit, ordered by sorted unit id. + dy_arr : np.ndarray, shape (G,) + First-difference outcome ``Y_{g,2} - Y_{g,1}`` per unit. + cluster_arr : np.ndarray or None, shape (G,) + Cluster IDs per unit (must be unit-constant); ``None`` when + ``cluster_col`` is ``None``. + unit_ids : np.ndarray, shape (G,) + Sorted unit identifiers (useful for debugging / test inspection). + """ + df = data.sort_values([unit_col, time_col]).reset_index(drop=True) + pre = df[df[time_col] == t_pre].set_index(unit_col).sort_index() + post = df[df[time_col] == t_post].set_index(unit_col).sort_index() + + if bool(pre.index.equals(post.index)) is False: + # Should not reach here after _validate_had_panel balanced-panel + # check, but belt-and-suspenders: + raise ValueError( + "Internal error: pre and post period unit indices do not " + "match after balanced-panel validation. Please report this." + ) + + unit_ids = np.asarray(pre.index) + d_arr = post[dose_col].to_numpy(dtype=np.float64) + dy_arr = post[outcome_col].to_numpy(dtype=np.float64) - pre[outcome_col].to_numpy( + dtype=np.float64 + ) + + cluster_arr: Optional[np.ndarray] = None + if cluster_col is not None: + if cluster_col not in data.columns: + raise ValueError(f"cluster column {cluster_col!r} not found in data.") + # Cluster must be unit-constant; take the first cluster id per unit. + cluster_per_unit = df.groupby(unit_col)[cluster_col].nunique() + if (cluster_per_unit > 1).any(): + n_bad = int((cluster_per_unit > 1).sum()) + raise ValueError( + f"cluster column {cluster_col!r} is not constant within " + f"unit for {n_bad} unit(s). Cluster must be unit-level " + f"(e.g., cluster_col=unit_col, or a coarser grouping " + f"where each unit belongs to a single cluster)." + ) + cluster_arr = df.groupby(unit_col)[cluster_col].first().sort_index().to_numpy() + # NaN cluster ids are a silent-failure vector; reject front-door. + if pd.isna(cluster_arr).any(): + n_nan = int(pd.isna(cluster_arr).sum()) + raise ValueError( + f"{n_nan} unit(s) have NaN cluster id in " + f"column {cluster_col!r}. Silent row dropping is " + f"disabled; drop or impute cluster ids before calling fit()." + ) + + return d_arr, dy_arr, cluster_arr, unit_ids + + +# ============================================================================= +# Design auto-detection +# ============================================================================= + + +def _detect_design(d_arr: np.ndarray) -> str: + """Resolve ``design="auto"`` to a concrete mode string. + + Implements the REGISTRY line-2320 rule: + + 1. If ``d.min() == 0`` exactly, resolve unconditionally to + ``continuous_at_zero``. (The modal-min check only runs when + ``d.min() > 0`` since otherwise the "at d_lower" subset is always + the "zero-dose" subset and the continuous-at-zero path is the + paper-endorsed handling.) + 2. Otherwise, if ``d.min() < 0.01 * median(|d|)``, resolve to + ``continuous_at_zero`` (small-share-of-treated samples that + effectively satisfy Design 1'). + 3. Else, if the modal fraction at ``d.min()`` exceeds 2%, resolve to + ``mass_point``. + 4. Else, resolve to ``continuous_near_d_lower``. + + The rule is strict first-match: when both the ``<0.01 * median`` + threshold and the modal-fraction rule fire simultaneously (e.g., 3% + of units at ``D = 0`` + 97% ``Uniform(0.5, 1)``), the resolution is + ``continuous_at_zero``. This matches the paper-endorsed Design 1' + handling of small-share-of-treated samples. + + Parameters + ---------- + d_arr : np.ndarray, shape (G,) + Unit-level post-period doses. + + Returns + ------- + str + One of ``"continuous_at_zero"``, ``"continuous_near_d_lower"``, + ``"mass_point"``. + """ + d_min = float(d_arr.min()) + + # Tie-break: d.min() == 0 always resolves to Design 1'. + # Use an absolute tolerance that scales with the range of the data. + scale = max(1.0, float(np.max(np.abs(d_arr)))) + if abs(d_min) <= 1e-12 * scale: + return "continuous_at_zero" + + median_abs = float(np.median(np.abs(d_arr))) + # REGISTRY rule: d.min() < threshold * median(|d|) -> continuous_at_zero. + # Guard against median_abs == 0 (all doses at zero; handled above). + if median_abs > 0 and d_min < _CONTINUOUS_AT_ZERO_THRESHOLD * median_abs: + return "continuous_at_zero" + + # Modal-fraction rule for mass-point detection. + eps = 1e-12 * max(1.0, abs(d_min)) + at_d_min = np.abs(d_arr - d_min) <= eps + modal_fraction = float(np.mean(at_d_min)) + if modal_fraction > _MASS_POINT_THRESHOLD: + return "mass_point" + + return "continuous_near_d_lower" + + +# ============================================================================= +# Mass-point 2SLS +# ============================================================================= + + +def _fit_mass_point_2sls( + d: np.ndarray, + dy: np.ndarray, + d_lower: float, + cluster: Optional[np.ndarray], + vcov_type: str, +) -> Tuple[float, float]: + """Wald-IV point estimate and structural-residual 2SLS sandwich SE. + + The just-identified binary instrument ``Z_g = 1{D_{g,2} > d_lower}`` + gives a 2SLS estimator that collapses to the Wald-IV sample-average + ratio + ``beta_hat = (Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. + + The STANDARD ERROR is computed via the 2SLS sandwich + ``V = [Z'X]^{-1} * Omega * [Z'X]^{-T}`` where ``Omega`` is built from + the STRUCTURAL residuals + ``u = dy - alpha_hat - beta_hat * d`` + (NOT the reduced-form residuals). This is the canonical 2SLS + inference path and matches what ``AER::ivreg`` / ``ivreg2`` would + produce. The "OLS-on-indicator + scale by dose-gap" shortcut gives + reduced-form residuals, which diverge from structural residuals in + finite samples and substantively under clustering. + + Supported ``vcov_type``: + + - ``"classical"``: constant variance ``sigma_hat^2 = sum(u^2) / (n-2)``. + - ``"hc1"``: heteroskedasticity-robust with small-sample DOF scaling + ``n / (n - 2)``. With ``cluster`` supplied, switches to CR1 + (Liang-Zeger) cluster-robust. + + ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError`` (2SLS-specific + leverage derivation pending; queued for follow-up). + + Parameters + ---------- + d : np.ndarray, shape (n,) + Post-period doses ``D_{g,2}``. + dy : np.ndarray, shape (n,) + First-difference outcome ``Delta Y_g``. + d_lower : float + Support infimum / evaluation point. + cluster : np.ndarray or None, shape (n,) + Cluster ids per unit (``None`` for no clustering). + vcov_type : str + One of ``"classical"``, ``"hc1"``. + + Returns + ------- + tuple[float, float] + ``(beta_hat, se_beta)``. NaN for SE when the dose-gap vanishes + (``Dbar_{Z=1} == Dbar_{Z=0}``) or the sandwich is singular. + """ + d = np.asarray(d, dtype=np.float64) + dy = np.asarray(dy, dtype=np.float64) + n = d.shape[0] + + Z = (d > d_lower).astype(np.float64) + n_above = int(Z.sum()) + n_at_or_below = n - n_above + + # Degeneracy checks: if either Z-subset is empty, Wald-IV is undefined. + if n_above == 0 or n_at_or_below == 0: + return float("nan"), float("nan") + + dose_gap = d[Z == 1].mean() - d[Z == 0].mean() + if abs(dose_gap) < 1e-12 * max(1.0, abs(float(d.mean()))): + # No dose variation around d_lower -> beta undefined. + return float("nan"), float("nan") + + dy_gap = dy[Z == 1].mean() - dy[Z == 0].mean() + beta_hat = float(dy_gap / dose_gap) + alpha_hat = float(dy.mean() - beta_hat * d.mean()) + + # STRUCTURAL residuals (plan-review CRITICAL #1): u = y - alpha - beta*x. + # The Wald-IV/OLS-on-indicator shortcut would use reduced-form residuals + # u_rf = dy - (alpha_rf + gamma * Z), which differ in finite samples. + u = dy - alpha_hat - beta_hat * d + + # Design matrices: X = [1, d] (endogenous), Z_d = [1, Z] (instrument). + X = np.column_stack([np.ones(n, dtype=np.float64), d]) + Zd = np.column_stack([np.ones(n, dtype=np.float64), Z]) + + ZtX = Zd.T @ X # (2, 2) + try: + ZtX_inv = np.linalg.inv(ZtX) + except np.linalg.LinAlgError: + # Z'X singular (e.g., no variation in Z, already handled above). + return beta_hat, float("nan") + + vcov_type = vcov_type.lower() + if vcov_type in _MASS_POINT_VCOV_UNSUPPORTED: + raise NotImplementedError( + f"vcov_type={vcov_type!r} is not supported on the " + f"HeterogeneousAdoptionDiD mass-point path in Phase 2a. " + f"HC2 / HC2-BM require a 2SLS-specific leverage derivation " + f"`x_i' (Z'X)^{{-1}}(...)(X'Z)^{{-1}} x_i` that differs from " + f"the OLS leverage `x_i' (X'X)^{{-1}} x_i`. Derivation + R " + f"parity anchor are queued for the follow-up PR. Use " + f"vcov_type='hc1' or 'classical' for now." + ) + + if cluster is not None: + # CR1 (Liang-Zeger) cluster-robust sandwich. + # Use pd.unique to match R's first-appearance order (stable for + # cross-runtime reproducibility). + clusters_unique = pd.unique(cluster) + Omega = np.zeros((2, 2), dtype=np.float64) + for c in clusters_unique: + idx = cluster == c + # score per cluster: s_c = Zd[idx]' @ u[idx] + s = Zd[idx].T @ u[idx] + Omega += np.outer(s, s) + G = len(clusters_unique) + k = 2 + if G < 2: + # Cluster-robust SE undefined with a single cluster. + return beta_hat, float("nan") + Omega *= (G / (G - 1)) * ((n - 1) / (n - k)) + elif vcov_type == "classical": + dof = n - 2 + if dof <= 0: + return beta_hat, float("nan") + sigma2 = float((u * u).sum()) / dof + Omega = sigma2 * (Zd.T @ Zd) + elif vcov_type == "hc1": + dof = n - 2 + if dof <= 0: + return beta_hat, float("nan") + Omega = (n / dof) * (Zd.T @ ((u * u)[:, None] * Zd)) + else: + raise ValueError( + f"Unsupported vcov_type={vcov_type!r} on the HAD mass-point " + f"path. Supported: {_MASS_POINT_VCOV_SUPPORTED} (plus " + f"cluster-robust CR1 via cluster=)." + ) + + V = ZtX_inv @ Omega @ ZtX_inv.T + var_beta = float(V[1, 1]) + if not np.isfinite(var_beta) or var_beta < 0: + return beta_hat, float("nan") + se_beta = float(np.sqrt(var_beta)) + return beta_hat, se_beta + + +# ============================================================================= +# Main estimator class +# ============================================================================= + + +class HeterogeneousAdoptionDiD: + """Heterogeneous Adoption Difference-in-Differences estimator. + + Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026) + Weighted-Average-Slope (WAS) estimator with three design-dispatch + paths: Design 1' (continuous-at-zero), Design 1 continuous-near- + d_lower, and Design 1 mass-point (2SLS sample-average per paper + Section 3.2.4). Phase 2a ships the single-period path only; the + multi-period event-study extension (Appendix B.2) is queued for + Phase 2b. + + Parameters + ---------- + design : {"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"} + Design-dispatch strategy. Defaults to ``"auto"`` which resolves + via the REGISTRY auto-detect rule on the fitted dose data + (see :func:`_detect_design`). Explicit overrides run the chosen + path without auto-reject (e.g., forcing Design 1' on mass-point + data runs the nonparametric fit even though the paper would + counsel the 2SLS path). + d_lower : float or None + Support infimum ``d_lower``. ``None`` means use ``0.0`` on the + Design 1' path and ``float(d.min())`` on the other two paths. + kernel : {"epanechnikov", "triangular", "uniform"} + Forwarded to :func:`bias_corrected_local_linear` on the continuous + paths. Ignored on the mass-point path. + alpha : float + CI level (0.05 for 95% CI). + vcov_type : {"classical", "hc1"} or None + Mass-point-path only. ``None`` defaults to ``"hc1"``. + ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError``. Ignored + on the continuous paths (which use the CCT-2014 robust SE from + Phase 1c); passing a non-default ``vcov_type`` on a continuous + path emits a one-time ``UserWarning``. + robust : bool + Backward-compat alias: ``True`` maps to ``"hc1"``, ``False`` maps + to ``"classical"``. Only relevant when ``vcov_type is None`` on + the mass-point path. + cluster : str or None + Column name for cluster-robust SE on the mass-point path (CR1). + Ignored with a ``UserWarning`` on the continuous paths in Phase + 2a (nonparametric cluster support exists on Phase 1c but is + exposed separately via ``bias_corrected_local_linear``; the + estimator-level knob is queued for a follow-up PR). + + Notes + ----- + **Diagnostics coverage.** ``HeterogeneousAdoptionDiDResults.bandwidth_diagnostics`` + and ``.bias_corrected_fit`` are populated only on the continuous + paths; both are ``None`` on the mass-point path (which is parametric + and has no bandwidth). Conversely, ``.n_mass_point`` and + ``.n_above_d_lower`` are populated only on the mass-point path. + + **Clone idempotence.** ``self.design`` stores the RAW user input + (e.g., ``"auto"``); the resolved mode is stored on the result object + at fit time. This mirrors Phase 1a's ``_vcov_type_arg`` pattern and + keeps ``get_params()`` / ``sklearn.clone()`` round-trips exact. + + Examples + -------- + >>> import pandas as pd + >>> from diff_diff import HeterogeneousAdoptionDiD, generate_continuous_did_data + >>> data = generate_continuous_did_data(n_units=500, seed=42) # doctest: +SKIP + >>> est = HeterogeneousAdoptionDiD(design="auto") # doctest: +SKIP + >>> result = est.fit(data, outcome_col="outcome", dose_col="dose", + ... time_col="period", unit_col="unit") # doctest: +SKIP + >>> result.design # doctest: +SKIP + 'continuous_at_zero' + """ + + def __init__( + self, + design: str = "auto", + d_lower: Optional[float] = None, + kernel: str = "epanechnikov", + alpha: float = 0.05, + vcov_type: Optional[str] = None, + robust: bool = False, + cluster: Optional[str] = None, + ) -> None: + self.design = design + self.d_lower = d_lower + self.kernel = kernel + self.alpha = alpha + self.vcov_type = vcov_type + self.robust = robust + self.cluster = cluster + self._validate_constructor_args() + + def _validate_constructor_args(self) -> None: + if self.design not in _VALID_DESIGNS: + raise ValueError( + f"Invalid design={self.design!r}. Must be one of " f"{_VALID_DESIGNS}." + ) + if not (0.0 < float(self.alpha) < 1.0): + raise ValueError(f"alpha must be in (0, 1); got {self.alpha!r}.") + if self.vcov_type is not None: + if self.vcov_type.lower() in _MASS_POINT_VCOV_UNSUPPORTED: + # Don't raise here — the raise happens at fit() time on + # the mass-point path so users who set vcov_type=hc2 for + # a continuous fit (which ignores the knob) don't get + # blocked. But we also don't silently accept it at + # construction if it's outside our documented enum. + pass + elif ( + self.vcov_type.lower() + not in _MASS_POINT_VCOV_SUPPORTED + _MASS_POINT_VCOV_UNSUPPORTED + ): + raise ValueError( + f"Invalid vcov_type={self.vcov_type!r}. Must be one " + f"of {_MASS_POINT_VCOV_SUPPORTED + _MASS_POINT_VCOV_UNSUPPORTED}, " + f"or None." + ) + + def get_params(self) -> Dict[str, Any]: + """Return the raw constructor parameters (sklearn-compatible). + + Preserves the user's original inputs - in particular, ``design`` + returns ``"auto"`` when the user set it to ``"auto"`` (even after + fit), so ``sklearn.base.clone(est)`` round-trips exactly. + """ + return { + "design": self.design, + "d_lower": self.d_lower, + "kernel": self.kernel, + "alpha": self.alpha, + "vcov_type": self.vcov_type, + "robust": self.robust, + "cluster": self.cluster, + } + + def set_params(self, **params: Any) -> "HeterogeneousAdoptionDiD": + """Set estimator parameters and return self (sklearn-compatible).""" + for key, value in params.items(): + if not hasattr(self, key): + raise ValueError( + f"Invalid parameter: {key}. Valid parameters: " + f"{list(self.get_params().keys())}." + ) + setattr(self, key, value) + self._validate_constructor_args() + return self + + # ------------------------------------------------------------------ + # Main fit entry point + # ------------------------------------------------------------------ + + def fit( + self, + data: pd.DataFrame, + outcome_col: str, + dose_col: str, + time_col: str, + unit_col: str, + first_treat_col: Optional[str] = None, + aggregate: str = "overall", + survey: Any = None, + weights: Optional[np.ndarray] = None, + ) -> HeterogeneousAdoptionDiDResults: + """Fit the HAD estimator on a two-period panel. + + Parameters + ---------- + data : pd.DataFrame + outcome_col, dose_col, time_col, unit_col : str + Column names. + first_treat_col : str or None + Optional first-treatment column for cross-validation. + Required when the panel has more than two periods (raises + for >2 periods in Phase 2a; Phase 2b handles staggered). + aggregate : {"overall"} + ``"event_study"`` raises ``NotImplementedError`` (Phase 2b). + survey : SurveyDesign or None + Reserved for a follow-up survey-integration PR. Must be + ``None`` in Phase 2a. + weights : np.ndarray or None + Reserved for a follow-up PR. Must be ``None`` in Phase 2a. + + Returns + ------- + HeterogeneousAdoptionDiDResults + """ + # ---- Phase 2a scaffolding rejections (before any work) ---- + if aggregate not in _VALID_AGGREGATES: + raise ValueError( + f"Invalid aggregate={aggregate!r}. Must be one of " f"{_VALID_AGGREGATES}." + ) + if aggregate == "event_study": + raise NotImplementedError( + "aggregate='event_study' (multi-period event study per " + "paper Appendix B.2) is queued for Phase 2b. Phase 2a " + "supports aggregate='overall' (single-period WAS) only." + ) + if survey is not None: + raise NotImplementedError( + "survey= support on HeterogeneousAdoptionDiD " + "is queued for a follow-up PR after Phase 2a. Pass " + "survey=None for now." + ) + if weights is not None: + raise NotImplementedError( + "weights= support on HeterogeneousAdoptionDiD is " + "queued for a follow-up PR (paired with survey " + "integration). Pass weights=None for now." + ) + + # ---- Resolve effective fit-time state (local vars only, per + # feedback_fit_does_not_mutate_config; do not mutate self.*) ---- + design_arg = self.design + d_lower_arg = self.d_lower + vcov_type_arg = self.vcov_type + robust_arg = self.robust + cluster_arg = self.cluster + + # ---- Validate panel contract ---- + t_pre, t_post = _validate_had_panel( + data, outcome_col, dose_col, time_col, unit_col, first_treat_col + ) + + # ---- Aggregate to unit-level first differences ---- + d_arr, dy_arr, cluster_arr, _ = _aggregate_first_difference( + data, + outcome_col, + dose_col, + time_col, + unit_col, + t_pre, + t_post, + cluster_arg, + ) + + n_obs = int(d_arr.shape[0]) + if n_obs < 3: + raise ValueError( + f"HAD requires at least 3 units for inference; got " + f"n_obs={n_obs} after aggregation." + ) + + # ---- Resolve design ---- + if design_arg == "auto": + resolved_design = _detect_design(d_arr) + else: + resolved_design = design_arg + + # ---- Resolve d_lower ---- + if resolved_design == "continuous_at_zero": + d_lower_val = 0.0 + elif d_lower_arg is None: + d_lower_val = float(d_arr.min()) + else: + d_lower_val = float(d_lower_arg) + + # ---- Compute cohort counts ---- + if resolved_design == "mass_point": + eps = 1e-12 * max(1.0, abs(d_lower_val)) + at_d_min_mask = np.abs(d_arr - d_lower_val) <= eps + above_mask = d_arr > d_lower_val + n_mass_point: Optional[int] = int(at_d_min_mask.sum()) + n_above_d_lower: Optional[int] = int(above_mask.sum()) + n_treated = n_above_d_lower + n_control = int(at_d_min_mask.sum()) + else: + n_mass_point = None + n_above_d_lower = None + above_mask = d_arr > d_lower_val + n_treated = int(above_mask.sum()) + n_control = n_obs - n_treated + + dose_mean = float(d_arr.mean()) + + # ---- Dispatch ---- + if resolved_design in ("continuous_at_zero", "continuous_near_d_lower"): + # Warn once if the user set a mass-point-only knob that's + # ignored on the continuous path. + if vcov_type_arg is not None: + warnings.warn( + f"vcov_type={vcov_type_arg!r} is ignored on the " + f"'{resolved_design}' path (the continuous designs " + f"use the CCT-2014 robust SE from Phase 1c). " + f"vcov_type applies only to design='mass_point'.", + UserWarning, + stacklevel=2, + ) + if cluster_arg is not None: + warnings.warn( + f"cluster={cluster_arg!r} is ignored on the " + f"'{resolved_design}' path in Phase 2a. Cluster-" + f"robust SE on the nonparametric path is exposed " + f"via diff_diff.bias_corrected_local_linear directly " + f"but not yet threaded through the estimator-level " + f"knob.", + UserWarning, + stacklevel=2, + ) + att, se, bc_fit, bw_diag = self._fit_continuous( + d_arr, + dy_arr, + resolved_design, + d_lower_val, + ) + inference_method = "analytical_nonparametric" + vcov_label: Optional[str] = None + cluster_label: Optional[str] = None + elif resolved_design == "mass_point": + if vcov_type_arg is None: + # Backward-compat: robust=True -> hc1, robust=False -> classical. + vcov_effective = "hc1" if robust_arg else "classical" + else: + vcov_effective = vcov_type_arg.lower() + # Explicit cluster + hc1 becomes CR1 inside _fit_mass_point_2sls. + att, se = _fit_mass_point_2sls( + d_arr, + dy_arr, + d_lower_val, + cluster_arr, + vcov_effective, + ) + bc_fit = None + bw_diag = None + inference_method = "analytical_2sls" + vcov_label = vcov_effective + cluster_label = cluster_arg if cluster_arg is not None else None + else: + raise ValueError(f"Internal error: unhandled design={resolved_design!r}.") + + # ---- Route all inference fields through safe_inference ---- + t_stat, p_value, conf_int = safe_inference(att, se, alpha=float(self.alpha)) + + return HeterogeneousAdoptionDiDResults( + att=float(att), + se=float(se), + t_stat=float(t_stat), + p_value=float(p_value), + conf_int=(float(conf_int[0]), float(conf_int[1])), + alpha=float(self.alpha), + design=resolved_design, + target_parameter=_TARGET_PARAMETER[resolved_design], + d_lower=d_lower_val, + dose_mean=dose_mean, + n_obs=n_obs, + n_treated=n_treated, + n_control=n_control, + n_mass_point=n_mass_point, + n_above_d_lower=n_above_d_lower, + inference_method=inference_method, + vcov_type=vcov_label, + cluster_name=cluster_label, + survey_metadata=None, # Phase 2a: survey integration deferred. + bandwidth_diagnostics=bw_diag, + bias_corrected_fit=bc_fit, + ) + + # ------------------------------------------------------------------ + # Continuous-design dispatch (Design 1' + Design 1 continuous-near-d_lower) + # ------------------------------------------------------------------ + + def _fit_continuous( + self, + d_arr: np.ndarray, + dy_arr: np.ndarray, + resolved_design: str, + d_lower_val: float, + ) -> Tuple[float, float, Optional[BiasCorrectedFit], Optional[BandwidthResult]]: + """Fit Phase 1c ``bias_corrected_local_linear`` and rescale to beta-scale. + + Both continuous designs share the same rescaling path - they + differ only in the regressor shift and boundary: + + - ``continuous_at_zero``: regressor ``d_arr``, boundary ``0``. + - ``continuous_near_d_lower``: regressor ``d_arr - d_lower``, + boundary ``0``. + + The mu-scale output ``tau_bc`` from + :func:`bias_corrected_local_linear` is divided by the + beta-scale divisor ``D_bar = (1/G) * sum(D_{g,2})`` (Equation 8 + in the paper). The same divisor applies to the classical CI + endpoints. The result of ``D_bar`` uses the ORIGINAL + (unshifted) dose mean, matching paper convention. + """ + if resolved_design == "continuous_at_zero": + d_reg = d_arr + boundary = 0.0 + elif resolved_design == "continuous_near_d_lower": + d_reg = d_arr - d_lower_val + boundary = 0.0 + else: + raise ValueError( + f"_fit_continuous called with non-continuous " f"design={resolved_design!r}" + ) + + # Phase 1b/1c's bandwidth selector can hit degenerate ratios + # (divide-by-zero in the bias estimator) when the outcome is + # exactly constant or exactly linear in d (zero residuals). + # The upstream fix lives in ``_nprobust_port.lpbwselect_mse_dpi`` + # and is deliberately out of Phase 2a scope (per plan "No changes + # to local_linear.py / _nprobust_port.py"). Catch the symptomatic + # exceptions here and surface a NaN result via ``safe_inference`` + # so users get a well-shaped ``HeterogeneousAdoptionDiDResults`` + # rather than a traceback. + try: + bc_fit = bias_corrected_local_linear( + d=d_reg, + y=dy_arr, + boundary=boundary, + kernel=self.kernel, + alpha=float(self.alpha), + # No cluster / vce / weights threading in Phase 2a (see + # UserWarning in fit()). + ) + except (ZeroDivisionError, FloatingPointError, np.linalg.LinAlgError): + return float("nan"), float("nan"), None, None + + dose_mean = float(d_arr.mean()) + # Phase 1b / Phase 1c pre-checks enforce dose_mean > 0 for + # Design 1-family samples, but defense-in-depth: if dose_mean + # rounds to zero, fall through to NaN via safe_inference in fit(). + if abs(dose_mean) < 1e-12: + att = float("nan") + se = float("nan") + else: + att = float(bc_fit.estimate_bias_corrected) / dose_mean + se = float(bc_fit.se_robust) / dose_mean + + return att, se, bc_fit, bc_fit.bandwidth_diagnostics diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index dcc1e209..d9e1aba3 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2316,11 +2316,13 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4. - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246). - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug. -- [ ] Phase 2: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1', Design 1 mass-point, and Design 1 continuous-near-`d̲`. -- [ ] Phase 2: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). -- [ ] Phase 2: Panel validator verifies `D_{g,1} = 0` for all units; error on staggered timing without last-cohort subgroup. -- [ ] Phase 2: Multi-period event-study extension (Appendix B.2). -- [ ] Phase 2: NaN-propagation tests across all ~15 result fields via `assert_nan_inference`. +- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and apply the β-scale rescaling `(1/G) Σ D_{g,2}` from Equation 8 of de Chaisemartin et al. (2026); mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d_lower}`. +- [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples. +- [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units; rejects `>2` time periods (staggered reduction queued for Phase 2b); rejects unbalanced panels and NaN in outcome/dose/unit columns. +- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs. + - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR. + - **Note (Phase 2a scope):** Ships single-period only. `aggregate="event_study"` (Appendix B.2 multi-period extension) raises `NotImplementedError` pointing to Phase 2b. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR. +- [ ] Phase 2b: Multi-period event-study extension (Appendix B.2). Will lift the `aggregate="event_study"` scaffolding in `HeterogeneousAdoptionDiD.fit()` and aggregate per-cohort first-differences into an event-study result. - [ ] Phase 3: `qug_test()` (`T = D_{2,(1)} / (D_{2,(2)} - D_{2,(1)})`, rejection `{T > 1/α - 1}`). - [ ] Phase 3: `stute_test()` Cramér-von Mises with Mammen wild bootstrap. - [ ] Phase 3: `yatchew_hr_test()` heteroskedasticity-robust linearity test. diff --git a/tests/test_had.py b/tests/test_had.py new file mode 100644 index 00000000..dedcc1ac --- /dev/null +++ b/tests/test_had.py @@ -0,0 +1,1216 @@ +"""Tests for :class:`diff_diff.had.HeterogeneousAdoptionDiD` (Phase 2a). + +Covers the 12 plan commit criteria: + +1. All three design paths produce a finite result on synthetic DGPs. +2. ``design="auto"`` resolves correctly on each DGP + two edge cases. +3. Beta-scale rescaling: ``att == tau_bc / D_bar`` and CI endpoints == + ``(ci_low/D_bar, ci_high/D_bar)`` at ``atol=1e-14``. +4. Mass-point Wald-IV point estimate matches manual formula at + ``atol=1e-14``. +5. Mass-point 2SLS SE parity against hand-coded sandwich at + ``atol=1e-12`` for HC1, classical, and CR1 (cluster-robust). +6. Mass-point + ``vcov_type in {hc2, hc2_bm}`` raises + ``NotImplementedError``. +7. Panel-contract violations raise targeted ``ValueError``s. +8. NaN propagation: constant-y and mass-point degenerate inputs produce + all-NaN inference. +9. sklearn clone round-trip preserves raw ``design="auto"``; fit is + idempotent. +10. Scaffolding (``aggregate="event_study"``, ``survey``, ``weights``) + raises ``NotImplementedError`` with phase pointers. +11. ``get_params()`` keys match ``__init__`` signature. +12. REGISTRY ticks tested indirectly via parity with the paper rules. +""" + +from __future__ import annotations + +import inspect +import warnings + +import numpy as np +import pandas as pd +import pytest + +from diff_diff.had import ( + HeterogeneousAdoptionDiD, + HeterogeneousAdoptionDiDResults, + _aggregate_first_difference, + _detect_design, + _fit_mass_point_2sls, + _validate_had_panel, +) +from diff_diff.local_linear import bias_corrected_local_linear +from tests.conftest import assert_nan_inference + +# ============================================================================= +# DGP helpers +# ============================================================================= + + +def _make_panel(d_post, delta_y, periods=(1, 2), extra_cols=None): + """Build a balanced two-period panel with ``D_{g,1} = 0``. + + Parameters + ---------- + d_post : np.ndarray, shape (G,) + Unit-level post-period dose ``D_{g,2}``. + delta_y : np.ndarray, shape (G,) + Unit-level first-difference outcome ``Y_{g,2} - Y_{g,1}``. + periods : tuple + (t_pre, t_post). + extra_cols : dict or None + Additional unit-constant columns (e.g., cluster variable). + """ + G = len(d_post) + t_pre, t_post = periods + units = np.arange(G) + df = pd.DataFrame( + { + "unit": np.repeat(units, 2), + "period": np.tile([t_pre, t_post], G), + "dose": np.column_stack([np.zeros(G), d_post]).ravel(), + # Set period-1 outcome to 0; period-2 outcome = delta_y so that + # Y_{g,2} - Y_{g,1} == delta_y exactly. + "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(), + } + ) + if extra_cols: + for col, vals in extra_cols.items(): + df[col] = np.repeat(vals, 2) + return df + + +def _dgp_continuous_at_zero(G, seed): + """Design 1' DGP: uniform dose on [0, 1] with exact zero in the sample.""" + rng = np.random.default_rng(seed) + d = rng.uniform(0.0, 1.0, G) + d[0] = 0.0 # guarantee continuous_at_zero auto-detection + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + return d, dy + + +def _dgp_continuous_near_d_lower(G, seed): + """Design 1 continuous-near-d_lower DGP: Beta(2,2) shifted to [0.1, 1].""" + rng = np.random.default_rng(seed) + u = rng.beta(2, 2, G) + d = 0.1 + 0.9 * u + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + return d, dy + + +def _dgp_mass_point(G, seed, d_lower=0.5, mass_frac=0.3, beta=0.3): + """Mass-point DGP: ``mass_frac`` at d_lower, rest Uniform(d_lower, 1).""" + rng = np.random.default_rng(seed) + mass_n = int(mass_frac * G) + d = np.concatenate([np.full(mass_n, d_lower), rng.uniform(d_lower, 1.0, G - mass_n)]) + dy = beta * d + 0.1 * rng.standard_normal(G) + return d, dy + + +# ============================================================================= +# Criterion 1: Smoke tests - all 3 design paths produce finite output +# ============================================================================= + + +class TestSmokeAllDesigns: + def test_continuous_at_zero_finite(self): + d, dy = _dgp_continuous_at_zero(500, seed=42) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + assert np.isfinite(r.se) + assert r.se > 0 + + def test_continuous_near_d_lower_finite(self): + d, dy = _dgp_continuous_near_d_lower(500, seed=42) + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + assert np.isfinite(r.se) + assert r.se > 0 + + def test_mass_point_finite(self): + d, dy = _dgp_mass_point(500, seed=42) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + assert np.isfinite(r.se) + assert r.se > 0 + + def test_result_is_dataclass(self): + d, dy = _dgp_continuous_at_zero(400, seed=0) + r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit") + assert isinstance(r, HeterogeneousAdoptionDiDResults) + + def test_continuous_populates_bandwidth_diagnostics(self): + d, dy = _dgp_continuous_at_zero(400, seed=0) + r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit") + assert r.bandwidth_diagnostics is not None + assert r.bias_corrected_fit is not None + + def test_mass_point_nulls_bandwidth_diagnostics(self): + d, dy = _dgp_mass_point(400, seed=0) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.bandwidth_diagnostics is None + assert r.bias_corrected_fit is None + assert r.n_mass_point is not None + assert r.n_above_d_lower is not None + + def test_continuous_nulls_mass_point_counts(self): + d, dy = _dgp_continuous_at_zero(400, seed=0) + r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit") + assert r.n_mass_point is None + assert r.n_above_d_lower is None + + +# ============================================================================= +# Criterion 2: design="auto" detection rule +# ============================================================================= + + +class TestDesignAutoDetect: + def test_detect_design_1_prime_exact_zero(self): + d, _ = _dgp_continuous_at_zero(500, seed=0) + assert _detect_design(d) == "continuous_at_zero" + + def test_detect_design_continuous_near_d_lower(self): + d, _ = _dgp_continuous_near_d_lower(500, seed=0) + assert _detect_design(d) == "continuous_near_d_lower" + + def test_detect_mass_point(self): + d, _ = _dgp_mass_point(500, seed=0) + assert _detect_design(d) == "mass_point" + + def test_edge_small_mass_at_zero_resolves_continuous_at_zero(self): + """Plan criterion 2 edge-case (a): 3% at D=0 + 97% Uniform(0.5, 1).""" + rng = np.random.default_rng(0) + G = 1000 + mass_n = int(0.03 * G) + d = np.concatenate([np.zeros(mass_n), rng.uniform(0.5, 1.0, G - mass_n)]) + assert _detect_design(d) == "continuous_at_zero" + + def test_edge_shifted_beta_not_small_enough_for_design_1_prime(self): + """Plan criterion 2 edge-case (b): d.min/median ~ 0.03 > 0.01 threshold.""" + rng = np.random.default_rng(0) + u = rng.beta(2, 2, 1000) + d = 0.03 + u + assert _detect_design(d) == "continuous_near_d_lower" + + def test_design_auto_dispatches_correctly_at_fit(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + r = HeterogeneousAdoptionDiD(design="auto").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.design == "continuous_at_zero" + + def test_design_auto_mass_point_at_fit(self): + d, dy = _dgp_mass_point(500, seed=0) + r = HeterogeneousAdoptionDiD(design="auto").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.design == "mass_point" + + def test_auto_does_not_mutate_self_design(self): + """Plan decision #14: self.design preserves raw 'auto' after fit.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + est = HeterogeneousAdoptionDiD(design="auto") + _ = est.fit(_make_panel(d, dy), "outcome", "dose", "period", "unit") + assert est.design == "auto" + assert est.get_params()["design"] == "auto" + + +# ============================================================================= +# Criterion 3: Beta-scale rescaling parity +# ============================================================================= + + +class TestBetaScaleRescaling: + def test_att_equals_tau_bc_over_dbar(self): + """result.att == bc_fit.tau_bc / D_bar at atol=1e-14.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) + d_bar = float(d.mean()) + expected = float(bc.estimate_bias_corrected) / d_bar + assert abs(r.att - expected) < 1e-14 + + def test_se_equals_se_rb_over_dbar(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) + d_bar = float(d.mean()) + expected = float(bc.se_robust) / d_bar + assert abs(r.se - expected) < 1e-14 + + def test_ci_lower_equals_ci_low_over_dbar(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) + d_bar = float(d.mean()) + expected = float(bc.ci_low) / d_bar + assert abs(r.conf_int[0] - expected) < 1e-14 + + def test_ci_upper_equals_ci_high_over_dbar(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) + d_bar = float(d.mean()) + expected = float(bc.ci_high) / d_bar + assert abs(r.conf_int[1] - expected) < 1e-14 + + def test_rescaling_shifted_regressor_continuous_near_d_lower(self): + """continuous_near_d_lower: regressor shift `D - d_lower`, divisor = mean(D).""" + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + d_lower_val = float(d.min()) + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + panel, "outcome", "dose", "period", "unit" + ) + d_reg = d - d_lower_val + bc = bias_corrected_local_linear(d=d_reg, y=dy, boundary=0.0, alpha=0.05) + d_bar = float(d.mean()) + expected = float(bc.estimate_bias_corrected) / d_bar + assert abs(r.att - expected) < 1e-14 + + def test_dose_mean_stored_on_result(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit") + assert abs(r.dose_mean - float(d.mean())) < 1e-14 + + +# ============================================================================= +# Criterion 4: Mass-point Wald-IV point estimate parity +# ============================================================================= + + +class TestMassPointWaldIV: + def test_wald_iv_point_estimate(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + panel, "outcome", "dose", "period", "unit" + ) + Z = (d > 0.5).astype(float) + expected = (dy[Z == 1].mean() - dy[Z == 0].mean()) / (d[Z == 1].mean() - d[Z == 0].mean()) + assert abs(r.att - expected) < 1e-14 + + def test_wald_iv_equals_2sls(self): + """Sanity: Wald-IV is exactly 2SLS for binary instrument.""" + d, dy = _dgp_mass_point(500, seed=7) + Z = (d > 0.5).astype(float).reshape(-1, 1) + # 2SLS via Z'X invert: beta = [(Z'X)^-1 Z'y][1] + X = np.column_stack([np.ones_like(d), d]) + Zd = np.column_stack([np.ones_like(d), Z.ravel()]) + beta_2sls = np.linalg.inv(Zd.T @ X) @ (Zd.T @ dy) + beta_wald = (dy[Z.ravel() == 1].mean() - dy[Z.ravel() == 0].mean()) / ( + d[Z.ravel() == 1].mean() - d[Z.ravel() == 0].mean() + ) + assert abs(float(beta_2sls[1]) - beta_wald) < 1e-12 + + def test_mass_point_n_counts_populated(self): + d, dy = _dgp_mass_point(500, seed=0, d_lower=0.5, mass_frac=0.3) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r.n_mass_point == int(0.3 * 500) + assert r.n_above_d_lower == 500 - int(0.3 * 500) + assert r.n_treated == r.n_above_d_lower + assert r.n_control == r.n_mass_point + + +# ============================================================================= +# Criterion 5: Mass-point 2SLS SE sandwich parity +# ============================================================================= + + +def _manual_2sls_sandwich_se(d, dy, d_lower, vcov_type, cluster=None): + """Hand-coded textbook 2SLS sandwich using structural residuals. + + Returns se_beta for the coefficient on d. Mirrors the helper in had.py + but computed from scratch to serve as the parity reference. + """ + n = len(d) + Z = (d > d_lower).astype(np.float64) + dose_gap = d[Z == 1].mean() - d[Z == 0].mean() + dy_gap = dy[Z == 1].mean() - dy[Z == 0].mean() + beta = dy_gap / dose_gap + alpha_hat = dy.mean() - beta * d.mean() + u = dy - alpha_hat - beta * d # STRUCTURAL residuals + X = np.column_stack([np.ones(n), d]) + Zd = np.column_stack([np.ones(n), Z]) + ZtX_inv = np.linalg.inv(Zd.T @ X) + + if cluster is not None: + Omega = np.zeros((2, 2)) + clusters = pd.unique(cluster) + G = len(clusters) + for c in clusters: + idx = cluster == c + s = Zd[idx].T @ u[idx] + Omega += np.outer(s, s) + Omega *= (G / (G - 1)) * ((n - 1) / (n - 2)) + elif vcov_type == "classical": + sigma2 = (u * u).sum() / (n - 2) + Omega = sigma2 * (Zd.T @ Zd) + elif vcov_type == "hc1": + Omega = (n / (n - 2)) * (Zd.T @ ((u * u)[:, None] * Zd)) + else: + raise ValueError(f"unknown vcov_type={vcov_type}") + + V = ZtX_inv @ Omega @ ZtX_inv.T + return float(np.sqrt(V[1, 1])) + + +class TestMassPointSEParity: + def test_classical_parity(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical").fit( + panel, "outcome", "dose", "period", "unit" + ) + expected = _manual_2sls_sandwich_se(d, dy, 0.5, "classical") + assert abs(r.se - expected) < 1e-12 + + def test_hc1_parity(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1").fit( + panel, "outcome", "dose", "period", "unit" + ) + expected = _manual_2sls_sandwich_se(d, dy, 0.5, "hc1") + assert abs(r.se - expected) < 1e-12 + + def test_cr1_cluster_robust_parity(self): + d, dy = _dgp_mass_point(500, seed=0) + cluster_ids = np.tile(np.arange(50), 10) # 50 clusters of 10 units + panel = _make_panel(d, dy, extra_cols={"state": cluster_ids}) + r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit( + panel, "outcome", "dose", "period", "unit" + ) + expected = _manual_2sls_sandwich_se(d, dy, 0.5, "hc1", cluster=cluster_ids) + assert abs(r.se - expected) < 1e-12 + + def test_robust_alias_maps_to_hc1(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + r_robust = HeterogeneousAdoptionDiD(design="mass_point", robust=True).fit( + panel, "outcome", "dose", "period", "unit" + ) + r_hc1 = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r_robust.se == r_hc1.se + + def test_robust_false_maps_to_classical(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + r_robust = HeterogeneousAdoptionDiD(design="mass_point", robust=False).fit( + panel, "outcome", "dose", "period", "unit" + ) + r_classical = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r_robust.se == r_classical.se + + def test_vcov_type_explicit_overrides_robust(self): + """When vcov_type is explicit, robust is ignored.""" + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical", robust=True).fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r.vcov_type == "classical" + + +# ============================================================================= +# Criterion 6: hc2 / hc2_bm raise NotImplementedError +# ============================================================================= + + +class TestMassPointUnsupportedVcov: + def test_hc2_raises(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc2") + with pytest.raises(NotImplementedError, match="HC2"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_hc2_bm_raises(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc2_bm") + with pytest.raises(NotImplementedError, match="HC2"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_hc2_pointer_references_followup_pr(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc2") + with pytest.raises(NotImplementedError, match="follow-up"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_vcov_type_ignored_on_continuous(self): + """hc2 passed with continuous design emits warning, does not raise.""" + d, dy = _dgp_continuous_at_zero(300, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", vcov_type="hc2") + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert any("ignored" in str(warn.message).lower() for warn in w) + assert np.isfinite(r.att) + + +# ============================================================================= +# Criterion 7: Panel-contract violations +# ============================================================================= + + +class TestPanelContract: + def test_missing_outcome_col_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="column"): + est.fit(panel, "missing", "dose", "period", "unit") + + def test_missing_dose_col_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="column"): + est.fit(panel, "outcome", "missing", "period", "unit") + + def test_missing_time_col_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="column"): + est.fit(panel, "outcome", "dose", "missing", "unit") + + def test_missing_unit_col_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="column"): + est.fit(panel, "outcome", "dose", "period", "missing") + + def test_nonzero_pre_period_dose_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + panel.loc[panel["period"] == 1, "dose"] = 0.5 + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"D_\{g,1\}|pre-period"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_unbalanced_panel_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy).iloc[:-1] # drop one row + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"[Uu]nbalanced|[Bb]alanced"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_three_periods_without_first_treat_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel2 = _make_panel(d, dy) + panel3 = pd.concat([panel2, panel2.assign(period=3)]) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"two time periods|Phase 2b"): + est.fit(panel3, "outcome", "dose", "period", "unit") + + def test_three_periods_with_first_treat_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel2 = _make_panel(d, dy) + panel3 = pd.concat([panel2, panel2.assign(period=3)]) + panel3["ft"] = 2 # arbitrary first_treat + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"two time periods|Phase 2b"): + est.fit( + panel3, + "outcome", + "dose", + "period", + "unit", + first_treat_col="ft", + ) + + def test_single_period_raises(self): + d, _ = _dgp_continuous_at_zero(200, seed=0) + panel = pd.DataFrame( + { + "unit": np.arange(200), + "period": 2, + "dose": d, + "outcome": np.zeros(200), + } + ) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="two-period"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_nan_outcome_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + panel.loc[0, "outcome"] = np.nan + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="NaN"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_nan_dose_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + panel.loc[3, "dose"] = np.nan + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="NaN"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_duplicate_unit_period_raises(self): + """Two observations of the same unit-period cell.""" + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + panel = pd.concat([panel, panel.iloc[[0]]]) # duplicate first row + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"[Uu]nbalanced|observation"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_first_treat_col_invalid_cohort_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + # Set first_treat values to {0, 5, 2} where 5 is not t_post. + ft_unit = np.where(np.arange(200) % 2 == 0, 0, 5) + panel["ft"] = np.repeat(ft_unit, 2) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"first_treat_col"): + est.fit( + panel, + "outcome", + "dose", + "period", + "unit", + first_treat_col="ft", + ) + + +# ============================================================================= +# Criterion 8: NaN propagation +# ============================================================================= + + +class TestNaNPropagation: + def test_constant_y_produces_nan_inference(self): + """Constant outcome -> zero residuals -> NaN via safe_inference.""" + d, _ = _dgp_continuous_at_zero(500, seed=0) + dy_zero = np.zeros_like(d) + panel = _make_panel(d, dy_zero) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + # All inference fields NaN when SE is non-finite. + assert_nan_inference( + { + "se": r.se, + "t_stat": r.t_stat, + "p_value": r.p_value, + "conf_int": r.conf_int, + } + ) + + def test_mass_point_all_above_d_lower_nan(self): + """Degenerate mass-point: no units at d_lower -> NaN.""" + rng = np.random.default_rng(0) + G = 500 + d = rng.uniform(0.6, 1.0, G) # all above 0.5 + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5).fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isnan(r.att) + assert_nan_inference( + { + "se": r.se, + "t_stat": r.t_stat, + "p_value": r.p_value, + "conf_int": r.conf_int, + } + ) + + def test_mass_point_all_at_d_lower_nan(self): + """Degenerate mass-point: all units at d_lower -> NaN.""" + rng = np.random.default_rng(0) + G = 500 + d = np.full(G, 0.5) # all at 0.5 + dy = 0.1 * rng.standard_normal(G) + panel = _make_panel(d, dy) + # Avoid triggering pre-period D=0 check by starting at 0.5 at t2. + r = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5).fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isnan(r.att) + assert_nan_inference( + { + "se": r.se, + "t_stat": r.t_stat, + "p_value": r.p_value, + "conf_int": r.conf_int, + } + ) + + def test_helper_returns_nan_on_empty_z_one(self): + """_fit_mass_point_2sls returns NaN when no units above d_lower.""" + d = np.full(50, 0.5) + dy = np.random.default_rng(0).standard_normal(50) + beta, se = _fit_mass_point_2sls(d, dy, 0.5, None, "hc1") + assert np.isnan(beta) + assert np.isnan(se) + + def test_helper_returns_nan_on_empty_z_zero(self): + """_fit_mass_point_2sls returns NaN when no units at d_lower.""" + d = np.full(50, 0.6) # all strictly above d_lower=0.5 + dy = np.random.default_rng(0).standard_normal(50) + beta, se = _fit_mass_point_2sls(d, dy, 0.5, None, "hc1") + assert np.isnan(beta) + assert np.isnan(se) + + def test_single_cluster_cr1_returns_nan(self): + """CR1 with only 1 cluster is undefined -> NaN.""" + rng = np.random.default_rng(0) + G = 100 + d = np.concatenate([np.full(30, 0.5), rng.uniform(0.5, 1.0, G - 30)]) + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + panel = _make_panel(d, dy, extra_cols={"state": np.zeros(G, dtype=int)}) # single cluster + r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isnan(r.se) + + +# ============================================================================= +# Criterion 9: sklearn clone round-trip + fit idempotence +# ============================================================================= + + +class TestSklearnCompat: + def test_get_params_returns_all_constructor_args(self): + est = HeterogeneousAdoptionDiD( + design="continuous_near_d_lower", + d_lower=0.3, + kernel="triangular", + alpha=0.1, + vcov_type="hc1", + robust=True, + cluster="state", + ) + params = est.get_params() + assert params == { + "design": "continuous_near_d_lower", + "d_lower": 0.3, + "kernel": "triangular", + "alpha": 0.1, + "vcov_type": "hc1", + "robust": True, + "cluster": "state", + } + + def test_clone_round_trip(self): + est = HeterogeneousAdoptionDiD(design="auto", alpha=0.1, kernel="triangular") + est2 = HeterogeneousAdoptionDiD(**est.get_params()) + assert est.get_params() == est2.get_params() + + def test_fit_idempotent_same_att(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + r1 = est.fit(panel, "outcome", "dose", "period", "unit") + r2 = est.fit(panel, "outcome", "dose", "period", "unit") + assert r1.att == r2.att + assert r1.se == r2.se + assert r1.conf_int == r2.conf_int + + def test_set_params_updates_and_returns_self(self): + est = HeterogeneousAdoptionDiD() + ret = est.set_params(alpha=0.1, design="continuous_at_zero") + assert ret is est + assert est.alpha == 0.1 + assert est.design == "continuous_at_zero" + + def test_set_params_invalid_key_raises(self): + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="Invalid parameter"): + est.set_params(not_a_param=True) + + def test_set_params_invalid_design_raises(self): + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="design"): + est.set_params(design="made_up") + + +# ============================================================================= +# Criterion 10: Scaffolding raises +# ============================================================================= + + +class TestScaffoldingRejections: + def test_aggregate_event_study_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(NotImplementedError, match="2b"): + est.fit( + panel, + "outcome", + "dose", + "period", + "unit", + aggregate="event_study", + ) + + def test_aggregate_invalid_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="Invalid aggregate"): + est.fit( + panel, + "outcome", + "dose", + "period", + "unit", + aggregate="garbage", + ) + + def test_survey_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(NotImplementedError, match="survey"): + est.fit( + panel, + "outcome", + "dose", + "period", + "unit", + survey="anything", + ) + + def test_weights_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD() + with pytest.raises(NotImplementedError, match="weights"): + est.fit( + panel, + "outcome", + "dose", + "period", + "unit", + weights=np.ones(200), + ) + + +# ============================================================================= +# Criterion 11: get_params signature enumeration +# ============================================================================= + + +class TestGetParamsContract: + def test_get_params_matches_init_signature(self): + sig_params = set(inspect.signature(HeterogeneousAdoptionDiD.__init__).parameters.keys()) - { + "self" + } + gp_params = set(HeterogeneousAdoptionDiD().get_params().keys()) + assert sig_params == gp_params + + def test_set_params_covers_all_init_params(self): + """Every __init__ param must be settable via set_params.""" + est = HeterogeneousAdoptionDiD() + params = est.get_params() + # Round-trip via set_params + new_est = HeterogeneousAdoptionDiD() + new_est.set_params(**params) + assert new_est.get_params() == params + + +# ============================================================================= +# Result class methods +# ============================================================================= + + +class TestResultMethods: + def _result(self): + d, dy = _dgp_continuous_at_zero(400, seed=0) + panel = _make_panel(d, dy) + return HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit") + + def test_summary_returns_string(self): + r = self._result() + s = r.summary() + assert isinstance(s, str) + assert "HeterogeneousAdoptionDiD" in s + assert "WAS" in s + assert "Confidence Interval" in s + + def test_print_summary_executes(self, capsys): + r = self._result() + r.print_summary() + captured = capsys.readouterr() + assert "HeterogeneousAdoptionDiD" in captured.out + + def test_to_dict_populated(self): + r = self._result() + d = r.to_dict() + assert "att" in d + assert "se" in d + assert "design" in d + assert "target_parameter" in d + assert "d_lower" in d + assert "dose_mean" in d + assert "n_obs" in d + assert d["design"] == "continuous_at_zero" + + def test_to_dataframe_populated(self): + r = self._result() + df = r.to_dataframe() + assert isinstance(df, pd.DataFrame) + assert len(df) == 1 + assert "att" in df.columns + + def test_repr_concise(self): + r = self._result() + s = repr(r) + assert "HeterogeneousAdoptionDiDResults" in s + assert "att=" in s + assert "design=" in s + + def test_mass_point_summary_includes_mass_count(self): + d, dy = _dgp_mass_point(400, seed=0) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + panel, "outcome", "dose", "period", "unit" + ) + s = r.summary() + assert "mass point" in s.lower() or "At d_lower" in s + + +# ============================================================================= +# Design metadata +# ============================================================================= + + +class TestDesignMetadata: + def test_target_parameter_design_1_prime(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.target_parameter == "WAS" + + def test_target_parameter_design_1(self): + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.target_parameter == "WAS_d_lower" + + def test_target_parameter_mass_point(self): + d, dy = _dgp_mass_point(500, seed=0) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.target_parameter == "WAS_d_lower" + + def test_d_lower_zero_for_design_1_prime(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.d_lower == 0.0 + + def test_d_lower_from_data_for_continuous_near(self): + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert abs(r.d_lower - float(d.min())) < 1e-14 + + def test_d_lower_explicit_override(self): + """d_lower override must satisfy d.min() >= d_lower (else negative shifted doses).""" + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + # d.min() is around 0.1 + epsilon for this DGP; override within that. + d_lower_user = float(d.min()) # explicit but equal to default + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=d_lower_user).fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert abs(r.d_lower - d_lower_user) < 1e-14 + + def test_inference_method_continuous(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit") + assert r.inference_method == "analytical_nonparametric" + + def test_inference_method_mass_point(self): + d, dy = _dgp_mass_point(500, seed=0) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.inference_method == "analytical_2sls" + + def test_survey_metadata_always_none_in_phase_2a(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit") + assert r.survey_metadata is None + + def test_alpha_stored_on_result(self): + d, dy = _dgp_continuous_at_zero(500, seed=0) + r = HeterogeneousAdoptionDiD(alpha=0.1).fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.alpha == 0.1 + + +# ============================================================================= +# Constructor validation +# ============================================================================= + + +class TestConstructorValidation: + def test_invalid_design_raises(self): + with pytest.raises(ValueError, match="Invalid design"): + HeterogeneousAdoptionDiD(design="random_garbage") + + def test_alpha_zero_raises(self): + with pytest.raises(ValueError, match="alpha"): + HeterogeneousAdoptionDiD(alpha=0.0) + + def test_alpha_one_raises(self): + with pytest.raises(ValueError, match="alpha"): + HeterogeneousAdoptionDiD(alpha=1.0) + + def test_alpha_negative_raises(self): + with pytest.raises(ValueError, match="alpha"): + HeterogeneousAdoptionDiD(alpha=-0.05) + + def test_invalid_vcov_type_raises(self): + with pytest.raises(ValueError, match="vcov_type"): + HeterogeneousAdoptionDiD(vcov_type="garbage") + + def test_vcov_type_none_accepted(self): + est = HeterogeneousAdoptionDiD(vcov_type=None) + assert est.vcov_type is None + + def test_d_lower_none_accepted(self): + est = HeterogeneousAdoptionDiD(d_lower=None) + assert est.d_lower is None + + def test_d_lower_float_accepted(self): + est = HeterogeneousAdoptionDiD(d_lower=0.3) + assert est.d_lower == 0.3 + + +# ============================================================================= +# Explicit design override (don't auto-reject) +# ============================================================================= + + +class TestExplicitDesignOverrides: + def test_force_continuous_at_zero_on_mass_point_data(self): + """Forcing Design 1' on mass-point data should run (may produce wide CIs).""" + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + # Phase 1c's _validate_had_inputs would reject this (mass point), + # so this will raise NotImplementedError from underneath, NOT from had.py. + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + with pytest.raises(NotImplementedError, match="mass-point"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_force_mass_point_on_continuous_data(self): + """Forcing mass-point on continuous data runs the 2SLS on the minimum cluster.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + # d.min() == 0 exactly; mass-point with d_lower=0 will treat the + # zero-dose unit (just 1 unit) as the mass cluster and the rest as + # "above". This exercises the degenerate mass=1 case. + est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.0) + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert r.design == "mass_point" + + +# ============================================================================= +# Cluster handling (unit-level aggregation) +# ============================================================================= + + +class TestClusterHandling: + def test_cluster_not_constant_within_unit_raises(self): + d, dy = _dgp_mass_point(100, seed=0) + panel = _make_panel(d, dy) + # Make cluster vary within unit + panel["state"] = np.arange(len(panel)) + est = HeterogeneousAdoptionDiD(design="mass_point", cluster="state") + with pytest.raises(ValueError, match=r"constant within unit"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_missing_cluster_column_raises(self): + d, dy = _dgp_mass_point(100, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", cluster="missing") + with pytest.raises(ValueError, match="cluster"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_nan_cluster_raises(self): + d, dy = _dgp_mass_point(100, seed=0) + # Unit-level cluster ids: 50 clusters, 2 units each, with NaN on unit 0. + cluster_unit = np.repeat(np.arange(50).astype(float), 2) # length 100 + cluster_unit[0] = np.nan + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + est = HeterogeneousAdoptionDiD(design="mass_point", cluster="state") + with pytest.raises(ValueError, match="NaN"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_cluster_warns_on_continuous_path(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + cluster_unit = np.repeat(np.arange(100), 2) # length 200 unit-level + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state") + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert any("cluster" in str(warn.message).lower() for warn in w) + assert np.isfinite(r.att) + + def test_cluster_name_populated_mass_point(self): + d, dy = _dgp_mass_point(200, seed=0) + cluster_unit = np.repeat(np.arange(50), 4) # 50 clusters, 4 units each + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r.cluster_name == "state" + + def test_cluster_name_none_without_cluster(self): + d, dy = _dgp_mass_point(200, seed=0) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + _make_panel(d, dy), "outcome", "dose", "period", "unit" + ) + assert r.cluster_name is None + + +# ============================================================================= +# First-difference aggregation helper +# ============================================================================= + + +class TestFirstDifferenceAggregation: + def test_aggregate_returns_sorted_unit_order(self): + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy) + # Shuffle rows to test sort-invariance + panel_shuffled = panel.sample(frac=1, random_state=42).reset_index(drop=True) + d_arr, dy_arr, _, unit_ids = _aggregate_first_difference( + panel_shuffled, "outcome", "dose", "period", "unit", 1, 2, None + ) + # unit_ids sorted + assert np.all(np.diff(unit_ids) >= 0) + # Each dose matches the input dose for its unit + for i, uid in enumerate(unit_ids): + assert abs(d_arr[i] - d[uid]) < 1e-14 + assert abs(dy_arr[i] - dy[uid]) < 1e-14 + + def test_aggregate_cluster_array_correct(self): + d, dy = _dgp_mass_point(100, seed=0) + cluster_unit = np.repeat(np.arange(25), 4) # 25 clusters, 4 units each + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + _, _, cluster_arr, unit_ids = _aggregate_first_difference( + panel, + "outcome", + "dose", + "period", + "unit", + 1, + 2, + "state", + ) + assert cluster_arr is not None + assert len(cluster_arr) == 100 + # Cluster_arr[i] should equal cluster_unit[unit_ids[i]] + for i, uid in enumerate(unit_ids): + assert cluster_arr[i] == cluster_unit[uid] + + def test_aggregate_no_cluster_returns_none(self): + d, dy = _dgp_continuous_at_zero(50, seed=0) + panel = _make_panel(d, dy) + _, _, cluster_arr, _ = _aggregate_first_difference( + panel, "outcome", "dose", "period", "unit", 1, 2, None + ) + assert cluster_arr is None + + +# ============================================================================= +# Auto-detect mass-point vs continuous-near at boundary +# ============================================================================= + + +class TestAutoDetectEdges: + def test_exactly_two_percent_modal_is_not_mass_point(self): + """Threshold is strict >, not >=. 2% exactly should stay continuous.""" + rng = np.random.default_rng(0) + G = 1000 + mass_n = 20 # exactly 2% + d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)]) + # d.min() == 0.5, not 0, and modal fraction == 2% (not > 2%) + assert _detect_design(d) == "continuous_near_d_lower" + + def test_slightly_over_two_percent_is_mass_point(self): + rng = np.random.default_rng(0) + G = 1000 + mass_n = 25 # 2.5% + d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)]) + assert _detect_design(d) == "mass_point" + + def test_all_at_zero_resolves_continuous_at_zero(self): + """Degenerate but well-defined: all zeros -> continuous_at_zero.""" + d = np.zeros(100) + assert _detect_design(d) == "continuous_at_zero" + + +# ============================================================================= +# Panel validator direct tests +# ============================================================================= + + +class TestValidateHadPanel: + def test_returns_period_pair(self): + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy, periods=(2020, 2021)) + t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) + assert t_pre == 2020 + assert t_post == 2021 + + def test_rejects_string_periods_gracefully(self): + """String periods should still sort and validate.""" + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy, periods=("A", "B")) + # Should not raise - strings sort fine + t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) + assert t_pre == "A" + assert t_post == "B" From 77b34663e9b3318f984420c0b49d1975dc7f9703 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 17:19:26 -0400 Subject: [PATCH 02/17] Enforce d_lower == d.min() contract on HAD mass-point path Paper Section 3.2.4 defines the Design 1 mass-point estimator with instrument Z = 1{D_{g,2} > d_lower} at d_lower = float(d.min()) (the lower-support mass point). Phase 2a previously accepted user-supplied d_lower overrides on the mass-point path without validation, which would silently redefine the instrument/control split and identify a different (LATE-like) estimand outside Phase 2a's documented scope. Changes: - diff_diff/had.py: fit() now raises ValueError on the mass-point path when an explicit d_lower differs from float(d.min()) beyond float tolerance. d_lower=None (auto-resolve) and d_lower == d.min() (within tolerance) remain supported. Continuous paths are unaffected - they already reject off-support d_lower via Phase 1c's _validate_had_inputs (negative-dose check after the regressor shift). - tests/test_had.py: new TestMassPointDLowerContract class adds 6 rejection/acceptance tests for the contract. The prior test_force_mass_point_on_continuous_data test is renamed and clarified to document that d_lower=0.0 matches d.min()==0.0 in that DGP. The stale all-above-d_lower NaN-propagation test is removed (the input is now unreachable from the public API; the helper-level NaN guard is still tested via test_helper_returns_nan_on_empty_z_zero). Smaller fixes: - _validate_had_panel: first_treat_col comment no longer overstates the cross-validation (Phase 2a does value-domain only, not dose consistency). - _fit_continuous dispatch: "Warn once" comment revised to "Warn" (per-fit warnings are not suppressed across calls in Phase 2a). Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 41 +++++++++++++++++---- tests/test_had.py | 92 ++++++++++++++++++++++++++++++++++------------- 2 files changed, 101 insertions(+), 32 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 2398ad29..778036d9 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -423,11 +423,13 @@ def _validate_had_panel( f"t_pre={t_pre}. Drop these units or verify the dose column." ) - # Optional cross-validation via first_treat_col: if supplied, the - # first_treat == 0 units must have D_{g, t_post} == 0, and first_treat - # == t_post units may have any D_{g, t_post} >= 0. We do NOT require - # strict consistency (the paper treats D_{g,2} as the primary signal), - # but a quick sanity check catches obvious column mix-ups. + # Optional value-domain validation via first_treat_col: if supplied, + # every unit's first_treat value must be in {0, t_post} (0 = never + # treated, t_post = treated in the second period). This is a value- + # domain check that catches typos and staggered-timing mix-ups; it + # does NOT cross-validate first_treat against post-period dose + # (D_{g, t_post} remains the primary signal). Extended cross-checks + # are queued for a follow-up PR. if first_treat_col is not None: ft = data.groupby(unit_col)[first_treat_col].first().to_numpy() if np.any(np.isnan(ft.astype(np.float64, copy=False))): @@ -987,6 +989,30 @@ def fit( else: d_lower_val = float(d_lower_arg) + # ---- Mass-point contract: d_lower must equal the support infimum ---- + # Paper Section 3.2.4 defines the mass-point estimator with instrument + # Z = 1{D_{g,2} > d_lower} where d_lower is the lower-support mass + # point. A user-supplied d_lower that differs from float(d_arr.min()) + # redefines the instrument / control split and identifies a different + # estimand. Phase 2a supports only the paper's lower-support mass-point + # estimator, so we reject mismatched overrides front-door rather than + # silently running an unsupported variant. + if resolved_design == "mass_point" and d_lower_arg is not None: + d_min = float(d_arr.min()) + tol = 1e-12 * max(1.0, abs(d_min)) + if abs(d_lower_val - d_min) > tol: + raise ValueError( + f"design='mass_point' requires d_lower to equal the " + f"support infimum float(d.min())={d_min!r}; got " + f"d_lower={d_lower_val!r}. The paper's Design 1 mass-" + f"point estimator (Section 3.2.4) identifies at the " + f"lower-support mass point, not at an arbitrary " + f"threshold. Pass d_lower=None to auto-resolve, or " + f"d_lower=float(d.min()) explicitly. Non-support-" + f"infimum thresholds identify a different (LATE-like) " + f"estimand that is out of Phase 2a scope." + ) + # ---- Compute cohort counts ---- if resolved_design == "mass_point": eps = 1e-12 * max(1.0, abs(d_lower_val)) @@ -1007,8 +1033,9 @@ def fit( # ---- Dispatch ---- if resolved_design in ("continuous_at_zero", "continuous_near_d_lower"): - # Warn once if the user set a mass-point-only knob that's - # ignored on the continuous path. + # Warn when the user set a mass-point-only knob that's ignored + # on the continuous path. (Emitted per fit call; this is not + # suppressed after the first call.) if vcov_type_arg is not None: warnings.warn( f"vcov_type={vcov_type_arg!r} is ignored on the " diff --git a/tests/test_had.py b/tests/test_had.py index dedcc1ac..e1f6a7db 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -635,26 +635,6 @@ def test_constant_y_produces_nan_inference(self): } ) - def test_mass_point_all_above_d_lower_nan(self): - """Degenerate mass-point: no units at d_lower -> NaN.""" - rng = np.random.default_rng(0) - G = 500 - d = rng.uniform(0.6, 1.0, G) # all above 0.5 - dy = 0.3 * d + 0.1 * rng.standard_normal(G) - panel = _make_panel(d, dy) - r = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5).fit( - panel, "outcome", "dose", "period", "unit" - ) - assert np.isnan(r.att) - assert_nan_inference( - { - "se": r.se, - "t_stat": r.t_stat, - "p_value": r.p_value, - "conf_int": r.conf_int, - } - ) - def test_mass_point_all_at_d_lower_nan(self): """Degenerate mass-point: all units at d_lower -> NaN.""" rng = np.random.default_rng(0) @@ -1043,16 +1023,78 @@ def test_force_continuous_at_zero_on_mass_point_data(self): with pytest.raises(NotImplementedError, match="mass-point"): est.fit(panel, "outcome", "dose", "period", "unit") - def test_force_mass_point_on_continuous_data(self): - """Forcing mass-point on continuous data runs the 2SLS on the minimum cluster.""" + def test_force_mass_point_on_continuous_data_at_support_infimum(self): + """Forcing mass-point on continuous data with d_lower=d.min() runs. + + d.min()==0 exactly in this DGP, so d_lower=0.0 is the paper-consistent + support-infimum threshold. The resulting mass=1 case exercises the + degenerate-mass boundary (only unit 0 is "at d_lower"; rest are above). + """ d, dy = _dgp_continuous_at_zero(500, seed=0) panel = _make_panel(d, dy) - # d.min() == 0 exactly; mass-point with d_lower=0 will treat the - # zero-dose unit (just 1 unit) as the mass cluster and the rest as - # "above". This exercises the degenerate mass=1 case. + # d.min() == 0 exactly (d[0]=0 by construction); d_lower must match. est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.0) r = est.fit(panel, "outcome", "dose", "period", "unit") assert r.design == "mass_point" + assert r.d_lower == 0.0 + + +# ============================================================================= +# Mass-point d_lower contract enforcement +# ============================================================================= + + +class TestMassPointDLowerContract: + """Paper Section 3.2.4: mass-point d_lower must equal float(d.min()).""" + + def test_d_lower_above_min_raises(self): + d, dy = _dgp_mass_point(500, seed=0) # d.min() == 0.5 + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.6) + with pytest.raises(ValueError, match="support infimum"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_d_lower_below_min_raises(self): + d, dy = _dgp_mass_point(500, seed=0) # d.min() == 0.5 + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.3) + with pytest.raises(ValueError, match="support infimum"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_d_lower_matches_support_infimum_succeeds(self): + d, dy = _dgp_mass_point(500, seed=0) # d.min() == 0.5 + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5) + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert r.d_lower == 0.5 + assert np.isfinite(r.att) + + def test_d_lower_none_auto_resolves_to_min(self): + """With d_lower=None, the contract is satisfied automatically.""" + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=None) + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert abs(r.d_lower - float(d.min())) < 1e-14 + + def test_d_lower_within_tolerance_succeeds(self): + """Float-tolerance overrides of d.min() are accepted.""" + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + # Pass d_lower that differs from d.min() by float-rounding noise. + d_lower_user = float(d.min()) + 1e-15 + est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=d_lower_user) + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert np.isfinite(r.att) + + def test_d_lower_contract_is_mass_point_only(self): + """continuous_near_d_lower accepts arbitrary d_lower (within other guards).""" + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + # Setting d_lower=d.min() on continuous should just work. + est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min())) + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert np.isfinite(r.att) # ============================================================================= From 687a61573ffc91bdf4d3bbdcab237eb50c452409 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 17:20:41 -0400 Subject: [PATCH 03/17] Clarify continuous-path mass-point-guard test docstring Rename test_d_lower_contract_is_mass_point_only -> test_mass_point_equality_guard_does_not_fire_on_continuous and rewrite the docstring so it accurately describes what's exercised. Previously claimed "arbitrary d_lower" but only tests d_lower=d.min(); the renamed test now narrates that the mass-point-specific ValueError does not fire on the continuous path (the continuous path has its own upstream negative-dose guard after the regressor shift). Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/test_had.py | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tests/test_had.py b/tests/test_had.py index e1f6a7db..9afe4457 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1087,11 +1087,16 @@ def test_d_lower_within_tolerance_succeeds(self): r = est.fit(panel, "outcome", "dose", "period", "unit") assert np.isfinite(r.att) - def test_d_lower_contract_is_mass_point_only(self): - """continuous_near_d_lower accepts arbitrary d_lower (within other guards).""" + def test_mass_point_equality_guard_does_not_fire_on_continuous(self): + """The mass-point-specific d_lower equality guard is not enforced on + continuous_near_d_lower. (The continuous path has its own upstream + Phase 1c guards against off-support d_lower via the negative-dose + check after the regressor shift, but the mass-point-specific + ValueError does not fire here.) + """ d, dy = _dgp_continuous_near_d_lower(500, seed=0) panel = _make_panel(d, dy) - # Setting d_lower=d.min() on continuous should just work. + # d_lower == d.min() is always a valid continuous configuration. est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min())) r = est.fit(panel, "outcome", "dose", "period", "unit") assert np.isfinite(r.att) From 8c38f072c0e6d2a8fb31715aa51cff9db5fa1cec Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 18:44:36 -0400 Subject: [PATCH 04/17] Fix P0 HAD continuous estimator formula + P1 validator + docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address CI AI review on PR #346. **P0 (Methodology — estimator formula was wrong):** The continuous paths computed `att = tau_bc / D_bar` instead of the paper's `att = (mean(ΔY) - tau_bc) / den`. On a known DGP with β = 0.3 the old code returned ~0 (since tau_bc ≈ lim_{d↓d̲} E[ΔY|D≤d] ≈ 0); the fix recovers the true β at n=4000. - Design 1' (continuous_at_zero), paper Equation 7 / Theorem 3: β = (E[ΔY] - lim_{d↓0} E[ΔY | D_2 ≤ d]) / E[D_2] Implementation: `att = (dy_arr.mean() - tau_bc) / d_arr.mean()`, `se = se_robust / |d_arr.mean()|`. - Design 1 (continuous_near_d_lower), paper Theorem 4 / `WAS_{d_lower}` under Assumption 6: β = (E[ΔY] - lim_{d↓d_lower} E[ΔY | D_2 ≤ d]) / E[D_2 - d_lower] Implementation: regressor shift `D - d_lower`, evaluate local-linear fit at boundary=0 on the shifted scale; `att = (dy_arr.mean() - tau_bc) / (d_arr - d_lower).mean()`. CI endpoints reverse under the subtraction `(ΔȲ - tau_bc)`; safe_inference computes `att ± z · se` from scratch so reversal is automatic. **P1 (Validator — incomplete contract on continuous paths):** - `_validate_had_panel` now rejects negative post-period doses (`D_{g,2} < 0`) front-door on the ORIGINAL unshifted scale so errors reference the user's dose column, not shifted values. - `continuous_near_d_lower` now enforces `d_lower == float(d.min())` within float tolerance, mirroring the mass-point guard. Off-support `d_lower` would otherwise pass through Phase 1c's 5% plausibility heuristic and silently identify a different estimand. **P1 (Identification — Assumption 5/6 not surfaced):** Design 1 paths (continuous_near_d_lower + mass_point) now emit a UserWarning stating that `WAS_{d_lower}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. continuous_at_zero (Design 1', Assumption 3 only) does not emit the warning. **P2 (vcov_type docstring mismatch):** Corrected the constructor docstring to reflect actual behavior: `vcov_type=None` falls back to the `robust` flag (`robust=True -> hc1`, `robust=False -> classical`, the default); explicit `vcov_type` takes precedence over `robust`. **Tests rewritten + new regressions:** - `TestBetaScaleRescaling` now pins the corrected formula at atol=1e-14 (both designs), pins the CI endpoint reversal, and adds two "recover true β" asymptotic sanity tests. - New `TestDesign1DLowerContract` covers mass-point AND continuous_near_d_lower d_lower equality enforcement. - New `TestPostPeriodDoseContract` covers negative-dose rejection. - New `TestAssumptionFiveSixWarning` verifies the Design 1 warning is emitted on continuous_near_d_lower + mass_point and NOT on continuous_at_zero. **Docs:** REGISTRY.md HAD Phase 2a section updated with the corrected estimator formula, the d_lower contract on both Design 1 paths, the Assumption 5/6 warning note, and a CI-endpoint-reversal note. had.py module and result-class docstrings updated in kind. Targeted regression: 118 HAD tests + 497 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 236 ++++++++++++++++++++++++--------- docs/methodology/REGISTRY.md | 6 +- tests/test_had.py | 247 +++++++++++++++++++++++++++-------- 3 files changed, 371 insertions(+), 118 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 778036d9..01c98400 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -2,25 +2,39 @@ Heterogeneous Adoption Difference-in-Differences (HAD) estimator (Phase 2a). Implements the de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026) -Weighted-Average-Slope (WAS) estimator with three design-dispatch paths: +Weighted-Average-Slope (WAS) estimator with three design-dispatch paths. +All three paths produce a beta-scale point estimate of the form +``(mean(Delta Y) - [boundary limit]) / [expected dose gap]`` (Design 1 +family) or the Wald-IV ratio (mass-point), then route inference through +:func:`diff_diff.utils.safe_inference`. 1. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density - continuous at zero. Evaluates the bias-corrected local-linear fit at - ``boundary = 0`` and rescales to the beta-scale via Equation 8's divisor - ``D_bar = (1/G) * sum(D_{g,2})``. Invokes Assumption 3. + continuous at zero, Assumption 3. Equation 7 / Theorem 3: + + beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2] + + The bias-corrected local-linear fit at ``boundary = 0`` estimates + the boundary limit; ``E[D_2]`` and ``E[Delta Y]`` are plugin sample + means. 2. Design 1 continuous-near-d_lower (``continuous_near_d_lower``): - ``d_lower > 0``, continuous boundary density. Evaluates the bias-corrected - local-linear fit at ``boundary = float(d.min())`` after the regressor - shift ``D' = D - d_lower``. Same divisor, same CI path. - Invokes Assumption 5 or 6. - -3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction at - ``d.min()`` exceeds 2%. Uses a sample-average 2SLS estimator with - instrument ``Z_g = 1{D_{g,2} > d_lower}``. Point estimate reduces to the - Wald-IV ratio - ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard error - via the structural-residual 2SLS sandwich (see ``_fit_mass_point_2sls``). + ``d_lower > 0``, continuous boundary density, Assumption 5 or 6. + Proposition 3 / Theorem 4 (``WAS_{d_lower}`` under Assumption 6): + + beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d]) + / E[D_2 - d_lower] + + The local-linear fit is anchored at ``d_lower`` via the regressor + shift ``D' = D - d_lower``, evaluated at ``boundary = 0`` on the + shifted scale. + +3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction + at ``d.min()`` exceeds 2%. Sample-average 2SLS with instrument + ``Z_g = 1{D_{g,2} > d_lower}`` (paper Section 3.2.4). Point estimate + equals the Wald-IV ratio + ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard + error via the structural-residual 2SLS sandwich + (see ``_fit_mass_point_2sls``). Phase 2a ships the single-period path only; the multi-period event-study extension (paper Appendix B.2) is queued for Phase 2b. @@ -119,15 +133,28 @@ class HeterogeneousAdoptionDiDResults: Attributes ---------- att : float - Point estimate of the WAS parameter on the beta-scale. For the - continuous designs: ``tau_bc / D_bar`` where ``tau_bc`` is the - bias-corrected local-linear statistic on the mu-scale and - ``D_bar = (1/G) * sum(D_{g,2})``. For the mass-point design: - the 2SLS coefficient directly. + Point estimate of the WAS parameter on the beta-scale. + + - Design 1' (paper Equation 7 / Theorem 3): + ``att = (mean(ΔY) - tau_bc) / D_bar`` + where ``tau_bc`` is the bias-corrected local-linear estimate + of ``lim_{d v 0} E[ΔY | D_2 <= d]`` and + ``D_bar = (1/G) * sum(D_{g,2})``. + - Design 1 continuous-near-d_lower (paper Theorem 4, + ``WAS_{d_lower}`` under Assumption 6): + ``att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower)`` + where ``tau_bc`` is the bias-corrected local-linear estimate + of ``lim_{d v d_lower} E[ΔY | D_2 <= d]``. + - Mass-point (paper Section 3.2.4): the Wald-IV / 2SLS + coefficient directly - + ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. se : float Standard error on the beta-scale. For continuous designs, the - CCT-2014 robust SE divided by ``D_bar``. For mass-point, the 2SLS - structural-residual sandwich SE. + CCT-2014 robust SE from Phase 1c divided by ``|den|`` (the + absolute denominator used in ``att``); the higher-order + variance from ``mean(ΔY)`` is dominated by the nonparametric + boundary estimate in large samples and is not included. For + mass-point, the 2SLS structural-residual sandwich SE. t_stat, p_value, conf_int : inference fields Routed through ``safe_inference``; NaN when SE is non-finite. alpha : float @@ -423,6 +450,26 @@ def _validate_had_panel( f"t_pre={t_pre}. Drop these units or verify the dose column." ) + # Post-period nonnegative-dose check on the ORIGINAL (unshifted) dose + # scale. Front-door rejection per paper Assumption (dose definition + # Section 2) which treats D_{g,2} as nonnegative. Without this + # check, negative original doses would only surface after the + # regressor shift in ``_fit_continuous`` via Phase 1c's + # ``_validate_had_inputs``, which references the shifted values + # and would confuse users about which column is malformed. + post_mask = data[time_col] == t_post + post_doses = np.asarray(data.loc[post_mask, dose_col], dtype=np.float64) + neg_post = post_doses < 0 + if neg_post.any(): + n_neg = int(neg_post.sum()) + min_neg = float(post_doses[neg_post].min()) + raise ValueError( + f"HAD requires D_{{g,2}} >= 0 for all units (paper Section " + f"2 dose definition). {n_neg} unit(s) have negative post-" + f"period dose at t_post={t_post} (min={min_neg!r}). Drop " + f"these units or verify the dose column." + ) + # Optional value-domain validation via first_treat_col: if supplied, # every unit's first_treat value must be in {0, t_post} (0 = never # treated, t_post = treated in the second period). This is a value- @@ -766,15 +813,21 @@ class HeterogeneousAdoptionDiD: alpha : float CI level (0.05 for 95% CI). vcov_type : {"classical", "hc1"} or None - Mass-point-path only. ``None`` defaults to ``"hc1"``. - ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError``. Ignored - on the continuous paths (which use the CCT-2014 robust SE from + Mass-point-path only. When ``None``, the effective family falls + back to the ``robust`` flag: ``robust=True`` -> ``"hc1"``, + ``robust=False`` -> ``"classical"`` (the default construction). + Explicit ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError`` + pending a 2SLS-specific leverage derivation. Ignored on the + continuous paths (which use the CCT-2014 robust SE from Phase 1c); passing a non-default ``vcov_type`` on a continuous - path emits a one-time ``UserWarning``. + path emits a ``UserWarning`` per fit call. robust : bool - Backward-compat alias: ``True`` maps to ``"hc1"``, ``False`` maps - to ``"classical"``. Only relevant when ``vcov_type is None`` on - the mass-point path. + Backward-compat alias used only when ``vcov_type is None``: + ``True`` -> ``"hc1"``, ``False`` -> ``"classical"``. Explicit + ``vcov_type`` takes precedence (e.g., + ``vcov_type="classical", robust=True`` runs classical). Only + the mass-point path consumes these; continuous paths ignore + both with a warning. cluster : str or None Column name for cluster-robust SE on the mass-point path (CR1). Ignored with a ``UserWarning`` on the continuous paths in Phase @@ -989,28 +1042,35 @@ def fit( else: d_lower_val = float(d_lower_arg) - # ---- Mass-point contract: d_lower must equal the support infimum ---- - # Paper Section 3.2.4 defines the mass-point estimator with instrument - # Z = 1{D_{g,2} > d_lower} where d_lower is the lower-support mass - # point. A user-supplied d_lower that differs from float(d_arr.min()) - # redefines the instrument / control split and identifies a different - # estimand. Phase 2a supports only the paper's lower-support mass-point - # estimator, so we reject mismatched overrides front-door rather than - # silently running an unsupported variant. - if resolved_design == "mass_point" and d_lower_arg is not None: + # ---- d_lower contract for Design 1 paths ---- + # Paper Sections 3.2.2-3.2.4 define the Design 1 estimators at + # d_lower = support infimum of D_{g,2}. For the mass-point path, + # the instrument Z = 1{D_{g,2} > d_lower} requires d_lower to be + # the lower-support mass point. For the continuous-near-d_lower + # path, evaluating the local-linear fit after the regressor + # shift (D - d_lower) at boundary=0 only makes sense when the + # shift anchors to the realized sample minimum; otherwise the + # boundary evaluation is off-support (no observations near zero + # on the shifted scale) and Phase 1c's 5% plausibility heuristic + # may fail to catch the mismatch. We enforce d_lower == d.min() + # within float tolerance on both Design 1 paths; mismatched + # overrides raise with a clear pointer to the unsupported + # estimand. + if resolved_design in ("mass_point", "continuous_near_d_lower") and d_lower_arg is not None: d_min = float(d_arr.min()) tol = 1e-12 * max(1.0, abs(d_min)) if abs(d_lower_val - d_min) > tol: raise ValueError( - f"design='mass_point' requires d_lower to equal the " - f"support infimum float(d.min())={d_min!r}; got " - f"d_lower={d_lower_val!r}. The paper's Design 1 mass-" - f"point estimator (Section 3.2.4) identifies at the " - f"lower-support mass point, not at an arbitrary " + f"design={resolved_design!r} requires d_lower to equal " + f"the support infimum float(d.min())={d_min!r}; got " + f"d_lower={d_lower_val!r}. The paper's Design 1 " + f"estimators (Sections 3.2.2-3.2.4) identify at the " + f"lower-support boundary, not at an arbitrary " f"threshold. Pass d_lower=None to auto-resolve, or " f"d_lower=float(d.min()) explicitly. Non-support-" - f"infimum thresholds identify a different (LATE-like) " - f"estimand that is out of Phase 2a scope." + f"infimum thresholds identify a different (LATE-like " + f"for mass_point, off-support for continuous_near_" + f"d_lower) estimand that is out of Phase 2a scope." ) # ---- Compute cohort counts ---- @@ -1031,6 +1091,26 @@ def fit( dose_mean = float(d_arr.mean()) + # ---- Assumption 5/6 warning on Design 1 paths ---- + # Paper Sections 3.2.2-3.2.4: when d_lower > 0 (Design 1 family), + # point identification of WAS_{d_lower} requires Assumption 6 in + # addition to parallel trends (Assumption 1-3); Assumption 5 gives + # only sign identification. These extra assumptions are NOT + # testable via pre-trends. Surface this to the user front-door so + # results are not silently interpreted as full point identification. + if resolved_design in ("continuous_near_d_lower", "mass_point"): + warnings.warn( + f"design={resolved_design!r} (Design 1, d_lower > 0) requires " + f"Assumption 6 from de Chaisemartin et al. (2026) for point " + f"identification of WAS_{{d_lower}}, or Assumption 5 for " + f"sign identification only. Neither is testable via " + f"pre-trends. Confirm the extra assumption is defensible " + f"for your setting before interpreting the returned " + f"point estimate as the WAS.", + UserWarning, + stacklevel=2, + ) + # ---- Dispatch ---- if resolved_design in ("continuous_at_zero", "continuous_near_d_lower"): # Warn when the user set a mass-point-only knob that's ignored @@ -1125,28 +1205,53 @@ def _fit_continuous( resolved_design: str, d_lower_val: float, ) -> Tuple[float, float, Optional[BiasCorrectedFit], Optional[BandwidthResult]]: - """Fit Phase 1c ``bias_corrected_local_linear`` and rescale to beta-scale. + """Fit Phase 1c ``bias_corrected_local_linear`` and form the WAS estimate. + + Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau + (2026) continuous-design estimators: + + - Design 1' (``continuous_at_zero``), paper Equation 7 / + Theorem 3: + + beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2] + + Regressor passed to the local-linear boundary fit is + ``d_arr``; the boundary is ``0``. - Both continuous designs share the same rescaling path - they - differ only in the regressor shift and boundary: + - Design 1 (``continuous_near_d_lower``), paper Proposition 3 / + Theorem 4 (``WAS_{d_lower}`` under Assumption 6): - - ``continuous_at_zero``: regressor ``d_arr``, boundary ``0``. - - ``continuous_near_d_lower``: regressor ``d_arr - d_lower``, - boundary ``0``. + beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d]) + / E[D_2 - d_lower] - The mu-scale output ``tau_bc`` from - :func:`bias_corrected_local_linear` is divided by the - beta-scale divisor ``D_bar = (1/G) * sum(D_{g,2})`` (Equation 8 - in the paper). The same divisor applies to the classical CI - endpoints. The result of ``D_bar`` uses the ORIGINAL - (unshifted) dose mean, matching paper convention. + Regressor passed to the local-linear fit is + ``d_arr - d_lower`` so the boundary evaluation point is ``0`` + on the shifted scale; the numerator and denominator are + computed on the original scale. + + The bias-corrected boundary estimate ``tau_bc`` from + :func:`bias_corrected_local_linear` corresponds to the limit + ``lim_{d v d_lower} E[Delta Y | D_2 <= d]``. The beta-scale + estimator and standard error are then: + + att = (mean(dy) - tau_bc) / den + se = se_robust / |den| + + where ``den`` is the expectation in the denominator (``D_bar`` + for Design 1', ``mean(D_2 - d_lower)`` for Design 1). The + confidence interval is ``att +/- z * se`` computed in + :func:`diff_diff.utils.safe_inference`; the endpoints reverse + relative to the boundary-limit CI because the numerator + transformation is ``ΔȲ - tau_bc``. """ if resolved_design == "continuous_at_zero": d_reg = d_arr boundary = 0.0 + den = float(d_arr.mean()) elif resolved_design == "continuous_near_d_lower": d_reg = d_arr - d_lower_val boundary = 0.0 + den = float((d_arr - d_lower_val).mean()) else: raise ValueError( f"_fit_continuous called with non-continuous " f"design={resolved_design!r}" @@ -1174,15 +1279,18 @@ def _fit_continuous( except (ZeroDivisionError, FloatingPointError, np.linalg.LinAlgError): return float("nan"), float("nan"), None, None - dose_mean = float(d_arr.mean()) - # Phase 1b / Phase 1c pre-checks enforce dose_mean > 0 for - # Design 1-family samples, but defense-in-depth: if dose_mean - # rounds to zero, fall through to NaN via safe_inference in fit(). - if abs(dose_mean) < 1e-12: + # Guard against degenerate denominators: if all units are at + # d_lower (continuous_near_d_lower) or if D_bar rounds to zero + # (continuous_at_zero with a vanishing sample mean), the beta- + # scale estimator is undefined. Fall through to NaN via + # safe_inference in fit(). + if abs(den) < 1e-12: att = float("nan") se = float("nan") else: - att = float(bc_fit.estimate_bias_corrected) / dose_mean - se = float(bc_fit.se_robust) / dose_mean + dy_mean = float(dy_arr.mean()) + tau_bc = float(bc_fit.estimate_bias_corrected) + att = (dy_mean - tau_bc) / den + se = float(bc_fit.se_robust) / abs(den) return att, se, bc_fit, bc_fit.bandwidth_diagnostics diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index d9e1aba3..9944477c 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2316,11 +2316,13 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4. - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246). - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug. -- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and apply the β-scale rescaling `(1/G) Σ D_{g,2}` from Equation 8 of de Chaisemartin et al. (2026); mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d_lower}`. +- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Equation 7 / Theorem 3) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 4, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4). - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples. -- [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units; rejects `>2` time periods (staggered reduction queued for Phase 2b); rejects unbalanced panels and NaN in outcome/dose/unit columns. +- [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand. - [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs. - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR. + - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning. + - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate. - **Note (Phase 2a scope):** Ships single-period only. `aggregate="event_study"` (Appendix B.2 multi-period extension) raises `NotImplementedError` pointing to Phase 2b. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR. - [ ] Phase 2b: Multi-period event-study extension (Appendix B.2). Will lift the `aggregate="event_study"` scaffolding in `HeterogeneousAdoptionDiD.fit()` and aggregate per-cohort first-differences into an event-study result. - [ ] Phase 3: `qug_test()` (`T = D_{2,(1)} / (D_{2,(2)} - D_{2,(1)})`, rejection `{T > 1/α - 1}`). diff --git a/tests/test_had.py b/tests/test_had.py index 9afe4457..b8e0152f 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -4,8 +4,13 @@ 1. All three design paths produce a finite result on synthetic DGPs. 2. ``design="auto"`` resolves correctly on each DGP + two edge cases. -3. Beta-scale rescaling: ``att == tau_bc / D_bar`` and CI endpoints == - ``(ci_low/D_bar, ci_high/D_bar)`` at ``atol=1e-14``. +3. Beta-scale WAS estimator at atol=1e-14: + - Design 1' / continuous_at_zero: + ``att = (mean(ΔY) - tau_bc) / mean(D)`` + - Design 1 / continuous_near_d_lower: + ``att = (mean(ΔY) - tau_bc) / mean(D - d_lower)`` + - CI endpoints reverse under subtraction: + ``CI_lower(att) = (mean(ΔY) - CI_upper_boundary) / den`` 4. Mass-point Wald-IV point estimate matches manual formula at ``atol=1e-14``. 5. Mass-point 2SLS SE parity against hand-coded sandwich at @@ -231,8 +236,19 @@ def test_auto_does_not_mutate_self_design(self): class TestBetaScaleRescaling: - def test_att_equals_tau_bc_over_dbar(self): - """result.att == bc_fit.tau_bc / D_bar at atol=1e-14.""" + """Plan commit criterion #3 + review P0: the continuous estimator is + + att = (mean(ΔY) - tau_bc) / den + + with ``den = mean(D)`` for Design 1' and ``den = mean(D - d_lower)`` + for Design 1 continuous-near-d_lower. SE is ``se_robust / |den|``. + CI endpoints are computed via ``att +/- z * se`` (endpoints reverse + relative to the boundary-limit CI because the numerator is + ``ΔȲ - tau_bc``). + """ + + def test_att_design_1_prime(self): + """att = (mean(ΔY) - tau_bc) / D_bar for Design 1' at atol=1e-14.""" d, dy = _dgp_continuous_at_zero(500, seed=0) panel = _make_panel(d, dy) r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( @@ -240,32 +256,27 @@ def test_att_equals_tau_bc_over_dbar(self): ) bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) d_bar = float(d.mean()) - expected = float(bc.estimate_bias_corrected) / d_bar + dy_mean = float(dy.mean()) + expected = (dy_mean - float(bc.estimate_bias_corrected)) / d_bar assert abs(r.att - expected) < 1e-14 - def test_se_equals_se_rb_over_dbar(self): + def test_se_design_1_prime(self): + """se = se_robust / |D_bar| for Design 1' at atol=1e-14.""" d, dy = _dgp_continuous_at_zero(500, seed=0) panel = _make_panel(d, dy) r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( panel, "outcome", "dose", "period", "unit" ) bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) - d_bar = float(d.mean()) - expected = float(bc.se_robust) / d_bar + expected = float(bc.se_robust) / abs(float(d.mean())) assert abs(r.se - expected) < 1e-14 - def test_ci_lower_equals_ci_low_over_dbar(self): - d, dy = _dgp_continuous_at_zero(500, seed=0) - panel = _make_panel(d, dy) - r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( - panel, "outcome", "dose", "period", "unit" - ) - bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) - d_bar = float(d.mean()) - expected = float(bc.ci_low) / d_bar - assert abs(r.conf_int[0] - expected) < 1e-14 + def test_ci_endpoints_reverse_under_subtraction(self): + """Because att = (ΔȲ - tau_bc)/D_bar, CI endpoints reverse: - def test_ci_upper_equals_ci_high_over_dbar(self): + CI_lower(att) = (ΔȲ - CI_upper_boundary) / D_bar + CI_upper(att) = (ΔȲ - CI_lower_boundary) / D_bar + """ d, dy = _dgp_continuous_at_zero(500, seed=0) panel = _make_panel(d, dy) r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( @@ -273,23 +284,61 @@ def test_ci_upper_equals_ci_high_over_dbar(self): ) bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05) d_bar = float(d.mean()) - expected = float(bc.ci_high) / d_bar - assert abs(r.conf_int[1] - expected) < 1e-14 - - def test_rescaling_shifted_regressor_continuous_near_d_lower(self): - """continuous_near_d_lower: regressor shift `D - d_lower`, divisor = mean(D).""" + dy_mean = float(dy.mean()) + # CI bounds on the att scale, computed by endpoint reversal from + # the boundary-limit CI. + expected_lower = (dy_mean - float(bc.ci_high)) / d_bar + expected_upper = (dy_mean - float(bc.ci_low)) / d_bar + assert abs(r.conf_int[0] - expected_lower) < 1e-14 + assert abs(r.conf_int[1] - expected_upper) < 1e-14 + + def test_att_design_1_continuous_near_d_lower(self): + """att = (mean(ΔY) - tau_bc) / mean(D - d_lower) for Design 1 at atol=1e-14.""" d, dy = _dgp_continuous_near_d_lower(500, seed=0) panel = _make_panel(d, dy) d_lower_val = float(d.min()) - r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( - panel, "outcome", "dose", "period", "unit" - ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + panel, "outcome", "dose", "period", "unit" + ) d_reg = d - d_lower_val bc = bias_corrected_local_linear(d=d_reg, y=dy, boundary=0.0, alpha=0.05) - d_bar = float(d.mean()) - expected = float(bc.estimate_bias_corrected) / d_bar + den = float((d - d_lower_val).mean()) + dy_mean = float(dy.mean()) + expected = (dy_mean - float(bc.estimate_bias_corrected)) / den assert abs(r.att - expected) < 1e-14 + def test_att_recovers_true_beta_design_1_prime(self): + """Sanity: on a known DGP with beta=0.3, att should be close to 0.3.""" + rng = np.random.default_rng(0) + G = 2000 + d = rng.uniform(0, 1, G) + d[0] = 0.0 + dy = 0.3 * d + 0.05 * rng.standard_normal(G) + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + # Asymptotic: expect att close to 0.3 at G=2000, n=4000 observations. + assert abs(r.att - 0.3) < 0.1 + + def test_att_recovers_true_beta_continuous_near_d_lower(self): + """Sanity: Design 1 DGP with beta_d_lower=0.3 recovers beta at scale.""" + rng = np.random.default_rng(0) + G = 2000 + u = rng.beta(2, 2, G) + d = 0.1 + 0.9 * u # d_lower ~ 0.1 + # True WAS_{d_lower} = 0.3 since dy = 0.3 * (d - d_lower) + noise + dy = 0.3 * (d - 0.1) + 0.05 * rng.standard_normal(G) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert abs(r.att - 0.3) < 0.1 + def test_dose_mean_stored_on_result(self): d, dy = _dgp_continuous_at_zero(500, seed=0) panel = _make_panel(d, dy) @@ -1040,68 +1089,162 @@ def test_force_mass_point_on_continuous_data_at_support_infimum(self): # ============================================================================= -# Mass-point d_lower contract enforcement +# Design 1 d_lower contract enforcement (mass-point + continuous_near_d_lower) # ============================================================================= -class TestMassPointDLowerContract: - """Paper Section 3.2.4: mass-point d_lower must equal float(d.min()).""" +class TestDesign1DLowerContract: + """Paper Sections 3.2.2-3.2.4: Design 1 estimators identify at the support + infimum. Both mass_point and continuous_near_d_lower require + ``d_lower == float(d.min())`` within float tolerance; mismatched + overrides raise. + """ - def test_d_lower_above_min_raises(self): + def test_mass_point_d_lower_above_min_raises(self): d, dy = _dgp_mass_point(500, seed=0) # d.min() == 0.5 panel = _make_panel(d, dy) est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.6) with pytest.raises(ValueError, match="support infimum"): est.fit(panel, "outcome", "dose", "period", "unit") - def test_d_lower_below_min_raises(self): - d, dy = _dgp_mass_point(500, seed=0) # d.min() == 0.5 + def test_mass_point_d_lower_below_min_raises(self): + d, dy = _dgp_mass_point(500, seed=0) panel = _make_panel(d, dy) est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.3) with pytest.raises(ValueError, match="support infimum"): est.fit(panel, "outcome", "dose", "period", "unit") - def test_d_lower_matches_support_infimum_succeeds(self): - d, dy = _dgp_mass_point(500, seed=0) # d.min() == 0.5 + def test_mass_point_d_lower_matches_succeeds(self): + d, dy = _dgp_mass_point(500, seed=0) panel = _make_panel(d, dy) est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5) r = est.fit(panel, "outcome", "dose", "period", "unit") assert r.d_lower == 0.5 assert np.isfinite(r.att) - def test_d_lower_none_auto_resolves_to_min(self): - """With d_lower=None, the contract is satisfied automatically.""" + def test_mass_point_d_lower_none_auto_resolves_to_min(self): d, dy = _dgp_mass_point(500, seed=0) panel = _make_panel(d, dy) est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=None) r = est.fit(panel, "outcome", "dose", "period", "unit") assert abs(r.d_lower - float(d.min())) < 1e-14 - def test_d_lower_within_tolerance_succeeds(self): - """Float-tolerance overrides of d.min() are accepted.""" + def test_mass_point_d_lower_within_tolerance_succeeds(self): d, dy = _dgp_mass_point(500, seed=0) panel = _make_panel(d, dy) - # Pass d_lower that differs from d.min() by float-rounding noise. d_lower_user = float(d.min()) + 1e-15 est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=d_lower_user) r = est.fit(panel, "outcome", "dose", "period", "unit") assert np.isfinite(r.att) - def test_mass_point_equality_guard_does_not_fire_on_continuous(self): - """The mass-point-specific d_lower equality guard is not enforced on - continuous_near_d_lower. (The continuous path has its own upstream - Phase 1c guards against off-support d_lower via the negative-dose - check after the regressor shift, but the mass-point-specific - ValueError does not fire here.) - """ + def test_continuous_near_d_lower_above_min_raises(self): + """Review P1: continuous_near_d_lower must also enforce support infimum.""" d, dy = _dgp_continuous_near_d_lower(500, seed=0) panel = _make_panel(d, dy) - # d_lower == d.min() is always a valid continuous configuration. - est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min())) - r = est.fit(panel, "outcome", "dose", "period", "unit") + est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=0.3) + with pytest.raises(ValueError, match="support infimum"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_continuous_near_d_lower_below_min_raises(self): + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + # d.min() for this Beta DGP is > 0.1 but setting d_lower=0.05 is below min. + est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=0.05) + with pytest.raises(ValueError, match="support infimum"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_continuous_near_d_lower_matches_succeeds(self): + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min())) + r = est.fit(panel, "outcome", "dose", "period", "unit") assert np.isfinite(r.att) +# ============================================================================= +# Post-period dose non-negative contract (review P1) +# ============================================================================= + + +class TestPostPeriodDoseContract: + """Paper Section 2 dose definition: D_{g,2} >= 0. _validate_had_panel + rejects negative post-period dose front-door on the ORIGINAL scale + (before the regressor shift) so the error references the user's + dose column, not the Phase 1c shifted values. + """ + + def test_negative_post_dose_raises(self): + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + # Inject a negative post-period dose on one unit. + post_mask = panel["period"] == 2 + idx = panel[post_mask].index[0] + panel.loc[idx, "dose"] = -0.1 + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"D_\{g,2\}|negative post"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_zero_post_dose_accepted(self): + """D_{g,2} == 0 is the Design 1' no-treated-group case, always allowed.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + # Ensure d[0] == 0 exactly (no-treated unit) is accepted. + assert d[0] == 0.0 + panel = _make_panel(d, dy) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + + +# ============================================================================= +# Design 1 Assumption 5/6 identification warning (review P1) +# ============================================================================= + + +class TestAssumptionFiveSixWarning: + """Paper Sections 3.2.2-3.2.4: Design 1 fits require Assumption 5 (sign + identification) or Assumption 6 (point identification of WAS_{d_lower}) + beyond parallel trends. These extras are not pre-trend testable. A + UserWarning surfaces the identification burden on Design 1 fits. + """ + + def test_continuous_near_d_lower_emits_assumption_warning(self): + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + panel, "outcome", "dose", "period", "unit" + ) + assumption_warnings = [warn for warn in w if "Assumption" in str(warn.message)] + assert len(assumption_warnings) >= 1 + + def test_mass_point_emits_assumption_warning(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + HeterogeneousAdoptionDiD(design="mass_point").fit( + panel, "outcome", "dose", "period", "unit" + ) + assumption_warnings = [warn for warn in w if "Assumption" in str(warn.message)] + assert len(assumption_warnings) >= 1 + + def test_continuous_at_zero_does_not_emit_assumption_warning(self): + """Design 1' (d_lower=0) is identified under Assumption 3 only; no warning.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + assumption_warnings = [warn for warn in w if "Assumption 6" in str(warn.message)] + assert len(assumption_warnings) == 0 + + # ============================================================================= # Cluster handling (unit-level aggregation) # ============================================================================= From c320ddeb6bb6a8da191667a034a108734e66d8d5 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 18:58:08 -0400 Subject: [PATCH 05/17] Address PR #346 CI review round 2: 2 P1s + 1 P2 **P1 #1 (Methodology): continuous_near_d_lower on mass-point samples** When a user explicitly forced design="continuous_near_d_lower" on a sample that actually satisfies the >2% modal-fraction mass-point criterion, the downstream regressor shift (D - d_lower) would move the support minimum to zero on the shifted scale. Phase 1c's mass-point rejection guard only fires when d.min() > 0 (_validate_had_inputs), so the silent coercion ran the nonparametric local-linear estimator on a sample the paper (Section 3.2.4) requires to use the 2SLS branch, producing the wrong estimand. Fix: `HeterogeneousAdoptionDiD.fit()` now runs the modal-fraction check on the ORIGINAL (unshifted) d_arr when the user explicitly selects design="continuous_near_d_lower". If the fraction at d.min() exceeds 2%, the fit raises ValueError pointing to design="mass_point" or design="auto". design="auto" is unaffected (_detect_design already correctly resolves such samples to mass_point). **P1 #2 (Code Quality): first_treat_col validator not dtype-agnostic** The previous validator called `.astype(np.float64)` and `int(v)` on grouped first_treat values, which crashed on otherwise-supported string-labelled two-period panels (period in {"A","B"}, first_treat in {0, "B"}). Rewrote using `pd.isna()` for missingness and raw-value set-membership against `{0, t_post}` with no numeric coercion. **P2 (Maintainability): cluster-applied mass-point stored wrong vcov_type** When cluster was supplied, `_fit_mass_point_2sls` unconditionally switches to the CR1 cluster-robust sandwich, but the result object stored the REQUESTED family ("hc1" or "classical") as `vcov_type`. `summary()` rendered correctly via the cluster_name branch, but `to_dict()` and downstream programmatic consumers saw the stale requested label. Fixed: when cluster is supplied, `vcov_type` is stored as `"cr1"` regardless of the requested family. Renamed the local variable from `vcov_effective` to `vcov_requested` to separate the input from the effective family. Updated the `HeterogeneousAdoptionDiDResults.summary()` branch so the cluster rendering still works with the new stored value. **Tests added (+8 regression):** - TestValidateHadPanel.test_first_treat_col_with_string_periods - TestValidateHadPanel.test_first_treat_col_dtype_agnostic_rejects_invalid_string - TestContinuousPathRejectsMassPoint (2 tests) - TestMassPointClusterLabel (4 tests: cr1 stored when clustered, base family when unclustered, classical+cluster collapses to cr1, to_dict shows effective family) Targeted regression: 126 HAD tests + 505 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 74 +++++++++++++++++++++++------ tests/test_had.py | 115 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 174 insertions(+), 15 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 01c98400..e931c9d2 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -189,10 +189,15 @@ class HeterogeneousAdoptionDiDResults: ``"analytical_nonparametric"`` (continuous designs) or ``"analytical_2sls"`` (mass-point). vcov_type : str or None - Variance-covariance family used. ``None`` on continuous paths - (they use the CCT-2014 robust SE from Phase 1c, not the library's - ``vcov_type`` enum). Mass-point: one of ``"classical"`` or - ``"hc1"`` (CR1 when ``cluster`` is supplied). + Effective variance-covariance family used. ``None`` on continuous + paths (they use the CCT-2014 robust SE from Phase 1c, not the + library's ``vcov_type`` enum). Mass-point: ``"classical"`` or + ``"hc1"`` when ``cluster`` is not supplied, and ``"cr1"`` + whenever ``cluster`` is supplied (cluster-robust CR1 is computed + regardless of the requested ``vcov_type`` because + classical/hc1 + cluster collapses to the same CR1 sandwich). + Downstream consumers reading ``result.to_dict()`` can inspect + this field directly to determine the effective SE family. cluster_name : str or None Column name of the cluster variable on the mass-point path when cluster-robust SE is requested. ``None`` otherwise. @@ -269,9 +274,12 @@ def summary(self) -> str: lines.append(f"{'Strictly above d_lower:':<30} {self.n_above_d_lower:>20}") lines.append(f"{'Inference method:':<30} {self.inference_method:>20}") if self.vcov_type is not None: - label = self.vcov_type if self.cluster_name is not None: + # Cluster-robust (CR1): the stored vcov_type is already "cr1", + # but render with the cluster column for clarity. label = f"CR1 at {self.cluster_name}" + else: + label = self.vcov_type lines.append(f"{'Variance:':<30} {label:>20}") if self.bandwidth_diagnostics is not None: bw = self.bandwidth_diagnostics @@ -476,20 +484,24 @@ def _validate_had_panel( # domain check that catches typos and staggered-timing mix-ups; it # does NOT cross-validate first_treat against post-period dose # (D_{g, t_post} remains the primary signal). Extended cross-checks - # are queued for a follow-up PR. + # are queued for a follow-up PR. The check is DTYPE-AGNOSTIC: it uses + # pd.isna() for missingness and raw-value membership against + # {0, t_post} so that string-labelled periods (e.g., ("A", "B")) with + # first_treat in {0, "B"} are supported. if first_treat_col is not None: - ft = data.groupby(unit_col)[first_treat_col].first().to_numpy() - if np.any(np.isnan(ft.astype(np.float64, copy=False))): + ft_series = data.groupby(unit_col)[first_treat_col].first() + if bool(ft_series.isna().any()): raise ValueError( f"first_treat_col={first_treat_col!r} contains NaN. " f"Use 0 for never-treated units and t_post for treated." ) valid_values = {0, t_post} - bad = [v for v in np.unique(ft) if int(v) not in valid_values] + observed = set(ft_series.unique().tolist()) + bad = sorted(observed - valid_values, key=lambda x: str(x)) if bad: raise ValueError( f"first_treat_col={first_treat_col!r} contains value(s) " - f"{bad} outside the allowed set {{0, {t_post}}} for a " + f"{bad} outside the allowed set {{0, {t_post!r}}} for a " f"two-period HAD panel. Staggered timing with multiple " f"cohorts is Phase 2b." ) @@ -1042,6 +1054,34 @@ def fit( else: d_lower_val = float(d_lower_arg) + # ---- Original-scale mass-point check before the regressor shift ---- + # When a user explicitly forces design="continuous_near_d_lower" + # on a sample that is actually a mass-point sample (modal fraction + # at d.min() > 2% per paper Section 3.2.4), the downstream code + # would shift D - d_lower, making the support minimum zero on the + # shifted scale and suppressing the Phase 1c mass-point rejection + # guard (_validate_had_inputs only checks modal fraction when + # d.min() > 0). That silent coercion produces wrong CIs for a + # mass-point sample by running the nonparametric estimator where + # the paper's 2SLS branch is required. Front-door reject. + if resolved_design == "continuous_near_d_lower": + d_min_orig = float(d_arr.min()) + if d_min_orig > 0: + eps_mp = 1e-12 * max(1.0, abs(d_min_orig)) + at_d_min_mask_orig = np.abs(d_arr - d_min_orig) <= eps_mp + modal_fraction_orig = float(np.mean(at_d_min_mask_orig)) + if modal_fraction_orig > _MASS_POINT_THRESHOLD: + raise ValueError( + f"design='continuous_near_d_lower' cannot be used on a " + f"mass-point sample (modal fraction {modal_fraction_orig:.4f} " + f"at d.min()={d_min_orig!r} exceeds the " + f"{_MASS_POINT_THRESHOLD:.2f} threshold from paper Section " + f"3.2.4). Use design='mass_point' (Wald-IV / 2SLS) or " + f"design='auto' which will auto-detect. Forcing the " + f"continuous path on a mass-point sample would produce " + f"the wrong estimand." + ) + # ---- d_lower contract for Design 1 paths ---- # Paper Sections 3.2.2-3.2.4 define the Design 1 estimators at # d_lower = support infimum of D_{g,2}. For the mass-point path, @@ -1148,21 +1188,25 @@ def fit( elif resolved_design == "mass_point": if vcov_type_arg is None: # Backward-compat: robust=True -> hc1, robust=False -> classical. - vcov_effective = "hc1" if robust_arg else "classical" + vcov_requested = "hc1" if robust_arg else "classical" else: - vcov_effective = vcov_type_arg.lower() - # Explicit cluster + hc1 becomes CR1 inside _fit_mass_point_2sls. + vcov_requested = vcov_type_arg.lower() att, se = _fit_mass_point_2sls( d_arr, dy_arr, d_lower_val, cluster_arr, - vcov_effective, + vcov_requested, ) bc_fit = None bw_diag = None inference_method = "analytical_2sls" - vcov_label = vcov_effective + # Store the EFFECTIVE variance family so downstream consumers + # (to_dict, to_dataframe, summary) see the actual SE type that + # was computed. When cluster is supplied, _fit_mass_point_2sls + # unconditionally computes CR1 regardless of vcov_requested + # (e.g. classical+cluster -> CR1), so we surface that here. + vcov_label = "cr1" if cluster_arg is not None else vcov_requested cluster_label = cluster_arg if cluster_arg is not None else None else: raise ValueError(f"Internal error: unhandled design={resolved_design!r}.") diff --git a/tests/test_had.py b/tests/test_had.py index b8e0152f..19abe7b7 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1404,3 +1404,118 @@ def test_rejects_string_periods_gracefully(self): t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) assert t_pre == "A" assert t_post == "B" + + def test_first_treat_col_with_string_periods(self): + """Review P1: first_treat_col validator must be dtype-agnostic. + + With string periods ("A", "B") and first_treat_col values in + {0, "B"}, the validator must not attempt numeric coercion. + """ + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy, periods=("A", "B")) + # 50 units never-treated (first_treat=0), 50 treated (first_treat="B") + ft_unit = np.array([0 if i % 2 == 0 else "B" for i in range(100)], dtype=object) + panel["ft"] = np.repeat(ft_unit, 2) + t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft") + assert t_pre == "A" + assert t_post == "B" + + def test_first_treat_col_dtype_agnostic_rejects_invalid_string(self): + """Mix string periods + invalid first_treat_col string -> ValueError.""" + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy, periods=("A", "B")) + # Invalid: "Z" is neither 0 nor "B" + ft_unit = np.array([0 if i % 2 == 0 else "Z" for i in range(100)], dtype=object) + panel["ft"] = np.repeat(ft_unit, 2) + with pytest.raises(ValueError, match="first_treat_col"): + _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft") + + +# ============================================================================= +# Review P1: continuous_near_d_lower on a true mass-point sample rejects +# ============================================================================= + + +class TestContinuousPathRejectsMassPoint: + """Explicit override to continuous_near_d_lower on a mass-point sample + must raise before the regressor shift, otherwise the Phase 1c + mass-point guard (which fires only on d.min() > 0) is bypassed. + """ + + def test_continuous_near_on_mass_point_sample_raises(self): + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower") + with pytest.raises(ValueError, match=r"mass-point sample|mass_point"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_continuous_near_on_continuous_sample_runs(self): + """Sanity: the pre-shift check does NOT reject valid continuous samples.""" + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + + +# ============================================================================= +# Review P2: cluster-applied mass-point stores vcov_type="cr1" +# ============================================================================= + + +class TestMassPointClusterLabel: + def test_cluster_stores_cr1(self): + """to_dict() / downstream consumers see 'cr1' not 'hc1' when clustered.""" + d, dy = _dgp_mass_point(200, seed=0) + cluster_unit = np.repeat(np.arange(50), 4) + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r.vcov_type == "cr1" + assert r.cluster_name == "state" + + def test_no_cluster_stores_base_family(self): + """Unclustered mass-point keeps 'hc1' or 'classical' label.""" + d, dy = _dgp_mass_point(200, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r_hc1 = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1").fit( + panel, "outcome", "dose", "period", "unit" + ) + r_cl = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r_hc1.vcov_type == "hc1" + assert r_cl.vcov_type == "classical" + + def test_cluster_with_classical_collapses_to_cr1(self): + """classical + cluster is CR1 in practice; label reflects that.""" + d, dy = _dgp_mass_point(200, seed=0) + cluster_unit = np.repeat(np.arange(50), 4) + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD( + design="mass_point", vcov_type="classical", cluster="state" + ).fit(panel, "outcome", "dose", "period", "unit") + assert r.vcov_type == "cr1" + + def test_to_dict_shows_effective_family(self): + d, dy = _dgp_mass_point(200, seed=0) + cluster_unit = np.repeat(np.arange(50), 4) + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit( + panel, "outcome", "dose", "period", "unit" + ) + result_dict = r.to_dict() + assert result_dict["vcov_type"] == "cr1" + assert result_dict["cluster_name"] == "state" From ddc09e4341f528862de4a62505032d314fa8deea Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 19:11:40 -0400 Subject: [PATCH 06/17] Address PR #346 CI review round 3: P1 period inference + P2 summary label **P1 (Methodology): _validate_had_panel inferred pre/post by lexicographic sort** Previously the validator sorted the two period labels alphabetically and assigned `t_pre=periods[0]`, `t_post=periods[1]`. On supported string-labelled panels like `("pre", "post")` the alphabetic order is ["post", "pre"], so the code flipped pre and post and then raised on the treated-period D>0 check for a valid design. Same bug for `("before", "after")` and any non-alphabetic-chronological label pair. Fix: identify `t_pre` as the unique period where dose == 0 for ALL units (HAD paper Section 2 no-unit-untreated convention); `t_post` is the other period. This is a DGP-consistent invariant, not a string ordering. If neither period has all-zero dose, raise with the contract message and per-period nonzero-count diagnostics. If both periods have all-zero dose, raise (no treatment variation to estimate). The existing pre-period D=0 check is now tautological and has been removed since the inference itself enforces the invariant. Behavior on valid numeric panels (e.g., 2020/2021) is unchanged. **P2 (Code Quality): summary() hardcoded 'WAS' row label** `HeterogeneousAdoptionDiDResults.summary()` printed "WAS" as the parameter label regardless of the resolved design. For Design 1 paths (continuous_near_d_lower, mass_point) the stored `target_parameter` is "WAS_d_lower" per paper Sections 3.2.2-3.2.4, so the user-facing output misrepresented the estimand. Fix: render `self.target_parameter` in the summary row. Now Design 1' prints "WAS", Design 1 prints "WAS_d_lower", matching the stored result metadata. **Tests (+7 regression):** - TestValidateHadPanel.test_semantic_pre_post_labels_not_lexicographic - TestValidateHadPanel.test_semantic_pre_post_with_first_treat_col - TestValidateHadPanel.test_semantic_pre_post_fit_end_to_end - TestValidateHadPanel.test_before_after_labels - TestValidateHadPanel.test_no_all_zero_period_raises - TestValidateHadPanel.test_both_all_zero_periods_raises - TestResultMethods.test_summary_uses_target_parameter_for_row_label Targeted regression: 133 HAD tests + 512 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 52 +++++++++++++++--------- tests/test_had.py | 100 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 18 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index e931c9d2..913ce2e4 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -289,6 +289,7 @@ def summary(self) -> str: bc = self.bias_corrected_fit lines.append(f"{'Bandwidth h used:':<30} {bc.h:>20.6g}") lines.append(f"{'Obs in window (n_used):':<30} {bc.n_used:>20}") + param_label = self.target_parameter lines.extend( [ "", @@ -299,7 +300,7 @@ def summary(self) -> str: ), "-" * width, ( - f"{'WAS':<15} {self.att:>12.4f} {self.se:>12.4f} " + f"{param_label:<15} {self.att:>12.4f} {self.se:>12.4f} " f"{self.t_stat:>10.3f} {self.p_value:>10.4f}" ), "-" * width, @@ -395,30 +396,27 @@ def _validate_had_panel( if missing: raise ValueError(f"Missing column(s) in data: {missing}. Required: {required}.") - periods = np.sort(np.asarray(data[time_col].unique())) - if len(periods) < 2: + periods_list = list(data[time_col].unique()) + if len(periods_list) < 2: raise ValueError( - f"HAD requires a two-period panel; got {len(periods)} distinct " + f"HAD requires a two-period panel; got {len(periods_list)} distinct " f"period(s) in column {time_col!r}." ) - if len(periods) > 2: + if len(periods_list) > 2: if first_treat_col is None: raise ValueError( f"HAD Phase 2a requires exactly two time periods " - f"(got {len(periods)} in {time_col!r}) when " + f"(got {len(periods_list)} in {time_col!r}) when " f"first_treat_col=None. Multi-period / staggered adoption " f"support is queued for Phase 2b (Appendix B.2 event-study)." ) raise ValueError( f"HAD Phase 2a requires exactly two time periods " - f"(got {len(periods)} in {time_col!r}). Staggered adoption " + f"(got {len(periods_list)} in {time_col!r}). Staggered adoption " f"reduction (first_treat_col supplied with >2 periods) is " f"queued for Phase 2b (Appendix B.2 event-study)." ) - t_pre = int(periods[0]) if np.issubdtype(periods.dtype, np.integer) else periods[0] - t_post = int(periods[1]) if np.issubdtype(periods.dtype, np.integer) else periods[1] - # Balanced-panel check: every unit appears exactly once per period. counts = data.groupby([unit_col, time_col]).size() if (counts != 1).any(): @@ -446,17 +444,35 @@ def _validate_had_panel( f"calling fit()." ) - # Pre-period no-unit-untreated check. - pre_mask = data[time_col] == t_pre - pre_doses = np.asarray(data.loc[pre_mask, dose_col], dtype=np.float64) - nonzero_pre = pre_doses != 0 - if nonzero_pre.any(): - n_bad = int(nonzero_pre.sum()) + # Identify t_pre and t_post by the HAD invariant rather than by + # lexicographic sort on the time labels: D_{g, t_pre} = 0 for all + # units (paper Section 2 no-unit-untreated pre-period convention). + # Sorting labels alphabetically reverses valid chronologies like + # ("pre", "post") where ordering is semantic, not alphabetic. + per_period_nonzero: Dict[Any, int] = {} + for p in periods_list: + p_doses = np.asarray(data.loc[data[time_col] == p, dose_col], dtype=np.float64) + per_period_nonzero[p] = int((p_doses != 0).sum()) + all_zero_periods = [p for p, nz in per_period_nonzero.items() if nz == 0] + if len(all_zero_periods) == 0: + # Neither period has all-zero dose: HAD pre-period contract violated. + stats_str = ", ".join(f"{p!r}: {nz} nonzero" for p, nz in per_period_nonzero.items()) raise ValueError( f"HAD requires D_{{g,1}} = 0 for all units (pre-period " - f"untreated). {n_bad} unit(s) have nonzero dose at " - f"t_pre={t_pre}. Drop these units or verify the dose column." + f"untreated). Neither period in column {time_col!r} has " + f"all-zero dose ({stats_str}). Exactly one period must be " + f"the pre-treatment period with D_{{g,1}} = 0 for every unit; " + f"drop rows with nonzero pre-period dose or verify the dose " + f"column." + ) + if len(all_zero_periods) == 2: + raise ValueError( + f"HAD requires variation in D_{{g,2}} for estimation. Both " + f"periods in column {time_col!r} have all-zero dose, so " + f"there is no treatment assignment to estimate." ) + t_pre = all_zero_periods[0] + t_post = [p for p in periods_list if p != t_pre][0] # Post-period nonnegative-dose check on the ORIGINAL (unshifted) dose # scale. Front-door rejection per paper Assumption (dose definition diff --git a/tests/test_had.py b/tests/test_had.py index 19abe7b7..9b1e1668 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -899,6 +899,42 @@ def test_summary_returns_string(self): assert "WAS" in s assert "Confidence Interval" in s + def test_summary_uses_target_parameter_for_row_label(self): + """Review P2: the estimate row must render target_parameter (WAS or + WAS_d_lower), not hardcoded 'WAS'. + """ + # Design 1' -> target_parameter = "WAS" + d, dy = _dgp_continuous_at_zero(400, seed=0) + panel = _make_panel(d, dy) + r_d1p = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + s_d1p = r_d1p.summary() + assert r_d1p.target_parameter == "WAS" + assert "WAS" in s_d1p + + # Design 1 continuous-near-d_lower -> target_parameter = "WAS_d_lower" + d, dy = _dgp_continuous_near_d_lower(400, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r_d1 = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r_d1.target_parameter == "WAS_d_lower" + assert "WAS_d_lower" in r_d1.summary() + + # Design 1 mass-point -> target_parameter = "WAS_d_lower" + d, dy = _dgp_mass_point(400, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r_mp = HeterogeneousAdoptionDiD(design="mass_point").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert r_mp.target_parameter == "WAS_d_lower" + assert "WAS_d_lower" in r_mp.summary() + def test_print_summary_executes(self, capsys): r = self._result() r.print_summary() @@ -1430,6 +1466,70 @@ def test_first_treat_col_dtype_agnostic_rejects_invalid_string(self): with pytest.raises(ValueError, match="first_treat_col"): _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft") + def test_semantic_pre_post_labels_not_lexicographic(self): + """Review P1 round 3: pre/post inference must be dose-based. + + ("pre", "post") sorts alphabetically to ["post", "pre"], which + previously flipped the pre/post labels and raised on a valid + panel. The validator now infers pre from the all-zero-dose + period. + """ + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy, periods=("pre", "post")) + t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) + assert t_pre == "pre" + assert t_post == "post" + + def test_semantic_pre_post_with_first_treat_col(self): + """Combined: string periods + first_treat_col in {0, 'post'}.""" + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy, periods=("pre", "post")) + ft_unit = np.array([0 if i % 2 == 0 else "post" for i in range(100)], dtype=object) + panel["ft"] = np.repeat(ft_unit, 2) + t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft") + assert t_pre == "pre" + assert t_post == "post" + + def test_semantic_pre_post_fit_end_to_end(self): + """End-to-end: fit() runs on ("pre","post")-labelled panel.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy, periods=("pre", "post")) + r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + + def test_before_after_labels(self): + """("before","after") is also reversed alphabetically; must not fail.""" + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy, periods=("before", "after")) + t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) + assert t_pre == "before" + assert t_post == "after" + + def test_no_all_zero_period_raises(self): + """If neither period has all-zero dose, HAD's D_{g,1}=0 contract fails.""" + d, dy = _dgp_continuous_at_zero(100, seed=0) + panel = _make_panel(d, dy) + # Inject nonzero dose into the pre period so neither period is all-zero. + panel.loc[panel["period"] == 1, "dose"] = 0.5 + with pytest.raises(ValueError, match=r"D_\{g,1\}|pre-treatment"): + _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) + + def test_both_all_zero_periods_raises(self): + """If both periods have all-zero dose, no treatment to estimate.""" + G = 100 + panel = pd.DataFrame( + { + "unit": np.repeat(np.arange(G), 2), + "period": np.tile([1, 2], G), + "dose": np.zeros(2 * G), + "outcome": np.random.default_rng(0).standard_normal(2 * G), + } + ) + with pytest.raises(ValueError, match="variation"): + _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) + # ============================================================================= # Review P1: continuous_near_d_lower on a true mass-point sample rejects From e622d60482fa06544f2ca0a51c3e06584c4e0b83 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 19:23:38 -0400 Subject: [PATCH 07/17] Address PR #346 CI review round 4: P1 d_lower > 0 on Design 1 paths **P1 (Methodology): Design 1 paths must reject d_lower = 0** Paper Section 3.2 partitions HAD by regime: `d_lower = 0` is Design 1' (`continuous_at_zero`); `d_lower > 0` is Design 1 (`continuous_near_ d_lower` or `mass_point`). The auto-detect rule already respects this partition, but explicit overrides previously allowed `design="mass_point", d_lower=0` and `design="continuous_near_d_lower"` on a `d.min()==0` sample to run silently, returning a paper-incompatible estimand (2SLS with degenerate single-unit mass for mass_point; Design 1' algebra relabeled as `WAS_d_lower` with a spurious Assumption 5/6 warning for continuous_near_d_lower). Fix: add a fit-time guard that raises `ValueError` when `resolved_design in ("mass_point", "continuous_near_d_lower")` and the resolved `d_lower_val` is within float tolerance of zero (same tolerance family as `_detect_design`'s d.min()==0 tie-break). The error message points users to `continuous_at_zero` or `auto` for samples with support infimum at zero. **Docstring + test updates:** - Rewrote the `design` parameter docstring to document the regime-partition contract precisely: each explicit override is now described with its d_lower precondition and mass-point compatibility. - Rewrote the `d_lower` parameter docstring to note the Design-1-requires-positive contract. - Inverted the prior `test_force_mass_point_on_continuous_data_at_ support_infimum` test (which incorrectly codified the unsupported behavior) into three rejection regressions: `test_force_mass_point_on_d_lower_zero_sample_raises`, `test_force_continuous_near_d_lower_on_d_lower_zero_sample_raises`, `test_force_mass_point_d_lower_none_on_zero_sample_raises`. Targeted regression: 135 HAD tests + 514 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 54 +++++++++++++++++++++++++++++++++++++++++++---- tests/test_had.py | 34 +++++++++++++++++++++-------- 2 files changed, 75 insertions(+), 13 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 913ce2e4..383792e6 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -828,13 +828,36 @@ class HeterogeneousAdoptionDiD: design : {"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"} Design-dispatch strategy. Defaults to ``"auto"`` which resolves via the REGISTRY auto-detect rule on the fitted dose data - (see :func:`_detect_design`). Explicit overrides run the chosen - path without auto-reject (e.g., forcing Design 1' on mass-point - data runs the nonparametric fit even though the paper would - counsel the 2SLS path). + (see :func:`_detect_design`). + + Explicit overrides are checked against the paper's + regime-partition contract (Section 3.2) at fit time: + + - ``"continuous_at_zero"`` (Design 1'): paper requires the + support infimum ``d_lower = 0``. Phase 1c's + ``_validate_had_inputs`` rejects mass-point samples passed + to this path. + - ``"continuous_near_d_lower"`` (Design 1, continuous density + near ``d_lower``): requires ``d_lower > 0`` and a + non-mass-point sample (modal fraction at ``d.min()`` must be + <= 2%). ``d_lower`` must equal ``float(d.min())`` within + float tolerance; non-support-infimum thresholds are off- + support and raise. + - ``"mass_point"`` (Design 1 mass-point): requires + ``d_lower > 0`` and ``d_lower == float(d.min())`` within + float tolerance. Forcing this design on a ``d_lower = 0`` + sample raises (use ``"continuous_at_zero"`` or ``"auto"``). + + Mismatched overrides raise ``ValueError`` pointing at the + correct design rather than silently identifying a different + estimand. d_lower : float or None Support infimum ``d_lower``. ``None`` means use ``0.0`` on the Design 1' path and ``float(d.min())`` on the other two paths. + On Design 1 paths (``continuous_near_d_lower`` and + ``mass_point``), an explicit ``d_lower`` must equal + ``float(d.min())`` within float tolerance AND must be strictly + positive; zero-valued or mismatched thresholds raise. kernel : {"epanechnikov", "triangular", "uniform"} Forwarded to :func:`bias_corrected_local_linear` on the continuous paths. Ignored on the mass-point path. @@ -1070,6 +1093,29 @@ def fit( else: d_lower_val = float(d_lower_arg) + # ---- Regime partition: d_lower > 0 for Design 1 paths ---- + # Paper Section 3.2 partitions HAD into d_lower = 0 (Design 1', + # continuous_at_zero) and d_lower > 0 (Design 1, continuous_near + # _d_lower or mass_point). The auto-detect rule already enforces + # this partition; explicit overrides must respect it too, otherwise + # `design="mass_point", d_lower=0` returns a finite but + # paper-incompatible 2SLS result and `design="continuous_near_d_lower"` + # with d_lower=0 reduces to Design 1' algebra while mislabeling the + # estimand as `WAS_d_lower` and emitting the wrong Assumption 5/6 + # warning. Use the same float-tolerance family as _detect_design's + # d.min()==0 tie-break. + if resolved_design in ("mass_point", "continuous_near_d_lower"): + scale = max(1.0, float(np.max(np.abs(d_arr)))) + if abs(d_lower_val) <= 1e-12 * scale: + raise ValueError( + f"design={resolved_design!r} requires d_lower > 0 (paper " + f"Section 3.2 reserves the d_lower=0 regime for Design 1' " + f"/ `continuous_at_zero`). Got d_lower={d_lower_val!r}. " + f"Use design='continuous_at_zero' (explicit) or " + f"design='auto' (auto-detect) for samples with support " + f"infimum at zero." + ) + # ---- Original-scale mass-point check before the regressor shift ---- # When a user explicitly forces design="continuous_near_d_lower" # on a sample that is actually a mass-point sample (modal fraction diff --git a/tests/test_had.py b/tests/test_had.py index 9b1e1668..424364b5 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1108,20 +1108,36 @@ def test_force_continuous_at_zero_on_mass_point_data(self): with pytest.raises(NotImplementedError, match="mass-point"): est.fit(panel, "outcome", "dose", "period", "unit") - def test_force_mass_point_on_continuous_data_at_support_infimum(self): - """Forcing mass-point on continuous data with d_lower=d.min() runs. + def test_force_mass_point_on_d_lower_zero_sample_raises(self): + """Review P1 round 4: Design 1 paths require d_lower > 0. - d.min()==0 exactly in this DGP, so d_lower=0.0 is the paper-consistent - support-infimum threshold. The resulting mass=1 case exercises the - degenerate-mass boundary (only unit 0 is "at d_lower"; rest are above). + Paper Section 3.2 reserves the d_lower=0 regime for Design 1' + (continuous_at_zero). Forcing `mass_point` on a sample with + d.min()==0 must raise, pointing the user to continuous_at_zero + or auto. """ d, dy = _dgp_continuous_at_zero(500, seed=0) panel = _make_panel(d, dy) - # d.min() == 0 exactly (d[0]=0 by construction); d_lower must match. est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.0) - r = est.fit(panel, "outcome", "dose", "period", "unit") - assert r.design == "mass_point" - assert r.d_lower == 0.0 + with pytest.raises(ValueError, match=r"d_lower > 0|Design 1'"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_force_continuous_near_d_lower_on_d_lower_zero_sample_raises(self): + """Parallel: continuous_near_d_lower must also reject d_lower=0.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower") + # d_lower auto-resolves to float(d.min()) == 0.0 on this DGP. + with pytest.raises(ValueError, match=r"d_lower > 0|Design 1'"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_force_mass_point_d_lower_none_on_zero_sample_raises(self): + """d_lower=None on a d.min()==0 sample resolves to 0; must still raise.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=None) + with pytest.raises(ValueError, match=r"d_lower > 0"): + est.fit(panel, "outcome", "dose", "period", "unit") # ============================================================================= From 2884c1d81adf576edc6f14397b8cb97880a28cbe Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 19:41:32 -0400 Subject: [PATCH 08/17] Address PR #346 CI review round 5: P1 reciprocal mass_point regime guard **P1 (Methodology): reciprocal regime check for design="mass_point"** Round 2 added a guard that rejects `design="continuous_near_d_lower"` on samples with modal-min fraction > 2% (mass-point samples). The reciprocal direction was left unguarded: `design="mass_point"` on a continuous-near-d_lower sample (modal fraction <= 2%) would silently run 2SLS, but the instrument Z = 1{D > d.min()} degenerates because the "mass" side has almost no observations. The resulting point estimate identifies the exact-d.min() cell rather than the paper's boundary-limit estimand (Section 3.2.4 2SLS is only the documented Design 1 estimator when the modal mass exists). Fix: extend the existing pre-shift regime check so it fires symmetrically on both explicit overrides. When `resolved_design == "mass_point"` and the modal fraction at d.min() does NOT exceed 2%, raise ValueError pointing users to `continuous_near_d_lower` or `auto`. Same 2% threshold used by `_detect_design()` and Phase 1c's `_validate_had_inputs()` so the three dispatch paths share one regime rule. **Docs:** Updated the `design` parameter docstring to document the reciprocal mass-point contract. **Tests (+4 regression):** - TestMassPointPathRejectsContinuousSample.test_mass_point_on_continuous_near_sample_raises - TestMassPointPathRejectsContinuousSample.test_mass_point_on_true_mass_point_sample_runs - TestMassPointPathRejectsContinuousSample.test_mass_point_modal_at_threshold_runs (2.5%) - TestMassPointPathRejectsContinuousSample.test_mass_point_modal_exactly_two_percent_raises Targeted regression: 139 HAD tests + 518 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 51 +++++++++++++++++++++++++++++++-------------- tests/test_had.py | 53 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+), 15 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 383792e6..6798b064 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -844,9 +844,13 @@ class HeterogeneousAdoptionDiD: float tolerance; non-support-infimum thresholds are off- support and raise. - ``"mass_point"`` (Design 1 mass-point): requires - ``d_lower > 0`` and ``d_lower == float(d.min())`` within - float tolerance. Forcing this design on a ``d_lower = 0`` - sample raises (use ``"continuous_at_zero"`` or ``"auto"``). + ``d_lower > 0`` AND a mass-point sample (modal fraction at + ``d.min()`` must be > 2%). ``d_lower`` must equal + ``float(d.min())`` within float tolerance. Forcing this + design on a ``d_lower = 0`` sample or on a continuous + (non-mass-point) sample raises; in either case 2SLS + identifies a different estimand than the paper's Design 1 + mass-point WAS. Mismatched overrides raise ``ValueError`` pointing at the correct design rather than silently identifying a different @@ -1116,23 +1120,27 @@ def fit( f"infimum at zero." ) - # ---- Original-scale mass-point check before the regressor shift ---- - # When a user explicitly forces design="continuous_near_d_lower" - # on a sample that is actually a mass-point sample (modal fraction - # at d.min() > 2% per paper Section 3.2.4), the downstream code - # would shift D - d_lower, making the support minimum zero on the - # shifted scale and suppressing the Phase 1c mass-point rejection - # guard (_validate_had_inputs only checks modal fraction when - # d.min() > 0). That silent coercion produces wrong CIs for a - # mass-point sample by running the nonparametric estimator where - # the paper's 2SLS branch is required. Front-door reject. - if resolved_design == "continuous_near_d_lower": + # ---- Original-scale modal-fraction / regime check ---- + # Paper Section 3.2 splits Design 1 into two REGIME-SPECIFIC + # estimators by modal-min fraction: + # - continuous_near_d_lower: modal fraction at d.min() <= 2% + # (local-linear boundary-limit estimator). + # - mass_point: modal fraction at d.min() > 2% + # (Wald-IV / 2SLS identified by the instrument 1{D_2 > d_lower}). + # The auto-detect rule already enforces this; explicit overrides + # must too, otherwise the wrong estimand is returned silently. + # Both guards are symmetric around the 2% threshold used in + # _detect_design() and Phase 1c's _validate_had_inputs(). + if resolved_design in ("continuous_near_d_lower", "mass_point"): d_min_orig = float(d_arr.min()) if d_min_orig > 0: eps_mp = 1e-12 * max(1.0, abs(d_min_orig)) at_d_min_mask_orig = np.abs(d_arr - d_min_orig) <= eps_mp modal_fraction_orig = float(np.mean(at_d_min_mask_orig)) - if modal_fraction_orig > _MASS_POINT_THRESHOLD: + if ( + resolved_design == "continuous_near_d_lower" + and modal_fraction_orig > _MASS_POINT_THRESHOLD + ): raise ValueError( f"design='continuous_near_d_lower' cannot be used on a " f"mass-point sample (modal fraction {modal_fraction_orig:.4f} " @@ -1143,6 +1151,19 @@ def fit( f"continuous path on a mass-point sample would produce " f"the wrong estimand." ) + if resolved_design == "mass_point" and modal_fraction_orig <= _MASS_POINT_THRESHOLD: + raise ValueError( + f"design='mass_point' requires a modal mass at d.min() " + f"exceeding the {_MASS_POINT_THRESHOLD:.2f} threshold " + f"(paper Section 3.2.4). Got modal fraction " + f"{modal_fraction_orig:.4f} at d.min()={d_min_orig!r}. " + f"For continuous-near-d_lower samples use " + f"design='continuous_near_d_lower' (local-linear " + f"boundary-limit estimator) or design='auto' which " + f"will auto-detect. Forcing 2SLS on a continuous " + f"sample identifies the exact-d.min() cell rather " + f"than the paper's boundary-limit estimand." + ) # ---- d_lower contract for Design 1 paths ---- # Paper Sections 3.2.2-3.2.4 define the Design 1 estimators at diff --git a/tests/test_had.py b/tests/test_had.py index 424364b5..91b8d2e5 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1577,6 +1577,59 @@ def test_continuous_near_on_continuous_sample_runs(self): assert np.isfinite(r.att) +class TestMassPointPathRejectsContinuousSample: + """Review P1 round 5: reciprocal guard. Forcing design="mass_point" on a + continuous-near-d_lower sample (modal fraction at d.min() <= 2%) must + raise, otherwise 2SLS identifies the exact-d.min() cell rather than + the paper's boundary-limit estimand. + """ + + def test_mass_point_on_continuous_near_sample_raises(self): + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point") + with pytest.raises(ValueError, match=r"modal mass|2SLS.*continuous"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_mass_point_on_true_mass_point_sample_runs(self): + """Sanity: the reciprocal guard does NOT reject valid mass-point samples.""" + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + + def test_mass_point_modal_at_threshold_runs(self): + """At exactly 2% + 1 unit, mass_point runs (strict > 0.02).""" + rng = np.random.default_rng(0) + G = 1000 + mass_n = 25 # 2.5% > threshold + d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)]) + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + panel = _make_panel(d, dy) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r = HeterogeneousAdoptionDiD(design="mass_point").fit( + panel, "outcome", "dose", "period", "unit" + ) + assert np.isfinite(r.att) + + def test_mass_point_modal_exactly_two_percent_raises(self): + """At exactly 2% (not strictly greater), mass_point must raise.""" + rng = np.random.default_rng(0) + G = 1000 + mass_n = 20 # exactly 2% (not > 2%) + d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)]) + dy = 0.3 * d + 0.1 * rng.standard_normal(G) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="mass_point") + with pytest.raises(ValueError, match=r"modal mass"): + est.fit(panel, "outcome", "dose", "period", "unit") + + # ============================================================================= # Review P2: cluster-applied mass-point stores vcov_type="cr1" # ============================================================================= From afa0f6c68586a1e15a09d1dfd19badc0402e52d5 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 19:54:29 -0400 Subject: [PATCH 09/17] Address PR #346 CI review round 6: P1 panel-only note + P3 theorem refs **P1 (Methodology): Phase 2a panel-only not documented as deviation** Paper Section 2 defines HAD on panel OR repeated cross-section data, but Phase 2a ships a panel-only implementation. The existing balanced-panel validator rejects RCS inputs (disjoint unit IDs between periods) with a generic "unit(s) do not appear in both periods" error, but the scope restriction was not documented in REGISTRY.md as a `**Note:**` deviation from the paper's documented surface. Fix: - REGISTRY.md: new `**Note (Phase 2a panel-only):**` block under the HAD Phase 2a entry, explaining the restriction and pointing at the follow-up RCS PR. - `docs/methodology/papers/dechaisemartin-2026-review.md`: mirrored implementation note on the panel-or-RCS scope line. - `HeterogeneousAdoptionDiD.fit()` docstring: new preamble paragraph stating the panel-only restriction and why (unit-level first differences required). - `TODO.md`: new row queuing RCS identification-path work. - `tests/test_had.py`: two regression tests (`test_repeated_cross_section_raises` and `test_repeated_cross_section_fit_raises`) that construct RCS-shaped input and assert the validator + fit() raise with the expected error. **P3 (Docs/Tests): theorem/equation references were off** Module docstring, result-class `att` docstring, and `_fit_continuous()` docstring cited the paper's theorems/equations inconsistently with the registry's source map: - Design 1' identification: said "Theorem 3" -> should be Theorem 1 with Equation 3; Equation 7 is the sample estimator. - Design 1 continuous-near-d_lower: said "Proposition 3 / Theorem 4" -> should be Theorem 3 / Equation 11 (Theorem 4 is the QUG null test, not this estimand). Fix: updated all three docstring sites to cite Theorem 1 / Equation 3 (Design 1' identification) + Equation 7 (sample estimator), and Theorem 3 / Equation 11 (Design 1 continuous-near-d_lower / WAS_d_lower under Assumption 6). Targeted regression: 141 HAD tests + 520 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- TODO.md | 1 + diff_diff/had.py | 33 ++++++++---- docs/methodology/REGISTRY.md | 1 + .../papers/dechaisemartin-2026-review.md | 2 +- tests/test_had.py | 52 +++++++++++++++++++ 5 files changed, 79 insertions(+), 10 deletions(-) diff --git a/TODO.md b/TODO.md index bd8be7a5..cd033be3 100644 --- a/TODO.md +++ b/TODO.md @@ -100,6 +100,7 @@ Deferred items from PR reviews that were not addressed before merge. | `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low | | `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low | | `HeterogeneousAdoptionDiD` staggered-timing reduction: Phase 2a requires exactly 2 time periods and raises on `>2` periods with or without `first_treat_col`. A "last-cohort subgroup" reduction scheme (slice to max-cohort's 2-period window) could lift this in a targeted follow-up PR before full Phase 2b multi-period aggregation. | `diff_diff/had.py::_validate_had_panel` | Phase 2a | Low | +| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium | #### Performance diff --git a/diff_diff/had.py b/diff_diff/had.py index 6798b064..32eaf674 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -9,7 +9,8 @@ :func:`diff_diff.utils.safe_inference`. 1. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density - continuous at zero, Assumption 3. Equation 7 / Theorem 3: + continuous at zero, Assumption 3. Theorem 1 / Equation 3 + (identification); Equation 7 (sample estimator): beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2] @@ -19,7 +20,8 @@ 2. Design 1 continuous-near-d_lower (``continuous_near_d_lower``): ``d_lower > 0``, continuous boundary density, Assumption 5 or 6. - Proposition 3 / Theorem 4 (``WAS_{d_lower}`` under Assumption 6): + Theorem 3 / Equation 11 (``WAS_{d_lower}`` under Assumption 6; + Theorem 4 is the QUG null test, not this estimand): beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d]) / E[D_2 - d_lower] @@ -135,13 +137,14 @@ class HeterogeneousAdoptionDiDResults: att : float Point estimate of the WAS parameter on the beta-scale. - - Design 1' (paper Equation 7 / Theorem 3): + - Design 1' (paper Theorem 1 / Equation 3 identification; + Equation 7 sample estimator): ``att = (mean(ΔY) - tau_bc) / D_bar`` where ``tau_bc`` is the bias-corrected local-linear estimate of ``lim_{d v 0} E[ΔY | D_2 <= d]`` and ``D_bar = (1/G) * sum(D_{g,2})``. - - Design 1 continuous-near-d_lower (paper Theorem 4, - ``WAS_{d_lower}`` under Assumption 6): + - Design 1 continuous-near-d_lower (paper Theorem 3 / + Equation 11, ``WAS_{d_lower}`` under Assumption 6): ``att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower)`` where ``tau_bc`` is the bias-corrected local-linear estimate of ``lim_{d v d_lower} E[ΔY | D_2 <= d]``. @@ -1006,6 +1009,17 @@ def fit( ) -> HeterogeneousAdoptionDiDResults: """Fit the HAD estimator on a two-period panel. + Phase 2a is **panel-only**: the paper (Section 2) defines HAD on + panel or repeated-cross-section data, but this implementation + requires a balanced two-period panel with a unit identifier so + that unit-level first differences ``ΔY_g = Y_{g,2} - Y_{g,1}`` + can be formed. Repeated-cross-section inputs (disjoint unit IDs + between periods) are rejected by the balanced-panel validator. + Repeated-cross-section support is queued for a follow-up PR + (tracked in ``TODO.md``); it requires a separate identification + path based on pre/post cell means rather than unit-level + differences. + Parameters ---------- data : pd.DataFrame @@ -1337,16 +1351,17 @@ def _fit_continuous( Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026) continuous-design estimators: - - Design 1' (``continuous_at_zero``), paper Equation 7 / - Theorem 3: + - Design 1' (``continuous_at_zero``), paper Theorem 1 / + Equation 3 (identification); Equation 7 (sample estimator): beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2] Regressor passed to the local-linear boundary fit is ``d_arr``; the boundary is ``0``. - - Design 1 (``continuous_near_d_lower``), paper Proposition 3 / - Theorem 4 (``WAS_{d_lower}`` under Assumption 6): + - Design 1 (``continuous_near_d_lower``), paper Theorem 3 / + Equation 11 (``WAS_{d_lower}`` under Assumption 6; note + Theorem 4 is the QUG null test, not this estimand): beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d]) / E[D_2 - d_lower] diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 9944477c..e04bdb8c 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2324,6 +2324,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning. - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate. - **Note (Phase 2a scope):** Ships single-period only. `aggregate="event_study"` (Appendix B.2 multi-period extension) raises `NotImplementedError` pointing to Phase 2b. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR. + - **Note (Phase 2a panel-only):** The paper (Section 2) defines HAD on *panel or repeated cross-section* data, but Phase 2a ships a panel-only implementation: `HeterogeneousAdoptionDiD.fit()` requires a balanced two-period panel with a unit identifier so that unit-level first differences ΔY_g = Y_{g,2} - Y_{g,1} can be formed. Repeated-cross-section inputs (disjoint unit IDs between periods) are rejected by the existing balanced-panel validator with the "unit(s) do not appear in both periods" error. Repeated-cross-section support is queued for a follow-up PR (tracked in `TODO.md`); it will need a separate identification path based on pre/post cell means rather than unit-level differences. - [ ] Phase 2b: Multi-period event-study extension (Appendix B.2). Will lift the `aggregate="event_study"` scaffolding in `HeterogeneousAdoptionDiD.fit()` and aggregate per-cohort first-differences into an event-study result. - [ ] Phase 3: `qug_test()` (`T = D_{2,(1)} / (D_{2,(2)} - D_{2,(1)})`, rejection `{T > 1/α - 1}`). - [ ] Phase 3: `stute_test()` Cramér-von Mises with Mammen wild bootstrap. diff --git a/docs/methodology/papers/dechaisemartin-2026-review.md b/docs/methodology/papers/dechaisemartin-2026-review.md index 2e031091..8017b244 100644 --- a/docs/methodology/papers/dechaisemartin-2026-review.md +++ b/docs/methodology/papers/dechaisemartin-2026-review.md @@ -20,7 +20,7 @@ **Key implementation requirements:** *Assumption checks / warnings:* -- Data must be panel (or repeated cross-section) with `D_{g,1} = 0` for all `g` (nobody treated in period one). +- Data must be panel (or repeated cross-section) with `D_{g,1} = 0` for all `g` (nobody treated in period one). **Phase 2a implementation note:** `diff_diff.HeterogeneousAdoptionDiD` ships panel-only; repeated-cross-section inputs are rejected by the balanced-panel validator. RCS support is queued for a follow-up PR. - Treatment dose `D_{g,2} >= 0`. For Design 1' (the QUG case) the support infimum `d̲ := inf Supp(D_{g,2})` must equal 0; for Design 1 (no QUG) `d̲ > 0` and Assumption 5 or 6 must be invoked. - Assumption 1 (i.i.d. sample): `(Y_{g,1}, Y_{g,2}, D_{g,1}, D_{g,2})_{g=1,...,G}` i.i.d. - Assumption 2 (parallel trends for the least-treated): `lim_{d ↓ d̲} E[ΔY(0) | D_2 ≤ d] = E[ΔY(0)]`. Testable with pre-trends when a pre-treatment period `t=0` exists. Reduces to standard parallel trends when treatment is binary. diff --git a/tests/test_had.py b/tests/test_had.py index 91b8d2e5..54ab3fa3 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1546,6 +1546,58 @@ def test_both_all_zero_periods_raises(self): with pytest.raises(ValueError, match="variation"): _validate_had_panel(panel, "outcome", "dose", "period", "unit", None) + def test_repeated_cross_section_raises(self): + """Review P1 round 6: Phase 2a is panel-only. An RCS input (disjoint + unit IDs across periods) must be rejected by the balanced-panel + validator with the "unit(s) do not appear in both periods" error. + """ + rng = np.random.default_rng(0) + G = 100 + pre = pd.DataFrame( + { + "unit": np.arange(G), + "period": 1, + "dose": np.zeros(G), + "outcome": rng.standard_normal(G), + } + ) + post = pd.DataFrame( + { + "unit": np.arange(G, 2 * G), + "period": 2, + "dose": rng.uniform(0, 1, G), + "outcome": rng.standard_normal(G), + } + ) + rcs = pd.concat([pre, post], ignore_index=True) + with pytest.raises(ValueError, match=r"both periods|[Uu]nbalanced"): + _validate_had_panel(rcs, "outcome", "dose", "period", "unit", None) + + def test_repeated_cross_section_fit_raises(self): + """End-to-end: fit() on an RCS panel raises ValueError.""" + rng = np.random.default_rng(0) + G = 100 + pre = pd.DataFrame( + { + "unit": np.arange(G), + "period": 1, + "dose": np.zeros(G), + "outcome": rng.standard_normal(G), + } + ) + post = pd.DataFrame( + { + "unit": np.arange(G, 2 * G), + "period": 2, + "dose": rng.uniform(0, 1, G), + "outcome": rng.standard_normal(G), + } + ) + rcs = pd.concat([pre, post], ignore_index=True) + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"both periods|[Uu]nbalanced"): + est.fit(rcs, "outcome", "dose", "period", "unit") + # ============================================================================= # Review P1: continuous_near_d_lower on a true mass-point sample rejects From 792997dd2c40b5d844aae52b0420d9212333a73d Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 20:32:29 -0400 Subject: [PATCH 10/17] Address PR #346 CI review round 7: P1 defer cluster validation + P3 registry refs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **P1 (Code Quality): cluster= must truly be ignored on continuous paths** `HeterogeneousAdoptionDiD.fit()` previously passed `self.cluster` into `_aggregate_first_difference()` before the design was resolved. The aggregator validates the cluster column eagerly (missing column, within-unit variance, NaN ID), so a valid continuous fit could abort just because a shared config supplied an irrelevant `cluster=`. This contradicted the documented "ignored with a warning on continuous paths" contract. Fix: defer cluster extraction until after design resolution. The first aggregation call now passes `cluster_col=None` unconditionally; a second aggregation pass with `cluster_col=cluster_arg` runs only when `resolved_design == "mass_point"`, which is the only path that consumes the extracted cluster array. Continuous paths emit the existing `UserWarning` and proceed to fit without touching the cluster column at all. **P3 (Methodology): registry checklist theorem references were stale** Round 6 fixed the theorem citations in `had.py` and the paper review doc but missed the Phase 2a checklist line in `REGISTRY.md`, which still said "Equation 7 / Theorem 3" for Design 1' identification and "Theorem 4, WAS_{d̲} under Assumption 6" for the continuous-near-d_lower path. Updated the checklist line to match: Theorem 1 / Equation 3 (identification) + Equation 7 (sample estimator) for Design 1'; Theorem 3 / Equation 11 for WAS_{d̲}. **Tests (+4 regression):** - test_missing_cluster_column_on_continuous_only_warns: continuous_at_zero + cluster='does_not_exist' -> warn + fit succeeds. - test_nan_cluster_on_continuous_only_warns: NaN cluster IDs on continuous path -> warn + fit succeeds. - test_within_unit_varying_cluster_on_continuous_only_warns: within-unit- varying cluster IDs on continuous -> warn + fit succeeds. - test_auto_design_ignores_irrelevant_cluster_on_continuous: design='auto' resolving to continuous_at_zero also ignores cluster gracefully. Targeted regression: 145 HAD tests + 524 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 30 ++++++++++++++++++--- docs/methodology/REGISTRY.md | 2 +- tests/test_had.py | 52 ++++++++++++++++++++++++++++++++++++ 3 files changed, 80 insertions(+), 4 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 32eaf674..396781f6 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -1078,8 +1078,13 @@ def fit( data, outcome_col, dose_col, time_col, unit_col, first_treat_col ) - # ---- Aggregate to unit-level first differences ---- - d_arr, dy_arr, cluster_arr, _ = _aggregate_first_difference( + # ---- Aggregate to unit-level first differences (no cluster yet) ---- + # Defer cluster validation/extraction until after the design is + # resolved: the continuous paths ignore cluster= with a warning, + # so a malformed or irrelevant cluster column must not abort a + # valid continuous fit. Cluster extraction is re-run below only + # when resolved_design == "mass_point". + d_arr, dy_arr, _, _ = _aggregate_first_difference( data, outcome_col, dose_col, @@ -1087,7 +1092,7 @@ def fit( unit_col, t_pre, t_post, - cluster_arg, + None, ) n_obs = int(d_arr.shape[0]) @@ -1103,6 +1108,25 @@ def fit( else: resolved_design = design_arg + # ---- Extract cluster IDs (mass-point path only) ---- + # Continuous paths ignore cluster= with a warning emitted later in + # the dispatch block; the cluster column is not read for them. On + # the mass-point path we now re-run the aggregation with + # cluster_col so validation (missing column / NaN / within-unit + # variance) fires only when cluster is actually going to be used. + cluster_arr: Optional[np.ndarray] = None + if resolved_design == "mass_point" and cluster_arg is not None: + _, _, cluster_arr, _ = _aggregate_first_difference( + data, + outcome_col, + dose_col, + time_col, + unit_col, + t_pre, + t_post, + cluster_arg, + ) + # ---- Resolve d_lower ---- if resolved_design == "continuous_at_zero": d_lower_val = 0.0 diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index e04bdb8c..0c425a6b 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2316,7 +2316,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4. - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246). - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug. -- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Equation 7 / Theorem 3) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 4, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4). +- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4). - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples. - [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand. - [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs. diff --git a/tests/test_had.py b/tests/test_had.py index 54ab3fa3..aff5abda 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1356,6 +1356,58 @@ def test_cluster_name_none_without_cluster(self): ) assert r.cluster_name is None + def test_missing_cluster_column_on_continuous_only_warns(self): + """Review P1 round 7: irrelevant cluster on continuous path must not + abort the fit. The cluster column doesn't even need to exist. + """ + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="does_not_exist") + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert any("cluster" in str(warn.message).lower() for warn in w) + assert np.isfinite(r.att) + assert r.cluster_name is None + + def test_nan_cluster_on_continuous_only_warns(self): + """NaN cluster IDs on continuous path must not abort the fit.""" + d, dy = _dgp_continuous_at_zero(200, seed=0) + cluster_unit = np.repeat(np.arange(100).astype(float), 2) + cluster_unit[0] = np.nan + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state") + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert any("cluster" in str(warn.message).lower() for warn in w) + assert np.isfinite(r.att) + + def test_within_unit_varying_cluster_on_continuous_only_warns(self): + """Within-unit-varying cluster IDs on continuous path must not abort.""" + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + # Varies within unit (distinct value per row) + panel["state"] = np.arange(len(panel)) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state") + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert any("cluster" in str(warn.message).lower() for warn in w) + assert np.isfinite(r.att) + + def test_auto_design_ignores_irrelevant_cluster_on_continuous(self): + """design='auto' resolving to a continuous path must also ignore cluster.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="auto", cluster="does_not_exist") + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert any("cluster" in str(warn.message).lower() for warn in w) + assert r.design == "continuous_at_zero" + assert np.isfinite(r.att) + # ============================================================================= # First-difference aggregation helper From 1eda6148f38b7391b4ca7c481e8829d4651e4a27 Mon Sep 17 00:00:00 2001 From: igerber Date: Mon, 20 Apr 2026 20:51:03 -0400 Subject: [PATCH 11/17] Address PR #346 CI review round 8: P1 snap d_lower + P2 row-level validation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **P1 (Methodology): snap tolerance-accepted d_lower to float(d.min())** The `d_lower == float(d.min())` within-tolerance contract accepted user-supplied values that differed from the support infimum by float rounding noise (1e-15). Downstream algebra is not tolerance-aware: - `mass_point` + `d_lower = d.min() - ε`: `Z = d > d_lower` puts every mass-point unit (at exactly d.min()) into Z=1 because they are strictly > d_lower. The Z=0 control group empties and the Wald-IV ratio becomes undefined. - `continuous_near_d_lower` + `d_lower = d.min() + ε`: the regressor shift `d - d_lower` produces a minimum of `-ε` on the shifted scale. Phase 1c's `_validate_had_inputs` then raises on the negative-dose guard, aborting an otherwise acceptable fit. Fix: after the tolerance check passes on Design 1 paths, snap `d_lower_val = float(d_arr.min())` before the mass-point instrument construction, cohort counts, and the continuous-path regressor shift. This preserves the advertised tolerance contract while keeping downstream algebra exact. **P2 (Code Quality): row-level NaN/domain validation** The previous `first_treat_col` and `cluster=` validators collapsed each unit to a single value via `groupby().first()` before checking for NaN or out-of-domain values. pandas's `.first()` silently skips NaN, so a unit with rows `[valid, NaN]` would pass a unit-level check even though the raw NaN on the bad row should be rejected. Fix: for both `first_treat_col` (in `_validate_had_panel`) and `cluster=` (in `_aggregate_first_difference`), validate raw per-row values BEFORE the per-unit collapse: - Row-level NaN check on the raw column. - Row-level domain check on raw values (for first_treat_col). - Per-unit-constancy check using `nunique(dropna=False)` so a `[value, NaN]` within a unit registers as 2 distinct values. **Tests (+5 regression):** - test_mass_point_d_lower_below_min_within_tolerance_snaps: asserts d_lower = d.min() - 1e-15 collapses ULP-identically to the exact case across att, se, n_mass_point, n_above_d_lower. - test_continuous_near_d_lower_above_within_tolerance_snaps: asserts d_lower = d.min() + 1e-15 collapses to the exact case (no negative-shift rejection). - test_first_treat_col_mixed_row_nan_raises: unit with [valid, NaN] rows raises. - test_first_treat_col_mixed_row_invalid_value_raises: unit with [valid, 999] rows raises with the 999 in the error message. - test_mixed_row_nan_cluster_raises_on_mass_point: mass-point with cluster column holding [valid, NaN] per unit raises. Targeted regression: 150 HAD tests + 529 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 70 +++++++++++++++++++++++++++++--------- tests/test_had.py | 86 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 140 insertions(+), 16 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 396781f6..4a8edb28 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -508,15 +508,25 @@ def _validate_had_panel( # {0, t_post} so that string-labelled periods (e.g., ("A", "B")) with # first_treat in {0, "B"} are supported. if first_treat_col is not None: - ft_series = data.groupby(unit_col)[first_treat_col].first() - if bool(ft_series.isna().any()): + # Row-level NaN check: `groupby().first()` skips NaNs silently, so a + # unit with rows [valid, NaN] would pass a collapsed check. Validate + # raw per-row values first, then verify per-unit constancy with + # `nunique(dropna=False)` so within-unit NaN variation is caught. + ft_raw = data[first_treat_col] + if bool(ft_raw.isna().any()): + n_nan = int(ft_raw.isna().sum()) raise ValueError( - f"first_treat_col={first_treat_col!r} contains NaN. " - f"Use 0 for never-treated units and t_post for treated." + f"first_treat_col={first_treat_col!r} contains " + f"{n_nan} NaN value(s) at the row level. Use 0 for " + f"never-treated units and t_post for treated, and drop " + f"or impute any NaN rows before calling fit()." ) + # Row-level domain check: every row (not just the collapsed first()) + # must be in {0, t_post}. Catches mixed-row malformed inputs where + # a unit has [valid, invalid]. valid_values = {0, t_post} - observed = set(ft_series.unique().tolist()) - bad = sorted(observed - valid_values, key=lambda x: str(x)) + observed_raw = set(ft_raw.unique().tolist()) + bad = sorted(observed_raw - valid_values, key=lambda x: str(x)) if bad: raise ValueError( f"first_treat_col={first_treat_col!r} contains value(s) " @@ -524,6 +534,18 @@ def _validate_had_panel( f"two-period HAD panel. Staggered timing with multiple " f"cohorts is Phase 2b." ) + # Within-unit consistency: every unit must have a single + # first_treat value across its rows. Uses dropna=False so a unit + # with [value, NaN] counts as 2 unique values (caught above by + # the NaN check anyway, but this is belt-and-suspenders). + ft_per_unit_nunique = data.groupby(unit_col)[first_treat_col].nunique(dropna=False) + if (ft_per_unit_nunique > 1).any(): + n_bad = int((ft_per_unit_nunique > 1).sum()) + raise ValueError( + f"first_treat_col={first_treat_col!r} is not constant " + f"within unit for {n_bad} unit(s). Each unit must have " + f"a single first_treat value across both observed periods." + ) return t_pre, t_post @@ -574,8 +596,22 @@ def _aggregate_first_difference( if cluster_col is not None: if cluster_col not in data.columns: raise ValueError(f"cluster column {cluster_col!r} not found in data.") - # Cluster must be unit-constant; take the first cluster id per unit. - cluster_per_unit = df.groupby(unit_col)[cluster_col].nunique() + # Row-level NaN check: `groupby().first()` skips NaNs silently, so a + # unit with rows [valid, NaN] would pass a collapsed check. Validate + # raw per-row values first so any NaN in the cluster column is + # rejected up-front, not masked by the per-unit collapse. + cluster_raw = data[cluster_col] + if bool(cluster_raw.isna().any()): + n_nan = int(cluster_raw.isna().sum()) + raise ValueError( + f"cluster column {cluster_col!r} contains {n_nan} NaN " + f"value(s) at the row level. Silent row dropping is " + f"disabled; drop or impute cluster ids before calling fit()." + ) + # Cluster must be unit-constant. `nunique(dropna=False)` counts NaN + # as a distinct value so rows like [value, NaN] register as 2 + # uniques (also caught by the row-level NaN check above). + cluster_per_unit = df.groupby(unit_col)[cluster_col].nunique(dropna=False) if (cluster_per_unit > 1).any(): n_bad = int((cluster_per_unit > 1).sum()) raise ValueError( @@ -585,14 +621,6 @@ def _aggregate_first_difference( f"where each unit belongs to a single cluster)." ) cluster_arr = df.groupby(unit_col)[cluster_col].first().sort_index().to_numpy() - # NaN cluster ids are a silent-failure vector; reject front-door. - if pd.isna(cluster_arr).any(): - n_nan = int(pd.isna(cluster_arr).sum()) - raise ValueError( - f"{n_nan} unit(s) have NaN cluster id in " - f"column {cluster_col!r}. Silent row dropping is " - f"disabled; drop or impute cluster ids before calling fit()." - ) return d_arr, dy_arr, cluster_arr, unit_ids @@ -1233,6 +1261,16 @@ def fit( f"for mass_point, off-support for continuous_near_" f"d_lower) estimand that is out of Phase 2a scope." ) + # Snap tolerance-accepted overrides back to the exact support + # infimum. Float-rounding drift matters downstream: on the + # mass-point path, `Z = d > d_lower` with d_lower = d.min() - ε + # puts the mass-point units into Z=1 (control group empties); + # on the continuous-near-d_lower path, d_lower = d.min() + ε + # makes `d - d_lower` negative and trips Phase 1c's + # _validate_had_inputs negative-dose guard. Snapping preserves + # the "within tolerance" contract while keeping downstream + # algebra exact. + d_lower_val = d_min # ---- Compute cohort counts ---- if resolved_design == "mass_point": diff --git a/tests/test_had.py b/tests/test_had.py index aff5abda..d4ca7d1a 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -659,6 +659,36 @@ def test_first_treat_col_invalid_cohort_raises(self): first_treat_col="ft", ) + def test_first_treat_col_mixed_row_nan_raises(self): + """Review P2 round 8: per-unit rows like [valid, NaN] must be rejected. + + `groupby().first()` silently skips NaNs; a unit with [0, NaN] + collapses to first_treat=0 and a unit-level NaN check would + pass. Row-level validation must catch the NaN on the bad row. + """ + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + # Unit-level first_treat all zero (never-treated); inject a NaN on + # exactly the second row of unit 0 (t_post row). + panel["ft"] = 0.0 + unit0_post_idx = panel[(panel["unit"] == 0) & (panel["period"] == 2)].index[0] + panel.loc[unit0_post_idx, "ft"] = np.nan + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="NaN"): + est.fit(panel, "outcome", "dose", "period", "unit", first_treat_col="ft") + + def test_first_treat_col_mixed_row_invalid_value_raises(self): + """Per-unit rows like [valid, invalid_value] must be rejected.""" + d, dy = _dgp_continuous_at_zero(200, seed=0) + panel = _make_panel(d, dy) + panel["ft"] = 0.0 + # Inject an out-of-domain value on unit 0's post-period row. + unit0_post_idx = panel[(panel["unit"] == 0) & (panel["period"] == 2)].index[0] + panel.loc[unit0_post_idx, "ft"] = 999.0 + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match=r"first_treat_col.*999"): + est.fit(panel, "outcome", "dose", "period", "unit", first_treat_col="ft") + # ============================================================================= # Criterion 8: NaN propagation @@ -1189,6 +1219,47 @@ def test_mass_point_d_lower_within_tolerance_succeeds(self): r = est.fit(panel, "outcome", "dose", "period", "unit") assert np.isfinite(r.att) + def test_mass_point_d_lower_below_min_within_tolerance_snaps(self): + """Review P1 round 8: tolerance-accepted d_lower = d.min() - ε must + be SNAPPED to d.min() so the instrument Z = d > d_lower matches + the exact-minimum case; otherwise mass-point units would fall + into Z=1 and empty the control group. + """ + d, dy = _dgp_mass_point(500, seed=0) + panel = _make_panel(d, dy) + d_lower_below = float(d.min()) - 1e-15 + r_below = HeterogeneousAdoptionDiD(design="mass_point", d_lower=d_lower_below).fit( + panel, "outcome", "dose", "period", "unit" + ) + r_exact = HeterogeneousAdoptionDiD(design="mass_point", d_lower=float(d.min())).fit( + panel, "outcome", "dose", "period", "unit" + ) + # Behavior must be identical within ULP (the snap collapses them). + assert r_below.att == r_exact.att + assert r_below.se == r_exact.se + assert r_below.n_mass_point == r_exact.n_mass_point + assert r_below.n_above_d_lower == r_exact.n_above_d_lower + + def test_continuous_near_d_lower_above_within_tolerance_snaps(self): + """Review P1 round 8: tolerance-accepted d_lower = d.min() + ε on + continuous_near_d_lower must be SNAPPED so the regressor shift + `d - d_lower` does not produce negative doses and trip Phase 1c's + _validate_had_inputs negative-dose guard. + """ + d, dy = _dgp_continuous_near_d_lower(500, seed=0) + panel = _make_panel(d, dy) + d_lower_above = float(d.min()) + 1e-15 + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r_above = HeterogeneousAdoptionDiD( + design="continuous_near_d_lower", d_lower=d_lower_above + ).fit(panel, "outcome", "dose", "period", "unit") + r_exact = HeterogeneousAdoptionDiD( + design="continuous_near_d_lower", d_lower=float(d.min()) + ).fit(panel, "outcome", "dose", "period", "unit") + assert r_above.att == r_exact.att + assert r_above.se == r_exact.se + def test_continuous_near_d_lower_above_min_raises(self): """Review P1: continuous_near_d_lower must also enforce support infimum.""" d, dy = _dgp_continuous_near_d_lower(500, seed=0) @@ -1329,6 +1400,21 @@ def test_nan_cluster_raises(self): with pytest.raises(ValueError, match="NaN"): est.fit(panel, "outcome", "dose", "period", "unit") + def test_mixed_row_nan_cluster_raises_on_mass_point(self): + """Review P2 round 8: a unit with rows [valid, NaN] on mass-point + must be rejected by row-level validation, not masked by + `groupby().first()`. + """ + d, dy = _dgp_mass_point(100, seed=0) + cluster_unit = np.repeat(np.arange(50).astype(float), 2) # all valid + panel = _make_panel(d, dy, extra_cols={"state": cluster_unit}) + # Inject NaN only on the second row (t_post) of unit 0. + unit0_post_idx = panel[(panel["unit"] == 0) & (panel["period"] == 2)].index[0] + panel.loc[unit0_post_idx, "state"] = np.nan + est = HeterogeneousAdoptionDiD(design="mass_point", cluster="state") + with pytest.raises(ValueError, match="NaN"): + est.fit(panel, "outcome", "dose", "period", "unit") + def test_cluster_warns_on_continuous_path(self): d, dy = _dgp_continuous_at_zero(200, seed=0) cluster_unit = np.repeat(np.arange(100), 2) # length 200 unit-level From a3553ed287cf8695404aa90368c0839f4c9e6622 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 21 Apr 2026 05:53:43 -0400 Subject: [PATCH 12/17] Address PR #346 CI review round 9: P2 robust warning + P3 docstring MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review 9 is overall ✅ Looks good; this PR addresses the remaining P2 and two P3s to close the loop cleanly. **P2 (Code Quality): `robust=True` silently ignored on continuous paths** The continuous dispatch block warns on `vcov_type` and `cluster` when they are supplied on a path that ignores them, but `robust=True` slipped through the same check. Since `robust=True` on the mass-point path is the backward-compat alias for `vcov_type="hc1"`, a user passing `robust=True` on `continuous_at_zero` / `continuous_near_d_lower` would reasonably expect it to do something; in reality the continuous paths always use the CCT-2014 robust SE from Phase 1c and the flag has no effect. Fix: extend the ignore-warning block to emit a `UserWarning` when `robust=True` and `resolved_design` is continuous. **P3 (Maintainability): _validate_had_panel return docstring stale** The Returns section said ``(t_pre, t_post)`` is returned in "min then max" order, but the implementation now uses the HAD dose invariant (the all-zero-dose period is `t_pre` regardless of label order). Updated the docstring to describe the identified semantic order and the arbitrary-dtype-label support (int, str, datetime). **Tests (+2 regression):** - test_robust_true_ignored_on_continuous_warns: `robust=True` on continuous path emits a warning containing "robust". - test_robust_false_silent_on_continuous: `robust=False` (the default) on continuous path does NOT emit the `robust=True is ignored` warning, keeping the default construction silent. Targeted regression: 152 HAD tests + 531 total across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 18 ++++++++++++++++-- tests/test_had.py | 31 +++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 4a8edb28..97c8a799 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -385,8 +385,12 @@ def _validate_had_panel( Returns ------- - tuple[int, int] - ``(t_pre, t_post)`` - the two period identifiers, min then max. + tuple[Any, Any] + ``(t_pre, t_post)`` - the two period identifiers identified by + the HAD dose invariant (``t_pre`` is the period with dose == 0 + for all units; ``t_post`` is the other period). Supports + arbitrary-dtype period labels (int, str, datetime, etc.) rather + than relying on ordinal / lexicographic sort. Raises ------ @@ -1324,6 +1328,16 @@ def fit( UserWarning, stacklevel=2, ) + if robust_arg: + warnings.warn( + f"robust=True is ignored on the '{resolved_design}' " + f"path (the continuous designs use the CCT-2014 " + f"robust SE from Phase 1c unconditionally; the " + f"robust flag is a mass-point-only backward-compat " + f"alias for vcov_type).", + UserWarning, + stacklevel=2, + ) if cluster_arg is not None: warnings.warn( f"cluster={cluster_arg!r} is ignored on the " diff --git a/tests/test_had.py b/tests/test_had.py index d4ca7d1a..449a1256 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -529,6 +529,37 @@ def test_vcov_type_ignored_on_continuous(self): assert any("ignored" in str(warn.message).lower() for warn in w) assert np.isfinite(r.att) + def test_robust_true_ignored_on_continuous_warns(self): + """Review P2 round 9: robust=True on continuous path must warn. + + The continuous designs use the CCT-2014 robust SE unconditionally; + robust= is a mass-point-only backward-compat alias for vcov_type. + Passing robust=True on a continuous path has no effect on the + computed SE, so the user must get a warning that the flag was + ignored. + """ + d, dy = _dgp_continuous_at_zero(300, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", robust=True) + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + robust_warnings = [warn for warn in w if "robust" in str(warn.message).lower()] + assert len(robust_warnings) >= 1 + assert np.isfinite(r.att) + + def test_robust_false_silent_on_continuous(self): + """robust=False (the default) on continuous path emits no robust-warn.""" + d, dy = _dgp_continuous_at_zero(300, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", robust=False) + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + r = est.fit(panel, "outcome", "dose", "period", "unit") + robust_warnings = [warn for warn in w if "robust=True is ignored" in str(warn.message)] + assert len(robust_warnings) == 0 + assert np.isfinite(r.att) + # ============================================================================= # Criterion 7: Panel-contract violations From 98b82017c58f0b32bbcb29aad4a9b47e4d49c4f2 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 21 Apr 2026 19:35:21 -0400 Subject: [PATCH 13/17] Address PR #346 CI review round 10: P1 sklearn-compatible get/set_params **P1 (Code Quality): sklearn parameter contract** The class docstring advertises `sklearn.base.clone(est)` compatibility, but the actual methods did not match sklearn's `BaseEstimator.get_params(deep=True)` / `set_params(**params)` surface: - `get_params()` did not accept a `deep` keyword. sklearn's `clone` calls `get_params(deep=False)`, so any integration would have failed with a TypeError on the missing kwarg. - `set_params()` validated keys with `hasattr(self, key)`. That would silently accept non-constructor attribute names like `fit`, and a typo or malicious kwargs dict could overwrite estimator methods. Fix: - `get_params(self, deep: bool = True)` matches sklearn's signature. `deep` is accepted for compat; this estimator has no nested sub-estimators, so `deep=True` and `deep=False` return the same dict. `del deep` documents the no-op explicitly and silences unused-arg linters. - `set_params(**params)` now restricts to keys from `get_params()`. Non-constructor attribute names (including method names like `fit` and dunder/private attrs) raise `ValueError`. **Tests (+4 regression):** - test_set_params_rejects_method_names: `set_params(fit=...)` raises and `est.fit` stays callable. - test_set_params_rejects_private_attrs: `_internal=42` raises. - test_get_params_accepts_deep_keyword: `deep=True`, `deep=False`, and no-arg all return the same dict. - test_sklearn_clone_round_trip_if_available: `sklearn.base.clone` round-trips the estimator; gated on `pytest.importorskip("sklearn")` so it skips cleanly when sklearn is not in the test matrix. Targeted regression: 155 HAD tests (+ 1 skipped) + 534 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 32 ++++++++++++++++++++++++-------- tests/test_had.py | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+), 8 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 97c8a799..865b7b93 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -994,13 +994,23 @@ def _validate_constructor_args(self) -> None: f"or None." ) - def get_params(self) -> Dict[str, Any]: + def get_params(self, deep: bool = True) -> Dict[str, Any]: """Return the raw constructor parameters (sklearn-compatible). - Preserves the user's original inputs - in particular, ``design`` - returns ``"auto"`` when the user set it to ``"auto"`` (even after - fit), so ``sklearn.base.clone(est)`` round-trips exactly. + Matches the :meth:`sklearn.base.BaseEstimator.get_params` + signature. Preserves the user's original inputs - in particular, + ``design`` returns ``"auto"`` when the user set it to ``"auto"`` + (even after fit), so ``sklearn.base.clone(est)`` round-trips + exactly. + + Parameters + ---------- + deep : bool, default=True + Accepted for sklearn-contract compatibility. This estimator + has no nested sub-estimator parameters, so ``deep=False`` + and ``deep=True`` return the same dict. """ + del deep # accepted for compat; this estimator has no nested params return { "design": self.design, "d_lower": self.d_lower, @@ -1012,12 +1022,18 @@ def get_params(self) -> Dict[str, Any]: } def set_params(self, **params: Any) -> "HeterogeneousAdoptionDiD": - """Set estimator parameters and return self (sklearn-compatible).""" + """Set estimator parameters and return self (sklearn-compatible). + + Only keys returned by :meth:`get_params` are accepted. Passing + any other attribute name (including method names like ``fit``) + raises ``ValueError`` so the estimator cannot be silently + corrupted by a mistyped or attacker-supplied key. + """ + valid_keys = set(self.get_params().keys()) for key, value in params.items(): - if not hasattr(self, key): + if key not in valid_keys: raise ValueError( - f"Invalid parameter: {key}. Valid parameters: " - f"{list(self.get_params().keys())}." + f"Invalid parameter: {key!r}. Valid parameters: " f"{sorted(valid_keys)}." ) setattr(self, key, value) self._validate_constructor_args() diff --git a/tests/test_had.py b/tests/test_had.py index 449a1256..3c71ca41 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -849,6 +849,46 @@ def test_set_params_invalid_key_raises(self): with pytest.raises(ValueError, match="Invalid parameter"): est.set_params(not_a_param=True) + def test_set_params_rejects_method_names(self): + """Review P1 round 10: set_params must restrict to constructor keys, + not any hasattr-able name. Method names like 'fit' must raise, + else they would silently overwrite the method. + """ + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="Invalid parameter"): + est.set_params(fit="not_a_method") + # sanity: fit is still callable on the class + assert callable(est.fit) + + def test_set_params_rejects_private_attrs(self): + """Internal-looking attribute names must also raise.""" + est = HeterogeneousAdoptionDiD() + with pytest.raises(ValueError, match="Invalid parameter"): + est.set_params(_internal=42) + + def test_get_params_accepts_deep_keyword(self): + """Review P1 round 10: get_params must match sklearn's signature. + + sklearn.base.BaseEstimator.get_params(deep=True). This estimator + has no nested sub-estimators, so deep=True and deep=False return + the same dict, but the keyword must be accepted. + """ + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", alpha=0.1) + params_default = est.get_params() + params_deep_true = est.get_params(deep=True) + params_deep_false = est.get_params(deep=False) + assert params_default == params_deep_true == params_deep_false + + def test_sklearn_clone_round_trip_if_available(self): + """If sklearn is installed, sklearn.base.clone round-trips the estimator.""" + sklearn_base = pytest.importorskip("sklearn.base") + est = HeterogeneousAdoptionDiD(design="auto", alpha=0.1, kernel="triangular") + cloned = sklearn_base.clone(est) + assert cloned.get_params() == est.get_params() + assert cloned is not est + # clone produces a fresh instance of the same class. + assert type(cloned) is type(est) + def test_set_params_invalid_design_raises(self): est = HeterogeneousAdoptionDiD() with pytest.raises(ValueError, match="design"): From c67505156b94fb557a69441349a43ee3d9b21c6b Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 21 Apr 2026 19:48:27 -0400 Subject: [PATCH 14/17] Address PR #346 CI review round 11: P2 atomic set_params + P3 doc example MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 11 is ✅ Looks good overall; this PR closes the remaining P2 and P3 for cleanliness. **P2 (Code Quality): set_params must be atomic** The previous implementation applied `setattr()` inside the validation loop and called `_validate_constructor_args()` only afterward. A multi-key call like `set_params(alpha=0.1, design="made_up")` would mutate `self.alpha` before raising on the invalid design, leaving the estimator half-updated. Callers that catch `ValueError` and retry would see the wrong state. Fix: dry-run validation via `type(self)(**merged)` before any attribute write. If the constructor raises on the combined state, `self` is not mutated. Mutations are applied only after validation passes, so rejection leaves the estimator fully rollback-safe. **P3 (Documentation/Tests): docstring example was non-runnable** The class docstring example constructed data via `generate_continuous_did_data()` which defaults to `n_periods=4`, while `HeterogeneousAdoptionDiD.fit()` requires exactly two periods in Phase 2a. A user copying the example verbatim would hit the two-period rejection. Fix: rewrote the example to build a two-period HAD panel by hand with an explicit `D_{g,1} = 0` pre-period column, a unit-level `dose_post`, and `delta_y` so the shape matches the estimator's contract exactly. Kept `doctest: +SKIP` markers on the RNG and fit lines. **Tests (+3 rollback regressions):** - test_set_params_rollback_on_failure: multi-key call with invalid design keeps estimator unchanged. - test_set_params_rollback_on_invalid_key: unknown kwarg keeps estimator unchanged. - test_set_params_rollback_on_invalid_alpha: out-of-range alpha keeps estimator unchanged. Targeted regression: 158 HAD tests (+1 sklearn-gated skip) + 537 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 46 +++++++++++++++++++++++++++++++++++++--------- tests/test_had.py | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 9 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index 865b7b93..40cebe1d 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -940,12 +940,28 @@ class HeterogeneousAdoptionDiD: Examples -------- + Construct a two-period HAD panel by hand. Phase 2a requires exactly + two periods with ``D_{g,1} = 0`` for every unit. + + >>> import numpy as np >>> import pandas as pd - >>> from diff_diff import HeterogeneousAdoptionDiD, generate_continuous_did_data - >>> data = generate_continuous_did_data(n_units=500, seed=42) # doctest: +SKIP + >>> from diff_diff import HeterogeneousAdoptionDiD + >>> rng = np.random.default_rng(42) # doctest: +SKIP + >>> G = 500 # doctest: +SKIP + >>> dose_post = rng.uniform(0.0, 1.0, G) # doctest: +SKIP + >>> dose_post[0] = 0.0 # at least one zero-dose unit for Design 1' + >>> delta_y = 0.3 * dose_post + 0.1 * rng.standard_normal(G) # doctest: +SKIP + >>> data = pd.DataFrame({ # doctest: +SKIP + ... "unit": np.repeat(np.arange(G), 2), + ... "period": np.tile([1, 2], G), + ... "dose": np.column_stack([np.zeros(G), dose_post]).ravel(), + ... "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(), + ... }) >>> est = HeterogeneousAdoptionDiD(design="auto") # doctest: +SKIP - >>> result = est.fit(data, outcome_col="outcome", dose_col="dose", - ... time_col="period", unit_col="unit") # doctest: +SKIP + >>> result = est.fit( # doctest: +SKIP + ... data, outcome_col="outcome", dose_col="dose", + ... time_col="period", unit_col="unit", + ... ) >>> result.design # doctest: +SKIP 'continuous_at_zero' """ @@ -1028,15 +1044,27 @@ def set_params(self, **params: Any) -> "HeterogeneousAdoptionDiD": any other attribute name (including method names like ``fit``) raises ``ValueError`` so the estimator cannot be silently corrupted by a mistyped or attacker-supplied key. + + Mutation is ATOMIC: validation runs on a proposed merged + parameter dict before any attribute is overwritten. A failing + call (invalid key, or an otherwise valid key whose value + violates the constructor constraints) leaves ``self`` unchanged + and safe to reuse. """ valid_keys = set(self.get_params().keys()) + invalid = [k for k in params if k not in valid_keys] + if invalid: + raise ValueError( + f"Invalid parameter: {invalid[0]!r}. Valid parameters: " f"{sorted(valid_keys)}." + ) + # Dry-run validation by constructing a fresh instance with the + # merged state. If the constructor raises, self is not mutated. + merged = self.get_params() + merged.update(params) + type(self)(**merged) # raises ValueError on invalid combination + # All checks passed; apply atomically. for key, value in params.items(): - if key not in valid_keys: - raise ValueError( - f"Invalid parameter: {key!r}. Valid parameters: " f"{sorted(valid_keys)}." - ) setattr(self, key, value) - self._validate_constructor_args() return self # ------------------------------------------------------------------ diff --git a/tests/test_had.py b/tests/test_had.py index 3c71ca41..343c2381 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -894,6 +894,38 @@ def test_set_params_invalid_design_raises(self): with pytest.raises(ValueError, match="design"): est.set_params(design="made_up") + def test_set_params_rollback_on_failure(self): + """Review P2 round 11: set_params must be ATOMIC. + + A failing call (valid key but value violates constructor + constraints) must leave the estimator unchanged so the caller + can catch the ValueError and reuse the object. + """ + est = HeterogeneousAdoptionDiD(alpha=0.05, design="continuous_at_zero") + baseline = est.get_params() + # Multi-key call where alpha is valid but design is invalid. + # The old (non-atomic) code would have set alpha before raising + # on design, leaving the estimator half-mutated. + with pytest.raises(ValueError): + est.set_params(alpha=0.1, design="garbage_design") + assert est.get_params() == baseline + + def test_set_params_rollback_on_invalid_key(self): + """Rejecting an unknown key must leave self unchanged.""" + est = HeterogeneousAdoptionDiD(alpha=0.05) + baseline = est.get_params() + with pytest.raises(ValueError): + est.set_params(alpha=0.1, not_a_param=True) + assert est.get_params() == baseline + + def test_set_params_rollback_on_invalid_alpha(self): + """alpha outside (0, 1) must leave self unchanged.""" + est = HeterogeneousAdoptionDiD(alpha=0.05, design="continuous_at_zero") + baseline = est.get_params() + with pytest.raises(ValueError): + est.set_params(alpha=1.5, kernel="triangular") + assert est.get_params() == baseline + # ============================================================================= # Criterion 10: Scaffolding raises From 07ab30f0b736d3a5fea6b1df97273412984e4ba5 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 21 Apr 2026 19:58:58 -0400 Subject: [PATCH 15/17] Address PR #346 CI review round 12: P1 reject nonzero d_lower on continuous_at_zero **P1 (Methodology): continuous_at_zero silently ignored nonzero d_lower** `HeterogeneousAdoptionDiD.fit()` unconditionally set `d_lower_val = 0.0` on the `continuous_at_zero` path, so calls like `HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.5)` silently fit Design 1' at zero. This contradicted paper Section 3.2's Design 1' regime contract (`d_lower = 0`) and the class docstring's promise that mismatched overrides raise. It also created a parameter- contract mismatch: `get_params()` reported the user's nonzero `d_lower` while the fitted result reported `d_lower=0.0`. Fix: after `resolved_design == "continuous_at_zero"` is known, check whether the user supplied an explicit `d_lower`. If the absolute value exceeds the float-tolerance band (same family as the Design 1 d_lower guards below), raise `ValueError` pointing at `continuous_near_d_lower` / `mass_point` / `auto`. `d_lower=None` (auto-resolve) and `d_lower=0.0` (redundant but benign) continue to succeed. **Tests (+4 regression):** - test_continuous_at_zero_with_nonzero_d_lower_raises: d_lower=0.5 raises with "d_lower == 0" / "Design 1'" pointer. - test_continuous_at_zero_with_small_d_lower_raises: d_lower=0.01 also raises (not just large overrides). - test_continuous_at_zero_with_zero_d_lower_succeeds: d_lower=0.0 exactly is accepted (redundant-but-valid case). - test_auto_on_zero_sample_ignores_user_d_lower: design='auto' resolving to continuous_at_zero must ALSO reject explicit nonzero d_lower, not silently drop it. Targeted regression: 162 HAD tests (+1 sklearn-gated skip) + 541 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 18 ++++++++++++++++++ tests/test_had.py | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/diff_diff/had.py b/diff_diff/had.py index 40cebe1d..e0ea227c 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -1205,6 +1205,24 @@ def fit( # ---- Resolve d_lower ---- if resolved_design == "continuous_at_zero": + # Design 1' regime (paper Section 3.2) is defined at d_lower = 0. + # Reject explicit nonzero d_lower overrides front-door rather + # than silently coerce to zero. Tolerance family matches the + # Design 1 d_lower guards below. + if d_lower_arg is not None: + scale = max(1.0, float(np.max(np.abs(d_arr)))) + if abs(float(d_lower_arg)) > 1e-12 * scale: + raise ValueError( + f"design='continuous_at_zero' (Design 1') requires " + f"d_lower == 0 within float tolerance (paper Section " + f"3.2 Design 1' regime). Got d_lower=" + f"{float(d_lower_arg)!r}. For d_lower > 0 use " + f"design='continuous_near_d_lower' (local-linear " + f"boundary-limit estimator) or design='mass_point' " + f"(2SLS) as appropriate, or design='auto' which " + f"auto-detects the correct path from the dose " + f"distribution." + ) d_lower_val = 0.0 elif d_lower_arg is None: d_lower_val = float(d_arr.min()) diff --git a/tests/test_had.py b/tests/test_had.py index 343c2381..2aaddf36 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1272,6 +1272,46 @@ def test_force_mass_point_d_lower_none_on_zero_sample_raises(self): with pytest.raises(ValueError, match=r"d_lower > 0"): est.fit(panel, "outcome", "dose", "period", "unit") + def test_continuous_at_zero_with_nonzero_d_lower_raises(self): + """Review P1 round 12: continuous_at_zero must reject nonzero d_lower. + + Paper Section 3.2 Design 1' is defined at d_lower = 0; silently + coercing a user-supplied d_lower=0.5 to zero would contradict + the documented regime contract. + """ + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.5) + with pytest.raises(ValueError, match=r"d_lower == 0|Design 1'"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_continuous_at_zero_with_small_d_lower_raises(self): + """Even a small nonzero d_lower should raise.""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.01) + with pytest.raises(ValueError, match=r"d_lower == 0|Design 1'"): + est.fit(panel, "outcome", "dose", "period", "unit") + + def test_continuous_at_zero_with_zero_d_lower_succeeds(self): + """d_lower=0.0 exactly is fine (redundant but allowed).""" + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.0) + r = est.fit(panel, "outcome", "dose", "period", "unit") + assert r.d_lower == 0.0 + assert np.isfinite(r.att) + + def test_auto_on_zero_sample_ignores_user_d_lower(self): + """design='auto' resolving to continuous_at_zero must ALSO reject + an explicit nonzero d_lower, not silently drop it. + """ + d, dy = _dgp_continuous_at_zero(500, seed=0) + panel = _make_panel(d, dy) + est = HeterogeneousAdoptionDiD(design="auto", d_lower=0.5) + with pytest.raises(ValueError, match=r"d_lower == 0|Design 1'"): + est.fit(panel, "outcome", "dose", "period", "unit") + # ============================================================================= # Design 1 d_lower contract enforcement (mass-point + continuous_near_d_lower) From 2f6e1d0b3677ad0e5dd936818497bcd79358e24b Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 21 Apr 2026 20:21:07 -0400 Subject: [PATCH 16/17] Address PR #346 CI review round 13: P1 reject non-finite d_lower + P3 docstring **P1 (Methodology): d_lower=NaN/inf bypassed comparison guards** The Design 1 regime guards (`d_lower > 0`, `d_lower == d.min()` within tolerance, `d_lower <= tol` for continuous_at_zero) all use comparison operators that return False for NaN. A user passing `d_lower=np.nan` therefore slipped through every check and ended up at `0.0` (continuous_at_zero branch) or `d.min()` (the snap). On a zero-support sample, that could silently re-enter the paper- incompatible continuous_near_d_lower / mass_point path the earlier review rounds had fixed. Fix: reject non-finite d_lower front-door in `_validate_constructor_args()`. This makes `__init__`, `set_params()`, and the atomic dry-run validation all share the same contract: `d_lower` must be None or a finite scalar. NaN and +/-inf raise `ValueError`. Atomic `set_params()` confirms that a failing `d_lower=NaN` call leaves the estimator unchanged. **P3 (Documentation/Tests): safe_inference coverage overstated** The result-class docstring and REGISTRY said "all 5 inference fields are routed through safe_inference". In reality, `safe_inference` only gates the downstream triple (t_stat, p_value, conf_int) - `att` and `se` are raw estimator outputs. The fit paths themselves return `(nan, nan)` on degenerate configurations, which combined with the safe_inference gate produces the advertised "all five fields NaN together" behavior, but the docstring wording was misleading. Fix: rewrote the Results docstring preamble and the REGISTRY.md Phase 2a NaN-propagation line to describe the two-layer NaN flow (fit-path NaN on att/se + safe_inference on the downstream triple) precisely. **Tests (+4 regression):** - test_d_lower_nan_raises: __init__ rejects NaN. - test_d_lower_posinf_raises / test_d_lower_neginf_raises: +/-inf rejected. - test_d_lower_nan_via_set_params_raises: set_params(d_lower=NaN) raises AND leaves the estimator unchanged (atomic rollback). Targeted regression: 166 HAD tests (+1 sklearn-gated skip) + 545 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 30 ++++++++++++++++++++++++++---- docs/methodology/REGISTRY.md | 2 +- tests/test_had.py | 22 ++++++++++++++++++++++ 3 files changed, 49 insertions(+), 5 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index e0ea227c..ff9e40a3 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -127,10 +127,17 @@ class HeterogeneousAdoptionDiDResults: """Estimator output for :class:`HeterogeneousAdoptionDiD`. - All inference fields (``att``, ``se``, ``t_stat``, ``p_value``, - ``conf_int``) are routed through :func:`diff_diff.utils.safe_inference` - at fit time; when SE is non-finite, zero, or negative, every inference - field is NaN. + NaN-safe inference: the three downstream fields ``t_stat``, + ``p_value``, and ``conf_int`` are routed through + :func:`diff_diff.utils.safe_inference`, which returns NaN on all + three whenever ``se`` is non-finite, zero, or negative. ``att`` and + ``se`` themselves are raw estimator outputs - when the estimator's + fit path detects a degenerate configuration (constant outcome, + no-variation-above-``d_lower``, divide-by-zero), it returns + ``(att=nan, se=nan)`` directly, which the safe-inference gate then + propagates into the remaining three fields. Net effect: users can + safely check ``np.isfinite(result.se)`` and expect all five fields + to be finite or all NaN together on the current fit paths. Attributes ---------- @@ -992,6 +999,21 @@ def _validate_constructor_args(self) -> None: ) if not (0.0 < float(self.alpha) < 1.0): raise ValueError(f"alpha must be in (0, 1); got {self.alpha!r}.") + # d_lower must be None or a finite scalar. NaN / +/-inf would + # bypass every downstream comparison-based guard (`>`, `<=`, + # tolerance-abs-diff) because those return False for NaN, so we + # front-door reject non-finite values here. This fires under + # __init__ and set_params (which dry-runs through the + # constructor), keeping the contract uniform across all entry + # points. + if self.d_lower is not None: + d_lower_float = float(self.d_lower) + if not np.isfinite(d_lower_float): + raise ValueError( + f"d_lower must be None or a finite scalar; got " + f"d_lower={self.d_lower!r}. NaN and +/-inf are not " + f"valid support-infimum values." + ) if self.vcov_type is not None: if self.vcov_type.lower() in _MASS_POINT_VCOV_UNSUPPORTED: # Don't raise here — the raise happens at fit() time on diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 0c425a6b..fe2768af 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2319,7 +2319,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step - [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4). - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples. - [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand. -- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs. +- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) covering constant-y and degenerate mass-point inputs. The inner `safe_inference()` gate covers the downstream triple (`t_stat`, `p_value`, `conf_int`); `att` and `se` are set to `NaN` by the fit paths themselves when they detect a degenerate configuration (constant outcome, no-variation-above-`d_lower`, divide-by-zero), so the five fields move to `NaN` together and the `assert_nan_inference` fixture holds. - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR. - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning. - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate. diff --git a/tests/test_had.py b/tests/test_had.py index 2aaddf36..99d4b5cb 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -1224,6 +1224,28 @@ def test_d_lower_float_accepted(self): est = HeterogeneousAdoptionDiD(d_lower=0.3) assert est.d_lower == 0.3 + def test_d_lower_nan_raises(self): + """Review P1 round 13: d_lower=NaN must be rejected in __init__.""" + with pytest.raises(ValueError, match=r"d_lower.*finite"): + HeterogeneousAdoptionDiD(d_lower=float("nan")) + + def test_d_lower_posinf_raises(self): + with pytest.raises(ValueError, match=r"d_lower.*finite"): + HeterogeneousAdoptionDiD(d_lower=float("inf")) + + def test_d_lower_neginf_raises(self): + with pytest.raises(ValueError, match=r"d_lower.*finite"): + HeterogeneousAdoptionDiD(d_lower=float("-inf")) + + def test_d_lower_nan_via_set_params_raises(self): + """d_lower=NaN through set_params must also raise (atomic rollback).""" + est = HeterogeneousAdoptionDiD(d_lower=0.3) + baseline = est.get_params() + with pytest.raises(ValueError, match=r"d_lower.*finite"): + est.set_params(d_lower=float("nan")) + # Atomic rollback: d_lower unchanged after failure. + assert est.get_params() == baseline + # ============================================================================= # Explicit design override (don't auto-reject) From 22d7f65f1ea126702ea97fbef2d03f1f8e5658db Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 21 Apr 2026 20:33:34 -0400 Subject: [PATCH 17/17] Address PR #346 CI review round 14: P3 tighten NaN-contract wording MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 14 is ✅ Looks good; this closes the final residual P3 for wording accuracy. **P3 (Docs): NaN-contract wording overstated** The prior wording said users can expect all five result fields to be finite or all NaN together. That is true on most degenerate paths (constant-y, no-dose-variation) but NOT on the single-cluster CR1 configuration of the mass-point path: `_fit_mass_point_2sls` returns `(att=beta_hat, se=nan)` there - `att` is finite because the Wald-IV ratio is well defined, while CR1 requires G >= 2 clusters so `se` is NaN. `safe_inference` then gates the downstream triple correctly, but the doc claim that `att` and `se` are coupled with the triple was inaccurate. Fix: rewrote the result-class docstring preamble and the REGISTRY Phase 2a NaN-propagation line to describe the invariant precisely: - GUARANTEED NaN coupling is on the downstream triple (`t_stat`, `p_value`, `conf_int`) via `safe_inference`. - `att` and `se` are RAW estimator outputs from the chosen fit path and are NOT gated by safe_inference. - On most degenerate paths, fit paths return `(nan, nan)` so all five move together, but on the single-cluster CR1 edge the fit path returns `(att=beta_hat, se=nan)` and only the downstream triple becomes NaN via the gate. No code behavior changes; the `assert_nan_inference` fixture already only checks the downstream triple, matching the restated contract. Targeted regression: 166 HAD tests (+1 sklearn-gated skip) + 545 total (+1 skipped) across Phase 1 and adjacent surfaces, all green. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/had.py | 27 ++++++++++++++++++++------- docs/methodology/REGISTRY.md | 2 +- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/diff_diff/had.py b/diff_diff/had.py index ff9e40a3..d9ba428c 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -131,13 +131,26 @@ class HeterogeneousAdoptionDiDResults: ``p_value``, and ``conf_int`` are routed through :func:`diff_diff.utils.safe_inference`, which returns NaN on all three whenever ``se`` is non-finite, zero, or negative. ``att`` and - ``se`` themselves are raw estimator outputs - when the estimator's - fit path detects a degenerate configuration (constant outcome, - no-variation-above-``d_lower``, divide-by-zero), it returns - ``(att=nan, se=nan)`` directly, which the safe-inference gate then - propagates into the remaining three fields. Net effect: users can - safely check ``np.isfinite(result.se)`` and expect all five fields - to be finite or all NaN together on the current fit paths. + ``se`` themselves are RAW estimator outputs from the chosen fit + path and are NOT gated by ``safe_inference``: + + - On the degenerate fit configurations (constant outcome on the + continuous paths, all-units-at-d_lower / no-dose-variation on the + mass-point path), the fit path explicitly returns + ``(att=nan, se=nan)``, which combined with the safe-inference + gate yields all five fields NaN together. + - On the degenerate CR1 cluster configuration (mass-point path + with a single cluster), ``_fit_mass_point_2sls`` returns + ``(att=beta_hat, se=nan)`` - ``att`` stays finite because the + Wald-IV ratio is well defined, but the cluster-robust SE is + not, so ``se`` is NaN and the downstream triple becomes NaN + via the safe-inference gate. + + So the guaranteed NaN coupling is on the downstream triple + (``t_stat``, ``p_value``, ``conf_int``), not on ``att``. The + ``assert_nan_inference`` fixture in ``tests/conftest.py`` checks + the downstream triple against the gate contract and does not + assume ``att`` is NaN. Attributes ---------- diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index fe2768af..5055e2ec 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2319,7 +2319,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step - [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4). - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples. - [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand. -- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) covering constant-y and degenerate mass-point inputs. The inner `safe_inference()` gate covers the downstream triple (`t_stat`, `p_value`, `conf_int`); `att` and `se` are set to `NaN` by the fit paths themselves when they detect a degenerate configuration (constant outcome, no-variation-above-`d_lower`, divide-by-zero), so the five fields move to `NaN` together and the `assert_nan_inference` fixture holds. +- [x] Phase 2a: NaN-propagation tests covering constant-y, degenerate-mass-point, and single-cluster-CR1 inputs. The guaranteed NaN coupling is on the DOWNSTREAM triple (`t_stat`, `p_value`, `conf_int`) via the `safe_inference()` gate, which returns NaN on all three whenever `se` is non-finite, zero, or negative. `att` and `se` themselves are raw estimator outputs: on constant-y / no-dose-variation / divide-by-zero the fit paths return `(att=nan, se=nan)` so all five fields move to NaN together; on the degenerate single-cluster CR1 configuration on the mass-point path, `_fit_mass_point_2sls` returns `(att=beta_hat, se=nan)` - `att` is finite (Wald-IV is well defined) while `se` is NaN, so the downstream triple is NaN while `att` remains the raw 2SLS coefficient. The `assert_nan_inference` fixture in `tests/conftest.py` checks the downstream triple against this contract without requiring `att` to be NaN. - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR. - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning. - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate.