From f875f558905c14554a816f6d61b1fe7598681d73 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 17:14:36 -0400
Subject: [PATCH 01/17] Phase 2a: HeterogeneousAdoptionDiD class
 (single-period)

Ships the user-facing HAD estimator with three design-dispatch paths:

- continuous_at_zero (Design 1'): bias-corrected local-linear at boundary=0,
  rescaled to beta-scale via Equation 8 divisor D_bar = (1/G) * sum(D_{g,2}).
- continuous_near_d_lower (Design 1): same path with regressor shift
  D' = D - d_lower and boundary = float(d.min()).
- mass_point (Design 1 Section 3.2.4): sample-average 2SLS with instrument
  1{D_{g,2} > d_lower}. Point estimate is the Wald-IV ratio; SE is the
  structural-residual 2SLS sandwich [Z'X]^-1 Omega [Z'X]^-T for classical,
  hc1, and CR1 (cluster-robust). hc2/hc2_bm raise NotImplementedError
  pending a 2SLS-specific leverage derivation.

design="auto" resolves via the REGISTRY auto-detect rule (d.min() < 0.01 *
median(|d|) -> continuous_at_zero; modal fraction > 2% -> mass_point; else
continuous_near_d_lower), with d.min() == 0 as an unconditional tie-break.

Panel validator enforces balanced two-period panel, D_{g,1} = 0 for all
units, no NaN in key columns. >2 periods raises pointing to Phase 2b.

aggregate="event_study" (Appendix B.2 multi-period) raises
NotImplementedError; survey= and weights= also raise (queued for follow-up
survey-integration PR).

All inference fields (att, se, t_stat, p_value, conf_int) routed through
safe_inference() for NaN-safe CI gating. Raw design kept on self so
get_params()/clone() round-trip preserves "auto" even after fit.

105 new tests cover the 12 plan commit criteria: three design paths finite,
design="auto" correctness + edge cases, beta-scale rescaling at atol=1e-14,
mass-point Wald-IV at atol=1e-14, mass-point sandwich SE parity at
atol=1e-12 (classical/hc1/CR1), NotImplementedError contracts, panel
contract violations, NaN propagation, sklearn clone round-trip,
get_params signature enumeration.

REGISTRY.md ticks four Phase 2a checkboxes and adds a Note block
documenting the structural-residual 2SLS sandwich choice. TODO.md queues
Phase 2b (multi-period), survey integration, weights, hc2/hc2_bm
leverage derivation, pre-test diagnostics (Phase 3), Pierce-Schott
replication (Phase 4), and tutorial integration (Phase 5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 TODO.md                      |    9 +
 diff_diff/__init__.py        |    9 +
 diff_diff/had.py             | 1161 ++++++++++++++++++++++++++++++++
 docs/methodology/REGISTRY.md |   12 +-
 tests/test_had.py            | 1216 ++++++++++++++++++++++++++++++++++
 5 files changed, 2402 insertions(+), 5 deletions(-)
 create mode 100644 diff_diff/had.py
 create mode 100644 tests/test_had.py

diff --git a/TODO.md b/TODO.md
index 244451e5..bd8be7a5 100644
--- a/TODO.md
+++ b/TODO.md
@@ -91,6 +91,15 @@ Deferred items from PR reviews that were not addressed before merge.
 | `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium |
 | `bias_corrected_local_linear`: support multi-eval grid (`neval > 1`) with cross-covariance (`covgrid=TRUE` branch of `lprobust.R:253-378`). Not needed for HAD but useful for multi-dose diagnostics. | `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Low |
 | Clustered-DGP parity: Phase 1c's DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster bug in `lpbwselect.mse.dpi`'s pilot fits. Once nprobust ships a fix (or we derive one independently), add a clustered-auto-bandwidth parity test. | `benchmarks/R/generate_nprobust_lprobust_golden.R` | Phase 1c | Low |
+| `HeterogeneousAdoptionDiD` Phase 2b: multi-period event-study extension (Appendix B.2). `aggregate="event_study"` raises `NotImplementedError` in Phase 2a; Phase 2b will aggregate per-cohort first-differences into an event-study result with `.event_study_effects` / `.event_study_se` result fields. | `diff_diff/had.py`, `tests/test_had.py` | Phase 2a | Medium |
+| `HeterogeneousAdoptionDiD`: survey-design integration (`survey=SurveyDesign(...)`). Currently raises `NotImplementedError`. Requires Taylor-linearization of the β-scale rescaling and replicate-weight-compatible 2SLS variance on the mass-point path. | `diff_diff/had.py` | Phase 2a | Medium |
+| `HeterogeneousAdoptionDiD`: `weights=` support. Deferred jointly with survey integration. nprobust's `lprobust` has no weight argument so the nonparametric continuous path needs a derivation; the 2SLS mass-point path needs weighted-sandwich parity. | `diff_diff/had.py` | Phase 2a | Medium |
+| `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium |
+| `HeterogeneousAdoptionDiD` continuous paths: thread `cluster=` through `bias_corrected_local_linear` (Phase 1c's wrapper already supports cluster; Phase 2a ignores it with a `UserWarning` on the continuous path to keep scope tight). | `diff_diff/had.py`, `diff_diff/local_linear.py` | Phase 2a | Low |
+| `HeterogeneousAdoptionDiD` Phase 3: `qug_test()`, `stute_test()`, `yatchew_hr_test()` pre-test diagnostics (paper Section 3.3). Composite helper `did_had_pretest_workflow()`. Not part of Phase 2a scope. | `diff_diff/had.py`, new module | Phase 2a | Medium |
+| `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low |
+| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
+| `HeterogeneousAdoptionDiD` staggered-timing reduction: Phase 2a requires exactly 2 time periods and raises on `>2` periods with or without `first_treat_col`. A "last-cohort subgroup" reduction scheme (slice to max-cohort's 2-period window) could lift this in a targeted follow-up PR before full Phase 2b multi-period aggregation. | `diff_diff/had.py::_validate_had_panel` | Phase 2a | Low |
 
 #### Performance
 
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
index be3a1472..5be7e97c 100644
--- a/diff_diff/__init__.py
+++ b/diff_diff/__init__.py
@@ -55,6 +55,10 @@
     triangular_kernel,
     uniform_kernel,
 )
+from diff_diff.had import (
+    HeterogeneousAdoptionDiD,
+    HeterogeneousAdoptionDiDResults,
+)
 from diff_diff.estimators import (
     DifferenceInDifferences,
     MultiPeriodDiD,
@@ -252,6 +256,7 @@
 EDiD = EfficientDiD
 ETWFE = WooldridgeDiD
 DCDH = ChaisemartinDHaultfoeuille
+HAD = HeterogeneousAdoptionDiD
 
 __version__ = "3.2.0"
 __all__ = [
@@ -431,6 +436,10 @@
     # Bias-corrected local-linear (Phase 1c for HeterogeneousAdoptionDiD)
     "BiasCorrectedFit",
     "bias_corrected_local_linear",
+    # HeterogeneousAdoptionDiD (Phase 2a)
+    "HeterogeneousAdoptionDiD",
+    "HeterogeneousAdoptionDiDResults",
+    "HAD",
     # Datasets
     "load_card_krueger",
     "load_castle_doctrine",
diff --git a/diff_diff/had.py b/diff_diff/had.py
new file mode 100644
index 00000000..2398ad29
--- /dev/null
+++ b/diff_diff/had.py
@@ -0,0 +1,1161 @@
+"""
+Heterogeneous Adoption Difference-in-Differences (HAD) estimator (Phase 2a).
+
+Implements the de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026)
+Weighted-Average-Slope (WAS) estimator with three design-dispatch paths:
+
+1. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density
+   continuous at zero. Evaluates the bias-corrected local-linear fit at
+   ``boundary = 0`` and rescales to the beta-scale via Equation 8's divisor
+   ``D_bar = (1/G) * sum(D_{g,2})``. Invokes Assumption 3.
+
+2. Design 1 continuous-near-d_lower (``continuous_near_d_lower``):
+   ``d_lower > 0``, continuous boundary density. Evaluates the bias-corrected
+   local-linear fit at ``boundary = float(d.min())`` after the regressor
+   shift ``D' = D - d_lower``. Same divisor, same CI path.
+   Invokes Assumption 5 or 6.
+
+3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction at
+   ``d.min()`` exceeds 2%. Uses a sample-average 2SLS estimator with
+   instrument ``Z_g = 1{D_{g,2} > d_lower}``. Point estimate reduces to the
+   Wald-IV ratio
+   ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard error
+   via the structural-residual 2SLS sandwich (see ``_fit_mass_point_2sls``).
+
+Phase 2a ships the single-period path only; the multi-period event-study
+extension (paper Appendix B.2) is queued for Phase 2b.
+
+Infrastructure reused from Phase 1:
+- ``diff_diff.local_linear.bias_corrected_local_linear`` (Phase 1c)
+- ``diff_diff.local_linear.BiasCorrectedFit``, ``BandwidthResult``
+- ``diff_diff.utils.safe_inference`` (NaN-safe CI gating)
+- ``diff_diff.prep.balance_panel`` (panel validation)
+
+References
+----------
+- de Chaisemartin, C., Ciccia, D., D'Haultfoeuille, X., & Knau, F. (2026).
+  Difference-in-Differences Estimators When No Unit Remains Untreated.
+  arXiv:2405.04465v6.
+- Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust
+  nonparametric confidence intervals for regression-discontinuity designs.
+  Econometrica, 82(6), 2295-2326.
+"""
+
+from __future__ import annotations
+
+import warnings
+from dataclasses import dataclass
+from typing import Any, Dict, Optional, Tuple
+
+import numpy as np
+import pandas as pd
+
+from diff_diff.local_linear import (
+    BandwidthResult,
+    BiasCorrectedFit,
+    bias_corrected_local_linear,
+)
+from diff_diff.utils import safe_inference
+
+__all__ = [
+    "HeterogeneousAdoptionDiD",
+    "HeterogeneousAdoptionDiDResults",
+]
+
+
+# =============================================================================
+# Module-level constants
+# =============================================================================
+
+# REGISTRY line 2320 auto-detect boundary: d.min() < 0.01 * median(|d|) resolves
+# to Design 1' (continuous_at_zero). Distinct from Phase 1c's stricter
+# _DESIGN_1_PRIME_RATIO = 0.05 in ``local_linear.py`` which is a plausibility
+# guard on the nonparametric fit, not the auto-detect rule.
+_CONTINUOUS_AT_ZERO_THRESHOLD = 0.01
+
+# REGISTRY line 2320 mass-point rule: modal fraction at d.min() > 0.02 resolves
+# to Design 1 mass-point. Mirrored (not imported) from
+# ``diff_diff.local_linear._validate_had_inputs`` where the same constant
+# enforces the upstream mass-point rejection on the nonparametric path.
+_MASS_POINT_THRESHOLD = 0.02
+
+_VALID_DESIGNS = (
+    "auto",
+    "continuous_at_zero",
+    "continuous_near_d_lower",
+    "mass_point",
+)
+_VALID_AGGREGATES = ("overall", "event_study")
+# Mass-point 2SLS supports: classical, hc1 (HC0 is an unscaled variant not
+# publicly exposed; hc2 and hc2_bm raise NotImplementedError pending a
+# 2SLS-specific leverage derivation and R parity anchor).
+_MASS_POINT_VCOV_SUPPORTED = ("classical", "hc1")
+_MASS_POINT_VCOV_UNSUPPORTED = ("hc2", "hc2_bm")
+
+# Target-parameter label per design. Design 1' targets the WAS (Assumption 3);
+# Design 1 targets WAS_{d_lower} (Assumption 5 or 6), which also applies to
+# the mass-point path (paper Section 3.2.4).
+_TARGET_PARAMETER = {
+    "continuous_at_zero": "WAS",
+    "continuous_near_d_lower": "WAS_d_lower",
+    "mass_point": "WAS_d_lower",
+}
+
+
+# =============================================================================
+# Results dataclass
+# =============================================================================
+
+
+@dataclass
+class HeterogeneousAdoptionDiDResults:
+    """Estimator output for :class:`HeterogeneousAdoptionDiD`.
+
+    All inference fields (``att``, ``se``, ``t_stat``, ``p_value``,
+    ``conf_int``) are routed through :func:`diff_diff.utils.safe_inference`
+    at fit time; when SE is non-finite, zero, or negative, every inference
+    field is NaN.
+
+    Attributes
+    ----------
+    att : float
+        Point estimate of the WAS parameter on the beta-scale. For the
+        continuous designs: ``tau_bc / D_bar`` where ``tau_bc`` is the
+        bias-corrected local-linear statistic on the mu-scale and
+        ``D_bar = (1/G) * sum(D_{g,2})``. For the mass-point design:
+        the 2SLS coefficient directly.
+    se : float
+        Standard error on the beta-scale. For continuous designs, the
+        CCT-2014 robust SE divided by ``D_bar``. For mass-point, the 2SLS
+        structural-residual sandwich SE.
+    t_stat, p_value, conf_int : inference fields
+        Routed through ``safe_inference``; NaN when SE is non-finite.
+    alpha : float
+        CI level used at fit time (0.05 for a 95% CI).
+    design : str
+        Resolved design mode: ``"continuous_at_zero"``,
+        ``"continuous_near_d_lower"``, or ``"mass_point"``. ``"auto"`` is
+        resolved to one of the three concrete modes before storing.
+    target_parameter : str
+        Estimand label: ``"WAS"`` for Design 1', ``"WAS_d_lower"`` for the
+        other two. Pins the estimand semantically even when two designs
+        share the same divisor.
+    d_lower : float
+        Support infimum ``d_lower``. ``0.0`` for Design 1';
+        ``float(d.min())`` for the other two.
+    dose_mean : float
+        ``D_bar = (1/G) * sum(D_{g,2})``.
+    n_obs : int
+        Number of units contributing to the estimator (post panel
+        aggregation to unit-level first differences).
+    n_treated : int
+        Number of units with ``D_{g,2} > d_lower``.
+    n_control : int
+        Number of units at or below ``d_lower`` (the "not-treated" subset).
+    n_mass_point : int or None
+        Mass-point path only: number of units with ``D_{g,2} == d_lower``.
+        ``None`` on continuous paths.
+    n_above_d_lower : int or None
+        Mass-point path only: number of units with ``D_{g,2} > d_lower``.
+        ``None`` on continuous paths.
+    inference_method : str
+        ``"analytical_nonparametric"`` (continuous designs) or
+        ``"analytical_2sls"`` (mass-point).
+    vcov_type : str or None
+        Variance-covariance family used. ``None`` on continuous paths
+        (they use the CCT-2014 robust SE from Phase 1c, not the library's
+        ``vcov_type`` enum). Mass-point: one of ``"classical"`` or
+        ``"hc1"`` (CR1 when ``cluster`` is supplied).
+    cluster_name : str or None
+        Column name of the cluster variable on the mass-point path when
+        cluster-robust SE is requested. ``None`` otherwise.
+    survey_metadata : object or None
+        Always ``None`` in Phase 2a. Field shape kept for future-compat
+        with a planned survey integration PR.
+    bandwidth_diagnostics : BandwidthResult or None
+        Full Phase 1b MSE-DPI selector output on the continuous paths
+        (when bandwidths were auto-selected). ``None`` on the mass-point
+        path (parametric, no bandwidth).
+    bias_corrected_fit : BiasCorrectedFit or None
+        Full Phase 1c bias-corrected local-linear fit on the continuous
+        paths. ``None`` on the mass-point path.
+    """
+
+    # Point estimate + inference (safe_inference-gated)
+    att: float
+    se: float
+    t_stat: float
+    p_value: float
+    conf_int: Tuple[float, float]
+    alpha: float
+
+    # Design metadata
+    design: str
+    target_parameter: str
+    d_lower: float
+    dose_mean: float
+
+    # Sample counts
+    n_obs: int
+    n_treated: int
+    n_control: int
+    n_mass_point: Optional[int]
+    n_above_d_lower: Optional[int]
+
+    # Inference metadata
+    inference_method: str
+    vcov_type: Optional[str]
+    cluster_name: Optional[str]
+    survey_metadata: Optional[Any]
+
+    # Nonparametric-only diagnostics
+    bandwidth_diagnostics: Optional[BandwidthResult]
+    bias_corrected_fit: Optional[BiasCorrectedFit]
+
+    def __repr__(self) -> str:
+        return (
+            f"HeterogeneousAdoptionDiDResults("
+            f"att={self.att:.4f}, se={self.se:.4f}, "
+            f"design={self.design!r}, n_obs={self.n_obs})"
+        )
+
+    def summary(self) -> str:
+        """Formatted summary table."""
+        width = 72
+        conf_level = int((1 - self.alpha) * 100)
+        lines = [
+            "=" * width,
+            "HeterogeneousAdoptionDiD Estimation Results".center(width),
+            "=" * width,
+            "",
+            f"{'Design:':<30} {self.design:>20}",
+            f"{'Target parameter:':<30} {self.target_parameter:>20}",
+            f"{'d_lower:':<30} {self.d_lower:>20.6g}",
+            f"{'D_bar (dose mean):':<30} {self.dose_mean:>20.6g}",
+            f"{'Observations (units):':<30} {self.n_obs:>20}",
+            f"{'Above d_lower:':<30} {self.n_treated:>20}",
+            f"{'At/below d_lower:':<30} {self.n_control:>20}",
+        ]
+        if self.n_mass_point is not None:
+            lines.append(f"{'At d_lower (mass point):':<30} {self.n_mass_point:>20}")
+        if self.n_above_d_lower is not None:
+            lines.append(f"{'Strictly above d_lower:':<30} {self.n_above_d_lower:>20}")
+        lines.append(f"{'Inference method:':<30} {self.inference_method:>20}")
+        if self.vcov_type is not None:
+            label = self.vcov_type
+            if self.cluster_name is not None:
+                label = f"CR1 at {self.cluster_name}"
+            lines.append(f"{'Variance:':<30} {label:>20}")
+        if self.bandwidth_diagnostics is not None:
+            bw = self.bandwidth_diagnostics
+            lines.append(f"{'Bandwidth h (MSE-DPI):':<30} {bw.h_mse:>20.6g}")
+            lines.append(f"{'Bandwidth b (bias):':<30} {bw.b_mse:>20.6g}")
+        if self.bias_corrected_fit is not None:
+            bc = self.bias_corrected_fit
+            lines.append(f"{'Bandwidth h used:':<30} {bc.h:>20.6g}")
+            lines.append(f"{'Obs in window (n_used):':<30} {bc.n_used:>20}")
+        lines.extend(
+            [
+                "",
+                "-" * width,
+                (
+                    f"{'Parameter':<15} {'Estimate':>12} {'Std. Err.':>12} "
+                    f"{'t-stat':>10} {'P>|t|':>10}"
+                ),
+                "-" * width,
+                (
+                    f"{'WAS':<15} {self.att:>12.4f} {self.se:>12.4f} "
+                    f"{self.t_stat:>10.3f} {self.p_value:>10.4f}"
+                ),
+                "-" * width,
+                "",
+                (
+                    f"{conf_level}% Confidence Interval: "
+                    f"[{self.conf_int[0]:.4f}, {self.conf_int[1]:.4f}]"
+                ),
+                "=" * width,
+            ]
+        )
+        return "\n".join(lines)
+
+    def print_summary(self) -> None:
+        """Print the summary to stdout."""
+        print(self.summary())
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Return results as a dict of scalars."""
+        return {
+            "att": self.att,
+            "se": self.se,
+            "t_stat": self.t_stat,
+            "p_value": self.p_value,
+            "conf_int_lower": self.conf_int[0],
+            "conf_int_upper": self.conf_int[1],
+            "alpha": self.alpha,
+            "design": self.design,
+            "target_parameter": self.target_parameter,
+            "d_lower": self.d_lower,
+            "dose_mean": self.dose_mean,
+            "n_obs": self.n_obs,
+            "n_treated": self.n_treated,
+            "n_control": self.n_control,
+            "n_mass_point": self.n_mass_point,
+            "n_above_d_lower": self.n_above_d_lower,
+            "inference_method": self.inference_method,
+            "vcov_type": self.vcov_type,
+            "cluster_name": self.cluster_name,
+        }
+
+    def to_dataframe(self) -> pd.DataFrame:
+        """Return a one-row DataFrame of the result dict."""
+        return pd.DataFrame([self.to_dict()])
+
+
+# =============================================================================
+# Panel validation and aggregation
+# =============================================================================
+
+
+def _validate_had_panel(
+    data: pd.DataFrame,
+    outcome_col: str,
+    dose_col: str,
+    time_col: str,
+    unit_col: str,
+    first_treat_col: Optional[str],
+) -> Tuple[int, int]:
+    """Validate a HAD panel and return ``(t_pre, t_post)``.
+
+    Enforces the Phase 2a panel contract:
+
+    - All required columns present.
+    - Exactly two distinct time periods. Staggered timing (``>2`` periods)
+      with ``first_treat_col=None`` raises; with ``first_treat_col`` it
+      also raises (multi-period reduction is Phase 2b).
+    - Balanced panel (all units observed at both periods).
+    - ``D_{g, t_pre} == 0`` for all units (HAD no-unit-untreated pre-period).
+    - No NaN in outcome, dose, or unit columns.
+
+    Parameters
+    ----------
+    data : pd.DataFrame
+    outcome_col, dose_col, time_col, unit_col : str
+    first_treat_col : str or None
+        Optional column for cross-validation. Supplied column must contain
+        ``0`` for never-treated and the post-period value for treated units.
+
+    Returns
+    -------
+    tuple[int, int]
+        ``(t_pre, t_post)`` - the two period identifiers, min then max.
+
+    Raises
+    ------
+    ValueError
+    """
+    required = [outcome_col, dose_col, time_col, unit_col]
+    if first_treat_col is not None:
+        required.append(first_treat_col)
+    missing = [c for c in required if c not in data.columns]
+    if missing:
+        raise ValueError(f"Missing column(s) in data: {missing}. Required: {required}.")
+
+    periods = np.sort(np.asarray(data[time_col].unique()))
+    if len(periods) < 2:
+        raise ValueError(
+            f"HAD requires a two-period panel; got {len(periods)} distinct "
+            f"period(s) in column {time_col!r}."
+        )
+    if len(periods) > 2:
+        if first_treat_col is None:
+            raise ValueError(
+                f"HAD Phase 2a requires exactly two time periods "
+                f"(got {len(periods)} in {time_col!r}) when "
+                f"first_treat_col=None. Multi-period / staggered adoption "
+                f"support is queued for Phase 2b (Appendix B.2 event-study)."
+            )
+        raise ValueError(
+            f"HAD Phase 2a requires exactly two time periods "
+            f"(got {len(periods)} in {time_col!r}). Staggered adoption "
+            f"reduction (first_treat_col supplied with >2 periods) is "
+            f"queued for Phase 2b (Appendix B.2 event-study)."
+        )
+
+    t_pre = int(periods[0]) if np.issubdtype(periods.dtype, np.integer) else periods[0]
+    t_post = int(periods[1]) if np.issubdtype(periods.dtype, np.integer) else periods[1]
+
+    # Balanced-panel check: every unit appears exactly once per period.
+    counts = data.groupby([unit_col, time_col]).size()
+    if (counts != 1).any():
+        n_bad = int((counts != 1).sum())
+        raise ValueError(
+            f"Unbalanced panel: {n_bad} unit-period cells have != 1 "
+            f"observation. HAD requires a balanced two-period panel "
+            f"(each unit observed exactly once at each period)."
+        )
+    unit_counts = data.groupby(unit_col)[time_col].nunique()
+    incomplete = unit_counts[unit_counts != 2]
+    if len(incomplete) > 0:
+        raise ValueError(
+            f"Unbalanced panel: {len(incomplete)} unit(s) do not appear "
+            f"in both periods. HAD requires a balanced two-period panel."
+        )
+
+    # NaN checks on key columns.
+    for col in [outcome_col, dose_col, unit_col]:
+        if bool(data[col].isna().any()):
+            n_nan = int(data[col].isna().sum())
+            raise ValueError(
+                f"{n_nan} NaN value(s) found in column {col!r}. HAD "
+                f"does not silently drop NaN rows; drop or impute before "
+                f"calling fit()."
+            )
+
+    # Pre-period no-unit-untreated check.
+    pre_mask = data[time_col] == t_pre
+    pre_doses = np.asarray(data.loc[pre_mask, dose_col], dtype=np.float64)
+    nonzero_pre = pre_doses != 0
+    if nonzero_pre.any():
+        n_bad = int(nonzero_pre.sum())
+        raise ValueError(
+            f"HAD requires D_{{g,1}} = 0 for all units (pre-period "
+            f"untreated). {n_bad} unit(s) have nonzero dose at "
+            f"t_pre={t_pre}. Drop these units or verify the dose column."
+        )
+
+    # Optional cross-validation via first_treat_col: if supplied, the
+    # first_treat == 0 units must have D_{g, t_post} == 0, and first_treat
+    # == t_post units may have any D_{g, t_post} >= 0. We do NOT require
+    # strict consistency (the paper treats D_{g,2} as the primary signal),
+    # but a quick sanity check catches obvious column mix-ups.
+    if first_treat_col is not None:
+        ft = data.groupby(unit_col)[first_treat_col].first().to_numpy()
+        if np.any(np.isnan(ft.astype(np.float64, copy=False))):
+            raise ValueError(
+                f"first_treat_col={first_treat_col!r} contains NaN. "
+                f"Use 0 for never-treated units and t_post for treated."
+            )
+        valid_values = {0, t_post}
+        bad = [v for v in np.unique(ft) if int(v) not in valid_values]
+        if bad:
+            raise ValueError(
+                f"first_treat_col={first_treat_col!r} contains value(s) "
+                f"{bad} outside the allowed set {{0, {t_post}}} for a "
+                f"two-period HAD panel. Staggered timing with multiple "
+                f"cohorts is Phase 2b."
+            )
+
+    return t_pre, t_post
+
+
+def _aggregate_first_difference(
+    data: pd.DataFrame,
+    outcome_col: str,
+    dose_col: str,
+    time_col: str,
+    unit_col: str,
+    t_pre: Any,
+    t_post: Any,
+    cluster_col: Optional[str],
+) -> Tuple[np.ndarray, np.ndarray, Optional[np.ndarray], np.ndarray]:
+    """Reduce a balanced two-period panel to unit-level first differences.
+
+    Returns
+    -------
+    d_arr : np.ndarray, shape (G,)
+        Post-period dose ``D_{g,2}`` per unit, ordered by sorted unit id.
+    dy_arr : np.ndarray, shape (G,)
+        First-difference outcome ``Y_{g,2} - Y_{g,1}`` per unit.
+    cluster_arr : np.ndarray or None, shape (G,)
+        Cluster IDs per unit (must be unit-constant); ``None`` when
+        ``cluster_col`` is ``None``.
+    unit_ids : np.ndarray, shape (G,)
+        Sorted unit identifiers (useful for debugging / test inspection).
+    """
+    df = data.sort_values([unit_col, time_col]).reset_index(drop=True)
+    pre = df[df[time_col] == t_pre].set_index(unit_col).sort_index()
+    post = df[df[time_col] == t_post].set_index(unit_col).sort_index()
+
+    if bool(pre.index.equals(post.index)) is False:
+        # Should not reach here after _validate_had_panel balanced-panel
+        # check, but belt-and-suspenders:
+        raise ValueError(
+            "Internal error: pre and post period unit indices do not "
+            "match after balanced-panel validation. Please report this."
+        )
+
+    unit_ids = np.asarray(pre.index)
+    d_arr = post[dose_col].to_numpy(dtype=np.float64)
+    dy_arr = post[outcome_col].to_numpy(dtype=np.float64) - pre[outcome_col].to_numpy(
+        dtype=np.float64
+    )
+
+    cluster_arr: Optional[np.ndarray] = None
+    if cluster_col is not None:
+        if cluster_col not in data.columns:
+            raise ValueError(f"cluster column {cluster_col!r} not found in data.")
+        # Cluster must be unit-constant; take the first cluster id per unit.
+        cluster_per_unit = df.groupby(unit_col)[cluster_col].nunique()
+        if (cluster_per_unit > 1).any():
+            n_bad = int((cluster_per_unit > 1).sum())
+            raise ValueError(
+                f"cluster column {cluster_col!r} is not constant within "
+                f"unit for {n_bad} unit(s). Cluster must be unit-level "
+                f"(e.g., cluster_col=unit_col, or a coarser grouping "
+                f"where each unit belongs to a single cluster)."
+            )
+        cluster_arr = df.groupby(unit_col)[cluster_col].first().sort_index().to_numpy()
+        # NaN cluster ids are a silent-failure vector; reject front-door.
+        if pd.isna(cluster_arr).any():
+            n_nan = int(pd.isna(cluster_arr).sum())
+            raise ValueError(
+                f"{n_nan} unit(s) have NaN cluster id in "
+                f"column {cluster_col!r}. Silent row dropping is "
+                f"disabled; drop or impute cluster ids before calling fit()."
+            )
+
+    return d_arr, dy_arr, cluster_arr, unit_ids
+
+
+# =============================================================================
+# Design auto-detection
+# =============================================================================
+
+
+def _detect_design(d_arr: np.ndarray) -> str:
+    """Resolve ``design="auto"`` to a concrete mode string.
+
+    Implements the REGISTRY line-2320 rule:
+
+    1. If ``d.min() == 0`` exactly, resolve unconditionally to
+       ``continuous_at_zero``. (The modal-min check only runs when
+       ``d.min() > 0`` since otherwise the "at d_lower" subset is always
+       the "zero-dose" subset and the continuous-at-zero path is the
+       paper-endorsed handling.)
+    2. Otherwise, if ``d.min() < 0.01 * median(|d|)``, resolve to
+       ``continuous_at_zero`` (small-share-of-treated samples that
+       effectively satisfy Design 1').
+    3. Else, if the modal fraction at ``d.min()`` exceeds 2%, resolve to
+       ``mass_point``.
+    4. Else, resolve to ``continuous_near_d_lower``.
+
+    The rule is strict first-match: when both the ``<0.01 * median``
+    threshold and the modal-fraction rule fire simultaneously (e.g., 3%
+    of units at ``D = 0`` + 97% ``Uniform(0.5, 1)``), the resolution is
+    ``continuous_at_zero``. This matches the paper-endorsed Design 1'
+    handling of small-share-of-treated samples.
+
+    Parameters
+    ----------
+    d_arr : np.ndarray, shape (G,)
+        Unit-level post-period doses.
+
+    Returns
+    -------
+    str
+        One of ``"continuous_at_zero"``, ``"continuous_near_d_lower"``,
+        ``"mass_point"``.
+    """
+    d_min = float(d_arr.min())
+
+    # Tie-break: d.min() == 0 always resolves to Design 1'.
+    # Use an absolute tolerance that scales with the range of the data.
+    scale = max(1.0, float(np.max(np.abs(d_arr))))
+    if abs(d_min) <= 1e-12 * scale:
+        return "continuous_at_zero"
+
+    median_abs = float(np.median(np.abs(d_arr)))
+    # REGISTRY rule: d.min() < threshold * median(|d|) -> continuous_at_zero.
+    # Guard against median_abs == 0 (all doses at zero; handled above).
+    if median_abs > 0 and d_min < _CONTINUOUS_AT_ZERO_THRESHOLD * median_abs:
+        return "continuous_at_zero"
+
+    # Modal-fraction rule for mass-point detection.
+    eps = 1e-12 * max(1.0, abs(d_min))
+    at_d_min = np.abs(d_arr - d_min) <= eps
+    modal_fraction = float(np.mean(at_d_min))
+    if modal_fraction > _MASS_POINT_THRESHOLD:
+        return "mass_point"
+
+    return "continuous_near_d_lower"
+
+
+# =============================================================================
+# Mass-point 2SLS
+# =============================================================================
+
+
+def _fit_mass_point_2sls(
+    d: np.ndarray,
+    dy: np.ndarray,
+    d_lower: float,
+    cluster: Optional[np.ndarray],
+    vcov_type: str,
+) -> Tuple[float, float]:
+    """Wald-IV point estimate and structural-residual 2SLS sandwich SE.
+
+    The just-identified binary instrument ``Z_g = 1{D_{g,2} > d_lower}``
+    gives a 2SLS estimator that collapses to the Wald-IV sample-average
+    ratio
+    ``beta_hat = (Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``.
+
+    The STANDARD ERROR is computed via the 2SLS sandwich
+    ``V = [Z'X]^{-1} * Omega * [Z'X]^{-T}`` where ``Omega`` is built from
+    the STRUCTURAL residuals
+    ``u = dy - alpha_hat - beta_hat * d``
+    (NOT the reduced-form residuals). This is the canonical 2SLS
+    inference path and matches what ``AER::ivreg`` / ``ivreg2`` would
+    produce. The "OLS-on-indicator + scale by dose-gap" shortcut gives
+    reduced-form residuals, which diverge from structural residuals in
+    finite samples and substantively under clustering.
+
+    Supported ``vcov_type``:
+
+    - ``"classical"``: constant variance ``sigma_hat^2 = sum(u^2) / (n-2)``.
+    - ``"hc1"``: heteroskedasticity-robust with small-sample DOF scaling
+      ``n / (n - 2)``. With ``cluster`` supplied, switches to CR1
+      (Liang-Zeger) cluster-robust.
+
+    ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError`` (2SLS-specific
+    leverage derivation pending; queued for follow-up).
+
+    Parameters
+    ----------
+    d : np.ndarray, shape (n,)
+        Post-period doses ``D_{g,2}``.
+    dy : np.ndarray, shape (n,)
+        First-difference outcome ``Delta Y_g``.
+    d_lower : float
+        Support infimum / evaluation point.
+    cluster : np.ndarray or None, shape (n,)
+        Cluster ids per unit (``None`` for no clustering).
+    vcov_type : str
+        One of ``"classical"``, ``"hc1"``.
+
+    Returns
+    -------
+    tuple[float, float]
+        ``(beta_hat, se_beta)``. NaN for SE when the dose-gap vanishes
+        (``Dbar_{Z=1} == Dbar_{Z=0}``) or the sandwich is singular.
+    """
+    d = np.asarray(d, dtype=np.float64)
+    dy = np.asarray(dy, dtype=np.float64)
+    n = d.shape[0]
+
+    Z = (d > d_lower).astype(np.float64)
+    n_above = int(Z.sum())
+    n_at_or_below = n - n_above
+
+    # Degeneracy checks: if either Z-subset is empty, Wald-IV is undefined.
+    if n_above == 0 or n_at_or_below == 0:
+        return float("nan"), float("nan")
+
+    dose_gap = d[Z == 1].mean() - d[Z == 0].mean()
+    if abs(dose_gap) < 1e-12 * max(1.0, abs(float(d.mean()))):
+        # No dose variation around d_lower -> beta undefined.
+        return float("nan"), float("nan")
+
+    dy_gap = dy[Z == 1].mean() - dy[Z == 0].mean()
+    beta_hat = float(dy_gap / dose_gap)
+    alpha_hat = float(dy.mean() - beta_hat * d.mean())
+
+    # STRUCTURAL residuals (plan-review CRITICAL #1): u = y - alpha - beta*x.
+    # The Wald-IV/OLS-on-indicator shortcut would use reduced-form residuals
+    # u_rf = dy - (alpha_rf + gamma * Z), which differ in finite samples.
+    u = dy - alpha_hat - beta_hat * d
+
+    # Design matrices: X = [1, d] (endogenous), Z_d = [1, Z] (instrument).
+    X = np.column_stack([np.ones(n, dtype=np.float64), d])
+    Zd = np.column_stack([np.ones(n, dtype=np.float64), Z])
+
+    ZtX = Zd.T @ X  # (2, 2)
+    try:
+        ZtX_inv = np.linalg.inv(ZtX)
+    except np.linalg.LinAlgError:
+        # Z'X singular (e.g., no variation in Z, already handled above).
+        return beta_hat, float("nan")
+
+    vcov_type = vcov_type.lower()
+    if vcov_type in _MASS_POINT_VCOV_UNSUPPORTED:
+        raise NotImplementedError(
+            f"vcov_type={vcov_type!r} is not supported on the "
+            f"HeterogeneousAdoptionDiD mass-point path in Phase 2a. "
+            f"HC2 / HC2-BM require a 2SLS-specific leverage derivation "
+            f"`x_i' (Z'X)^{{-1}}(...)(X'Z)^{{-1}} x_i` that differs from "
+            f"the OLS leverage `x_i' (X'X)^{{-1}} x_i`. Derivation + R "
+            f"parity anchor are queued for the follow-up PR. Use "
+            f"vcov_type='hc1' or 'classical' for now."
+        )
+
+    if cluster is not None:
+        # CR1 (Liang-Zeger) cluster-robust sandwich.
+        # Use pd.unique to match R's first-appearance order (stable for
+        # cross-runtime reproducibility).
+        clusters_unique = pd.unique(cluster)
+        Omega = np.zeros((2, 2), dtype=np.float64)
+        for c in clusters_unique:
+            idx = cluster == c
+            # score per cluster: s_c = Zd[idx]' @ u[idx]
+            s = Zd[idx].T @ u[idx]
+            Omega += np.outer(s, s)
+        G = len(clusters_unique)
+        k = 2
+        if G < 2:
+            # Cluster-robust SE undefined with a single cluster.
+            return beta_hat, float("nan")
+        Omega *= (G / (G - 1)) * ((n - 1) / (n - k))
+    elif vcov_type == "classical":
+        dof = n - 2
+        if dof <= 0:
+            return beta_hat, float("nan")
+        sigma2 = float((u * u).sum()) / dof
+        Omega = sigma2 * (Zd.T @ Zd)
+    elif vcov_type == "hc1":
+        dof = n - 2
+        if dof <= 0:
+            return beta_hat, float("nan")
+        Omega = (n / dof) * (Zd.T @ ((u * u)[:, None] * Zd))
+    else:
+        raise ValueError(
+            f"Unsupported vcov_type={vcov_type!r} on the HAD mass-point "
+            f"path. Supported: {_MASS_POINT_VCOV_SUPPORTED} (plus "
+            f"cluster-robust CR1 via cluster=)."
+        )
+
+    V = ZtX_inv @ Omega @ ZtX_inv.T
+    var_beta = float(V[1, 1])
+    if not np.isfinite(var_beta) or var_beta < 0:
+        return beta_hat, float("nan")
+    se_beta = float(np.sqrt(var_beta))
+    return beta_hat, se_beta
+
+
+# =============================================================================
+# Main estimator class
+# =============================================================================
+
+
+class HeterogeneousAdoptionDiD:
+    """Heterogeneous Adoption Difference-in-Differences estimator.
+
+    Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026)
+    Weighted-Average-Slope (WAS) estimator with three design-dispatch
+    paths: Design 1' (continuous-at-zero), Design 1 continuous-near-
+    d_lower, and Design 1 mass-point (2SLS sample-average per paper
+    Section 3.2.4). Phase 2a ships the single-period path only; the
+    multi-period event-study extension (Appendix B.2) is queued for
+    Phase 2b.
+
+    Parameters
+    ----------
+    design : {"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"}
+        Design-dispatch strategy. Defaults to ``"auto"`` which resolves
+        via the REGISTRY auto-detect rule on the fitted dose data
+        (see :func:`_detect_design`). Explicit overrides run the chosen
+        path without auto-reject (e.g., forcing Design 1' on mass-point
+        data runs the nonparametric fit even though the paper would
+        counsel the 2SLS path).
+    d_lower : float or None
+        Support infimum ``d_lower``. ``None`` means use ``0.0`` on the
+        Design 1' path and ``float(d.min())`` on the other two paths.
+    kernel : {"epanechnikov", "triangular", "uniform"}
+        Forwarded to :func:`bias_corrected_local_linear` on the continuous
+        paths. Ignored on the mass-point path.
+    alpha : float
+        CI level (0.05 for 95% CI).
+    vcov_type : {"classical", "hc1"} or None
+        Mass-point-path only. ``None`` defaults to ``"hc1"``.
+        ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError``. Ignored
+        on the continuous paths (which use the CCT-2014 robust SE from
+        Phase 1c); passing a non-default ``vcov_type`` on a continuous
+        path emits a one-time ``UserWarning``.
+    robust : bool
+        Backward-compat alias: ``True`` maps to ``"hc1"``, ``False`` maps
+        to ``"classical"``. Only relevant when ``vcov_type is None`` on
+        the mass-point path.
+    cluster : str or None
+        Column name for cluster-robust SE on the mass-point path (CR1).
+        Ignored with a ``UserWarning`` on the continuous paths in Phase
+        2a (nonparametric cluster support exists on Phase 1c but is
+        exposed separately via ``bias_corrected_local_linear``; the
+        estimator-level knob is queued for a follow-up PR).
+
+    Notes
+    -----
+    **Diagnostics coverage.** ``HeterogeneousAdoptionDiDResults.bandwidth_diagnostics``
+    and ``.bias_corrected_fit`` are populated only on the continuous
+    paths; both are ``None`` on the mass-point path (which is parametric
+    and has no bandwidth). Conversely, ``.n_mass_point`` and
+    ``.n_above_d_lower`` are populated only on the mass-point path.
+
+    **Clone idempotence.** ``self.design`` stores the RAW user input
+    (e.g., ``"auto"``); the resolved mode is stored on the result object
+    at fit time. This mirrors Phase 1a's ``_vcov_type_arg`` pattern and
+    keeps ``get_params()`` / ``sklearn.clone()`` round-trips exact.
+
+    Examples
+    --------
+    >>> import pandas as pd
+    >>> from diff_diff import HeterogeneousAdoptionDiD, generate_continuous_did_data
+    >>> data = generate_continuous_did_data(n_units=500, seed=42)  # doctest: +SKIP
+    >>> est = HeterogeneousAdoptionDiD(design="auto")  # doctest: +SKIP
+    >>> result = est.fit(data, outcome_col="outcome", dose_col="dose",
+    ...                  time_col="period", unit_col="unit")  # doctest: +SKIP
+    >>> result.design  # doctest: +SKIP
+    'continuous_at_zero'
+    """
+
+    def __init__(
+        self,
+        design: str = "auto",
+        d_lower: Optional[float] = None,
+        kernel: str = "epanechnikov",
+        alpha: float = 0.05,
+        vcov_type: Optional[str] = None,
+        robust: bool = False,
+        cluster: Optional[str] = None,
+    ) -> None:
+        self.design = design
+        self.d_lower = d_lower
+        self.kernel = kernel
+        self.alpha = alpha
+        self.vcov_type = vcov_type
+        self.robust = robust
+        self.cluster = cluster
+        self._validate_constructor_args()
+
+    def _validate_constructor_args(self) -> None:
+        if self.design not in _VALID_DESIGNS:
+            raise ValueError(
+                f"Invalid design={self.design!r}. Must be one of " f"{_VALID_DESIGNS}."
+            )
+        if not (0.0 < float(self.alpha) < 1.0):
+            raise ValueError(f"alpha must be in (0, 1); got {self.alpha!r}.")
+        if self.vcov_type is not None:
+            if self.vcov_type.lower() in _MASS_POINT_VCOV_UNSUPPORTED:
+                # Don't raise here — the raise happens at fit() time on
+                # the mass-point path so users who set vcov_type=hc2 for
+                # a continuous fit (which ignores the knob) don't get
+                # blocked. But we also don't silently accept it at
+                # construction if it's outside our documented enum.
+                pass
+            elif (
+                self.vcov_type.lower()
+                not in _MASS_POINT_VCOV_SUPPORTED + _MASS_POINT_VCOV_UNSUPPORTED
+            ):
+                raise ValueError(
+                    f"Invalid vcov_type={self.vcov_type!r}. Must be one "
+                    f"of {_MASS_POINT_VCOV_SUPPORTED + _MASS_POINT_VCOV_UNSUPPORTED}, "
+                    f"or None."
+                )
+
+    def get_params(self) -> Dict[str, Any]:
+        """Return the raw constructor parameters (sklearn-compatible).
+
+        Preserves the user's original inputs - in particular, ``design``
+        returns ``"auto"`` when the user set it to ``"auto"`` (even after
+        fit), so ``sklearn.base.clone(est)`` round-trips exactly.
+        """
+        return {
+            "design": self.design,
+            "d_lower": self.d_lower,
+            "kernel": self.kernel,
+            "alpha": self.alpha,
+            "vcov_type": self.vcov_type,
+            "robust": self.robust,
+            "cluster": self.cluster,
+        }
+
+    def set_params(self, **params: Any) -> "HeterogeneousAdoptionDiD":
+        """Set estimator parameters and return self (sklearn-compatible)."""
+        for key, value in params.items():
+            if not hasattr(self, key):
+                raise ValueError(
+                    f"Invalid parameter: {key}. Valid parameters: "
+                    f"{list(self.get_params().keys())}."
+                )
+            setattr(self, key, value)
+        self._validate_constructor_args()
+        return self
+
+    # ------------------------------------------------------------------
+    # Main fit entry point
+    # ------------------------------------------------------------------
+
+    def fit(
+        self,
+        data: pd.DataFrame,
+        outcome_col: str,
+        dose_col: str,
+        time_col: str,
+        unit_col: str,
+        first_treat_col: Optional[str] = None,
+        aggregate: str = "overall",
+        survey: Any = None,
+        weights: Optional[np.ndarray] = None,
+    ) -> HeterogeneousAdoptionDiDResults:
+        """Fit the HAD estimator on a two-period panel.
+
+        Parameters
+        ----------
+        data : pd.DataFrame
+        outcome_col, dose_col, time_col, unit_col : str
+            Column names.
+        first_treat_col : str or None
+            Optional first-treatment column for cross-validation.
+            Required when the panel has more than two periods (raises
+            for >2 periods in Phase 2a; Phase 2b handles staggered).
+        aggregate : {"overall"}
+            ``"event_study"`` raises ``NotImplementedError`` (Phase 2b).
+        survey : SurveyDesign or None
+            Reserved for a follow-up survey-integration PR. Must be
+            ``None`` in Phase 2a.
+        weights : np.ndarray or None
+            Reserved for a follow-up PR. Must be ``None`` in Phase 2a.
+
+        Returns
+        -------
+        HeterogeneousAdoptionDiDResults
+        """
+        # ---- Phase 2a scaffolding rejections (before any work) ----
+        if aggregate not in _VALID_AGGREGATES:
+            raise ValueError(
+                f"Invalid aggregate={aggregate!r}. Must be one of " f"{_VALID_AGGREGATES}."
+            )
+        if aggregate == "event_study":
+            raise NotImplementedError(
+                "aggregate='event_study' (multi-period event study per "
+                "paper Appendix B.2) is queued for Phase 2b. Phase 2a "
+                "supports aggregate='overall' (single-period WAS) only."
+            )
+        if survey is not None:
+            raise NotImplementedError(
+                "survey=<SurveyDesign> support on HeterogeneousAdoptionDiD "
+                "is queued for a follow-up PR after Phase 2a. Pass "
+                "survey=None for now."
+            )
+        if weights is not None:
+            raise NotImplementedError(
+                "weights=<array> support on HeterogeneousAdoptionDiD is "
+                "queued for a follow-up PR (paired with survey "
+                "integration). Pass weights=None for now."
+            )
+
+        # ---- Resolve effective fit-time state (local vars only, per
+        # feedback_fit_does_not_mutate_config; do not mutate self.*) ----
+        design_arg = self.design
+        d_lower_arg = self.d_lower
+        vcov_type_arg = self.vcov_type
+        robust_arg = self.robust
+        cluster_arg = self.cluster
+
+        # ---- Validate panel contract ----
+        t_pre, t_post = _validate_had_panel(
+            data, outcome_col, dose_col, time_col, unit_col, first_treat_col
+        )
+
+        # ---- Aggregate to unit-level first differences ----
+        d_arr, dy_arr, cluster_arr, _ = _aggregate_first_difference(
+            data,
+            outcome_col,
+            dose_col,
+            time_col,
+            unit_col,
+            t_pre,
+            t_post,
+            cluster_arg,
+        )
+
+        n_obs = int(d_arr.shape[0])
+        if n_obs < 3:
+            raise ValueError(
+                f"HAD requires at least 3 units for inference; got "
+                f"n_obs={n_obs} after aggregation."
+            )
+
+        # ---- Resolve design ----
+        if design_arg == "auto":
+            resolved_design = _detect_design(d_arr)
+        else:
+            resolved_design = design_arg
+
+        # ---- Resolve d_lower ----
+        if resolved_design == "continuous_at_zero":
+            d_lower_val = 0.0
+        elif d_lower_arg is None:
+            d_lower_val = float(d_arr.min())
+        else:
+            d_lower_val = float(d_lower_arg)
+
+        # ---- Compute cohort counts ----
+        if resolved_design == "mass_point":
+            eps = 1e-12 * max(1.0, abs(d_lower_val))
+            at_d_min_mask = np.abs(d_arr - d_lower_val) <= eps
+            above_mask = d_arr > d_lower_val
+            n_mass_point: Optional[int] = int(at_d_min_mask.sum())
+            n_above_d_lower: Optional[int] = int(above_mask.sum())
+            n_treated = n_above_d_lower
+            n_control = int(at_d_min_mask.sum())
+        else:
+            n_mass_point = None
+            n_above_d_lower = None
+            above_mask = d_arr > d_lower_val
+            n_treated = int(above_mask.sum())
+            n_control = n_obs - n_treated
+
+        dose_mean = float(d_arr.mean())
+
+        # ---- Dispatch ----
+        if resolved_design in ("continuous_at_zero", "continuous_near_d_lower"):
+            # Warn once if the user set a mass-point-only knob that's
+            # ignored on the continuous path.
+            if vcov_type_arg is not None:
+                warnings.warn(
+                    f"vcov_type={vcov_type_arg!r} is ignored on the "
+                    f"'{resolved_design}' path (the continuous designs "
+                    f"use the CCT-2014 robust SE from Phase 1c). "
+                    f"vcov_type applies only to design='mass_point'.",
+                    UserWarning,
+                    stacklevel=2,
+                )
+            if cluster_arg is not None:
+                warnings.warn(
+                    f"cluster={cluster_arg!r} is ignored on the "
+                    f"'{resolved_design}' path in Phase 2a. Cluster-"
+                    f"robust SE on the nonparametric path is exposed "
+                    f"via diff_diff.bias_corrected_local_linear directly "
+                    f"but not yet threaded through the estimator-level "
+                    f"knob.",
+                    UserWarning,
+                    stacklevel=2,
+                )
+            att, se, bc_fit, bw_diag = self._fit_continuous(
+                d_arr,
+                dy_arr,
+                resolved_design,
+                d_lower_val,
+            )
+            inference_method = "analytical_nonparametric"
+            vcov_label: Optional[str] = None
+            cluster_label: Optional[str] = None
+        elif resolved_design == "mass_point":
+            if vcov_type_arg is None:
+                # Backward-compat: robust=True -> hc1, robust=False -> classical.
+                vcov_effective = "hc1" if robust_arg else "classical"
+            else:
+                vcov_effective = vcov_type_arg.lower()
+            # Explicit cluster + hc1 becomes CR1 inside _fit_mass_point_2sls.
+            att, se = _fit_mass_point_2sls(
+                d_arr,
+                dy_arr,
+                d_lower_val,
+                cluster_arr,
+                vcov_effective,
+            )
+            bc_fit = None
+            bw_diag = None
+            inference_method = "analytical_2sls"
+            vcov_label = vcov_effective
+            cluster_label = cluster_arg if cluster_arg is not None else None
+        else:
+            raise ValueError(f"Internal error: unhandled design={resolved_design!r}.")
+
+        # ---- Route all inference fields through safe_inference ----
+        t_stat, p_value, conf_int = safe_inference(att, se, alpha=float(self.alpha))
+
+        return HeterogeneousAdoptionDiDResults(
+            att=float(att),
+            se=float(se),
+            t_stat=float(t_stat),
+            p_value=float(p_value),
+            conf_int=(float(conf_int[0]), float(conf_int[1])),
+            alpha=float(self.alpha),
+            design=resolved_design,
+            target_parameter=_TARGET_PARAMETER[resolved_design],
+            d_lower=d_lower_val,
+            dose_mean=dose_mean,
+            n_obs=n_obs,
+            n_treated=n_treated,
+            n_control=n_control,
+            n_mass_point=n_mass_point,
+            n_above_d_lower=n_above_d_lower,
+            inference_method=inference_method,
+            vcov_type=vcov_label,
+            cluster_name=cluster_label,
+            survey_metadata=None,  # Phase 2a: survey integration deferred.
+            bandwidth_diagnostics=bw_diag,
+            bias_corrected_fit=bc_fit,
+        )
+
+    # ------------------------------------------------------------------
+    # Continuous-design dispatch (Design 1' + Design 1 continuous-near-d_lower)
+    # ------------------------------------------------------------------
+
+    def _fit_continuous(
+        self,
+        d_arr: np.ndarray,
+        dy_arr: np.ndarray,
+        resolved_design: str,
+        d_lower_val: float,
+    ) -> Tuple[float, float, Optional[BiasCorrectedFit], Optional[BandwidthResult]]:
+        """Fit Phase 1c ``bias_corrected_local_linear`` and rescale to beta-scale.
+
+        Both continuous designs share the same rescaling path - they
+        differ only in the regressor shift and boundary:
+
+        - ``continuous_at_zero``: regressor ``d_arr``, boundary ``0``.
+        - ``continuous_near_d_lower``: regressor ``d_arr - d_lower``,
+          boundary ``0``.
+
+        The mu-scale output ``tau_bc`` from
+        :func:`bias_corrected_local_linear` is divided by the
+        beta-scale divisor ``D_bar = (1/G) * sum(D_{g,2})`` (Equation 8
+        in the paper). The same divisor applies to the classical CI
+        endpoints. The result of ``D_bar`` uses the ORIGINAL
+        (unshifted) dose mean, matching paper convention.
+        """
+        if resolved_design == "continuous_at_zero":
+            d_reg = d_arr
+            boundary = 0.0
+        elif resolved_design == "continuous_near_d_lower":
+            d_reg = d_arr - d_lower_val
+            boundary = 0.0
+        else:
+            raise ValueError(
+                f"_fit_continuous called with non-continuous " f"design={resolved_design!r}"
+            )
+
+        # Phase 1b/1c's bandwidth selector can hit degenerate ratios
+        # (divide-by-zero in the bias estimator) when the outcome is
+        # exactly constant or exactly linear in d (zero residuals).
+        # The upstream fix lives in ``_nprobust_port.lpbwselect_mse_dpi``
+        # and is deliberately out of Phase 2a scope (per plan "No changes
+        # to local_linear.py / _nprobust_port.py"). Catch the symptomatic
+        # exceptions here and surface a NaN result via ``safe_inference``
+        # so users get a well-shaped ``HeterogeneousAdoptionDiDResults``
+        # rather than a traceback.
+        try:
+            bc_fit = bias_corrected_local_linear(
+                d=d_reg,
+                y=dy_arr,
+                boundary=boundary,
+                kernel=self.kernel,
+                alpha=float(self.alpha),
+                # No cluster / vce / weights threading in Phase 2a (see
+                # UserWarning in fit()).
+            )
+        except (ZeroDivisionError, FloatingPointError, np.linalg.LinAlgError):
+            return float("nan"), float("nan"), None, None
+
+        dose_mean = float(d_arr.mean())
+        # Phase 1b / Phase 1c pre-checks enforce dose_mean > 0 for
+        # Design 1-family samples, but defense-in-depth: if dose_mean
+        # rounds to zero, fall through to NaN via safe_inference in fit().
+        if abs(dose_mean) < 1e-12:
+            att = float("nan")
+            se = float("nan")
+        else:
+            att = float(bc_fit.estimate_bias_corrected) / dose_mean
+            se = float(bc_fit.se_robust) / dose_mean
+
+        return att, se, bc_fit, bc_fit.bandwidth_diagnostics
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index dcc1e209..d9e1aba3 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2316,11 +2316,13 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
 - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4.
 - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246).
 - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug.
-- [ ] Phase 2: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1', Design 1 mass-point, and Design 1 continuous-near-`d̲`.
-- [ ] Phase 2: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower).
-- [ ] Phase 2: Panel validator verifies `D_{g,1} = 0` for all units; error on staggered timing without last-cohort subgroup.
-- [ ] Phase 2: Multi-period event-study extension (Appendix B.2).
-- [ ] Phase 2: NaN-propagation tests across all ~15 result fields via `assert_nan_inference`.
+- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and apply the β-scale rescaling `(1/G) Σ D_{g,2}` from Equation 8 of de Chaisemartin et al. (2026); mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d_lower}`.
+- [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples.
+- [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units; rejects `>2` time periods (staggered reduction queued for Phase 2b); rejects unbalanced panels and NaN in outcome/dose/unit columns.
+- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs.
+    - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR.
+    - **Note (Phase 2a scope):** Ships single-period only. `aggregate="event_study"` (Appendix B.2 multi-period extension) raises `NotImplementedError` pointing to Phase 2b. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR.
+- [ ] Phase 2b: Multi-period event-study extension (Appendix B.2). Will lift the `aggregate="event_study"` scaffolding in `HeterogeneousAdoptionDiD.fit()` and aggregate per-cohort first-differences into an event-study result.
 - [ ] Phase 3: `qug_test()` (`T = D_{2,(1)} / (D_{2,(2)} - D_{2,(1)})`, rejection `{T > 1/α - 1}`).
 - [ ] Phase 3: `stute_test()` Cramér-von Mises with Mammen wild bootstrap.
 - [ ] Phase 3: `yatchew_hr_test()` heteroskedasticity-robust linearity test.
diff --git a/tests/test_had.py b/tests/test_had.py
new file mode 100644
index 00000000..dedcc1ac
--- /dev/null
+++ b/tests/test_had.py
@@ -0,0 +1,1216 @@
+"""Tests for :class:`diff_diff.had.HeterogeneousAdoptionDiD` (Phase 2a).
+
+Covers the 12 plan commit criteria:
+
+1. All three design paths produce a finite result on synthetic DGPs.
+2. ``design="auto"`` resolves correctly on each DGP + two edge cases.
+3. Beta-scale rescaling: ``att == tau_bc / D_bar`` and CI endpoints ==
+   ``(ci_low/D_bar, ci_high/D_bar)`` at ``atol=1e-14``.
+4. Mass-point Wald-IV point estimate matches manual formula at
+   ``atol=1e-14``.
+5. Mass-point 2SLS SE parity against hand-coded sandwich at
+   ``atol=1e-12`` for HC1, classical, and CR1 (cluster-robust).
+6. Mass-point + ``vcov_type in {hc2, hc2_bm}`` raises
+   ``NotImplementedError``.
+7. Panel-contract violations raise targeted ``ValueError``s.
+8. NaN propagation: constant-y and mass-point degenerate inputs produce
+   all-NaN inference.
+9. sklearn clone round-trip preserves raw ``design="auto"``; fit is
+   idempotent.
+10. Scaffolding (``aggregate="event_study"``, ``survey``, ``weights``)
+    raises ``NotImplementedError`` with phase pointers.
+11. ``get_params()`` keys match ``__init__`` signature.
+12. REGISTRY ticks tested indirectly via parity with the paper rules.
+"""
+
+from __future__ import annotations
+
+import inspect
+import warnings
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from diff_diff.had import (
+    HeterogeneousAdoptionDiD,
+    HeterogeneousAdoptionDiDResults,
+    _aggregate_first_difference,
+    _detect_design,
+    _fit_mass_point_2sls,
+    _validate_had_panel,
+)
+from diff_diff.local_linear import bias_corrected_local_linear
+from tests.conftest import assert_nan_inference
+
+# =============================================================================
+# DGP helpers
+# =============================================================================
+
+
+def _make_panel(d_post, delta_y, periods=(1, 2), extra_cols=None):
+    """Build a balanced two-period panel with ``D_{g,1} = 0``.
+
+    Parameters
+    ----------
+    d_post : np.ndarray, shape (G,)
+        Unit-level post-period dose ``D_{g,2}``.
+    delta_y : np.ndarray, shape (G,)
+        Unit-level first-difference outcome ``Y_{g,2} - Y_{g,1}``.
+    periods : tuple
+        (t_pre, t_post).
+    extra_cols : dict or None
+        Additional unit-constant columns (e.g., cluster variable).
+    """
+    G = len(d_post)
+    t_pre, t_post = periods
+    units = np.arange(G)
+    df = pd.DataFrame(
+        {
+            "unit": np.repeat(units, 2),
+            "period": np.tile([t_pre, t_post], G),
+            "dose": np.column_stack([np.zeros(G), d_post]).ravel(),
+            # Set period-1 outcome to 0; period-2 outcome = delta_y so that
+            # Y_{g,2} - Y_{g,1} == delta_y exactly.
+            "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(),
+        }
+    )
+    if extra_cols:
+        for col, vals in extra_cols.items():
+            df[col] = np.repeat(vals, 2)
+    return df
+
+
+def _dgp_continuous_at_zero(G, seed):
+    """Design 1' DGP: uniform dose on [0, 1] with exact zero in the sample."""
+    rng = np.random.default_rng(seed)
+    d = rng.uniform(0.0, 1.0, G)
+    d[0] = 0.0  # guarantee continuous_at_zero auto-detection
+    dy = 0.3 * d + 0.1 * rng.standard_normal(G)
+    return d, dy
+
+
+def _dgp_continuous_near_d_lower(G, seed):
+    """Design 1 continuous-near-d_lower DGP: Beta(2,2) shifted to [0.1, 1]."""
+    rng = np.random.default_rng(seed)
+    u = rng.beta(2, 2, G)
+    d = 0.1 + 0.9 * u
+    dy = 0.3 * d + 0.1 * rng.standard_normal(G)
+    return d, dy
+
+
+def _dgp_mass_point(G, seed, d_lower=0.5, mass_frac=0.3, beta=0.3):
+    """Mass-point DGP: ``mass_frac`` at d_lower, rest Uniform(d_lower, 1)."""
+    rng = np.random.default_rng(seed)
+    mass_n = int(mass_frac * G)
+    d = np.concatenate([np.full(mass_n, d_lower), rng.uniform(d_lower, 1.0, G - mass_n)])
+    dy = beta * d + 0.1 * rng.standard_normal(G)
+    return d, dy
+
+
+# =============================================================================
+# Criterion 1: Smoke tests - all 3 design paths produce finite output
+# =============================================================================
+
+
+class TestSmokeAllDesigns:
+    def test_continuous_at_zero_finite(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=42)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert np.isfinite(r.att)
+        assert np.isfinite(r.se)
+        assert r.se > 0
+
+    def test_continuous_near_d_lower_finite(self):
+        d, dy = _dgp_continuous_near_d_lower(500, seed=42)
+        r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert np.isfinite(r.att)
+        assert np.isfinite(r.se)
+        assert r.se > 0
+
+    def test_mass_point_finite(self):
+        d, dy = _dgp_mass_point(500, seed=42)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert np.isfinite(r.att)
+        assert np.isfinite(r.se)
+        assert r.se > 0
+
+    def test_result_is_dataclass(self):
+        d, dy = _dgp_continuous_at_zero(400, seed=0)
+        r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit")
+        assert isinstance(r, HeterogeneousAdoptionDiDResults)
+
+    def test_continuous_populates_bandwidth_diagnostics(self):
+        d, dy = _dgp_continuous_at_zero(400, seed=0)
+        r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit")
+        assert r.bandwidth_diagnostics is not None
+        assert r.bias_corrected_fit is not None
+
+    def test_mass_point_nulls_bandwidth_diagnostics(self):
+        d, dy = _dgp_mass_point(400, seed=0)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.bandwidth_diagnostics is None
+        assert r.bias_corrected_fit is None
+        assert r.n_mass_point is not None
+        assert r.n_above_d_lower is not None
+
+    def test_continuous_nulls_mass_point_counts(self):
+        d, dy = _dgp_continuous_at_zero(400, seed=0)
+        r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit")
+        assert r.n_mass_point is None
+        assert r.n_above_d_lower is None
+
+
+# =============================================================================
+# Criterion 2: design="auto" detection rule
+# =============================================================================
+
+
+class TestDesignAutoDetect:
+    def test_detect_design_1_prime_exact_zero(self):
+        d, _ = _dgp_continuous_at_zero(500, seed=0)
+        assert _detect_design(d) == "continuous_at_zero"
+
+    def test_detect_design_continuous_near_d_lower(self):
+        d, _ = _dgp_continuous_near_d_lower(500, seed=0)
+        assert _detect_design(d) == "continuous_near_d_lower"
+
+    def test_detect_mass_point(self):
+        d, _ = _dgp_mass_point(500, seed=0)
+        assert _detect_design(d) == "mass_point"
+
+    def test_edge_small_mass_at_zero_resolves_continuous_at_zero(self):
+        """Plan criterion 2 edge-case (a): 3% at D=0 + 97% Uniform(0.5, 1)."""
+        rng = np.random.default_rng(0)
+        G = 1000
+        mass_n = int(0.03 * G)
+        d = np.concatenate([np.zeros(mass_n), rng.uniform(0.5, 1.0, G - mass_n)])
+        assert _detect_design(d) == "continuous_at_zero"
+
+    def test_edge_shifted_beta_not_small_enough_for_design_1_prime(self):
+        """Plan criterion 2 edge-case (b): d.min/median ~ 0.03 > 0.01 threshold."""
+        rng = np.random.default_rng(0)
+        u = rng.beta(2, 2, 1000)
+        d = 0.03 + u
+        assert _detect_design(d) == "continuous_near_d_lower"
+
+    def test_design_auto_dispatches_correctly_at_fit(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="auto").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.design == "continuous_at_zero"
+
+    def test_design_auto_mass_point_at_fit(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="auto").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.design == "mass_point"
+
+    def test_auto_does_not_mutate_self_design(self):
+        """Plan decision #14: self.design preserves raw 'auto' after fit."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        est = HeterogeneousAdoptionDiD(design="auto")
+        _ = est.fit(_make_panel(d, dy), "outcome", "dose", "period", "unit")
+        assert est.design == "auto"
+        assert est.get_params()["design"] == "auto"
+
+
+# =============================================================================
+# Criterion 3: Beta-scale rescaling parity
+# =============================================================================
+
+
+class TestBetaScaleRescaling:
+    def test_att_equals_tau_bc_over_dbar(self):
+        """result.att == bc_fit.tau_bc / D_bar at atol=1e-14."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
+        d_bar = float(d.mean())
+        expected = float(bc.estimate_bias_corrected) / d_bar
+        assert abs(r.att - expected) < 1e-14
+
+    def test_se_equals_se_rb_over_dbar(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
+        d_bar = float(d.mean())
+        expected = float(bc.se_robust) / d_bar
+        assert abs(r.se - expected) < 1e-14
+
+    def test_ci_lower_equals_ci_low_over_dbar(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
+        d_bar = float(d.mean())
+        expected = float(bc.ci_low) / d_bar
+        assert abs(r.conf_int[0] - expected) < 1e-14
+
+    def test_ci_upper_equals_ci_high_over_dbar(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
+        d_bar = float(d.mean())
+        expected = float(bc.ci_high) / d_bar
+        assert abs(r.conf_int[1] - expected) < 1e-14
+
+    def test_rescaling_shifted_regressor_continuous_near_d_lower(self):
+        """continuous_near_d_lower: regressor shift `D - d_lower`, divisor = mean(D)."""
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        d_lower_val = float(d.min())
+        r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        d_reg = d - d_lower_val
+        bc = bias_corrected_local_linear(d=d_reg, y=dy, boundary=0.0, alpha=0.05)
+        d_bar = float(d.mean())
+        expected = float(bc.estimate_bias_corrected) / d_bar
+        assert abs(r.att - expected) < 1e-14
+
+    def test_dose_mean_stored_on_result(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit")
+        assert abs(r.dose_mean - float(d.mean())) < 1e-14
+
+
+# =============================================================================
+# Criterion 4: Mass-point Wald-IV point estimate parity
+# =============================================================================
+
+
+class TestMassPointWaldIV:
+    def test_wald_iv_point_estimate(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        Z = (d > 0.5).astype(float)
+        expected = (dy[Z == 1].mean() - dy[Z == 0].mean()) / (d[Z == 1].mean() - d[Z == 0].mean())
+        assert abs(r.att - expected) < 1e-14
+
+    def test_wald_iv_equals_2sls(self):
+        """Sanity: Wald-IV is exactly 2SLS for binary instrument."""
+        d, dy = _dgp_mass_point(500, seed=7)
+        Z = (d > 0.5).astype(float).reshape(-1, 1)
+        # 2SLS via Z'X invert: beta = [(Z'X)^-1 Z'y][1]
+        X = np.column_stack([np.ones_like(d), d])
+        Zd = np.column_stack([np.ones_like(d), Z.ravel()])
+        beta_2sls = np.linalg.inv(Zd.T @ X) @ (Zd.T @ dy)
+        beta_wald = (dy[Z.ravel() == 1].mean() - dy[Z.ravel() == 0].mean()) / (
+            d[Z.ravel() == 1].mean() - d[Z.ravel() == 0].mean()
+        )
+        assert abs(float(beta_2sls[1]) - beta_wald) < 1e-12
+
+    def test_mass_point_n_counts_populated(self):
+        d, dy = _dgp_mass_point(500, seed=0, d_lower=0.5, mass_frac=0.3)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert r.n_mass_point == int(0.3 * 500)
+        assert r.n_above_d_lower == 500 - int(0.3 * 500)
+        assert r.n_treated == r.n_above_d_lower
+        assert r.n_control == r.n_mass_point
+
+
+# =============================================================================
+# Criterion 5: Mass-point 2SLS SE sandwich parity
+# =============================================================================
+
+
+def _manual_2sls_sandwich_se(d, dy, d_lower, vcov_type, cluster=None):
+    """Hand-coded textbook 2SLS sandwich using structural residuals.
+
+    Returns se_beta for the coefficient on d. Mirrors the helper in had.py
+    but computed from scratch to serve as the parity reference.
+    """
+    n = len(d)
+    Z = (d > d_lower).astype(np.float64)
+    dose_gap = d[Z == 1].mean() - d[Z == 0].mean()
+    dy_gap = dy[Z == 1].mean() - dy[Z == 0].mean()
+    beta = dy_gap / dose_gap
+    alpha_hat = dy.mean() - beta * d.mean()
+    u = dy - alpha_hat - beta * d  # STRUCTURAL residuals
+    X = np.column_stack([np.ones(n), d])
+    Zd = np.column_stack([np.ones(n), Z])
+    ZtX_inv = np.linalg.inv(Zd.T @ X)
+
+    if cluster is not None:
+        Omega = np.zeros((2, 2))
+        clusters = pd.unique(cluster)
+        G = len(clusters)
+        for c in clusters:
+            idx = cluster == c
+            s = Zd[idx].T @ u[idx]
+            Omega += np.outer(s, s)
+        Omega *= (G / (G - 1)) * ((n - 1) / (n - 2))
+    elif vcov_type == "classical":
+        sigma2 = (u * u).sum() / (n - 2)
+        Omega = sigma2 * (Zd.T @ Zd)
+    elif vcov_type == "hc1":
+        Omega = (n / (n - 2)) * (Zd.T @ ((u * u)[:, None] * Zd))
+    else:
+        raise ValueError(f"unknown vcov_type={vcov_type}")
+
+    V = ZtX_inv @ Omega @ ZtX_inv.T
+    return float(np.sqrt(V[1, 1]))
+
+
+class TestMassPointSEParity:
+    def test_classical_parity(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        expected = _manual_2sls_sandwich_se(d, dy, 0.5, "classical")
+        assert abs(r.se - expected) < 1e-12
+
+    def test_hc1_parity(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        expected = _manual_2sls_sandwich_se(d, dy, 0.5, "hc1")
+        assert abs(r.se - expected) < 1e-12
+
+    def test_cr1_cluster_robust_parity(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        cluster_ids = np.tile(np.arange(50), 10)  # 50 clusters of 10 units
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_ids})
+        r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        expected = _manual_2sls_sandwich_se(d, dy, 0.5, "hc1", cluster=cluster_ids)
+        assert abs(r.se - expected) < 1e-12
+
+    def test_robust_alias_maps_to_hc1(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        r_robust = HeterogeneousAdoptionDiD(design="mass_point", robust=True).fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        r_hc1 = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert r_robust.se == r_hc1.se
+
+    def test_robust_false_maps_to_classical(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        r_robust = HeterogeneousAdoptionDiD(design="mass_point", robust=False).fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        r_classical = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert r_robust.se == r_classical.se
+
+    def test_vcov_type_explicit_overrides_robust(self):
+        """When vcov_type is explicit, robust is ignored."""
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical", robust=True).fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert r.vcov_type == "classical"
+
+
+# =============================================================================
+# Criterion 6: hc2 / hc2_bm raise NotImplementedError
+# =============================================================================
+
+
+class TestMassPointUnsupportedVcov:
+    def test_hc2_raises(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc2")
+        with pytest.raises(NotImplementedError, match="HC2"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_hc2_bm_raises(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc2_bm")
+        with pytest.raises(NotImplementedError, match="HC2"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_hc2_pointer_references_followup_pr(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc2")
+        with pytest.raises(NotImplementedError, match="follow-up"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_vcov_type_ignored_on_continuous(self):
+        """hc2 passed with continuous design emits warning, does not raise."""
+        d, dy = _dgp_continuous_at_zero(300, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", vcov_type="hc2")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("ignored" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+
+
+# =============================================================================
+# Criterion 7: Panel-contract violations
+# =============================================================================
+
+
+class TestPanelContract:
+    def test_missing_outcome_col_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="column"):
+            est.fit(panel, "missing", "dose", "period", "unit")
+
+    def test_missing_dose_col_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="column"):
+            est.fit(panel, "outcome", "missing", "period", "unit")
+
+    def test_missing_time_col_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="column"):
+            est.fit(panel, "outcome", "dose", "missing", "unit")
+
+    def test_missing_unit_col_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="column"):
+            est.fit(panel, "outcome", "dose", "period", "missing")
+
+    def test_nonzero_pre_period_dose_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        panel.loc[panel["period"] == 1, "dose"] = 0.5
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"D_\{g,1\}|pre-period"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_unbalanced_panel_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy).iloc[:-1]  # drop one row
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"[Uu]nbalanced|[Bb]alanced"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_three_periods_without_first_treat_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel2 = _make_panel(d, dy)
+        panel3 = pd.concat([panel2, panel2.assign(period=3)])
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"two time periods|Phase 2b"):
+            est.fit(panel3, "outcome", "dose", "period", "unit")
+
+    def test_three_periods_with_first_treat_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel2 = _make_panel(d, dy)
+        panel3 = pd.concat([panel2, panel2.assign(period=3)])
+        panel3["ft"] = 2  # arbitrary first_treat
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"two time periods|Phase 2b"):
+            est.fit(
+                panel3,
+                "outcome",
+                "dose",
+                "period",
+                "unit",
+                first_treat_col="ft",
+            )
+
+    def test_single_period_raises(self):
+        d, _ = _dgp_continuous_at_zero(200, seed=0)
+        panel = pd.DataFrame(
+            {
+                "unit": np.arange(200),
+                "period": 2,
+                "dose": d,
+                "outcome": np.zeros(200),
+            }
+        )
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="two-period"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_nan_outcome_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        panel.loc[0, "outcome"] = np.nan
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="NaN"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_nan_dose_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        panel.loc[3, "dose"] = np.nan
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="NaN"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_duplicate_unit_period_raises(self):
+        """Two observations of the same unit-period cell."""
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        panel = pd.concat([panel, panel.iloc[[0]]])  # duplicate first row
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"[Uu]nbalanced|observation"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_first_treat_col_invalid_cohort_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        # Set first_treat values to {0, 5, 2} where 5 is not t_post.
+        ft_unit = np.where(np.arange(200) % 2 == 0, 0, 5)
+        panel["ft"] = np.repeat(ft_unit, 2)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"first_treat_col"):
+            est.fit(
+                panel,
+                "outcome",
+                "dose",
+                "period",
+                "unit",
+                first_treat_col="ft",
+            )
+
+
+# =============================================================================
+# Criterion 8: NaN propagation
+# =============================================================================
+
+
+class TestNaNPropagation:
+    def test_constant_y_produces_nan_inference(self):
+        """Constant outcome -> zero residuals -> NaN via safe_inference."""
+        d, _ = _dgp_continuous_at_zero(500, seed=0)
+        dy_zero = np.zeros_like(d)
+        panel = _make_panel(d, dy_zero)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        # All inference fields NaN when SE is non-finite.
+        assert_nan_inference(
+            {
+                "se": r.se,
+                "t_stat": r.t_stat,
+                "p_value": r.p_value,
+                "conf_int": r.conf_int,
+            }
+        )
+
+    def test_mass_point_all_above_d_lower_nan(self):
+        """Degenerate mass-point: no units at d_lower -> NaN."""
+        rng = np.random.default_rng(0)
+        G = 500
+        d = rng.uniform(0.6, 1.0, G)  # all above 0.5
+        dy = 0.3 * d + 0.1 * rng.standard_normal(G)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5).fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert np.isnan(r.att)
+        assert_nan_inference(
+            {
+                "se": r.se,
+                "t_stat": r.t_stat,
+                "p_value": r.p_value,
+                "conf_int": r.conf_int,
+            }
+        )
+
+    def test_mass_point_all_at_d_lower_nan(self):
+        """Degenerate mass-point: all units at d_lower -> NaN."""
+        rng = np.random.default_rng(0)
+        G = 500
+        d = np.full(G, 0.5)  # all at 0.5
+        dy = 0.1 * rng.standard_normal(G)
+        panel = _make_panel(d, dy)
+        # Avoid triggering pre-period D=0 check by starting at 0.5 at t2.
+        r = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5).fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert np.isnan(r.att)
+        assert_nan_inference(
+            {
+                "se": r.se,
+                "t_stat": r.t_stat,
+                "p_value": r.p_value,
+                "conf_int": r.conf_int,
+            }
+        )
+
+    def test_helper_returns_nan_on_empty_z_one(self):
+        """_fit_mass_point_2sls returns NaN when no units above d_lower."""
+        d = np.full(50, 0.5)
+        dy = np.random.default_rng(0).standard_normal(50)
+        beta, se = _fit_mass_point_2sls(d, dy, 0.5, None, "hc1")
+        assert np.isnan(beta)
+        assert np.isnan(se)
+
+    def test_helper_returns_nan_on_empty_z_zero(self):
+        """_fit_mass_point_2sls returns NaN when no units at d_lower."""
+        d = np.full(50, 0.6)  # all strictly above d_lower=0.5
+        dy = np.random.default_rng(0).standard_normal(50)
+        beta, se = _fit_mass_point_2sls(d, dy, 0.5, None, "hc1")
+        assert np.isnan(beta)
+        assert np.isnan(se)
+
+    def test_single_cluster_cr1_returns_nan(self):
+        """CR1 with only 1 cluster is undefined -> NaN."""
+        rng = np.random.default_rng(0)
+        G = 100
+        d = np.concatenate([np.full(30, 0.5), rng.uniform(0.5, 1.0, G - 30)])
+        dy = 0.3 * d + 0.1 * rng.standard_normal(G)
+        panel = _make_panel(d, dy, extra_cols={"state": np.zeros(G, dtype=int)})  # single cluster
+        r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert np.isnan(r.se)
+
+
+# =============================================================================
+# Criterion 9: sklearn clone round-trip + fit idempotence
+# =============================================================================
+
+
+class TestSklearnCompat:
+    def test_get_params_returns_all_constructor_args(self):
+        est = HeterogeneousAdoptionDiD(
+            design="continuous_near_d_lower",
+            d_lower=0.3,
+            kernel="triangular",
+            alpha=0.1,
+            vcov_type="hc1",
+            robust=True,
+            cluster="state",
+        )
+        params = est.get_params()
+        assert params == {
+            "design": "continuous_near_d_lower",
+            "d_lower": 0.3,
+            "kernel": "triangular",
+            "alpha": 0.1,
+            "vcov_type": "hc1",
+            "robust": True,
+            "cluster": "state",
+        }
+
+    def test_clone_round_trip(self):
+        est = HeterogeneousAdoptionDiD(design="auto", alpha=0.1, kernel="triangular")
+        est2 = HeterogeneousAdoptionDiD(**est.get_params())
+        assert est.get_params() == est2.get_params()
+
+    def test_fit_idempotent_same_att(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        r1 = est.fit(panel, "outcome", "dose", "period", "unit")
+        r2 = est.fit(panel, "outcome", "dose", "period", "unit")
+        assert r1.att == r2.att
+        assert r1.se == r2.se
+        assert r1.conf_int == r2.conf_int
+
+    def test_set_params_updates_and_returns_self(self):
+        est = HeterogeneousAdoptionDiD()
+        ret = est.set_params(alpha=0.1, design="continuous_at_zero")
+        assert ret is est
+        assert est.alpha == 0.1
+        assert est.design == "continuous_at_zero"
+
+    def test_set_params_invalid_key_raises(self):
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="Invalid parameter"):
+            est.set_params(not_a_param=True)
+
+    def test_set_params_invalid_design_raises(self):
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="design"):
+            est.set_params(design="made_up")
+
+
+# =============================================================================
+# Criterion 10: Scaffolding raises
+# =============================================================================
+
+
+class TestScaffoldingRejections:
+    def test_aggregate_event_study_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(NotImplementedError, match="2b"):
+            est.fit(
+                panel,
+                "outcome",
+                "dose",
+                "period",
+                "unit",
+                aggregate="event_study",
+            )
+
+    def test_aggregate_invalid_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="Invalid aggregate"):
+            est.fit(
+                panel,
+                "outcome",
+                "dose",
+                "period",
+                "unit",
+                aggregate="garbage",
+            )
+
+    def test_survey_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(NotImplementedError, match="survey"):
+            est.fit(
+                panel,
+                "outcome",
+                "dose",
+                "period",
+                "unit",
+                survey="anything",
+            )
+
+    def test_weights_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(NotImplementedError, match="weights"):
+            est.fit(
+                panel,
+                "outcome",
+                "dose",
+                "period",
+                "unit",
+                weights=np.ones(200),
+            )
+
+
+# =============================================================================
+# Criterion 11: get_params signature enumeration
+# =============================================================================
+
+
+class TestGetParamsContract:
+    def test_get_params_matches_init_signature(self):
+        sig_params = set(inspect.signature(HeterogeneousAdoptionDiD.__init__).parameters.keys()) - {
+            "self"
+        }
+        gp_params = set(HeterogeneousAdoptionDiD().get_params().keys())
+        assert sig_params == gp_params
+
+    def test_set_params_covers_all_init_params(self):
+        """Every __init__ param must be settable via set_params."""
+        est = HeterogeneousAdoptionDiD()
+        params = est.get_params()
+        # Round-trip via set_params
+        new_est = HeterogeneousAdoptionDiD()
+        new_est.set_params(**params)
+        assert new_est.get_params() == params
+
+
+# =============================================================================
+# Result class methods
+# =============================================================================
+
+
+class TestResultMethods:
+    def _result(self):
+        d, dy = _dgp_continuous_at_zero(400, seed=0)
+        panel = _make_panel(d, dy)
+        return HeterogeneousAdoptionDiD().fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_summary_returns_string(self):
+        r = self._result()
+        s = r.summary()
+        assert isinstance(s, str)
+        assert "HeterogeneousAdoptionDiD" in s
+        assert "WAS" in s
+        assert "Confidence Interval" in s
+
+    def test_print_summary_executes(self, capsys):
+        r = self._result()
+        r.print_summary()
+        captured = capsys.readouterr()
+        assert "HeterogeneousAdoptionDiD" in captured.out
+
+    def test_to_dict_populated(self):
+        r = self._result()
+        d = r.to_dict()
+        assert "att" in d
+        assert "se" in d
+        assert "design" in d
+        assert "target_parameter" in d
+        assert "d_lower" in d
+        assert "dose_mean" in d
+        assert "n_obs" in d
+        assert d["design"] == "continuous_at_zero"
+
+    def test_to_dataframe_populated(self):
+        r = self._result()
+        df = r.to_dataframe()
+        assert isinstance(df, pd.DataFrame)
+        assert len(df) == 1
+        assert "att" in df.columns
+
+    def test_repr_concise(self):
+        r = self._result()
+        s = repr(r)
+        assert "HeterogeneousAdoptionDiDResults" in s
+        assert "att=" in s
+        assert "design=" in s
+
+    def test_mass_point_summary_includes_mass_count(self):
+        d, dy = _dgp_mass_point(400, seed=0)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        s = r.summary()
+        assert "mass point" in s.lower() or "At d_lower" in s
+
+
+# =============================================================================
+# Design metadata
+# =============================================================================
+
+
+class TestDesignMetadata:
+    def test_target_parameter_design_1_prime(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.target_parameter == "WAS"
+
+    def test_target_parameter_design_1(self):
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.target_parameter == "WAS_d_lower"
+
+    def test_target_parameter_mass_point(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.target_parameter == "WAS_d_lower"
+
+    def test_d_lower_zero_for_design_1_prime(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.d_lower == 0.0
+
+    def test_d_lower_from_data_for_continuous_near(self):
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert abs(r.d_lower - float(d.min())) < 1e-14
+
+    def test_d_lower_explicit_override(self):
+        """d_lower override must satisfy d.min() >= d_lower (else negative shifted doses)."""
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        # d.min() is around 0.1 + epsilon for this DGP; override within that.
+        d_lower_user = float(d.min())  # explicit but equal to default
+        r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=d_lower_user).fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert abs(r.d_lower - d_lower_user) < 1e-14
+
+    def test_inference_method_continuous(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit")
+        assert r.inference_method == "analytical_nonparametric"
+
+    def test_inference_method_mass_point(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.inference_method == "analytical_2sls"
+
+    def test_survey_metadata_always_none_in_phase_2a(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        r = HeterogeneousAdoptionDiD().fit(_make_panel(d, dy), "outcome", "dose", "period", "unit")
+        assert r.survey_metadata is None
+
+    def test_alpha_stored_on_result(self):
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        r = HeterogeneousAdoptionDiD(alpha=0.1).fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.alpha == 0.1
+
+
+# =============================================================================
+# Constructor validation
+# =============================================================================
+
+
+class TestConstructorValidation:
+    def test_invalid_design_raises(self):
+        with pytest.raises(ValueError, match="Invalid design"):
+            HeterogeneousAdoptionDiD(design="random_garbage")
+
+    def test_alpha_zero_raises(self):
+        with pytest.raises(ValueError, match="alpha"):
+            HeterogeneousAdoptionDiD(alpha=0.0)
+
+    def test_alpha_one_raises(self):
+        with pytest.raises(ValueError, match="alpha"):
+            HeterogeneousAdoptionDiD(alpha=1.0)
+
+    def test_alpha_negative_raises(self):
+        with pytest.raises(ValueError, match="alpha"):
+            HeterogeneousAdoptionDiD(alpha=-0.05)
+
+    def test_invalid_vcov_type_raises(self):
+        with pytest.raises(ValueError, match="vcov_type"):
+            HeterogeneousAdoptionDiD(vcov_type="garbage")
+
+    def test_vcov_type_none_accepted(self):
+        est = HeterogeneousAdoptionDiD(vcov_type=None)
+        assert est.vcov_type is None
+
+    def test_d_lower_none_accepted(self):
+        est = HeterogeneousAdoptionDiD(d_lower=None)
+        assert est.d_lower is None
+
+    def test_d_lower_float_accepted(self):
+        est = HeterogeneousAdoptionDiD(d_lower=0.3)
+        assert est.d_lower == 0.3
+
+
+# =============================================================================
+# Explicit design override (don't auto-reject)
+# =============================================================================
+
+
+class TestExplicitDesignOverrides:
+    def test_force_continuous_at_zero_on_mass_point_data(self):
+        """Forcing Design 1' on mass-point data should run (may produce wide CIs)."""
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        # Phase 1c's _validate_had_inputs would reject this (mass point),
+        # so this will raise NotImplementedError from underneath, NOT from had.py.
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero")
+        with pytest.raises(NotImplementedError, match="mass-point"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_force_mass_point_on_continuous_data(self):
+        """Forcing mass-point on continuous data runs the 2SLS on the minimum cluster."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        # d.min() == 0 exactly; mass-point with d_lower=0 will treat the
+        # zero-dose unit (just 1 unit) as the mass cluster and the rest as
+        # "above". This exercises the degenerate mass=1 case.
+        est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.0)
+        r = est.fit(panel, "outcome", "dose", "period", "unit")
+        assert r.design == "mass_point"
+
+
+# =============================================================================
+# Cluster handling (unit-level aggregation)
+# =============================================================================
+
+
+class TestClusterHandling:
+    def test_cluster_not_constant_within_unit_raises(self):
+        d, dy = _dgp_mass_point(100, seed=0)
+        panel = _make_panel(d, dy)
+        # Make cluster vary within unit
+        panel["state"] = np.arange(len(panel))
+        est = HeterogeneousAdoptionDiD(design="mass_point", cluster="state")
+        with pytest.raises(ValueError, match=r"constant within unit"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_missing_cluster_column_raises(self):
+        d, dy = _dgp_mass_point(100, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", cluster="missing")
+        with pytest.raises(ValueError, match="cluster"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_nan_cluster_raises(self):
+        d, dy = _dgp_mass_point(100, seed=0)
+        # Unit-level cluster ids: 50 clusters, 2 units each, with NaN on unit 0.
+        cluster_unit = np.repeat(np.arange(50).astype(float), 2)  # length 100
+        cluster_unit[0] = np.nan
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        est = HeterogeneousAdoptionDiD(design="mass_point", cluster="state")
+        with pytest.raises(ValueError, match="NaN"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_cluster_warns_on_continuous_path(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        cluster_unit = np.repeat(np.arange(100), 2)  # length 200 unit-level
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+
+    def test_cluster_name_populated_mass_point(self):
+        d, dy = _dgp_mass_point(200, seed=0)
+        cluster_unit = np.repeat(np.arange(50), 4)  # 50 clusters, 4 units each
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert r.cluster_name == "state"
+
+    def test_cluster_name_none_without_cluster(self):
+        d, dy = _dgp_mass_point(200, seed=0)
+        r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+            _make_panel(d, dy), "outcome", "dose", "period", "unit"
+        )
+        assert r.cluster_name is None
+
+
+# =============================================================================
+# First-difference aggregation helper
+# =============================================================================
+
+
+class TestFirstDifferenceAggregation:
+    def test_aggregate_returns_sorted_unit_order(self):
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy)
+        # Shuffle rows to test sort-invariance
+        panel_shuffled = panel.sample(frac=1, random_state=42).reset_index(drop=True)
+        d_arr, dy_arr, _, unit_ids = _aggregate_first_difference(
+            panel_shuffled, "outcome", "dose", "period", "unit", 1, 2, None
+        )
+        # unit_ids sorted
+        assert np.all(np.diff(unit_ids) >= 0)
+        # Each dose matches the input dose for its unit
+        for i, uid in enumerate(unit_ids):
+            assert abs(d_arr[i] - d[uid]) < 1e-14
+            assert abs(dy_arr[i] - dy[uid]) < 1e-14
+
+    def test_aggregate_cluster_array_correct(self):
+        d, dy = _dgp_mass_point(100, seed=0)
+        cluster_unit = np.repeat(np.arange(25), 4)  # 25 clusters, 4 units each
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        _, _, cluster_arr, unit_ids = _aggregate_first_difference(
+            panel,
+            "outcome",
+            "dose",
+            "period",
+            "unit",
+            1,
+            2,
+            "state",
+        )
+        assert cluster_arr is not None
+        assert len(cluster_arr) == 100
+        # Cluster_arr[i] should equal cluster_unit[unit_ids[i]]
+        for i, uid in enumerate(unit_ids):
+            assert cluster_arr[i] == cluster_unit[uid]
+
+    def test_aggregate_no_cluster_returns_none(self):
+        d, dy = _dgp_continuous_at_zero(50, seed=0)
+        panel = _make_panel(d, dy)
+        _, _, cluster_arr, _ = _aggregate_first_difference(
+            panel, "outcome", "dose", "period", "unit", 1, 2, None
+        )
+        assert cluster_arr is None
+
+
+# =============================================================================
+# Auto-detect mass-point vs continuous-near at boundary
+# =============================================================================
+
+
+class TestAutoDetectEdges:
+    def test_exactly_two_percent_modal_is_not_mass_point(self):
+        """Threshold is strict >, not >=. 2% exactly should stay continuous."""
+        rng = np.random.default_rng(0)
+        G = 1000
+        mass_n = 20  # exactly 2%
+        d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)])
+        # d.min() == 0.5, not 0, and modal fraction == 2% (not > 2%)
+        assert _detect_design(d) == "continuous_near_d_lower"
+
+    def test_slightly_over_two_percent_is_mass_point(self):
+        rng = np.random.default_rng(0)
+        G = 1000
+        mass_n = 25  # 2.5%
+        d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)])
+        assert _detect_design(d) == "mass_point"
+
+    def test_all_at_zero_resolves_continuous_at_zero(self):
+        """Degenerate but well-defined: all zeros -> continuous_at_zero."""
+        d = np.zeros(100)
+        assert _detect_design(d) == "continuous_at_zero"
+
+
+# =============================================================================
+# Panel validator direct tests
+# =============================================================================
+
+
+class TestValidateHadPanel:
+    def test_returns_period_pair(self):
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy, periods=(2020, 2021))
+        t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
+        assert t_pre == 2020
+        assert t_post == 2021
+
+    def test_rejects_string_periods_gracefully(self):
+        """String periods should still sort and validate."""
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy, periods=("A", "B"))
+        # Should not raise - strings sort fine
+        t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
+        assert t_pre == "A"
+        assert t_post == "B"

From 77b34663e9b3318f984420c0b49d1975dc7f9703 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 17:19:26 -0400
Subject: [PATCH 02/17] Enforce d_lower == d.min() contract on HAD mass-point
 path

Paper Section 3.2.4 defines the Design 1 mass-point estimator with
instrument Z = 1{D_{g,2} > d_lower} at d_lower = float(d.min()) (the
lower-support mass point). Phase 2a previously accepted user-supplied
d_lower overrides on the mass-point path without validation, which would
silently redefine the instrument/control split and identify a different
(LATE-like) estimand outside Phase 2a's documented scope.

Changes:
- diff_diff/had.py: fit() now raises ValueError on the mass-point path
  when an explicit d_lower differs from float(d.min()) beyond float
  tolerance. d_lower=None (auto-resolve) and d_lower == d.min() (within
  tolerance) remain supported. Continuous paths are unaffected - they
  already reject off-support d_lower via Phase 1c's _validate_had_inputs
  (negative-dose check after the regressor shift).
- tests/test_had.py: new TestMassPointDLowerContract class adds 6
  rejection/acceptance tests for the contract. The prior
  test_force_mass_point_on_continuous_data test is renamed and clarified
  to document that d_lower=0.0 matches d.min()==0.0 in that DGP. The
  stale all-above-d_lower NaN-propagation test is removed (the input is
  now unreachable from the public API; the helper-level NaN guard is
  still tested via test_helper_returns_nan_on_empty_z_zero).

Smaller fixes:
- _validate_had_panel: first_treat_col comment no longer overstates the
  cross-validation (Phase 2a does value-domain only, not dose
  consistency).
- _fit_continuous dispatch: "Warn once" comment revised to "Warn"
  (per-fit warnings are not suppressed across calls in Phase 2a).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 41 +++++++++++++++++----
 tests/test_had.py | 92 ++++++++++++++++++++++++++++++++++-------------
 2 files changed, 101 insertions(+), 32 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 2398ad29..778036d9 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -423,11 +423,13 @@ def _validate_had_panel(
             f"t_pre={t_pre}. Drop these units or verify the dose column."
         )
 
-    # Optional cross-validation via first_treat_col: if supplied, the
-    # first_treat == 0 units must have D_{g, t_post} == 0, and first_treat
-    # == t_post units may have any D_{g, t_post} >= 0. We do NOT require
-    # strict consistency (the paper treats D_{g,2} as the primary signal),
-    # but a quick sanity check catches obvious column mix-ups.
+    # Optional value-domain validation via first_treat_col: if supplied,
+    # every unit's first_treat value must be in {0, t_post} (0 = never
+    # treated, t_post = treated in the second period). This is a value-
+    # domain check that catches typos and staggered-timing mix-ups; it
+    # does NOT cross-validate first_treat against post-period dose
+    # (D_{g, t_post} remains the primary signal). Extended cross-checks
+    # are queued for a follow-up PR.
     if first_treat_col is not None:
         ft = data.groupby(unit_col)[first_treat_col].first().to_numpy()
         if np.any(np.isnan(ft.astype(np.float64, copy=False))):
@@ -987,6 +989,30 @@ def fit(
         else:
             d_lower_val = float(d_lower_arg)
 
+        # ---- Mass-point contract: d_lower must equal the support infimum ----
+        # Paper Section 3.2.4 defines the mass-point estimator with instrument
+        # Z = 1{D_{g,2} > d_lower} where d_lower is the lower-support mass
+        # point. A user-supplied d_lower that differs from float(d_arr.min())
+        # redefines the instrument / control split and identifies a different
+        # estimand. Phase 2a supports only the paper's lower-support mass-point
+        # estimator, so we reject mismatched overrides front-door rather than
+        # silently running an unsupported variant.
+        if resolved_design == "mass_point" and d_lower_arg is not None:
+            d_min = float(d_arr.min())
+            tol = 1e-12 * max(1.0, abs(d_min))
+            if abs(d_lower_val - d_min) > tol:
+                raise ValueError(
+                    f"design='mass_point' requires d_lower to equal the "
+                    f"support infimum float(d.min())={d_min!r}; got "
+                    f"d_lower={d_lower_val!r}. The paper's Design 1 mass-"
+                    f"point estimator (Section 3.2.4) identifies at the "
+                    f"lower-support mass point, not at an arbitrary "
+                    f"threshold. Pass d_lower=None to auto-resolve, or "
+                    f"d_lower=float(d.min()) explicitly. Non-support-"
+                    f"infimum thresholds identify a different (LATE-like) "
+                    f"estimand that is out of Phase 2a scope."
+                )
+
         # ---- Compute cohort counts ----
         if resolved_design == "mass_point":
             eps = 1e-12 * max(1.0, abs(d_lower_val))
@@ -1007,8 +1033,9 @@ def fit(
 
         # ---- Dispatch ----
         if resolved_design in ("continuous_at_zero", "continuous_near_d_lower"):
-            # Warn once if the user set a mass-point-only knob that's
-            # ignored on the continuous path.
+            # Warn when the user set a mass-point-only knob that's ignored
+            # on the continuous path. (Emitted per fit call; this is not
+            # suppressed after the first call.)
             if vcov_type_arg is not None:
                 warnings.warn(
                     f"vcov_type={vcov_type_arg!r} is ignored on the "
diff --git a/tests/test_had.py b/tests/test_had.py
index dedcc1ac..e1f6a7db 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -635,26 +635,6 @@ def test_constant_y_produces_nan_inference(self):
             }
         )
 
-    def test_mass_point_all_above_d_lower_nan(self):
-        """Degenerate mass-point: no units at d_lower -> NaN."""
-        rng = np.random.default_rng(0)
-        G = 500
-        d = rng.uniform(0.6, 1.0, G)  # all above 0.5
-        dy = 0.3 * d + 0.1 * rng.standard_normal(G)
-        panel = _make_panel(d, dy)
-        r = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5).fit(
-            panel, "outcome", "dose", "period", "unit"
-        )
-        assert np.isnan(r.att)
-        assert_nan_inference(
-            {
-                "se": r.se,
-                "t_stat": r.t_stat,
-                "p_value": r.p_value,
-                "conf_int": r.conf_int,
-            }
-        )
-
     def test_mass_point_all_at_d_lower_nan(self):
         """Degenerate mass-point: all units at d_lower -> NaN."""
         rng = np.random.default_rng(0)
@@ -1043,16 +1023,78 @@ def test_force_continuous_at_zero_on_mass_point_data(self):
         with pytest.raises(NotImplementedError, match="mass-point"):
             est.fit(panel, "outcome", "dose", "period", "unit")
 
-    def test_force_mass_point_on_continuous_data(self):
-        """Forcing mass-point on continuous data runs the 2SLS on the minimum cluster."""
+    def test_force_mass_point_on_continuous_data_at_support_infimum(self):
+        """Forcing mass-point on continuous data with d_lower=d.min() runs.
+
+        d.min()==0 exactly in this DGP, so d_lower=0.0 is the paper-consistent
+        support-infimum threshold. The resulting mass=1 case exercises the
+        degenerate-mass boundary (only unit 0 is "at d_lower"; rest are above).
+        """
         d, dy = _dgp_continuous_at_zero(500, seed=0)
         panel = _make_panel(d, dy)
-        # d.min() == 0 exactly; mass-point with d_lower=0 will treat the
-        # zero-dose unit (just 1 unit) as the mass cluster and the rest as
-        # "above". This exercises the degenerate mass=1 case.
+        # d.min() == 0 exactly (d[0]=0 by construction); d_lower must match.
         est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.0)
         r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert r.design == "mass_point"
+        assert r.d_lower == 0.0
+
+
+# =============================================================================
+# Mass-point d_lower contract enforcement
+# =============================================================================
+
+
+class TestMassPointDLowerContract:
+    """Paper Section 3.2.4: mass-point d_lower must equal float(d.min())."""
+
+    def test_d_lower_above_min_raises(self):
+        d, dy = _dgp_mass_point(500, seed=0)  # d.min() == 0.5
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.6)
+        with pytest.raises(ValueError, match="support infimum"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_d_lower_below_min_raises(self):
+        d, dy = _dgp_mass_point(500, seed=0)  # d.min() == 0.5
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.3)
+        with pytest.raises(ValueError, match="support infimum"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_d_lower_matches_support_infimum_succeeds(self):
+        d, dy = _dgp_mass_point(500, seed=0)  # d.min() == 0.5
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5)
+        r = est.fit(panel, "outcome", "dose", "period", "unit")
+        assert r.d_lower == 0.5
+        assert np.isfinite(r.att)
+
+    def test_d_lower_none_auto_resolves_to_min(self):
+        """With d_lower=None, the contract is satisfied automatically."""
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=None)
+        r = est.fit(panel, "outcome", "dose", "period", "unit")
+        assert abs(r.d_lower - float(d.min())) < 1e-14
+
+    def test_d_lower_within_tolerance_succeeds(self):
+        """Float-tolerance overrides of d.min() are accepted."""
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        # Pass d_lower that differs from d.min() by float-rounding noise.
+        d_lower_user = float(d.min()) + 1e-15
+        est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=d_lower_user)
+        r = est.fit(panel, "outcome", "dose", "period", "unit")
+        assert np.isfinite(r.att)
+
+    def test_d_lower_contract_is_mass_point_only(self):
+        """continuous_near_d_lower accepts arbitrary d_lower (within other guards)."""
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        # Setting d_lower=d.min() on continuous should just work.
+        est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min()))
+        r = est.fit(panel, "outcome", "dose", "period", "unit")
+        assert np.isfinite(r.att)
 
 
 # =============================================================================

From 687a61573ffc91bdf4d3bbdcab237eb50c452409 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 17:20:41 -0400
Subject: [PATCH 03/17] Clarify continuous-path mass-point-guard test docstring

Rename test_d_lower_contract_is_mass_point_only ->
test_mass_point_equality_guard_does_not_fire_on_continuous and rewrite
the docstring so it accurately describes what's exercised. Previously
claimed "arbitrary d_lower" but only tests d_lower=d.min(); the renamed
test now narrates that the mass-point-specific ValueError does not fire
on the continuous path (the continuous path has its own upstream
negative-dose guard after the regressor shift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 tests/test_had.py | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tests/test_had.py b/tests/test_had.py
index e1f6a7db..9afe4457 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1087,11 +1087,16 @@ def test_d_lower_within_tolerance_succeeds(self):
         r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert np.isfinite(r.att)
 
-    def test_d_lower_contract_is_mass_point_only(self):
-        """continuous_near_d_lower accepts arbitrary d_lower (within other guards)."""
+    def test_mass_point_equality_guard_does_not_fire_on_continuous(self):
+        """The mass-point-specific d_lower equality guard is not enforced on
+        continuous_near_d_lower. (The continuous path has its own upstream
+        Phase 1c guards against off-support d_lower via the negative-dose
+        check after the regressor shift, but the mass-point-specific
+        ValueError does not fire here.)
+        """
         d, dy = _dgp_continuous_near_d_lower(500, seed=0)
         panel = _make_panel(d, dy)
-        # Setting d_lower=d.min() on continuous should just work.
+        # d_lower == d.min() is always a valid continuous configuration.
         est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min()))
         r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert np.isfinite(r.att)

From 8c38f072c0e6d2a8fb31715aa51cff9db5fa1cec Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 18:44:36 -0400
Subject: [PATCH 04/17] Fix P0 HAD continuous estimator formula + P1 validator
 + docs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Address CI AI review on PR #346.

**P0 (Methodology — estimator formula was wrong):** The continuous paths
computed `att = tau_bc / D_bar` instead of the paper's
`att = (mean(ΔY) - tau_bc) / den`. On a known DGP with β = 0.3 the old
code returned ~0 (since tau_bc ≈ lim_{d↓d̲} E[ΔY|D≤d] ≈ 0); the fix
recovers the true β at n=4000.

- Design 1' (continuous_at_zero), paper Equation 7 / Theorem 3:
    β = (E[ΔY] - lim_{d↓0} E[ΔY | D_2 ≤ d]) / E[D_2]
  Implementation: `att = (dy_arr.mean() - tau_bc) / d_arr.mean()`,
  `se = se_robust / |d_arr.mean()|`.
- Design 1 (continuous_near_d_lower), paper Theorem 4 /
  `WAS_{d_lower}` under Assumption 6:
    β = (E[ΔY] - lim_{d↓d_lower} E[ΔY | D_2 ≤ d]) / E[D_2 - d_lower]
  Implementation: regressor shift `D - d_lower`, evaluate local-linear
  fit at boundary=0 on the shifted scale; `att = (dy_arr.mean() -
  tau_bc) / (d_arr - d_lower).mean()`.

CI endpoints reverse under the subtraction `(ΔȲ - tau_bc)`; safe_inference
computes `att ± z · se` from scratch so reversal is automatic.

**P1 (Validator — incomplete contract on continuous paths):**
- `_validate_had_panel` now rejects negative post-period doses
  (`D_{g,2} < 0`) front-door on the ORIGINAL unshifted scale so errors
  reference the user's dose column, not shifted values.
- `continuous_near_d_lower` now enforces `d_lower == float(d.min())`
  within float tolerance, mirroring the mass-point guard. Off-support
  `d_lower` would otherwise pass through Phase 1c's 5% plausibility
  heuristic and silently identify a different estimand.

**P1 (Identification — Assumption 5/6 not surfaced):** Design 1 paths
(continuous_near_d_lower + mass_point) now emit a UserWarning stating
that `WAS_{d_lower}` identification requires Assumption 6 (or
Assumption 5 for sign identification only) beyond parallel trends, and
that neither is testable via pre-trends. continuous_at_zero (Design 1',
Assumption 3 only) does not emit the warning.

**P2 (vcov_type docstring mismatch):** Corrected the constructor
docstring to reflect actual behavior: `vcov_type=None` falls back to
the `robust` flag (`robust=True -> hc1`, `robust=False -> classical`,
the default); explicit `vcov_type` takes precedence over `robust`.

**Tests rewritten + new regressions:**
- `TestBetaScaleRescaling` now pins the corrected formula at atol=1e-14
  (both designs), pins the CI endpoint reversal, and adds two
  "recover true β" asymptotic sanity tests.
- New `TestDesign1DLowerContract` covers mass-point AND
  continuous_near_d_lower d_lower equality enforcement.
- New `TestPostPeriodDoseContract` covers negative-dose rejection.
- New `TestAssumptionFiveSixWarning` verifies the Design 1 warning is
  emitted on continuous_near_d_lower + mass_point and NOT on
  continuous_at_zero.

**Docs:** REGISTRY.md HAD Phase 2a section updated with the corrected
estimator formula, the d_lower contract on both Design 1 paths, the
Assumption 5/6 warning note, and a CI-endpoint-reversal note. had.py
module and result-class docstrings updated in kind.

Targeted regression: 118 HAD tests + 497 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py             | 236 ++++++++++++++++++++++++---------
 docs/methodology/REGISTRY.md |   6 +-
 tests/test_had.py            | 247 +++++++++++++++++++++++++++--------
 3 files changed, 371 insertions(+), 118 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 778036d9..01c98400 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -2,25 +2,39 @@
 Heterogeneous Adoption Difference-in-Differences (HAD) estimator (Phase 2a).
 
 Implements the de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026)
-Weighted-Average-Slope (WAS) estimator with three design-dispatch paths:
+Weighted-Average-Slope (WAS) estimator with three design-dispatch paths.
+All three paths produce a beta-scale point estimate of the form
+``(mean(Delta Y) - [boundary limit]) / [expected dose gap]`` (Design 1
+family) or the Wald-IV ratio (mass-point), then route inference through
+:func:`diff_diff.utils.safe_inference`.
 
 1. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density
-   continuous at zero. Evaluates the bias-corrected local-linear fit at
-   ``boundary = 0`` and rescales to the beta-scale via Equation 8's divisor
-   ``D_bar = (1/G) * sum(D_{g,2})``. Invokes Assumption 3.
+   continuous at zero, Assumption 3. Equation 7 / Theorem 3:
+
+       beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
+
+   The bias-corrected local-linear fit at ``boundary = 0`` estimates
+   the boundary limit; ``E[D_2]`` and ``E[Delta Y]`` are plugin sample
+   means.
 
 2. Design 1 continuous-near-d_lower (``continuous_near_d_lower``):
-   ``d_lower > 0``, continuous boundary density. Evaluates the bias-corrected
-   local-linear fit at ``boundary = float(d.min())`` after the regressor
-   shift ``D' = D - d_lower``. Same divisor, same CI path.
-   Invokes Assumption 5 or 6.
-
-3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction at
-   ``d.min()`` exceeds 2%. Uses a sample-average 2SLS estimator with
-   instrument ``Z_g = 1{D_{g,2} > d_lower}``. Point estimate reduces to the
-   Wald-IV ratio
-   ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard error
-   via the structural-residual 2SLS sandwich (see ``_fit_mass_point_2sls``).
+   ``d_lower > 0``, continuous boundary density, Assumption 5 or 6.
+   Proposition 3 / Theorem 4 (``WAS_{d_lower}`` under Assumption 6):
+
+       beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
+              / E[D_2 - d_lower]
+
+   The local-linear fit is anchored at ``d_lower`` via the regressor
+   shift ``D' = D - d_lower``, evaluated at ``boundary = 0`` on the
+   shifted scale.
+
+3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction
+   at ``d.min()`` exceeds 2%. Sample-average 2SLS with instrument
+   ``Z_g = 1{D_{g,2} > d_lower}`` (paper Section 3.2.4). Point estimate
+   equals the Wald-IV ratio
+   ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard
+   error via the structural-residual 2SLS sandwich
+   (see ``_fit_mass_point_2sls``).
 
 Phase 2a ships the single-period path only; the multi-period event-study
 extension (paper Appendix B.2) is queued for Phase 2b.
@@ -119,15 +133,28 @@ class HeterogeneousAdoptionDiDResults:
     Attributes
     ----------
     att : float
-        Point estimate of the WAS parameter on the beta-scale. For the
-        continuous designs: ``tau_bc / D_bar`` where ``tau_bc`` is the
-        bias-corrected local-linear statistic on the mu-scale and
-        ``D_bar = (1/G) * sum(D_{g,2})``. For the mass-point design:
-        the 2SLS coefficient directly.
+        Point estimate of the WAS parameter on the beta-scale.
+
+        - Design 1' (paper Equation 7 / Theorem 3):
+          ``att = (mean(ΔY) - tau_bc) / D_bar``
+          where ``tau_bc`` is the bias-corrected local-linear estimate
+          of ``lim_{d v 0} E[ΔY | D_2 <= d]`` and
+          ``D_bar = (1/G) * sum(D_{g,2})``.
+        - Design 1 continuous-near-d_lower (paper Theorem 4,
+          ``WAS_{d_lower}`` under Assumption 6):
+          ``att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower)``
+          where ``tau_bc`` is the bias-corrected local-linear estimate
+          of ``lim_{d v d_lower} E[ΔY | D_2 <= d]``.
+        - Mass-point (paper Section 3.2.4): the Wald-IV / 2SLS
+          coefficient directly -
+          ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``.
     se : float
         Standard error on the beta-scale. For continuous designs, the
-        CCT-2014 robust SE divided by ``D_bar``. For mass-point, the 2SLS
-        structural-residual sandwich SE.
+        CCT-2014 robust SE from Phase 1c divided by ``|den|`` (the
+        absolute denominator used in ``att``); the higher-order
+        variance from ``mean(ΔY)`` is dominated by the nonparametric
+        boundary estimate in large samples and is not included. For
+        mass-point, the 2SLS structural-residual sandwich SE.
     t_stat, p_value, conf_int : inference fields
         Routed through ``safe_inference``; NaN when SE is non-finite.
     alpha : float
@@ -423,6 +450,26 @@ def _validate_had_panel(
             f"t_pre={t_pre}. Drop these units or verify the dose column."
         )
 
+    # Post-period nonnegative-dose check on the ORIGINAL (unshifted) dose
+    # scale. Front-door rejection per paper Assumption (dose definition
+    # Section 2) which treats D_{g,2} as nonnegative. Without this
+    # check, negative original doses would only surface after the
+    # regressor shift in ``_fit_continuous`` via Phase 1c's
+    # ``_validate_had_inputs``, which references the shifted values
+    # and would confuse users about which column is malformed.
+    post_mask = data[time_col] == t_post
+    post_doses = np.asarray(data.loc[post_mask, dose_col], dtype=np.float64)
+    neg_post = post_doses < 0
+    if neg_post.any():
+        n_neg = int(neg_post.sum())
+        min_neg = float(post_doses[neg_post].min())
+        raise ValueError(
+            f"HAD requires D_{{g,2}} >= 0 for all units (paper Section "
+            f"2 dose definition). {n_neg} unit(s) have negative post-"
+            f"period dose at t_post={t_post} (min={min_neg!r}). Drop "
+            f"these units or verify the dose column."
+        )
+
     # Optional value-domain validation via first_treat_col: if supplied,
     # every unit's first_treat value must be in {0, t_post} (0 = never
     # treated, t_post = treated in the second period). This is a value-
@@ -766,15 +813,21 @@ class HeterogeneousAdoptionDiD:
     alpha : float
         CI level (0.05 for 95% CI).
     vcov_type : {"classical", "hc1"} or None
-        Mass-point-path only. ``None`` defaults to ``"hc1"``.
-        ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError``. Ignored
-        on the continuous paths (which use the CCT-2014 robust SE from
+        Mass-point-path only. When ``None``, the effective family falls
+        back to the ``robust`` flag: ``robust=True`` -> ``"hc1"``,
+        ``robust=False`` -> ``"classical"`` (the default construction).
+        Explicit ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError``
+        pending a 2SLS-specific leverage derivation. Ignored on the
+        continuous paths (which use the CCT-2014 robust SE from
         Phase 1c); passing a non-default ``vcov_type`` on a continuous
-        path emits a one-time ``UserWarning``.
+        path emits a ``UserWarning`` per fit call.
     robust : bool
-        Backward-compat alias: ``True`` maps to ``"hc1"``, ``False`` maps
-        to ``"classical"``. Only relevant when ``vcov_type is None`` on
-        the mass-point path.
+        Backward-compat alias used only when ``vcov_type is None``:
+        ``True`` -> ``"hc1"``, ``False`` -> ``"classical"``. Explicit
+        ``vcov_type`` takes precedence (e.g.,
+        ``vcov_type="classical", robust=True`` runs classical). Only
+        the mass-point path consumes these; continuous paths ignore
+        both with a warning.
     cluster : str or None
         Column name for cluster-robust SE on the mass-point path (CR1).
         Ignored with a ``UserWarning`` on the continuous paths in Phase
@@ -989,28 +1042,35 @@ def fit(
         else:
             d_lower_val = float(d_lower_arg)
 
-        # ---- Mass-point contract: d_lower must equal the support infimum ----
-        # Paper Section 3.2.4 defines the mass-point estimator with instrument
-        # Z = 1{D_{g,2} > d_lower} where d_lower is the lower-support mass
-        # point. A user-supplied d_lower that differs from float(d_arr.min())
-        # redefines the instrument / control split and identifies a different
-        # estimand. Phase 2a supports only the paper's lower-support mass-point
-        # estimator, so we reject mismatched overrides front-door rather than
-        # silently running an unsupported variant.
-        if resolved_design == "mass_point" and d_lower_arg is not None:
+        # ---- d_lower contract for Design 1 paths ----
+        # Paper Sections 3.2.2-3.2.4 define the Design 1 estimators at
+        # d_lower = support infimum of D_{g,2}. For the mass-point path,
+        # the instrument Z = 1{D_{g,2} > d_lower} requires d_lower to be
+        # the lower-support mass point. For the continuous-near-d_lower
+        # path, evaluating the local-linear fit after the regressor
+        # shift (D - d_lower) at boundary=0 only makes sense when the
+        # shift anchors to the realized sample minimum; otherwise the
+        # boundary evaluation is off-support (no observations near zero
+        # on the shifted scale) and Phase 1c's 5% plausibility heuristic
+        # may fail to catch the mismatch. We enforce d_lower == d.min()
+        # within float tolerance on both Design 1 paths; mismatched
+        # overrides raise with a clear pointer to the unsupported
+        # estimand.
+        if resolved_design in ("mass_point", "continuous_near_d_lower") and d_lower_arg is not None:
             d_min = float(d_arr.min())
             tol = 1e-12 * max(1.0, abs(d_min))
             if abs(d_lower_val - d_min) > tol:
                 raise ValueError(
-                    f"design='mass_point' requires d_lower to equal the "
-                    f"support infimum float(d.min())={d_min!r}; got "
-                    f"d_lower={d_lower_val!r}. The paper's Design 1 mass-"
-                    f"point estimator (Section 3.2.4) identifies at the "
-                    f"lower-support mass point, not at an arbitrary "
+                    f"design={resolved_design!r} requires d_lower to equal "
+                    f"the support infimum float(d.min())={d_min!r}; got "
+                    f"d_lower={d_lower_val!r}. The paper's Design 1 "
+                    f"estimators (Sections 3.2.2-3.2.4) identify at the "
+                    f"lower-support boundary, not at an arbitrary "
                     f"threshold. Pass d_lower=None to auto-resolve, or "
                     f"d_lower=float(d.min()) explicitly. Non-support-"
-                    f"infimum thresholds identify a different (LATE-like) "
-                    f"estimand that is out of Phase 2a scope."
+                    f"infimum thresholds identify a different (LATE-like "
+                    f"for mass_point, off-support for continuous_near_"
+                    f"d_lower) estimand that is out of Phase 2a scope."
                 )
 
         # ---- Compute cohort counts ----
@@ -1031,6 +1091,26 @@ def fit(
 
         dose_mean = float(d_arr.mean())
 
+        # ---- Assumption 5/6 warning on Design 1 paths ----
+        # Paper Sections 3.2.2-3.2.4: when d_lower > 0 (Design 1 family),
+        # point identification of WAS_{d_lower} requires Assumption 6 in
+        # addition to parallel trends (Assumption 1-3); Assumption 5 gives
+        # only sign identification. These extra assumptions are NOT
+        # testable via pre-trends. Surface this to the user front-door so
+        # results are not silently interpreted as full point identification.
+        if resolved_design in ("continuous_near_d_lower", "mass_point"):
+            warnings.warn(
+                f"design={resolved_design!r} (Design 1, d_lower > 0) requires "
+                f"Assumption 6 from de Chaisemartin et al. (2026) for point "
+                f"identification of WAS_{{d_lower}}, or Assumption 5 for "
+                f"sign identification only. Neither is testable via "
+                f"pre-trends. Confirm the extra assumption is defensible "
+                f"for your setting before interpreting the returned "
+                f"point estimate as the WAS.",
+                UserWarning,
+                stacklevel=2,
+            )
+
         # ---- Dispatch ----
         if resolved_design in ("continuous_at_zero", "continuous_near_d_lower"):
             # Warn when the user set a mass-point-only knob that's ignored
@@ -1125,28 +1205,53 @@ def _fit_continuous(
         resolved_design: str,
         d_lower_val: float,
     ) -> Tuple[float, float, Optional[BiasCorrectedFit], Optional[BandwidthResult]]:
-        """Fit Phase 1c ``bias_corrected_local_linear`` and rescale to beta-scale.
+        """Fit Phase 1c ``bias_corrected_local_linear`` and form the WAS estimate.
+
+        Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau
+        (2026) continuous-design estimators:
+
+        - Design 1' (``continuous_at_zero``), paper Equation 7 /
+          Theorem 3:
+
+              beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
+
+          Regressor passed to the local-linear boundary fit is
+          ``d_arr``; the boundary is ``0``.
 
-        Both continuous designs share the same rescaling path - they
-        differ only in the regressor shift and boundary:
+        - Design 1 (``continuous_near_d_lower``), paper Proposition 3 /
+          Theorem 4 (``WAS_{d_lower}`` under Assumption 6):
 
-        - ``continuous_at_zero``: regressor ``d_arr``, boundary ``0``.
-        - ``continuous_near_d_lower``: regressor ``d_arr - d_lower``,
-          boundary ``0``.
+              beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
+                     / E[D_2 - d_lower]
 
-        The mu-scale output ``tau_bc`` from
-        :func:`bias_corrected_local_linear` is divided by the
-        beta-scale divisor ``D_bar = (1/G) * sum(D_{g,2})`` (Equation 8
-        in the paper). The same divisor applies to the classical CI
-        endpoints. The result of ``D_bar`` uses the ORIGINAL
-        (unshifted) dose mean, matching paper convention.
+          Regressor passed to the local-linear fit is
+          ``d_arr - d_lower`` so the boundary evaluation point is ``0``
+          on the shifted scale; the numerator and denominator are
+          computed on the original scale.
+
+        The bias-corrected boundary estimate ``tau_bc`` from
+        :func:`bias_corrected_local_linear` corresponds to the limit
+        ``lim_{d v d_lower} E[Delta Y | D_2 <= d]``. The beta-scale
+        estimator and standard error are then:
+
+            att = (mean(dy) - tau_bc) / den
+            se  = se_robust / |den|
+
+        where ``den`` is the expectation in the denominator (``D_bar``
+        for Design 1', ``mean(D_2 - d_lower)`` for Design 1). The
+        confidence interval is ``att +/- z * se`` computed in
+        :func:`diff_diff.utils.safe_inference`; the endpoints reverse
+        relative to the boundary-limit CI because the numerator
+        transformation is ``ΔȲ - tau_bc``.
         """
         if resolved_design == "continuous_at_zero":
             d_reg = d_arr
             boundary = 0.0
+            den = float(d_arr.mean())
         elif resolved_design == "continuous_near_d_lower":
             d_reg = d_arr - d_lower_val
             boundary = 0.0
+            den = float((d_arr - d_lower_val).mean())
         else:
             raise ValueError(
                 f"_fit_continuous called with non-continuous " f"design={resolved_design!r}"
@@ -1174,15 +1279,18 @@ def _fit_continuous(
         except (ZeroDivisionError, FloatingPointError, np.linalg.LinAlgError):
             return float("nan"), float("nan"), None, None
 
-        dose_mean = float(d_arr.mean())
-        # Phase 1b / Phase 1c pre-checks enforce dose_mean > 0 for
-        # Design 1-family samples, but defense-in-depth: if dose_mean
-        # rounds to zero, fall through to NaN via safe_inference in fit().
-        if abs(dose_mean) < 1e-12:
+        # Guard against degenerate denominators: if all units are at
+        # d_lower (continuous_near_d_lower) or if D_bar rounds to zero
+        # (continuous_at_zero with a vanishing sample mean), the beta-
+        # scale estimator is undefined. Fall through to NaN via
+        # safe_inference in fit().
+        if abs(den) < 1e-12:
             att = float("nan")
             se = float("nan")
         else:
-            att = float(bc_fit.estimate_bias_corrected) / dose_mean
-            se = float(bc_fit.se_robust) / dose_mean
+            dy_mean = float(dy_arr.mean())
+            tau_bc = float(bc_fit.estimate_bias_corrected)
+            att = (dy_mean - tau_bc) / den
+            se = float(bc_fit.se_robust) / abs(den)
 
         return att, se, bc_fit, bc_fit.bandwidth_diagnostics
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index d9e1aba3..9944477c 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2316,11 +2316,13 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
 - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4.
 - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246).
 - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug.
-- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and apply the β-scale rescaling `(1/G) Σ D_{g,2}` from Equation 8 of de Chaisemartin et al. (2026); mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d_lower}`.
+- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Equation 7 / Theorem 3) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 4, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
 - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples.
-- [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units; rejects `>2` time periods (staggered reduction queued for Phase 2b); rejects unbalanced panels and NaN in outcome/dose/unit columns.
+- [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand.
 - [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs.
     - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR.
+    - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning.
+    - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate.
     - **Note (Phase 2a scope):** Ships single-period only. `aggregate="event_study"` (Appendix B.2 multi-period extension) raises `NotImplementedError` pointing to Phase 2b. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR.
 - [ ] Phase 2b: Multi-period event-study extension (Appendix B.2). Will lift the `aggregate="event_study"` scaffolding in `HeterogeneousAdoptionDiD.fit()` and aggregate per-cohort first-differences into an event-study result.
 - [ ] Phase 3: `qug_test()` (`T = D_{2,(1)} / (D_{2,(2)} - D_{2,(1)})`, rejection `{T > 1/α - 1}`).
diff --git a/tests/test_had.py b/tests/test_had.py
index 9afe4457..b8e0152f 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -4,8 +4,13 @@
 
 1. All three design paths produce a finite result on synthetic DGPs.
 2. ``design="auto"`` resolves correctly on each DGP + two edge cases.
-3. Beta-scale rescaling: ``att == tau_bc / D_bar`` and CI endpoints ==
-   ``(ci_low/D_bar, ci_high/D_bar)`` at ``atol=1e-14``.
+3. Beta-scale WAS estimator at atol=1e-14:
+   - Design 1' / continuous_at_zero:
+     ``att = (mean(ΔY) - tau_bc) / mean(D)``
+   - Design 1 / continuous_near_d_lower:
+     ``att = (mean(ΔY) - tau_bc) / mean(D - d_lower)``
+   - CI endpoints reverse under subtraction:
+     ``CI_lower(att) = (mean(ΔY) - CI_upper_boundary) / den``
 4. Mass-point Wald-IV point estimate matches manual formula at
    ``atol=1e-14``.
 5. Mass-point 2SLS SE parity against hand-coded sandwich at
@@ -231,8 +236,19 @@ def test_auto_does_not_mutate_self_design(self):
 
 
 class TestBetaScaleRescaling:
-    def test_att_equals_tau_bc_over_dbar(self):
-        """result.att == bc_fit.tau_bc / D_bar at atol=1e-14."""
+    """Plan commit criterion #3 + review P0: the continuous estimator is
+
+        att = (mean(ΔY) - tau_bc) / den
+
+    with ``den = mean(D)`` for Design 1' and ``den = mean(D - d_lower)``
+    for Design 1 continuous-near-d_lower. SE is ``se_robust / |den|``.
+    CI endpoints are computed via ``att +/- z * se`` (endpoints reverse
+    relative to the boundary-limit CI because the numerator is
+    ``ΔȲ - tau_bc``).
+    """
+
+    def test_att_design_1_prime(self):
+        """att = (mean(ΔY) - tau_bc) / D_bar for Design 1' at atol=1e-14."""
         d, dy = _dgp_continuous_at_zero(500, seed=0)
         panel = _make_panel(d, dy)
         r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
@@ -240,32 +256,27 @@ def test_att_equals_tau_bc_over_dbar(self):
         )
         bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
         d_bar = float(d.mean())
-        expected = float(bc.estimate_bias_corrected) / d_bar
+        dy_mean = float(dy.mean())
+        expected = (dy_mean - float(bc.estimate_bias_corrected)) / d_bar
         assert abs(r.att - expected) < 1e-14
 
-    def test_se_equals_se_rb_over_dbar(self):
+    def test_se_design_1_prime(self):
+        """se = se_robust / |D_bar| for Design 1' at atol=1e-14."""
         d, dy = _dgp_continuous_at_zero(500, seed=0)
         panel = _make_panel(d, dy)
         r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
             panel, "outcome", "dose", "period", "unit"
         )
         bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
-        d_bar = float(d.mean())
-        expected = float(bc.se_robust) / d_bar
+        expected = float(bc.se_robust) / abs(float(d.mean()))
         assert abs(r.se - expected) < 1e-14
 
-    def test_ci_lower_equals_ci_low_over_dbar(self):
-        d, dy = _dgp_continuous_at_zero(500, seed=0)
-        panel = _make_panel(d, dy)
-        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
-            panel, "outcome", "dose", "period", "unit"
-        )
-        bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
-        d_bar = float(d.mean())
-        expected = float(bc.ci_low) / d_bar
-        assert abs(r.conf_int[0] - expected) < 1e-14
+    def test_ci_endpoints_reverse_under_subtraction(self):
+        """Because att = (ΔȲ - tau_bc)/D_bar, CI endpoints reverse:
 
-    def test_ci_upper_equals_ci_high_over_dbar(self):
+        CI_lower(att) = (ΔȲ - CI_upper_boundary) / D_bar
+        CI_upper(att) = (ΔȲ - CI_lower_boundary) / D_bar
+        """
         d, dy = _dgp_continuous_at_zero(500, seed=0)
         panel = _make_panel(d, dy)
         r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
@@ -273,23 +284,61 @@ def test_ci_upper_equals_ci_high_over_dbar(self):
         )
         bc = bias_corrected_local_linear(d=d, y=dy, boundary=0.0, alpha=0.05)
         d_bar = float(d.mean())
-        expected = float(bc.ci_high) / d_bar
-        assert abs(r.conf_int[1] - expected) < 1e-14
-
-    def test_rescaling_shifted_regressor_continuous_near_d_lower(self):
-        """continuous_near_d_lower: regressor shift `D - d_lower`, divisor = mean(D)."""
+        dy_mean = float(dy.mean())
+        # CI bounds on the att scale, computed by endpoint reversal from
+        # the boundary-limit CI.
+        expected_lower = (dy_mean - float(bc.ci_high)) / d_bar
+        expected_upper = (dy_mean - float(bc.ci_low)) / d_bar
+        assert abs(r.conf_int[0] - expected_lower) < 1e-14
+        assert abs(r.conf_int[1] - expected_upper) < 1e-14
+
+    def test_att_design_1_continuous_near_d_lower(self):
+        """att = (mean(ΔY) - tau_bc) / mean(D - d_lower) for Design 1 at atol=1e-14."""
         d, dy = _dgp_continuous_near_d_lower(500, seed=0)
         panel = _make_panel(d, dy)
         d_lower_val = float(d.min())
-        r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
-            panel, "outcome", "dose", "period", "unit"
-        )
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
         d_reg = d - d_lower_val
         bc = bias_corrected_local_linear(d=d_reg, y=dy, boundary=0.0, alpha=0.05)
-        d_bar = float(d.mean())
-        expected = float(bc.estimate_bias_corrected) / d_bar
+        den = float((d - d_lower_val).mean())
+        dy_mean = float(dy.mean())
+        expected = (dy_mean - float(bc.estimate_bias_corrected)) / den
         assert abs(r.att - expected) < 1e-14
 
+    def test_att_recovers_true_beta_design_1_prime(self):
+        """Sanity: on a known DGP with beta=0.3, att should be close to 0.3."""
+        rng = np.random.default_rng(0)
+        G = 2000
+        d = rng.uniform(0, 1, G)
+        d[0] = 0.0
+        dy = 0.3 * d + 0.05 * rng.standard_normal(G)
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        # Asymptotic: expect att close to 0.3 at G=2000, n=4000 observations.
+        assert abs(r.att - 0.3) < 0.1
+
+    def test_att_recovers_true_beta_continuous_near_d_lower(self):
+        """Sanity: Design 1 DGP with beta_d_lower=0.3 recovers beta at scale."""
+        rng = np.random.default_rng(0)
+        G = 2000
+        u = rng.beta(2, 2, G)
+        d = 0.1 + 0.9 * u  # d_lower ~ 0.1
+        # True WAS_{d_lower} = 0.3 since dy = 0.3 * (d - d_lower) + noise
+        dy = 0.3 * (d - 0.1) + 0.05 * rng.standard_normal(G)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert abs(r.att - 0.3) < 0.1
+
     def test_dose_mean_stored_on_result(self):
         d, dy = _dgp_continuous_at_zero(500, seed=0)
         panel = _make_panel(d, dy)
@@ -1040,68 +1089,162 @@ def test_force_mass_point_on_continuous_data_at_support_infimum(self):
 
 
 # =============================================================================
-# Mass-point d_lower contract enforcement
+# Design 1 d_lower contract enforcement (mass-point + continuous_near_d_lower)
 # =============================================================================
 
 
-class TestMassPointDLowerContract:
-    """Paper Section 3.2.4: mass-point d_lower must equal float(d.min())."""
+class TestDesign1DLowerContract:
+    """Paper Sections 3.2.2-3.2.4: Design 1 estimators identify at the support
+    infimum. Both mass_point and continuous_near_d_lower require
+    ``d_lower == float(d.min())`` within float tolerance; mismatched
+    overrides raise.
+    """
 
-    def test_d_lower_above_min_raises(self):
+    def test_mass_point_d_lower_above_min_raises(self):
         d, dy = _dgp_mass_point(500, seed=0)  # d.min() == 0.5
         panel = _make_panel(d, dy)
         est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.6)
         with pytest.raises(ValueError, match="support infimum"):
             est.fit(panel, "outcome", "dose", "period", "unit")
 
-    def test_d_lower_below_min_raises(self):
-        d, dy = _dgp_mass_point(500, seed=0)  # d.min() == 0.5
+    def test_mass_point_d_lower_below_min_raises(self):
+        d, dy = _dgp_mass_point(500, seed=0)
         panel = _make_panel(d, dy)
         est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.3)
         with pytest.raises(ValueError, match="support infimum"):
             est.fit(panel, "outcome", "dose", "period", "unit")
 
-    def test_d_lower_matches_support_infimum_succeeds(self):
-        d, dy = _dgp_mass_point(500, seed=0)  # d.min() == 0.5
+    def test_mass_point_d_lower_matches_succeeds(self):
+        d, dy = _dgp_mass_point(500, seed=0)
         panel = _make_panel(d, dy)
         est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.5)
         r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert r.d_lower == 0.5
         assert np.isfinite(r.att)
 
-    def test_d_lower_none_auto_resolves_to_min(self):
-        """With d_lower=None, the contract is satisfied automatically."""
+    def test_mass_point_d_lower_none_auto_resolves_to_min(self):
         d, dy = _dgp_mass_point(500, seed=0)
         panel = _make_panel(d, dy)
         est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=None)
         r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert abs(r.d_lower - float(d.min())) < 1e-14
 
-    def test_d_lower_within_tolerance_succeeds(self):
-        """Float-tolerance overrides of d.min() are accepted."""
+    def test_mass_point_d_lower_within_tolerance_succeeds(self):
         d, dy = _dgp_mass_point(500, seed=0)
         panel = _make_panel(d, dy)
-        # Pass d_lower that differs from d.min() by float-rounding noise.
         d_lower_user = float(d.min()) + 1e-15
         est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=d_lower_user)
         r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert np.isfinite(r.att)
 
-    def test_mass_point_equality_guard_does_not_fire_on_continuous(self):
-        """The mass-point-specific d_lower equality guard is not enforced on
-        continuous_near_d_lower. (The continuous path has its own upstream
-        Phase 1c guards against off-support d_lower via the negative-dose
-        check after the regressor shift, but the mass-point-specific
-        ValueError does not fire here.)
-        """
+    def test_continuous_near_d_lower_above_min_raises(self):
+        """Review P1: continuous_near_d_lower must also enforce support infimum."""
         d, dy = _dgp_continuous_near_d_lower(500, seed=0)
         panel = _make_panel(d, dy)
-        # d_lower == d.min() is always a valid continuous configuration.
-        est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min()))
-        r = est.fit(panel, "outcome", "dose", "period", "unit")
+        est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=0.3)
+        with pytest.raises(ValueError, match="support infimum"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_continuous_near_d_lower_below_min_raises(self):
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        # d.min() for this Beta DGP is > 0.1 but setting d_lower=0.05 is below min.
+        est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=0.05)
+        with pytest.raises(ValueError, match="support infimum"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_continuous_near_d_lower_matches_succeeds(self):
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower", d_lower=float(d.min()))
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert np.isfinite(r.att)
 
 
+# =============================================================================
+# Post-period dose non-negative contract (review P1)
+# =============================================================================
+
+
+class TestPostPeriodDoseContract:
+    """Paper Section 2 dose definition: D_{g,2} >= 0. _validate_had_panel
+    rejects negative post-period dose front-door on the ORIGINAL scale
+    (before the regressor shift) so the error references the user's
+    dose column, not the Phase 1c shifted values.
+    """
+
+    def test_negative_post_dose_raises(self):
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        # Inject a negative post-period dose on one unit.
+        post_mask = panel["period"] == 2
+        idx = panel[post_mask].index[0]
+        panel.loc[idx, "dose"] = -0.1
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"D_\{g,2\}|negative post"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_zero_post_dose_accepted(self):
+        """D_{g,2} == 0 is the Design 1' no-treated-group case, always allowed."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        # Ensure d[0] == 0 exactly (no-treated unit) is accepted.
+        assert d[0] == 0.0
+        panel = _make_panel(d, dy)
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert np.isfinite(r.att)
+
+
+# =============================================================================
+# Design 1 Assumption 5/6 identification warning (review P1)
+# =============================================================================
+
+
+class TestAssumptionFiveSixWarning:
+    """Paper Sections 3.2.2-3.2.4: Design 1 fits require Assumption 5 (sign
+    identification) or Assumption 6 (point identification of WAS_{d_lower})
+    beyond parallel trends. These extras are not pre-trend testable. A
+    UserWarning surfaces the identification burden on Design 1 fits.
+    """
+
+    def test_continuous_near_d_lower_emits_assumption_warning(self):
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+            assumption_warnings = [warn for warn in w if "Assumption" in str(warn.message)]
+            assert len(assumption_warnings) >= 1
+
+    def test_mass_point_emits_assumption_warning(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            HeterogeneousAdoptionDiD(design="mass_point").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+            assumption_warnings = [warn for warn in w if "Assumption" in str(warn.message)]
+            assert len(assumption_warnings) >= 1
+
+    def test_continuous_at_zero_does_not_emit_assumption_warning(self):
+        """Design 1' (d_lower=0) is identified under Assumption 3 only; no warning."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+            assumption_warnings = [warn for warn in w if "Assumption 6" in str(warn.message)]
+            assert len(assumption_warnings) == 0
+
+
 # =============================================================================
 # Cluster handling (unit-level aggregation)
 # =============================================================================

From c320ddeb6bb6a8da191667a034a108734e66d8d5 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 18:58:08 -0400
Subject: [PATCH 05/17] Address PR #346 CI review round 2: 2 P1s + 1 P2

**P1 #1 (Methodology): continuous_near_d_lower on mass-point samples**

When a user explicitly forced design="continuous_near_d_lower" on a
sample that actually satisfies the >2% modal-fraction mass-point
criterion, the downstream regressor shift (D - d_lower) would move the
support minimum to zero on the shifted scale. Phase 1c's mass-point
rejection guard only fires when d.min() > 0 (_validate_had_inputs), so
the silent coercion ran the nonparametric local-linear estimator on a
sample the paper (Section 3.2.4) requires to use the 2SLS branch,
producing the wrong estimand.

Fix: `HeterogeneousAdoptionDiD.fit()` now runs the modal-fraction
check on the ORIGINAL (unshifted) d_arr when the user explicitly
selects design="continuous_near_d_lower". If the fraction at d.min()
exceeds 2%, the fit raises ValueError pointing to design="mass_point"
or design="auto". design="auto" is unaffected (_detect_design already
correctly resolves such samples to mass_point).

**P1 #2 (Code Quality): first_treat_col validator not dtype-agnostic**

The previous validator called `.astype(np.float64)` and `int(v)` on
grouped first_treat values, which crashed on otherwise-supported
string-labelled two-period panels (period in {"A","B"}, first_treat
in {0, "B"}). Rewrote using `pd.isna()` for missingness and raw-value
set-membership against `{0, t_post}` with no numeric coercion.

**P2 (Maintainability): cluster-applied mass-point stored wrong vcov_type**

When cluster was supplied, `_fit_mass_point_2sls` unconditionally
switches to the CR1 cluster-robust sandwich, but the result object
stored the REQUESTED family ("hc1" or "classical") as `vcov_type`.
`summary()` rendered correctly via the cluster_name branch, but
`to_dict()` and downstream programmatic consumers saw the stale
requested label. Fixed: when cluster is supplied, `vcov_type` is
stored as `"cr1"` regardless of the requested family. Renamed the
local variable from `vcov_effective` to `vcov_requested` to separate
the input from the effective family. Updated the
`HeterogeneousAdoptionDiDResults.summary()` branch so the cluster
rendering still works with the new stored value.

**Tests added (+8 regression):**
- TestValidateHadPanel.test_first_treat_col_with_string_periods
- TestValidateHadPanel.test_first_treat_col_dtype_agnostic_rejects_invalid_string
- TestContinuousPathRejectsMassPoint (2 tests)
- TestMassPointClusterLabel (4 tests: cr1 stored when clustered, base
  family when unclustered, classical+cluster collapses to cr1,
  to_dict shows effective family)

Targeted regression: 126 HAD tests + 505 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  |  74 +++++++++++++++++++++++------
 tests/test_had.py | 115 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 174 insertions(+), 15 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 01c98400..e931c9d2 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -189,10 +189,15 @@ class HeterogeneousAdoptionDiDResults:
         ``"analytical_nonparametric"`` (continuous designs) or
         ``"analytical_2sls"`` (mass-point).
     vcov_type : str or None
-        Variance-covariance family used. ``None`` on continuous paths
-        (they use the CCT-2014 robust SE from Phase 1c, not the library's
-        ``vcov_type`` enum). Mass-point: one of ``"classical"`` or
-        ``"hc1"`` (CR1 when ``cluster`` is supplied).
+        Effective variance-covariance family used. ``None`` on continuous
+        paths (they use the CCT-2014 robust SE from Phase 1c, not the
+        library's ``vcov_type`` enum). Mass-point: ``"classical"`` or
+        ``"hc1"`` when ``cluster`` is not supplied, and ``"cr1"``
+        whenever ``cluster`` is supplied (cluster-robust CR1 is computed
+        regardless of the requested ``vcov_type`` because
+        classical/hc1 + cluster collapses to the same CR1 sandwich).
+        Downstream consumers reading ``result.to_dict()`` can inspect
+        this field directly to determine the effective SE family.
     cluster_name : str or None
         Column name of the cluster variable on the mass-point path when
         cluster-robust SE is requested. ``None`` otherwise.
@@ -269,9 +274,12 @@ def summary(self) -> str:
             lines.append(f"{'Strictly above d_lower:':<30} {self.n_above_d_lower:>20}")
         lines.append(f"{'Inference method:':<30} {self.inference_method:>20}")
         if self.vcov_type is not None:
-            label = self.vcov_type
             if self.cluster_name is not None:
+                # Cluster-robust (CR1): the stored vcov_type is already "cr1",
+                # but render with the cluster column for clarity.
                 label = f"CR1 at {self.cluster_name}"
+            else:
+                label = self.vcov_type
             lines.append(f"{'Variance:':<30} {label:>20}")
         if self.bandwidth_diagnostics is not None:
             bw = self.bandwidth_diagnostics
@@ -476,20 +484,24 @@ def _validate_had_panel(
     # domain check that catches typos and staggered-timing mix-ups; it
     # does NOT cross-validate first_treat against post-period dose
     # (D_{g, t_post} remains the primary signal). Extended cross-checks
-    # are queued for a follow-up PR.
+    # are queued for a follow-up PR. The check is DTYPE-AGNOSTIC: it uses
+    # pd.isna() for missingness and raw-value membership against
+    # {0, t_post} so that string-labelled periods (e.g., ("A", "B")) with
+    # first_treat in {0, "B"} are supported.
     if first_treat_col is not None:
-        ft = data.groupby(unit_col)[first_treat_col].first().to_numpy()
-        if np.any(np.isnan(ft.astype(np.float64, copy=False))):
+        ft_series = data.groupby(unit_col)[first_treat_col].first()
+        if bool(ft_series.isna().any()):
             raise ValueError(
                 f"first_treat_col={first_treat_col!r} contains NaN. "
                 f"Use 0 for never-treated units and t_post for treated."
             )
         valid_values = {0, t_post}
-        bad = [v for v in np.unique(ft) if int(v) not in valid_values]
+        observed = set(ft_series.unique().tolist())
+        bad = sorted(observed - valid_values, key=lambda x: str(x))
         if bad:
             raise ValueError(
                 f"first_treat_col={first_treat_col!r} contains value(s) "
-                f"{bad} outside the allowed set {{0, {t_post}}} for a "
+                f"{bad} outside the allowed set {{0, {t_post!r}}} for a "
                 f"two-period HAD panel. Staggered timing with multiple "
                 f"cohorts is Phase 2b."
             )
@@ -1042,6 +1054,34 @@ def fit(
         else:
             d_lower_val = float(d_lower_arg)
 
+        # ---- Original-scale mass-point check before the regressor shift ----
+        # When a user explicitly forces design="continuous_near_d_lower"
+        # on a sample that is actually a mass-point sample (modal fraction
+        # at d.min() > 2% per paper Section 3.2.4), the downstream code
+        # would shift D - d_lower, making the support minimum zero on the
+        # shifted scale and suppressing the Phase 1c mass-point rejection
+        # guard (_validate_had_inputs only checks modal fraction when
+        # d.min() > 0). That silent coercion produces wrong CIs for a
+        # mass-point sample by running the nonparametric estimator where
+        # the paper's 2SLS branch is required. Front-door reject.
+        if resolved_design == "continuous_near_d_lower":
+            d_min_orig = float(d_arr.min())
+            if d_min_orig > 0:
+                eps_mp = 1e-12 * max(1.0, abs(d_min_orig))
+                at_d_min_mask_orig = np.abs(d_arr - d_min_orig) <= eps_mp
+                modal_fraction_orig = float(np.mean(at_d_min_mask_orig))
+                if modal_fraction_orig > _MASS_POINT_THRESHOLD:
+                    raise ValueError(
+                        f"design='continuous_near_d_lower' cannot be used on a "
+                        f"mass-point sample (modal fraction {modal_fraction_orig:.4f} "
+                        f"at d.min()={d_min_orig!r} exceeds the "
+                        f"{_MASS_POINT_THRESHOLD:.2f} threshold from paper Section "
+                        f"3.2.4). Use design='mass_point' (Wald-IV / 2SLS) or "
+                        f"design='auto' which will auto-detect. Forcing the "
+                        f"continuous path on a mass-point sample would produce "
+                        f"the wrong estimand."
+                    )
+
         # ---- d_lower contract for Design 1 paths ----
         # Paper Sections 3.2.2-3.2.4 define the Design 1 estimators at
         # d_lower = support infimum of D_{g,2}. For the mass-point path,
@@ -1148,21 +1188,25 @@ def fit(
         elif resolved_design == "mass_point":
             if vcov_type_arg is None:
                 # Backward-compat: robust=True -> hc1, robust=False -> classical.
-                vcov_effective = "hc1" if robust_arg else "classical"
+                vcov_requested = "hc1" if robust_arg else "classical"
             else:
-                vcov_effective = vcov_type_arg.lower()
-            # Explicit cluster + hc1 becomes CR1 inside _fit_mass_point_2sls.
+                vcov_requested = vcov_type_arg.lower()
             att, se = _fit_mass_point_2sls(
                 d_arr,
                 dy_arr,
                 d_lower_val,
                 cluster_arr,
-                vcov_effective,
+                vcov_requested,
             )
             bc_fit = None
             bw_diag = None
             inference_method = "analytical_2sls"
-            vcov_label = vcov_effective
+            # Store the EFFECTIVE variance family so downstream consumers
+            # (to_dict, to_dataframe, summary) see the actual SE type that
+            # was computed. When cluster is supplied, _fit_mass_point_2sls
+            # unconditionally computes CR1 regardless of vcov_requested
+            # (e.g. classical+cluster -> CR1), so we surface that here.
+            vcov_label = "cr1" if cluster_arg is not None else vcov_requested
             cluster_label = cluster_arg if cluster_arg is not None else None
         else:
             raise ValueError(f"Internal error: unhandled design={resolved_design!r}.")
diff --git a/tests/test_had.py b/tests/test_had.py
index b8e0152f..19abe7b7 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1404,3 +1404,118 @@ def test_rejects_string_periods_gracefully(self):
         t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
         assert t_pre == "A"
         assert t_post == "B"
+
+    def test_first_treat_col_with_string_periods(self):
+        """Review P1: first_treat_col validator must be dtype-agnostic.
+
+        With string periods ("A", "B") and first_treat_col values in
+        {0, "B"}, the validator must not attempt numeric coercion.
+        """
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy, periods=("A", "B"))
+        # 50 units never-treated (first_treat=0), 50 treated (first_treat="B")
+        ft_unit = np.array([0 if i % 2 == 0 else "B" for i in range(100)], dtype=object)
+        panel["ft"] = np.repeat(ft_unit, 2)
+        t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft")
+        assert t_pre == "A"
+        assert t_post == "B"
+
+    def test_first_treat_col_dtype_agnostic_rejects_invalid_string(self):
+        """Mix string periods + invalid first_treat_col string -> ValueError."""
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy, periods=("A", "B"))
+        # Invalid: "Z" is neither 0 nor "B"
+        ft_unit = np.array([0 if i % 2 == 0 else "Z" for i in range(100)], dtype=object)
+        panel["ft"] = np.repeat(ft_unit, 2)
+        with pytest.raises(ValueError, match="first_treat_col"):
+            _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft")
+
+
+# =============================================================================
+# Review P1: continuous_near_d_lower on a true mass-point sample rejects
+# =============================================================================
+
+
+class TestContinuousPathRejectsMassPoint:
+    """Explicit override to continuous_near_d_lower on a mass-point sample
+    must raise before the regressor shift, otherwise the Phase 1c
+    mass-point guard (which fires only on d.min() > 0) is bypassed.
+    """
+
+    def test_continuous_near_on_mass_point_sample_raises(self):
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower")
+        with pytest.raises(ValueError, match=r"mass-point sample|mass_point"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_continuous_near_on_continuous_sample_runs(self):
+        """Sanity: the pre-shift check does NOT reject valid continuous samples."""
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert np.isfinite(r.att)
+
+
+# =============================================================================
+# Review P2: cluster-applied mass-point stores vcov_type="cr1"
+# =============================================================================
+
+
+class TestMassPointClusterLabel:
+    def test_cluster_stores_cr1(self):
+        """to_dict() / downstream consumers see 'cr1' not 'hc1' when clustered."""
+        d, dy = _dgp_mass_point(200, seed=0)
+        cluster_unit = np.repeat(np.arange(50), 4)
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert r.vcov_type == "cr1"
+        assert r.cluster_name == "state"
+
+    def test_no_cluster_stores_base_family(self):
+        """Unclustered mass-point keeps 'hc1' or 'classical' label."""
+        d, dy = _dgp_mass_point(200, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r_hc1 = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+            r_cl = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="classical").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert r_hc1.vcov_type == "hc1"
+        assert r_cl.vcov_type == "classical"
+
+    def test_cluster_with_classical_collapses_to_cr1(self):
+        """classical + cluster is CR1 in practice; label reflects that."""
+        d, dy = _dgp_mass_point(200, seed=0)
+        cluster_unit = np.repeat(np.arange(50), 4)
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(
+                design="mass_point", vcov_type="classical", cluster="state"
+            ).fit(panel, "outcome", "dose", "period", "unit")
+        assert r.vcov_type == "cr1"
+
+    def test_to_dict_shows_effective_family(self):
+        d, dy = _dgp_mass_point(200, seed=0)
+        cluster_unit = np.repeat(np.arange(50), 4)
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(design="mass_point", cluster="state").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        result_dict = r.to_dict()
+        assert result_dict["vcov_type"] == "cr1"
+        assert result_dict["cluster_name"] == "state"

From ddc09e4341f528862de4a62505032d314fa8deea Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 19:11:40 -0400
Subject: [PATCH 06/17] Address PR #346 CI review round 3: P1 period inference
 + P2 summary label

**P1 (Methodology): _validate_had_panel inferred pre/post by lexicographic sort**

Previously the validator sorted the two period labels alphabetically
and assigned `t_pre=periods[0]`, `t_post=periods[1]`. On supported
string-labelled panels like `("pre", "post")` the alphabetic order is
["post", "pre"], so the code flipped pre and post and then raised on
the treated-period D>0 check for a valid design. Same bug for
`("before", "after")` and any non-alphabetic-chronological label pair.

Fix: identify `t_pre` as the unique period where dose == 0 for ALL
units (HAD paper Section 2 no-unit-untreated convention); `t_post` is
the other period. This is a DGP-consistent invariant, not a string
ordering. If neither period has all-zero dose, raise with the
contract message and per-period nonzero-count diagnostics. If both
periods have all-zero dose, raise (no treatment variation to estimate).

The existing pre-period D=0 check is now tautological and has been
removed since the inference itself enforces the invariant. Behavior
on valid numeric panels (e.g., 2020/2021) is unchanged.

**P2 (Code Quality): summary() hardcoded 'WAS' row label**

`HeterogeneousAdoptionDiDResults.summary()` printed "WAS" as the
parameter label regardless of the resolved design. For Design 1
paths (continuous_near_d_lower, mass_point) the stored
`target_parameter` is "WAS_d_lower" per paper Sections 3.2.2-3.2.4,
so the user-facing output misrepresented the estimand.

Fix: render `self.target_parameter` in the summary row. Now Design 1'
prints "WAS", Design 1 prints "WAS_d_lower", matching the stored
result metadata.

**Tests (+7 regression):**
- TestValidateHadPanel.test_semantic_pre_post_labels_not_lexicographic
- TestValidateHadPanel.test_semantic_pre_post_with_first_treat_col
- TestValidateHadPanel.test_semantic_pre_post_fit_end_to_end
- TestValidateHadPanel.test_before_after_labels
- TestValidateHadPanel.test_no_all_zero_period_raises
- TestValidateHadPanel.test_both_all_zero_periods_raises
- TestResultMethods.test_summary_uses_target_parameter_for_row_label

Targeted regression: 133 HAD tests + 512 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  |  52 +++++++++++++++---------
 tests/test_had.py | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 134 insertions(+), 18 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index e931c9d2..913ce2e4 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -289,6 +289,7 @@ def summary(self) -> str:
             bc = self.bias_corrected_fit
             lines.append(f"{'Bandwidth h used:':<30} {bc.h:>20.6g}")
             lines.append(f"{'Obs in window (n_used):':<30} {bc.n_used:>20}")
+        param_label = self.target_parameter
         lines.extend(
             [
                 "",
@@ -299,7 +300,7 @@ def summary(self) -> str:
                 ),
                 "-" * width,
                 (
-                    f"{'WAS':<15} {self.att:>12.4f} {self.se:>12.4f} "
+                    f"{param_label:<15} {self.att:>12.4f} {self.se:>12.4f} "
                     f"{self.t_stat:>10.3f} {self.p_value:>10.4f}"
                 ),
                 "-" * width,
@@ -395,30 +396,27 @@ def _validate_had_panel(
     if missing:
         raise ValueError(f"Missing column(s) in data: {missing}. Required: {required}.")
 
-    periods = np.sort(np.asarray(data[time_col].unique()))
-    if len(periods) < 2:
+    periods_list = list(data[time_col].unique())
+    if len(periods_list) < 2:
         raise ValueError(
-            f"HAD requires a two-period panel; got {len(periods)} distinct "
+            f"HAD requires a two-period panel; got {len(periods_list)} distinct "
             f"period(s) in column {time_col!r}."
         )
-    if len(periods) > 2:
+    if len(periods_list) > 2:
         if first_treat_col is None:
             raise ValueError(
                 f"HAD Phase 2a requires exactly two time periods "
-                f"(got {len(periods)} in {time_col!r}) when "
+                f"(got {len(periods_list)} in {time_col!r}) when "
                 f"first_treat_col=None. Multi-period / staggered adoption "
                 f"support is queued for Phase 2b (Appendix B.2 event-study)."
             )
         raise ValueError(
             f"HAD Phase 2a requires exactly two time periods "
-            f"(got {len(periods)} in {time_col!r}). Staggered adoption "
+            f"(got {len(periods_list)} in {time_col!r}). Staggered adoption "
             f"reduction (first_treat_col supplied with >2 periods) is "
             f"queued for Phase 2b (Appendix B.2 event-study)."
         )
 
-    t_pre = int(periods[0]) if np.issubdtype(periods.dtype, np.integer) else periods[0]
-    t_post = int(periods[1]) if np.issubdtype(periods.dtype, np.integer) else periods[1]
-
     # Balanced-panel check: every unit appears exactly once per period.
     counts = data.groupby([unit_col, time_col]).size()
     if (counts != 1).any():
@@ -446,17 +444,35 @@ def _validate_had_panel(
                 f"calling fit()."
             )
 
-    # Pre-period no-unit-untreated check.
-    pre_mask = data[time_col] == t_pre
-    pre_doses = np.asarray(data.loc[pre_mask, dose_col], dtype=np.float64)
-    nonzero_pre = pre_doses != 0
-    if nonzero_pre.any():
-        n_bad = int(nonzero_pre.sum())
+    # Identify t_pre and t_post by the HAD invariant rather than by
+    # lexicographic sort on the time labels: D_{g, t_pre} = 0 for all
+    # units (paper Section 2 no-unit-untreated pre-period convention).
+    # Sorting labels alphabetically reverses valid chronologies like
+    # ("pre", "post") where ordering is semantic, not alphabetic.
+    per_period_nonzero: Dict[Any, int] = {}
+    for p in periods_list:
+        p_doses = np.asarray(data.loc[data[time_col] == p, dose_col], dtype=np.float64)
+        per_period_nonzero[p] = int((p_doses != 0).sum())
+    all_zero_periods = [p for p, nz in per_period_nonzero.items() if nz == 0]
+    if len(all_zero_periods) == 0:
+        # Neither period has all-zero dose: HAD pre-period contract violated.
+        stats_str = ", ".join(f"{p!r}: {nz} nonzero" for p, nz in per_period_nonzero.items())
         raise ValueError(
             f"HAD requires D_{{g,1}} = 0 for all units (pre-period "
-            f"untreated). {n_bad} unit(s) have nonzero dose at "
-            f"t_pre={t_pre}. Drop these units or verify the dose column."
+            f"untreated). Neither period in column {time_col!r} has "
+            f"all-zero dose ({stats_str}). Exactly one period must be "
+            f"the pre-treatment period with D_{{g,1}} = 0 for every unit; "
+            f"drop rows with nonzero pre-period dose or verify the dose "
+            f"column."
+        )
+    if len(all_zero_periods) == 2:
+        raise ValueError(
+            f"HAD requires variation in D_{{g,2}} for estimation. Both "
+            f"periods in column {time_col!r} have all-zero dose, so "
+            f"there is no treatment assignment to estimate."
         )
+    t_pre = all_zero_periods[0]
+    t_post = [p for p in periods_list if p != t_pre][0]
 
     # Post-period nonnegative-dose check on the ORIGINAL (unshifted) dose
     # scale. Front-door rejection per paper Assumption (dose definition
diff --git a/tests/test_had.py b/tests/test_had.py
index 19abe7b7..9b1e1668 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -899,6 +899,42 @@ def test_summary_returns_string(self):
         assert "WAS" in s
         assert "Confidence Interval" in s
 
+    def test_summary_uses_target_parameter_for_row_label(self):
+        """Review P2: the estimate row must render target_parameter (WAS or
+        WAS_d_lower), not hardcoded 'WAS'.
+        """
+        # Design 1' -> target_parameter = "WAS"
+        d, dy = _dgp_continuous_at_zero(400, seed=0)
+        panel = _make_panel(d, dy)
+        r_d1p = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        s_d1p = r_d1p.summary()
+        assert r_d1p.target_parameter == "WAS"
+        assert "WAS" in s_d1p
+
+        # Design 1 continuous-near-d_lower -> target_parameter = "WAS_d_lower"
+        d, dy = _dgp_continuous_near_d_lower(400, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r_d1 = HeterogeneousAdoptionDiD(design="continuous_near_d_lower").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert r_d1.target_parameter == "WAS_d_lower"
+        assert "WAS_d_lower" in r_d1.summary()
+
+        # Design 1 mass-point -> target_parameter = "WAS_d_lower"
+        d, dy = _dgp_mass_point(400, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r_mp = HeterogeneousAdoptionDiD(design="mass_point").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert r_mp.target_parameter == "WAS_d_lower"
+        assert "WAS_d_lower" in r_mp.summary()
+
     def test_print_summary_executes(self, capsys):
         r = self._result()
         r.print_summary()
@@ -1430,6 +1466,70 @@ def test_first_treat_col_dtype_agnostic_rejects_invalid_string(self):
         with pytest.raises(ValueError, match="first_treat_col"):
             _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft")
 
+    def test_semantic_pre_post_labels_not_lexicographic(self):
+        """Review P1 round 3: pre/post inference must be dose-based.
+
+        ("pre", "post") sorts alphabetically to ["post", "pre"], which
+        previously flipped the pre/post labels and raised on a valid
+        panel. The validator now infers pre from the all-zero-dose
+        period.
+        """
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy, periods=("pre", "post"))
+        t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
+        assert t_pre == "pre"
+        assert t_post == "post"
+
+    def test_semantic_pre_post_with_first_treat_col(self):
+        """Combined: string periods + first_treat_col in {0, 'post'}."""
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy, periods=("pre", "post"))
+        ft_unit = np.array([0 if i % 2 == 0 else "post" for i in range(100)], dtype=object)
+        panel["ft"] = np.repeat(ft_unit, 2)
+        t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", "ft")
+        assert t_pre == "pre"
+        assert t_post == "post"
+
+    def test_semantic_pre_post_fit_end_to_end(self):
+        """End-to-end: fit() runs on ("pre","post")-labelled panel."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy, periods=("pre", "post"))
+        r = HeterogeneousAdoptionDiD(design="continuous_at_zero").fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        assert np.isfinite(r.att)
+
+    def test_before_after_labels(self):
+        """("before","after") is also reversed alphabetically; must not fail."""
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy, periods=("before", "after"))
+        t_pre, t_post = _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
+        assert t_pre == "before"
+        assert t_post == "after"
+
+    def test_no_all_zero_period_raises(self):
+        """If neither period has all-zero dose, HAD's D_{g,1}=0 contract fails."""
+        d, dy = _dgp_continuous_at_zero(100, seed=0)
+        panel = _make_panel(d, dy)
+        # Inject nonzero dose into the pre period so neither period is all-zero.
+        panel.loc[panel["period"] == 1, "dose"] = 0.5
+        with pytest.raises(ValueError, match=r"D_\{g,1\}|pre-treatment"):
+            _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
+
+    def test_both_all_zero_periods_raises(self):
+        """If both periods have all-zero dose, no treatment to estimate."""
+        G = 100
+        panel = pd.DataFrame(
+            {
+                "unit": np.repeat(np.arange(G), 2),
+                "period": np.tile([1, 2], G),
+                "dose": np.zeros(2 * G),
+                "outcome": np.random.default_rng(0).standard_normal(2 * G),
+            }
+        )
+        with pytest.raises(ValueError, match="variation"):
+            _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
+
 
 # =============================================================================
 # Review P1: continuous_near_d_lower on a true mass-point sample rejects

From e622d60482fa06544f2ca0a51c3e06584c4e0b83 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 19:23:38 -0400
Subject: [PATCH 07/17] Address PR #346 CI review round 4: P1 d_lower > 0 on
 Design 1 paths

**P1 (Methodology): Design 1 paths must reject d_lower = 0**

Paper Section 3.2 partitions HAD by regime: `d_lower = 0` is Design 1'
(`continuous_at_zero`); `d_lower > 0` is Design 1 (`continuous_near_
d_lower` or `mass_point`). The auto-detect rule already respects this
partition, but explicit overrides previously allowed
`design="mass_point", d_lower=0` and `design="continuous_near_d_lower"`
on a `d.min()==0` sample to run silently, returning a
paper-incompatible estimand (2SLS with degenerate single-unit mass for
mass_point; Design 1' algebra relabeled as `WAS_d_lower` with a
spurious Assumption 5/6 warning for continuous_near_d_lower).

Fix: add a fit-time guard that raises `ValueError` when
`resolved_design in ("mass_point", "continuous_near_d_lower")` and the
resolved `d_lower_val` is within float tolerance of zero (same
tolerance family as `_detect_design`'s d.min()==0 tie-break). The
error message points users to `continuous_at_zero` or `auto` for
samples with support infimum at zero.

**Docstring + test updates:**

- Rewrote the `design` parameter docstring to document the
  regime-partition contract precisely: each explicit override is
  now described with its d_lower precondition and mass-point
  compatibility.
- Rewrote the `d_lower` parameter docstring to note the
  Design-1-requires-positive contract.
- Inverted the prior `test_force_mass_point_on_continuous_data_at_
  support_infimum` test (which incorrectly codified the unsupported
  behavior) into three rejection regressions:
  `test_force_mass_point_on_d_lower_zero_sample_raises`,
  `test_force_continuous_near_d_lower_on_d_lower_zero_sample_raises`,
  `test_force_mass_point_d_lower_none_on_zero_sample_raises`.

Targeted regression: 135 HAD tests + 514 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 54 +++++++++++++++++++++++++++++++++++++++++++----
 tests/test_had.py | 34 +++++++++++++++++++++--------
 2 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 913ce2e4..383792e6 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -828,13 +828,36 @@ class HeterogeneousAdoptionDiD:
     design : {"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"}
         Design-dispatch strategy. Defaults to ``"auto"`` which resolves
         via the REGISTRY auto-detect rule on the fitted dose data
-        (see :func:`_detect_design`). Explicit overrides run the chosen
-        path without auto-reject (e.g., forcing Design 1' on mass-point
-        data runs the nonparametric fit even though the paper would
-        counsel the 2SLS path).
+        (see :func:`_detect_design`).
+
+        Explicit overrides are checked against the paper's
+        regime-partition contract (Section 3.2) at fit time:
+
+        - ``"continuous_at_zero"`` (Design 1'): paper requires the
+          support infimum ``d_lower = 0``. Phase 1c's
+          ``_validate_had_inputs`` rejects mass-point samples passed
+          to this path.
+        - ``"continuous_near_d_lower"`` (Design 1, continuous density
+          near ``d_lower``): requires ``d_lower > 0`` and a
+          non-mass-point sample (modal fraction at ``d.min()`` must be
+          <= 2%). ``d_lower`` must equal ``float(d.min())`` within
+          float tolerance; non-support-infimum thresholds are off-
+          support and raise.
+        - ``"mass_point"`` (Design 1 mass-point): requires
+          ``d_lower > 0`` and ``d_lower == float(d.min())`` within
+          float tolerance. Forcing this design on a ``d_lower = 0``
+          sample raises (use ``"continuous_at_zero"`` or ``"auto"``).
+
+        Mismatched overrides raise ``ValueError`` pointing at the
+        correct design rather than silently identifying a different
+        estimand.
     d_lower : float or None
         Support infimum ``d_lower``. ``None`` means use ``0.0`` on the
         Design 1' path and ``float(d.min())`` on the other two paths.
+        On Design 1 paths (``continuous_near_d_lower`` and
+        ``mass_point``), an explicit ``d_lower`` must equal
+        ``float(d.min())`` within float tolerance AND must be strictly
+        positive; zero-valued or mismatched thresholds raise.
     kernel : {"epanechnikov", "triangular", "uniform"}
         Forwarded to :func:`bias_corrected_local_linear` on the continuous
         paths. Ignored on the mass-point path.
@@ -1070,6 +1093,29 @@ def fit(
         else:
             d_lower_val = float(d_lower_arg)
 
+        # ---- Regime partition: d_lower > 0 for Design 1 paths ----
+        # Paper Section 3.2 partitions HAD into d_lower = 0 (Design 1',
+        # continuous_at_zero) and d_lower > 0 (Design 1, continuous_near
+        # _d_lower or mass_point). The auto-detect rule already enforces
+        # this partition; explicit overrides must respect it too, otherwise
+        # `design="mass_point", d_lower=0` returns a finite but
+        # paper-incompatible 2SLS result and `design="continuous_near_d_lower"`
+        # with d_lower=0 reduces to Design 1' algebra while mislabeling the
+        # estimand as `WAS_d_lower` and emitting the wrong Assumption 5/6
+        # warning. Use the same float-tolerance family as _detect_design's
+        # d.min()==0 tie-break.
+        if resolved_design in ("mass_point", "continuous_near_d_lower"):
+            scale = max(1.0, float(np.max(np.abs(d_arr))))
+            if abs(d_lower_val) <= 1e-12 * scale:
+                raise ValueError(
+                    f"design={resolved_design!r} requires d_lower > 0 (paper "
+                    f"Section 3.2 reserves the d_lower=0 regime for Design 1' "
+                    f"/ `continuous_at_zero`). Got d_lower={d_lower_val!r}. "
+                    f"Use design='continuous_at_zero' (explicit) or "
+                    f"design='auto' (auto-detect) for samples with support "
+                    f"infimum at zero."
+                )
+
         # ---- Original-scale mass-point check before the regressor shift ----
         # When a user explicitly forces design="continuous_near_d_lower"
         # on a sample that is actually a mass-point sample (modal fraction
diff --git a/tests/test_had.py b/tests/test_had.py
index 9b1e1668..424364b5 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1108,20 +1108,36 @@ def test_force_continuous_at_zero_on_mass_point_data(self):
         with pytest.raises(NotImplementedError, match="mass-point"):
             est.fit(panel, "outcome", "dose", "period", "unit")
 
-    def test_force_mass_point_on_continuous_data_at_support_infimum(self):
-        """Forcing mass-point on continuous data with d_lower=d.min() runs.
+    def test_force_mass_point_on_d_lower_zero_sample_raises(self):
+        """Review P1 round 4: Design 1 paths require d_lower > 0.
 
-        d.min()==0 exactly in this DGP, so d_lower=0.0 is the paper-consistent
-        support-infimum threshold. The resulting mass=1 case exercises the
-        degenerate-mass boundary (only unit 0 is "at d_lower"; rest are above).
+        Paper Section 3.2 reserves the d_lower=0 regime for Design 1'
+        (continuous_at_zero). Forcing `mass_point` on a sample with
+        d.min()==0 must raise, pointing the user to continuous_at_zero
+        or auto.
         """
         d, dy = _dgp_continuous_at_zero(500, seed=0)
         panel = _make_panel(d, dy)
-        # d.min() == 0 exactly (d[0]=0 by construction); d_lower must match.
         est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=0.0)
-        r = est.fit(panel, "outcome", "dose", "period", "unit")
-        assert r.design == "mass_point"
-        assert r.d_lower == 0.0
+        with pytest.raises(ValueError, match=r"d_lower > 0|Design 1'"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_force_continuous_near_d_lower_on_d_lower_zero_sample_raises(self):
+        """Parallel: continuous_near_d_lower must also reject d_lower=0."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_near_d_lower")
+        # d_lower auto-resolves to float(d.min()) == 0.0 on this DGP.
+        with pytest.raises(ValueError, match=r"d_lower > 0|Design 1'"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_force_mass_point_d_lower_none_on_zero_sample_raises(self):
+        """d_lower=None on a d.min()==0 sample resolves to 0; must still raise."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point", d_lower=None)
+        with pytest.raises(ValueError, match=r"d_lower > 0"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
 
 
 # =============================================================================

From 2884c1d81adf576edc6f14397b8cb97880a28cbe Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 19:41:32 -0400
Subject: [PATCH 08/17] Address PR #346 CI review round 5: P1 reciprocal
 mass_point regime guard

**P1 (Methodology): reciprocal regime check for design="mass_point"**

Round 2 added a guard that rejects `design="continuous_near_d_lower"`
on samples with modal-min fraction > 2% (mass-point samples). The
reciprocal direction was left unguarded: `design="mass_point"` on a
continuous-near-d_lower sample (modal fraction <= 2%) would silently
run 2SLS, but the instrument Z = 1{D > d.min()} degenerates because
the "mass" side has almost no observations. The resulting point
estimate identifies the exact-d.min() cell rather than the paper's
boundary-limit estimand (Section 3.2.4 2SLS is only the documented
Design 1 estimator when the modal mass exists).

Fix: extend the existing pre-shift regime check so it fires
symmetrically on both explicit overrides. When
`resolved_design == "mass_point"` and the modal fraction at d.min()
does NOT exceed 2%, raise ValueError pointing users to
`continuous_near_d_lower` or `auto`. Same 2% threshold used by
`_detect_design()` and Phase 1c's `_validate_had_inputs()` so the
three dispatch paths share one regime rule.

**Docs:** Updated the `design` parameter docstring to document the
reciprocal mass-point contract.

**Tests (+4 regression):**
- TestMassPointPathRejectsContinuousSample.test_mass_point_on_continuous_near_sample_raises
- TestMassPointPathRejectsContinuousSample.test_mass_point_on_true_mass_point_sample_runs
- TestMassPointPathRejectsContinuousSample.test_mass_point_modal_at_threshold_runs (2.5%)
- TestMassPointPathRejectsContinuousSample.test_mass_point_modal_exactly_two_percent_raises

Targeted regression: 139 HAD tests + 518 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 51 +++++++++++++++++++++++++++++++--------------
 tests/test_had.py | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 383792e6..6798b064 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -844,9 +844,13 @@ class HeterogeneousAdoptionDiD:
           float tolerance; non-support-infimum thresholds are off-
           support and raise.
         - ``"mass_point"`` (Design 1 mass-point): requires
-          ``d_lower > 0`` and ``d_lower == float(d.min())`` within
-          float tolerance. Forcing this design on a ``d_lower = 0``
-          sample raises (use ``"continuous_at_zero"`` or ``"auto"``).
+          ``d_lower > 0`` AND a mass-point sample (modal fraction at
+          ``d.min()`` must be > 2%). ``d_lower`` must equal
+          ``float(d.min())`` within float tolerance. Forcing this
+          design on a ``d_lower = 0`` sample or on a continuous
+          (non-mass-point) sample raises; in either case 2SLS
+          identifies a different estimand than the paper's Design 1
+          mass-point WAS.
 
         Mismatched overrides raise ``ValueError`` pointing at the
         correct design rather than silently identifying a different
@@ -1116,23 +1120,27 @@ def fit(
                     f"infimum at zero."
                 )
 
-        # ---- Original-scale mass-point check before the regressor shift ----
-        # When a user explicitly forces design="continuous_near_d_lower"
-        # on a sample that is actually a mass-point sample (modal fraction
-        # at d.min() > 2% per paper Section 3.2.4), the downstream code
-        # would shift D - d_lower, making the support minimum zero on the
-        # shifted scale and suppressing the Phase 1c mass-point rejection
-        # guard (_validate_had_inputs only checks modal fraction when
-        # d.min() > 0). That silent coercion produces wrong CIs for a
-        # mass-point sample by running the nonparametric estimator where
-        # the paper's 2SLS branch is required. Front-door reject.
-        if resolved_design == "continuous_near_d_lower":
+        # ---- Original-scale modal-fraction / regime check ----
+        # Paper Section 3.2 splits Design 1 into two REGIME-SPECIFIC
+        # estimators by modal-min fraction:
+        #   - continuous_near_d_lower: modal fraction at d.min() <= 2%
+        #     (local-linear boundary-limit estimator).
+        #   - mass_point: modal fraction at d.min() > 2%
+        #     (Wald-IV / 2SLS identified by the instrument 1{D_2 > d_lower}).
+        # The auto-detect rule already enforces this; explicit overrides
+        # must too, otherwise the wrong estimand is returned silently.
+        # Both guards are symmetric around the 2% threshold used in
+        # _detect_design() and Phase 1c's _validate_had_inputs().
+        if resolved_design in ("continuous_near_d_lower", "mass_point"):
             d_min_orig = float(d_arr.min())
             if d_min_orig > 0:
                 eps_mp = 1e-12 * max(1.0, abs(d_min_orig))
                 at_d_min_mask_orig = np.abs(d_arr - d_min_orig) <= eps_mp
                 modal_fraction_orig = float(np.mean(at_d_min_mask_orig))
-                if modal_fraction_orig > _MASS_POINT_THRESHOLD:
+                if (
+                    resolved_design == "continuous_near_d_lower"
+                    and modal_fraction_orig > _MASS_POINT_THRESHOLD
+                ):
                     raise ValueError(
                         f"design='continuous_near_d_lower' cannot be used on a "
                         f"mass-point sample (modal fraction {modal_fraction_orig:.4f} "
@@ -1143,6 +1151,19 @@ def fit(
                         f"continuous path on a mass-point sample would produce "
                         f"the wrong estimand."
                     )
+                if resolved_design == "mass_point" and modal_fraction_orig <= _MASS_POINT_THRESHOLD:
+                    raise ValueError(
+                        f"design='mass_point' requires a modal mass at d.min() "
+                        f"exceeding the {_MASS_POINT_THRESHOLD:.2f} threshold "
+                        f"(paper Section 3.2.4). Got modal fraction "
+                        f"{modal_fraction_orig:.4f} at d.min()={d_min_orig!r}. "
+                        f"For continuous-near-d_lower samples use "
+                        f"design='continuous_near_d_lower' (local-linear "
+                        f"boundary-limit estimator) or design='auto' which "
+                        f"will auto-detect. Forcing 2SLS on a continuous "
+                        f"sample identifies the exact-d.min() cell rather "
+                        f"than the paper's boundary-limit estimand."
+                    )
 
         # ---- d_lower contract for Design 1 paths ----
         # Paper Sections 3.2.2-3.2.4 define the Design 1 estimators at
diff --git a/tests/test_had.py b/tests/test_had.py
index 424364b5..91b8d2e5 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1577,6 +1577,59 @@ def test_continuous_near_on_continuous_sample_runs(self):
         assert np.isfinite(r.att)
 
 
+class TestMassPointPathRejectsContinuousSample:
+    """Review P1 round 5: reciprocal guard. Forcing design="mass_point" on a
+    continuous-near-d_lower sample (modal fraction at d.min() <= 2%) must
+    raise, otherwise 2SLS identifies the exact-d.min() cell rather than
+    the paper's boundary-limit estimand.
+    """
+
+    def test_mass_point_on_continuous_near_sample_raises(self):
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point")
+        with pytest.raises(ValueError, match=r"modal mass|2SLS.*continuous"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_mass_point_on_true_mass_point_sample_runs(self):
+        """Sanity: the reciprocal guard does NOT reject valid mass-point samples."""
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert np.isfinite(r.att)
+
+    def test_mass_point_modal_at_threshold_runs(self):
+        """At exactly 2% + 1 unit, mass_point runs (strict > 0.02)."""
+        rng = np.random.default_rng(0)
+        G = 1000
+        mass_n = 25  # 2.5% > threshold
+        d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)])
+        dy = 0.3 * d + 0.1 * rng.standard_normal(G)
+        panel = _make_panel(d, dy)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = HeterogeneousAdoptionDiD(design="mass_point").fit(
+                panel, "outcome", "dose", "period", "unit"
+            )
+        assert np.isfinite(r.att)
+
+    def test_mass_point_modal_exactly_two_percent_raises(self):
+        """At exactly 2% (not strictly greater), mass_point must raise."""
+        rng = np.random.default_rng(0)
+        G = 1000
+        mass_n = 20  # exactly 2% (not > 2%)
+        d = np.concatenate([np.full(mass_n, 0.5), rng.uniform(0.5001, 1.0, G - mass_n)])
+        dy = 0.3 * d + 0.1 * rng.standard_normal(G)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="mass_point")
+        with pytest.raises(ValueError, match=r"modal mass"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+
 # =============================================================================
 # Review P2: cluster-applied mass-point stores vcov_type="cr1"
 # =============================================================================

From afa0f6c68586a1e15a09d1dfd19badc0402e52d5 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 19:54:29 -0400
Subject: [PATCH 09/17] Address PR #346 CI review round 6: P1 panel-only note +
 P3 theorem refs

**P1 (Methodology): Phase 2a panel-only not documented as deviation**

Paper Section 2 defines HAD on panel OR repeated cross-section data,
but Phase 2a ships a panel-only implementation. The existing
balanced-panel validator rejects RCS inputs (disjoint unit IDs
between periods) with a generic "unit(s) do not appear in both
periods" error, but the scope restriction was not documented in
REGISTRY.md as a `**Note:**` deviation from the paper's documented
surface.

Fix:
- REGISTRY.md: new `**Note (Phase 2a panel-only):**` block under the
  HAD Phase 2a entry, explaining the restriction and pointing at the
  follow-up RCS PR.
- `docs/methodology/papers/dechaisemartin-2026-review.md`: mirrored
  implementation note on the panel-or-RCS scope line.
- `HeterogeneousAdoptionDiD.fit()` docstring: new preamble paragraph
  stating the panel-only restriction and why (unit-level first
  differences required).
- `TODO.md`: new row queuing RCS identification-path work.
- `tests/test_had.py`: two regression tests
  (`test_repeated_cross_section_raises` and
  `test_repeated_cross_section_fit_raises`) that construct RCS-shaped
  input and assert the validator + fit() raise with the expected
  error.

**P3 (Docs/Tests): theorem/equation references were off**

Module docstring, result-class `att` docstring, and
`_fit_continuous()` docstring cited the paper's theorems/equations
inconsistently with the registry's source map:
- Design 1' identification: said "Theorem 3" -> should be Theorem 1
  with Equation 3; Equation 7 is the sample estimator.
- Design 1 continuous-near-d_lower: said "Proposition 3 / Theorem 4"
  -> should be Theorem 3 / Equation 11 (Theorem 4 is the QUG null
  test, not this estimand).

Fix: updated all three docstring sites to cite Theorem 1 / Equation 3
(Design 1' identification) + Equation 7 (sample estimator), and
Theorem 3 / Equation 11 (Design 1 continuous-near-d_lower / WAS_d_lower
under Assumption 6).

Targeted regression: 141 HAD tests + 520 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 TODO.md                                       |  1 +
 diff_diff/had.py                              | 33 ++++++++----
 docs/methodology/REGISTRY.md                  |  1 +
 .../papers/dechaisemartin-2026-review.md      |  2 +-
 tests/test_had.py                             | 52 +++++++++++++++++++
 5 files changed, 79 insertions(+), 10 deletions(-)

diff --git a/TODO.md b/TODO.md
index bd8be7a5..cd033be3 100644
--- a/TODO.md
+++ b/TODO.md
@@ -100,6 +100,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low |
 | `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
 | `HeterogeneousAdoptionDiD` staggered-timing reduction: Phase 2a requires exactly 2 time periods and raises on `>2` periods with or without `first_treat_col`. A "last-cohort subgroup" reduction scheme (slice to max-cohort's 2-period window) could lift this in a targeted follow-up PR before full Phase 2b multi-period aggregation. | `diff_diff/had.py::_validate_had_panel` | Phase 2a | Low |
+| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
 
 #### Performance
 
diff --git a/diff_diff/had.py b/diff_diff/had.py
index 6798b064..32eaf674 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -9,7 +9,8 @@
 :func:`diff_diff.utils.safe_inference`.
 
 1. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density
-   continuous at zero, Assumption 3. Equation 7 / Theorem 3:
+   continuous at zero, Assumption 3. Theorem 1 / Equation 3
+   (identification); Equation 7 (sample estimator):
 
        beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
 
@@ -19,7 +20,8 @@
 
 2. Design 1 continuous-near-d_lower (``continuous_near_d_lower``):
    ``d_lower > 0``, continuous boundary density, Assumption 5 or 6.
-   Proposition 3 / Theorem 4 (``WAS_{d_lower}`` under Assumption 6):
+   Theorem 3 / Equation 11 (``WAS_{d_lower}`` under Assumption 6;
+   Theorem 4 is the QUG null test, not this estimand):
 
        beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
               / E[D_2 - d_lower]
@@ -135,13 +137,14 @@ class HeterogeneousAdoptionDiDResults:
     att : float
         Point estimate of the WAS parameter on the beta-scale.
 
-        - Design 1' (paper Equation 7 / Theorem 3):
+        - Design 1' (paper Theorem 1 / Equation 3 identification;
+          Equation 7 sample estimator):
           ``att = (mean(ΔY) - tau_bc) / D_bar``
           where ``tau_bc`` is the bias-corrected local-linear estimate
           of ``lim_{d v 0} E[ΔY | D_2 <= d]`` and
           ``D_bar = (1/G) * sum(D_{g,2})``.
-        - Design 1 continuous-near-d_lower (paper Theorem 4,
-          ``WAS_{d_lower}`` under Assumption 6):
+        - Design 1 continuous-near-d_lower (paper Theorem 3 /
+          Equation 11, ``WAS_{d_lower}`` under Assumption 6):
           ``att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower)``
           where ``tau_bc`` is the bias-corrected local-linear estimate
           of ``lim_{d v d_lower} E[ΔY | D_2 <= d]``.
@@ -1006,6 +1009,17 @@ def fit(
     ) -> HeterogeneousAdoptionDiDResults:
         """Fit the HAD estimator on a two-period panel.
 
+        Phase 2a is **panel-only**: the paper (Section 2) defines HAD on
+        panel or repeated-cross-section data, but this implementation
+        requires a balanced two-period panel with a unit identifier so
+        that unit-level first differences ``ΔY_g = Y_{g,2} - Y_{g,1}``
+        can be formed. Repeated-cross-section inputs (disjoint unit IDs
+        between periods) are rejected by the balanced-panel validator.
+        Repeated-cross-section support is queued for a follow-up PR
+        (tracked in ``TODO.md``); it requires a separate identification
+        path based on pre/post cell means rather than unit-level
+        differences.
+
         Parameters
         ----------
         data : pd.DataFrame
@@ -1337,16 +1351,17 @@ def _fit_continuous(
         Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau
         (2026) continuous-design estimators:
 
-        - Design 1' (``continuous_at_zero``), paper Equation 7 /
-          Theorem 3:
+        - Design 1' (``continuous_at_zero``), paper Theorem 1 /
+          Equation 3 (identification); Equation 7 (sample estimator):
 
               beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
 
           Regressor passed to the local-linear boundary fit is
           ``d_arr``; the boundary is ``0``.
 
-        - Design 1 (``continuous_near_d_lower``), paper Proposition 3 /
-          Theorem 4 (``WAS_{d_lower}`` under Assumption 6):
+        - Design 1 (``continuous_near_d_lower``), paper Theorem 3 /
+          Equation 11 (``WAS_{d_lower}`` under Assumption 6; note
+          Theorem 4 is the QUG null test, not this estimand):
 
               beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
                      / E[D_2 - d_lower]
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 9944477c..e04bdb8c 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2324,6 +2324,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
     - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning.
     - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate.
     - **Note (Phase 2a scope):** Ships single-period only. `aggregate="event_study"` (Appendix B.2 multi-period extension) raises `NotImplementedError` pointing to Phase 2b. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR.
+    - **Note (Phase 2a panel-only):** The paper (Section 2) defines HAD on *panel or repeated cross-section* data, but Phase 2a ships a panel-only implementation: `HeterogeneousAdoptionDiD.fit()` requires a balanced two-period panel with a unit identifier so that unit-level first differences ΔY_g = Y_{g,2} - Y_{g,1} can be formed. Repeated-cross-section inputs (disjoint unit IDs between periods) are rejected by the existing balanced-panel validator with the "unit(s) do not appear in both periods" error. Repeated-cross-section support is queued for a follow-up PR (tracked in `TODO.md`); it will need a separate identification path based on pre/post cell means rather than unit-level differences.
 - [ ] Phase 2b: Multi-period event-study extension (Appendix B.2). Will lift the `aggregate="event_study"` scaffolding in `HeterogeneousAdoptionDiD.fit()` and aggregate per-cohort first-differences into an event-study result.
 - [ ] Phase 3: `qug_test()` (`T = D_{2,(1)} / (D_{2,(2)} - D_{2,(1)})`, rejection `{T > 1/α - 1}`).
 - [ ] Phase 3: `stute_test()` Cramér-von Mises with Mammen wild bootstrap.
diff --git a/docs/methodology/papers/dechaisemartin-2026-review.md b/docs/methodology/papers/dechaisemartin-2026-review.md
index 2e031091..8017b244 100644
--- a/docs/methodology/papers/dechaisemartin-2026-review.md
+++ b/docs/methodology/papers/dechaisemartin-2026-review.md
@@ -20,7 +20,7 @@
 **Key implementation requirements:**
 
 *Assumption checks / warnings:*
-- Data must be panel (or repeated cross-section) with `D_{g,1} = 0` for all `g` (nobody treated in period one).
+- Data must be panel (or repeated cross-section) with `D_{g,1} = 0` for all `g` (nobody treated in period one). **Phase 2a implementation note:** `diff_diff.HeterogeneousAdoptionDiD` ships panel-only; repeated-cross-section inputs are rejected by the balanced-panel validator. RCS support is queued for a follow-up PR.
 - Treatment dose `D_{g,2} >= 0`. For Design 1' (the QUG case) the support infimum `d̲ := inf Supp(D_{g,2})` must equal 0; for Design 1 (no QUG) `d̲ > 0` and Assumption 5 or 6 must be invoked.
 - Assumption 1 (i.i.d. sample): `(Y_{g,1}, Y_{g,2}, D_{g,1}, D_{g,2})_{g=1,...,G}` i.i.d.
 - Assumption 2 (parallel trends for the least-treated): `lim_{d ↓ d̲} E[ΔY(0) | D_2 ≤ d] = E[ΔY(0)]`. Testable with pre-trends when a pre-treatment period `t=0` exists. Reduces to standard parallel trends when treatment is binary.
diff --git a/tests/test_had.py b/tests/test_had.py
index 91b8d2e5..54ab3fa3 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1546,6 +1546,58 @@ def test_both_all_zero_periods_raises(self):
         with pytest.raises(ValueError, match="variation"):
             _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
 
+    def test_repeated_cross_section_raises(self):
+        """Review P1 round 6: Phase 2a is panel-only. An RCS input (disjoint
+        unit IDs across periods) must be rejected by the balanced-panel
+        validator with the "unit(s) do not appear in both periods" error.
+        """
+        rng = np.random.default_rng(0)
+        G = 100
+        pre = pd.DataFrame(
+            {
+                "unit": np.arange(G),
+                "period": 1,
+                "dose": np.zeros(G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        post = pd.DataFrame(
+            {
+                "unit": np.arange(G, 2 * G),
+                "period": 2,
+                "dose": rng.uniform(0, 1, G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        rcs = pd.concat([pre, post], ignore_index=True)
+        with pytest.raises(ValueError, match=r"both periods|[Uu]nbalanced"):
+            _validate_had_panel(rcs, "outcome", "dose", "period", "unit", None)
+
+    def test_repeated_cross_section_fit_raises(self):
+        """End-to-end: fit() on an RCS panel raises ValueError."""
+        rng = np.random.default_rng(0)
+        G = 100
+        pre = pd.DataFrame(
+            {
+                "unit": np.arange(G),
+                "period": 1,
+                "dose": np.zeros(G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        post = pd.DataFrame(
+            {
+                "unit": np.arange(G, 2 * G),
+                "period": 2,
+                "dose": rng.uniform(0, 1, G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        rcs = pd.concat([pre, post], ignore_index=True)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"both periods|[Uu]nbalanced"):
+            est.fit(rcs, "outcome", "dose", "period", "unit")
+
 
 # =============================================================================
 # Review P1: continuous_near_d_lower on a true mass-point sample rejects

From 792997dd2c40b5d844aae52b0420d9212333a73d Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 20:32:29 -0400
Subject: [PATCH 10/17] Address PR #346 CI review round 7: P1 defer cluster
 validation + P3 registry refs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

**P1 (Code Quality): cluster= must truly be ignored on continuous paths**

`HeterogeneousAdoptionDiD.fit()` previously passed `self.cluster` into
`_aggregate_first_difference()` before the design was resolved. The
aggregator validates the cluster column eagerly (missing column,
within-unit variance, NaN ID), so a valid continuous fit could abort
just because a shared config supplied an irrelevant `cluster=`. This
contradicted the documented "ignored with a warning on continuous
paths" contract.

Fix: defer cluster extraction until after design resolution. The
first aggregation call now passes `cluster_col=None` unconditionally;
a second aggregation pass with `cluster_col=cluster_arg` runs only
when `resolved_design == "mass_point"`, which is the only path that
consumes the extracted cluster array. Continuous paths emit the
existing `UserWarning` and proceed to fit without touching the
cluster column at all.

**P3 (Methodology): registry checklist theorem references were stale**

Round 6 fixed the theorem citations in `had.py` and the paper review
doc but missed the Phase 2a checklist line in `REGISTRY.md`, which
still said "Equation 7 / Theorem 3" for Design 1' identification and
"Theorem 4, WAS_{d̲} under Assumption 6" for the continuous-near-d_lower
path. Updated the checklist line to match: Theorem 1 / Equation 3
(identification) + Equation 7 (sample estimator) for Design 1'; Theorem
3 / Equation 11 for WAS_{d̲}.

**Tests (+4 regression):**
- test_missing_cluster_column_on_continuous_only_warns: continuous_at_zero
  + cluster='does_not_exist' -> warn + fit succeeds.
- test_nan_cluster_on_continuous_only_warns: NaN cluster IDs on continuous
  path -> warn + fit succeeds.
- test_within_unit_varying_cluster_on_continuous_only_warns: within-unit-
  varying cluster IDs on continuous -> warn + fit succeeds.
- test_auto_design_ignores_irrelevant_cluster_on_continuous: design='auto'
  resolving to continuous_at_zero also ignores cluster gracefully.

Targeted regression: 145 HAD tests + 524 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py             | 30 ++++++++++++++++++---
 docs/methodology/REGISTRY.md |  2 +-
 tests/test_had.py            | 52 ++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 32eaf674..396781f6 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -1078,8 +1078,13 @@ def fit(
             data, outcome_col, dose_col, time_col, unit_col, first_treat_col
         )
 
-        # ---- Aggregate to unit-level first differences ----
-        d_arr, dy_arr, cluster_arr, _ = _aggregate_first_difference(
+        # ---- Aggregate to unit-level first differences (no cluster yet) ----
+        # Defer cluster validation/extraction until after the design is
+        # resolved: the continuous paths ignore cluster= with a warning,
+        # so a malformed or irrelevant cluster column must not abort a
+        # valid continuous fit. Cluster extraction is re-run below only
+        # when resolved_design == "mass_point".
+        d_arr, dy_arr, _, _ = _aggregate_first_difference(
             data,
             outcome_col,
             dose_col,
@@ -1087,7 +1092,7 @@ def fit(
             unit_col,
             t_pre,
             t_post,
-            cluster_arg,
+            None,
         )
 
         n_obs = int(d_arr.shape[0])
@@ -1103,6 +1108,25 @@ def fit(
         else:
             resolved_design = design_arg
 
+        # ---- Extract cluster IDs (mass-point path only) ----
+        # Continuous paths ignore cluster= with a warning emitted later in
+        # the dispatch block; the cluster column is not read for them. On
+        # the mass-point path we now re-run the aggregation with
+        # cluster_col so validation (missing column / NaN / within-unit
+        # variance) fires only when cluster is actually going to be used.
+        cluster_arr: Optional[np.ndarray] = None
+        if resolved_design == "mass_point" and cluster_arg is not None:
+            _, _, cluster_arr, _ = _aggregate_first_difference(
+                data,
+                outcome_col,
+                dose_col,
+                time_col,
+                unit_col,
+                t_pre,
+                t_post,
+                cluster_arg,
+            )
+
         # ---- Resolve d_lower ----
         if resolved_design == "continuous_at_zero":
             d_lower_val = 0.0
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index e04bdb8c..0c425a6b 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2316,7 +2316,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
 - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4.
 - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246).
 - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug.
-- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Equation 7 / Theorem 3) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 4, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
+- [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
 - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples.
 - [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand.
 - [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs.
diff --git a/tests/test_had.py b/tests/test_had.py
index 54ab3fa3..aff5abda 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1356,6 +1356,58 @@ def test_cluster_name_none_without_cluster(self):
         )
         assert r.cluster_name is None
 
+    def test_missing_cluster_column_on_continuous_only_warns(self):
+        """Review P1 round 7: irrelevant cluster on continuous path must not
+        abort the fit. The cluster column doesn't even need to exist.
+        """
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="does_not_exist")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+        assert r.cluster_name is None
+
+    def test_nan_cluster_on_continuous_only_warns(self):
+        """NaN cluster IDs on continuous path must not abort the fit."""
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        cluster_unit = np.repeat(np.arange(100).astype(float), 2)
+        cluster_unit[0] = np.nan
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+
+    def test_within_unit_varying_cluster_on_continuous_only_warns(self):
+        """Within-unit-varying cluster IDs on continuous path must not abort."""
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        # Varies within unit (distinct value per row)
+        panel["state"] = np.arange(len(panel))
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", cluster="state")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert np.isfinite(r.att)
+
+    def test_auto_design_ignores_irrelevant_cluster_on_continuous(self):
+        """design='auto' resolving to a continuous path must also ignore cluster."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="auto", cluster="does_not_exist")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            assert any("cluster" in str(warn.message).lower() for warn in w)
+        assert r.design == "continuous_at_zero"
+        assert np.isfinite(r.att)
+
 
 # =============================================================================
 # First-difference aggregation helper

From 1eda6148f38b7391b4ca7c481e8829d4651e4a27 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 20 Apr 2026 20:51:03 -0400
Subject: [PATCH 11/17] Address PR #346 CI review round 8: P1 snap d_lower + P2
 row-level validation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

**P1 (Methodology): snap tolerance-accepted d_lower to float(d.min())**

The `d_lower == float(d.min())` within-tolerance contract accepted
user-supplied values that differed from the support infimum by float
rounding noise (1e-15). Downstream algebra is not tolerance-aware:

- `mass_point` + `d_lower = d.min() - ε`: `Z = d > d_lower` puts every
  mass-point unit (at exactly d.min()) into Z=1 because they are
  strictly > d_lower. The Z=0 control group empties and the Wald-IV
  ratio becomes undefined.
- `continuous_near_d_lower` + `d_lower = d.min() + ε`: the regressor
  shift `d - d_lower` produces a minimum of `-ε` on the shifted
  scale. Phase 1c's `_validate_had_inputs` then raises on the
  negative-dose guard, aborting an otherwise acceptable fit.

Fix: after the tolerance check passes on Design 1 paths, snap
`d_lower_val = float(d_arr.min())` before the mass-point instrument
construction, cohort counts, and the continuous-path regressor shift.
This preserves the advertised tolerance contract while keeping
downstream algebra exact.

**P2 (Code Quality): row-level NaN/domain validation**

The previous `first_treat_col` and `cluster=` validators collapsed
each unit to a single value via `groupby().first()` before checking
for NaN or out-of-domain values. pandas's `.first()` silently skips
NaN, so a unit with rows `[valid, NaN]` would pass a unit-level check
even though the raw NaN on the bad row should be rejected.

Fix: for both `first_treat_col` (in `_validate_had_panel`) and
`cluster=` (in `_aggregate_first_difference`), validate raw per-row
values BEFORE the per-unit collapse:
- Row-level NaN check on the raw column.
- Row-level domain check on raw values (for first_treat_col).
- Per-unit-constancy check using `nunique(dropna=False)` so a
  `[value, NaN]` within a unit registers as 2 distinct values.

**Tests (+5 regression):**
- test_mass_point_d_lower_below_min_within_tolerance_snaps: asserts
  d_lower = d.min() - 1e-15 collapses ULP-identically to the exact
  case across att, se, n_mass_point, n_above_d_lower.
- test_continuous_near_d_lower_above_within_tolerance_snaps: asserts
  d_lower = d.min() + 1e-15 collapses to the exact case (no
  negative-shift rejection).
- test_first_treat_col_mixed_row_nan_raises: unit with [valid, NaN]
  rows raises.
- test_first_treat_col_mixed_row_invalid_value_raises: unit with
  [valid, 999] rows raises with the 999 in the error message.
- test_mixed_row_nan_cluster_raises_on_mass_point: mass-point with
  cluster column holding [valid, NaN] per unit raises.

Targeted regression: 150 HAD tests + 529 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 70 +++++++++++++++++++++++++++++---------
 tests/test_had.py | 86 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 140 insertions(+), 16 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 396781f6..4a8edb28 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -508,15 +508,25 @@ def _validate_had_panel(
     # {0, t_post} so that string-labelled periods (e.g., ("A", "B")) with
     # first_treat in {0, "B"} are supported.
     if first_treat_col is not None:
-        ft_series = data.groupby(unit_col)[first_treat_col].first()
-        if bool(ft_series.isna().any()):
+        # Row-level NaN check: `groupby().first()` skips NaNs silently, so a
+        # unit with rows [valid, NaN] would pass a collapsed check. Validate
+        # raw per-row values first, then verify per-unit constancy with
+        # `nunique(dropna=False)` so within-unit NaN variation is caught.
+        ft_raw = data[first_treat_col]
+        if bool(ft_raw.isna().any()):
+            n_nan = int(ft_raw.isna().sum())
             raise ValueError(
-                f"first_treat_col={first_treat_col!r} contains NaN. "
-                f"Use 0 for never-treated units and t_post for treated."
+                f"first_treat_col={first_treat_col!r} contains "
+                f"{n_nan} NaN value(s) at the row level. Use 0 for "
+                f"never-treated units and t_post for treated, and drop "
+                f"or impute any NaN rows before calling fit()."
             )
+        # Row-level domain check: every row (not just the collapsed first())
+        # must be in {0, t_post}. Catches mixed-row malformed inputs where
+        # a unit has [valid, invalid].
         valid_values = {0, t_post}
-        observed = set(ft_series.unique().tolist())
-        bad = sorted(observed - valid_values, key=lambda x: str(x))
+        observed_raw = set(ft_raw.unique().tolist())
+        bad = sorted(observed_raw - valid_values, key=lambda x: str(x))
         if bad:
             raise ValueError(
                 f"first_treat_col={first_treat_col!r} contains value(s) "
@@ -524,6 +534,18 @@ def _validate_had_panel(
                 f"two-period HAD panel. Staggered timing with multiple "
                 f"cohorts is Phase 2b."
             )
+        # Within-unit consistency: every unit must have a single
+        # first_treat value across its rows. Uses dropna=False so a unit
+        # with [value, NaN] counts as 2 unique values (caught above by
+        # the NaN check anyway, but this is belt-and-suspenders).
+        ft_per_unit_nunique = data.groupby(unit_col)[first_treat_col].nunique(dropna=False)
+        if (ft_per_unit_nunique > 1).any():
+            n_bad = int((ft_per_unit_nunique > 1).sum())
+            raise ValueError(
+                f"first_treat_col={first_treat_col!r} is not constant "
+                f"within unit for {n_bad} unit(s). Each unit must have "
+                f"a single first_treat value across both observed periods."
+            )
 
     return t_pre, t_post
 
@@ -574,8 +596,22 @@ def _aggregate_first_difference(
     if cluster_col is not None:
         if cluster_col not in data.columns:
             raise ValueError(f"cluster column {cluster_col!r} not found in data.")
-        # Cluster must be unit-constant; take the first cluster id per unit.
-        cluster_per_unit = df.groupby(unit_col)[cluster_col].nunique()
+        # Row-level NaN check: `groupby().first()` skips NaNs silently, so a
+        # unit with rows [valid, NaN] would pass a collapsed check. Validate
+        # raw per-row values first so any NaN in the cluster column is
+        # rejected up-front, not masked by the per-unit collapse.
+        cluster_raw = data[cluster_col]
+        if bool(cluster_raw.isna().any()):
+            n_nan = int(cluster_raw.isna().sum())
+            raise ValueError(
+                f"cluster column {cluster_col!r} contains {n_nan} NaN "
+                f"value(s) at the row level. Silent row dropping is "
+                f"disabled; drop or impute cluster ids before calling fit()."
+            )
+        # Cluster must be unit-constant. `nunique(dropna=False)` counts NaN
+        # as a distinct value so rows like [value, NaN] register as 2
+        # uniques (also caught by the row-level NaN check above).
+        cluster_per_unit = df.groupby(unit_col)[cluster_col].nunique(dropna=False)
         if (cluster_per_unit > 1).any():
             n_bad = int((cluster_per_unit > 1).sum())
             raise ValueError(
@@ -585,14 +621,6 @@ def _aggregate_first_difference(
                 f"where each unit belongs to a single cluster)."
             )
         cluster_arr = df.groupby(unit_col)[cluster_col].first().sort_index().to_numpy()
-        # NaN cluster ids are a silent-failure vector; reject front-door.
-        if pd.isna(cluster_arr).any():
-            n_nan = int(pd.isna(cluster_arr).sum())
-            raise ValueError(
-                f"{n_nan} unit(s) have NaN cluster id in "
-                f"column {cluster_col!r}. Silent row dropping is "
-                f"disabled; drop or impute cluster ids before calling fit()."
-            )
 
     return d_arr, dy_arr, cluster_arr, unit_ids
 
@@ -1233,6 +1261,16 @@ def fit(
                     f"for mass_point, off-support for continuous_near_"
                     f"d_lower) estimand that is out of Phase 2a scope."
                 )
+            # Snap tolerance-accepted overrides back to the exact support
+            # infimum. Float-rounding drift matters downstream: on the
+            # mass-point path, `Z = d > d_lower` with d_lower = d.min() - ε
+            # puts the mass-point units into Z=1 (control group empties);
+            # on the continuous-near-d_lower path, d_lower = d.min() + ε
+            # makes `d - d_lower` negative and trips Phase 1c's
+            # _validate_had_inputs negative-dose guard. Snapping preserves
+            # the "within tolerance" contract while keeping downstream
+            # algebra exact.
+            d_lower_val = d_min
 
         # ---- Compute cohort counts ----
         if resolved_design == "mass_point":
diff --git a/tests/test_had.py b/tests/test_had.py
index aff5abda..d4ca7d1a 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -659,6 +659,36 @@ def test_first_treat_col_invalid_cohort_raises(self):
                 first_treat_col="ft",
             )
 
+    def test_first_treat_col_mixed_row_nan_raises(self):
+        """Review P2 round 8: per-unit rows like [valid, NaN] must be rejected.
+
+        `groupby().first()` silently skips NaNs; a unit with [0, NaN]
+        collapses to first_treat=0 and a unit-level NaN check would
+        pass. Row-level validation must catch the NaN on the bad row.
+        """
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        # Unit-level first_treat all zero (never-treated); inject a NaN on
+        # exactly the second row of unit 0 (t_post row).
+        panel["ft"] = 0.0
+        unit0_post_idx = panel[(panel["unit"] == 0) & (panel["period"] == 2)].index[0]
+        panel.loc[unit0_post_idx, "ft"] = np.nan
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="NaN"):
+            est.fit(panel, "outcome", "dose", "period", "unit", first_treat_col="ft")
+
+    def test_first_treat_col_mixed_row_invalid_value_raises(self):
+        """Per-unit rows like [valid, invalid_value] must be rejected."""
+        d, dy = _dgp_continuous_at_zero(200, seed=0)
+        panel = _make_panel(d, dy)
+        panel["ft"] = 0.0
+        # Inject an out-of-domain value on unit 0's post-period row.
+        unit0_post_idx = panel[(panel["unit"] == 0) & (panel["period"] == 2)].index[0]
+        panel.loc[unit0_post_idx, "ft"] = 999.0
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"first_treat_col.*999"):
+            est.fit(panel, "outcome", "dose", "period", "unit", first_treat_col="ft")
+
 
 # =============================================================================
 # Criterion 8: NaN propagation
@@ -1189,6 +1219,47 @@ def test_mass_point_d_lower_within_tolerance_succeeds(self):
         r = est.fit(panel, "outcome", "dose", "period", "unit")
         assert np.isfinite(r.att)
 
+    def test_mass_point_d_lower_below_min_within_tolerance_snaps(self):
+        """Review P1 round 8: tolerance-accepted d_lower = d.min() - ε must
+        be SNAPPED to d.min() so the instrument Z = d > d_lower matches
+        the exact-minimum case; otherwise mass-point units would fall
+        into Z=1 and empty the control group.
+        """
+        d, dy = _dgp_mass_point(500, seed=0)
+        panel = _make_panel(d, dy)
+        d_lower_below = float(d.min()) - 1e-15
+        r_below = HeterogeneousAdoptionDiD(design="mass_point", d_lower=d_lower_below).fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        r_exact = HeterogeneousAdoptionDiD(design="mass_point", d_lower=float(d.min())).fit(
+            panel, "outcome", "dose", "period", "unit"
+        )
+        # Behavior must be identical within ULP (the snap collapses them).
+        assert r_below.att == r_exact.att
+        assert r_below.se == r_exact.se
+        assert r_below.n_mass_point == r_exact.n_mass_point
+        assert r_below.n_above_d_lower == r_exact.n_above_d_lower
+
+    def test_continuous_near_d_lower_above_within_tolerance_snaps(self):
+        """Review P1 round 8: tolerance-accepted d_lower = d.min() + ε on
+        continuous_near_d_lower must be SNAPPED so the regressor shift
+        `d - d_lower` does not produce negative doses and trip Phase 1c's
+        _validate_had_inputs negative-dose guard.
+        """
+        d, dy = _dgp_continuous_near_d_lower(500, seed=0)
+        panel = _make_panel(d, dy)
+        d_lower_above = float(d.min()) + 1e-15
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r_above = HeterogeneousAdoptionDiD(
+                design="continuous_near_d_lower", d_lower=d_lower_above
+            ).fit(panel, "outcome", "dose", "period", "unit")
+            r_exact = HeterogeneousAdoptionDiD(
+                design="continuous_near_d_lower", d_lower=float(d.min())
+            ).fit(panel, "outcome", "dose", "period", "unit")
+        assert r_above.att == r_exact.att
+        assert r_above.se == r_exact.se
+
     def test_continuous_near_d_lower_above_min_raises(self):
         """Review P1: continuous_near_d_lower must also enforce support infimum."""
         d, dy = _dgp_continuous_near_d_lower(500, seed=0)
@@ -1329,6 +1400,21 @@ def test_nan_cluster_raises(self):
         with pytest.raises(ValueError, match="NaN"):
             est.fit(panel, "outcome", "dose", "period", "unit")
 
+    def test_mixed_row_nan_cluster_raises_on_mass_point(self):
+        """Review P2 round 8: a unit with rows [valid, NaN] on mass-point
+        must be rejected by row-level validation, not masked by
+        `groupby().first()`.
+        """
+        d, dy = _dgp_mass_point(100, seed=0)
+        cluster_unit = np.repeat(np.arange(50).astype(float), 2)  # all valid
+        panel = _make_panel(d, dy, extra_cols={"state": cluster_unit})
+        # Inject NaN only on the second row (t_post) of unit 0.
+        unit0_post_idx = panel[(panel["unit"] == 0) & (panel["period"] == 2)].index[0]
+        panel.loc[unit0_post_idx, "state"] = np.nan
+        est = HeterogeneousAdoptionDiD(design="mass_point", cluster="state")
+        with pytest.raises(ValueError, match="NaN"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
     def test_cluster_warns_on_continuous_path(self):
         d, dy = _dgp_continuous_at_zero(200, seed=0)
         cluster_unit = np.repeat(np.arange(100), 2)  # length 200 unit-level

From a3553ed287cf8695404aa90368c0839f4c9e6622 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 21 Apr 2026 05:53:43 -0400
Subject: [PATCH 12/17] Address PR #346 CI review round 9: P2 robust warning +
 P3 docstring
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Review 9 is overall ✅ Looks good; this PR addresses the remaining
P2 and two P3s to close the loop cleanly.

**P2 (Code Quality): `robust=True` silently ignored on continuous paths**

The continuous dispatch block warns on `vcov_type` and `cluster` when
they are supplied on a path that ignores them, but `robust=True`
slipped through the same check. Since `robust=True` on the mass-point
path is the backward-compat alias for `vcov_type="hc1"`, a user
passing `robust=True` on `continuous_at_zero` / `continuous_near_d_lower`
would reasonably expect it to do something; in reality the continuous
paths always use the CCT-2014 robust SE from Phase 1c and the flag
has no effect.

Fix: extend the ignore-warning block to emit a `UserWarning` when
`robust=True` and `resolved_design` is continuous.

**P3 (Maintainability): _validate_had_panel return docstring stale**

The Returns section said ``(t_pre, t_post)`` is returned in "min then
max" order, but the implementation now uses the HAD dose invariant
(the all-zero-dose period is `t_pre` regardless of label order).
Updated the docstring to describe the identified semantic order and
the arbitrary-dtype-label support (int, str, datetime).

**Tests (+2 regression):**
- test_robust_true_ignored_on_continuous_warns: `robust=True` on
  continuous path emits a warning containing "robust".
- test_robust_false_silent_on_continuous: `robust=False` (the default)
  on continuous path does NOT emit the `robust=True is ignored`
  warning, keeping the default construction silent.

Targeted regression: 152 HAD tests + 531 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 18 ++++++++++++++++--
 tests/test_had.py | 31 +++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 4a8edb28..97c8a799 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -385,8 +385,12 @@ def _validate_had_panel(
 
     Returns
     -------
-    tuple[int, int]
-        ``(t_pre, t_post)`` - the two period identifiers, min then max.
+    tuple[Any, Any]
+        ``(t_pre, t_post)`` - the two period identifiers identified by
+        the HAD dose invariant (``t_pre`` is the period with dose == 0
+        for all units; ``t_post`` is the other period). Supports
+        arbitrary-dtype period labels (int, str, datetime, etc.) rather
+        than relying on ordinal / lexicographic sort.
 
     Raises
     ------
@@ -1324,6 +1328,16 @@ def fit(
                     UserWarning,
                     stacklevel=2,
                 )
+            if robust_arg:
+                warnings.warn(
+                    f"robust=True is ignored on the '{resolved_design}' "
+                    f"path (the continuous designs use the CCT-2014 "
+                    f"robust SE from Phase 1c unconditionally; the "
+                    f"robust flag is a mass-point-only backward-compat "
+                    f"alias for vcov_type).",
+                    UserWarning,
+                    stacklevel=2,
+                )
             if cluster_arg is not None:
                 warnings.warn(
                     f"cluster={cluster_arg!r} is ignored on the "
diff --git a/tests/test_had.py b/tests/test_had.py
index d4ca7d1a..449a1256 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -529,6 +529,37 @@ def test_vcov_type_ignored_on_continuous(self):
             assert any("ignored" in str(warn.message).lower() for warn in w)
         assert np.isfinite(r.att)
 
+    def test_robust_true_ignored_on_continuous_warns(self):
+        """Review P2 round 9: robust=True on continuous path must warn.
+
+        The continuous designs use the CCT-2014 robust SE unconditionally;
+        robust= is a mass-point-only backward-compat alias for vcov_type.
+        Passing robust=True on a continuous path has no effect on the
+        computed SE, so the user must get a warning that the flag was
+        ignored.
+        """
+        d, dy = _dgp_continuous_at_zero(300, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", robust=True)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            robust_warnings = [warn for warn in w if "robust" in str(warn.message).lower()]
+            assert len(robust_warnings) >= 1
+        assert np.isfinite(r.att)
+
+    def test_robust_false_silent_on_continuous(self):
+        """robust=False (the default) on continuous path emits no robust-warn."""
+        d, dy = _dgp_continuous_at_zero(300, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", robust=False)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            r = est.fit(panel, "outcome", "dose", "period", "unit")
+            robust_warnings = [warn for warn in w if "robust=True is ignored" in str(warn.message)]
+            assert len(robust_warnings) == 0
+        assert np.isfinite(r.att)
+
 
 # =============================================================================
 # Criterion 7: Panel-contract violations

From 98b82017c58f0b32bbcb29aad4a9b47e4d49c4f2 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 21 Apr 2026 19:35:21 -0400
Subject: [PATCH 13/17] Address PR #346 CI review round 10: P1
 sklearn-compatible get/set_params

**P1 (Code Quality): sklearn parameter contract**

The class docstring advertises `sklearn.base.clone(est)` compatibility,
but the actual methods did not match sklearn's
`BaseEstimator.get_params(deep=True)` / `set_params(**params)` surface:

- `get_params()` did not accept a `deep` keyword. sklearn's `clone`
  calls `get_params(deep=False)`, so any integration would have
  failed with a TypeError on the missing kwarg.
- `set_params()` validated keys with `hasattr(self, key)`. That would
  silently accept non-constructor attribute names like `fit`, and a
  typo or malicious kwargs dict could overwrite estimator methods.

Fix:
- `get_params(self, deep: bool = True)` matches sklearn's signature.
  `deep` is accepted for compat; this estimator has no nested
  sub-estimators, so `deep=True` and `deep=False` return the same
  dict. `del deep` documents the no-op explicitly and silences
  unused-arg linters.
- `set_params(**params)` now restricts to keys from `get_params()`.
  Non-constructor attribute names (including method names like `fit`
  and dunder/private attrs) raise `ValueError`.

**Tests (+4 regression):**
- test_set_params_rejects_method_names: `set_params(fit=...)` raises
  and `est.fit` stays callable.
- test_set_params_rejects_private_attrs: `_internal=42` raises.
- test_get_params_accepts_deep_keyword: `deep=True`, `deep=False`, and
  no-arg all return the same dict.
- test_sklearn_clone_round_trip_if_available: `sklearn.base.clone`
  round-trips the estimator; gated on `pytest.importorskip("sklearn")`
  so it skips cleanly when sklearn is not in the test matrix.

Targeted regression: 155 HAD tests (+ 1 skipped) + 534 total (+1
skipped) across Phase 1 and adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 32 ++++++++++++++++++++++++--------
 tests/test_had.py | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 97c8a799..865b7b93 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -994,13 +994,23 @@ def _validate_constructor_args(self) -> None:
                     f"or None."
                 )
 
-    def get_params(self) -> Dict[str, Any]:
+    def get_params(self, deep: bool = True) -> Dict[str, Any]:
         """Return the raw constructor parameters (sklearn-compatible).
 
-        Preserves the user's original inputs - in particular, ``design``
-        returns ``"auto"`` when the user set it to ``"auto"`` (even after
-        fit), so ``sklearn.base.clone(est)`` round-trips exactly.
+        Matches the :meth:`sklearn.base.BaseEstimator.get_params`
+        signature. Preserves the user's original inputs - in particular,
+        ``design`` returns ``"auto"`` when the user set it to ``"auto"``
+        (even after fit), so ``sklearn.base.clone(est)`` round-trips
+        exactly.
+
+        Parameters
+        ----------
+        deep : bool, default=True
+            Accepted for sklearn-contract compatibility. This estimator
+            has no nested sub-estimator parameters, so ``deep=False``
+            and ``deep=True`` return the same dict.
         """
+        del deep  # accepted for compat; this estimator has no nested params
         return {
             "design": self.design,
             "d_lower": self.d_lower,
@@ -1012,12 +1022,18 @@ def get_params(self) -> Dict[str, Any]:
         }
 
     def set_params(self, **params: Any) -> "HeterogeneousAdoptionDiD":
-        """Set estimator parameters and return self (sklearn-compatible)."""
+        """Set estimator parameters and return self (sklearn-compatible).
+
+        Only keys returned by :meth:`get_params` are accepted. Passing
+        any other attribute name (including method names like ``fit``)
+        raises ``ValueError`` so the estimator cannot be silently
+        corrupted by a mistyped or attacker-supplied key.
+        """
+        valid_keys = set(self.get_params().keys())
         for key, value in params.items():
-            if not hasattr(self, key):
+            if key not in valid_keys:
                 raise ValueError(
-                    f"Invalid parameter: {key}. Valid parameters: "
-                    f"{list(self.get_params().keys())}."
+                    f"Invalid parameter: {key!r}. Valid parameters: " f"{sorted(valid_keys)}."
                 )
             setattr(self, key, value)
         self._validate_constructor_args()
diff --git a/tests/test_had.py b/tests/test_had.py
index 449a1256..3c71ca41 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -849,6 +849,46 @@ def test_set_params_invalid_key_raises(self):
         with pytest.raises(ValueError, match="Invalid parameter"):
             est.set_params(not_a_param=True)
 
+    def test_set_params_rejects_method_names(self):
+        """Review P1 round 10: set_params must restrict to constructor keys,
+        not any hasattr-able name. Method names like 'fit' must raise,
+        else they would silently overwrite the method.
+        """
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="Invalid parameter"):
+            est.set_params(fit="not_a_method")
+        # sanity: fit is still callable on the class
+        assert callable(est.fit)
+
+    def test_set_params_rejects_private_attrs(self):
+        """Internal-looking attribute names must also raise."""
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match="Invalid parameter"):
+            est.set_params(_internal=42)
+
+    def test_get_params_accepts_deep_keyword(self):
+        """Review P1 round 10: get_params must match sklearn's signature.
+
+        sklearn.base.BaseEstimator.get_params(deep=True). This estimator
+        has no nested sub-estimators, so deep=True and deep=False return
+        the same dict, but the keyword must be accepted.
+        """
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", alpha=0.1)
+        params_default = est.get_params()
+        params_deep_true = est.get_params(deep=True)
+        params_deep_false = est.get_params(deep=False)
+        assert params_default == params_deep_true == params_deep_false
+
+    def test_sklearn_clone_round_trip_if_available(self):
+        """If sklearn is installed, sklearn.base.clone round-trips the estimator."""
+        sklearn_base = pytest.importorskip("sklearn.base")
+        est = HeterogeneousAdoptionDiD(design="auto", alpha=0.1, kernel="triangular")
+        cloned = sklearn_base.clone(est)
+        assert cloned.get_params() == est.get_params()
+        assert cloned is not est
+        # clone produces a fresh instance of the same class.
+        assert type(cloned) is type(est)
+
     def test_set_params_invalid_design_raises(self):
         est = HeterogeneousAdoptionDiD()
         with pytest.raises(ValueError, match="design"):

From c67505156b94fb557a69441349a43ee3d9b21c6b Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 21 Apr 2026 19:48:27 -0400
Subject: [PATCH 14/17] Address PR #346 CI review round 11: P2 atomic
 set_params + P3 doc example
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round 11 is ✅ Looks good overall; this PR closes the remaining P2 and
P3 for cleanliness.

**P2 (Code Quality): set_params must be atomic**

The previous implementation applied `setattr()` inside the validation
loop and called `_validate_constructor_args()` only afterward. A
multi-key call like `set_params(alpha=0.1, design="made_up")` would
mutate `self.alpha` before raising on the invalid design, leaving
the estimator half-updated. Callers that catch `ValueError` and
retry would see the wrong state.

Fix: dry-run validation via `type(self)(**merged)` before any
attribute write. If the constructor raises on the combined state,
`self` is not mutated. Mutations are applied only after validation
passes, so rejection leaves the estimator fully rollback-safe.

**P3 (Documentation/Tests): docstring example was non-runnable**

The class docstring example constructed data via
`generate_continuous_did_data()` which defaults to `n_periods=4`,
while `HeterogeneousAdoptionDiD.fit()` requires exactly two periods
in Phase 2a. A user copying the example verbatim would hit the
two-period rejection.

Fix: rewrote the example to build a two-period HAD panel by hand
with an explicit `D_{g,1} = 0` pre-period column, a unit-level
`dose_post`, and `delta_y` so the shape matches the estimator's
contract exactly. Kept `doctest: +SKIP` markers on the RNG and fit
lines.

**Tests (+3 rollback regressions):**
- test_set_params_rollback_on_failure: multi-key call with invalid
  design keeps estimator unchanged.
- test_set_params_rollback_on_invalid_key: unknown kwarg keeps
  estimator unchanged.
- test_set_params_rollback_on_invalid_alpha: out-of-range alpha
  keeps estimator unchanged.

Targeted regression: 158 HAD tests (+1 sklearn-gated skip) + 537
total (+1 skipped) across Phase 1 and adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 46 +++++++++++++++++++++++++++++++++++++---------
 tests/test_had.py | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 865b7b93..40cebe1d 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -940,12 +940,28 @@ class HeterogeneousAdoptionDiD:
 
     Examples
     --------
+    Construct a two-period HAD panel by hand. Phase 2a requires exactly
+    two periods with ``D_{g,1} = 0`` for every unit.
+
+    >>> import numpy as np
     >>> import pandas as pd
-    >>> from diff_diff import HeterogeneousAdoptionDiD, generate_continuous_did_data
-    >>> data = generate_continuous_did_data(n_units=500, seed=42)  # doctest: +SKIP
+    >>> from diff_diff import HeterogeneousAdoptionDiD
+    >>> rng = np.random.default_rng(42)  # doctest: +SKIP
+    >>> G = 500  # doctest: +SKIP
+    >>> dose_post = rng.uniform(0.0, 1.0, G)  # doctest: +SKIP
+    >>> dose_post[0] = 0.0  # at least one zero-dose unit for Design 1'
+    >>> delta_y = 0.3 * dose_post + 0.1 * rng.standard_normal(G)  # doctest: +SKIP
+    >>> data = pd.DataFrame({  # doctest: +SKIP
+    ...     "unit": np.repeat(np.arange(G), 2),
+    ...     "period": np.tile([1, 2], G),
+    ...     "dose": np.column_stack([np.zeros(G), dose_post]).ravel(),
+    ...     "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(),
+    ... })
     >>> est = HeterogeneousAdoptionDiD(design="auto")  # doctest: +SKIP
-    >>> result = est.fit(data, outcome_col="outcome", dose_col="dose",
-    ...                  time_col="period", unit_col="unit")  # doctest: +SKIP
+    >>> result = est.fit(  # doctest: +SKIP
+    ...     data, outcome_col="outcome", dose_col="dose",
+    ...     time_col="period", unit_col="unit",
+    ... )
     >>> result.design  # doctest: +SKIP
     'continuous_at_zero'
     """
@@ -1028,15 +1044,27 @@ def set_params(self, **params: Any) -> "HeterogeneousAdoptionDiD":
         any other attribute name (including method names like ``fit``)
         raises ``ValueError`` so the estimator cannot be silently
         corrupted by a mistyped or attacker-supplied key.
+
+        Mutation is ATOMIC: validation runs on a proposed merged
+        parameter dict before any attribute is overwritten. A failing
+        call (invalid key, or an otherwise valid key whose value
+        violates the constructor constraints) leaves ``self`` unchanged
+        and safe to reuse.
         """
         valid_keys = set(self.get_params().keys())
+        invalid = [k for k in params if k not in valid_keys]
+        if invalid:
+            raise ValueError(
+                f"Invalid parameter: {invalid[0]!r}. Valid parameters: " f"{sorted(valid_keys)}."
+            )
+        # Dry-run validation by constructing a fresh instance with the
+        # merged state. If the constructor raises, self is not mutated.
+        merged = self.get_params()
+        merged.update(params)
+        type(self)(**merged)  # raises ValueError on invalid combination
+        # All checks passed; apply atomically.
         for key, value in params.items():
-            if key not in valid_keys:
-                raise ValueError(
-                    f"Invalid parameter: {key!r}. Valid parameters: " f"{sorted(valid_keys)}."
-                )
             setattr(self, key, value)
-        self._validate_constructor_args()
         return self
 
     # ------------------------------------------------------------------
diff --git a/tests/test_had.py b/tests/test_had.py
index 3c71ca41..343c2381 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -894,6 +894,38 @@ def test_set_params_invalid_design_raises(self):
         with pytest.raises(ValueError, match="design"):
             est.set_params(design="made_up")
 
+    def test_set_params_rollback_on_failure(self):
+        """Review P2 round 11: set_params must be ATOMIC.
+
+        A failing call (valid key but value violates constructor
+        constraints) must leave the estimator unchanged so the caller
+        can catch the ValueError and reuse the object.
+        """
+        est = HeterogeneousAdoptionDiD(alpha=0.05, design="continuous_at_zero")
+        baseline = est.get_params()
+        # Multi-key call where alpha is valid but design is invalid.
+        # The old (non-atomic) code would have set alpha before raising
+        # on design, leaving the estimator half-mutated.
+        with pytest.raises(ValueError):
+            est.set_params(alpha=0.1, design="garbage_design")
+        assert est.get_params() == baseline
+
+    def test_set_params_rollback_on_invalid_key(self):
+        """Rejecting an unknown key must leave self unchanged."""
+        est = HeterogeneousAdoptionDiD(alpha=0.05)
+        baseline = est.get_params()
+        with pytest.raises(ValueError):
+            est.set_params(alpha=0.1, not_a_param=True)
+        assert est.get_params() == baseline
+
+    def test_set_params_rollback_on_invalid_alpha(self):
+        """alpha outside (0, 1) must leave self unchanged."""
+        est = HeterogeneousAdoptionDiD(alpha=0.05, design="continuous_at_zero")
+        baseline = est.get_params()
+        with pytest.raises(ValueError):
+            est.set_params(alpha=1.5, kernel="triangular")
+        assert est.get_params() == baseline
+
 
 # =============================================================================
 # Criterion 10: Scaffolding raises

From 07ab30f0b736d3a5fea6b1df97273412984e4ba5 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 21 Apr 2026 19:58:58 -0400
Subject: [PATCH 15/17] Address PR #346 CI review round 12: P1 reject nonzero
 d_lower on continuous_at_zero

**P1 (Methodology): continuous_at_zero silently ignored nonzero d_lower**

`HeterogeneousAdoptionDiD.fit()` unconditionally set `d_lower_val = 0.0`
on the `continuous_at_zero` path, so calls like
`HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.5)`
silently fit Design 1' at zero. This contradicted paper Section 3.2's
Design 1' regime contract (`d_lower = 0`) and the class docstring's
promise that mismatched overrides raise. It also created a parameter-
contract mismatch: `get_params()` reported the user's nonzero
`d_lower` while the fitted result reported `d_lower=0.0`.

Fix: after `resolved_design == "continuous_at_zero"` is known, check
whether the user supplied an explicit `d_lower`. If the absolute value
exceeds the float-tolerance band (same family as the Design 1
d_lower guards below), raise `ValueError` pointing at
`continuous_near_d_lower` / `mass_point` / `auto`. `d_lower=None`
(auto-resolve) and `d_lower=0.0` (redundant but benign) continue to
succeed.

**Tests (+4 regression):**
- test_continuous_at_zero_with_nonzero_d_lower_raises: d_lower=0.5
  raises with "d_lower == 0" / "Design 1'" pointer.
- test_continuous_at_zero_with_small_d_lower_raises: d_lower=0.01 also
  raises (not just large overrides).
- test_continuous_at_zero_with_zero_d_lower_succeeds: d_lower=0.0
  exactly is accepted (redundant-but-valid case).
- test_auto_on_zero_sample_ignores_user_d_lower: design='auto'
  resolving to continuous_at_zero must ALSO reject explicit nonzero
  d_lower, not silently drop it.

Targeted regression: 162 HAD tests (+1 sklearn-gated skip) + 541
total (+1 skipped) across Phase 1 and adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py  | 18 ++++++++++++++++++
 tests/test_had.py | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index 40cebe1d..e0ea227c 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -1205,6 +1205,24 @@ def fit(
 
         # ---- Resolve d_lower ----
         if resolved_design == "continuous_at_zero":
+            # Design 1' regime (paper Section 3.2) is defined at d_lower = 0.
+            # Reject explicit nonzero d_lower overrides front-door rather
+            # than silently coerce to zero. Tolerance family matches the
+            # Design 1 d_lower guards below.
+            if d_lower_arg is not None:
+                scale = max(1.0, float(np.max(np.abs(d_arr))))
+                if abs(float(d_lower_arg)) > 1e-12 * scale:
+                    raise ValueError(
+                        f"design='continuous_at_zero' (Design 1') requires "
+                        f"d_lower == 0 within float tolerance (paper Section "
+                        f"3.2 Design 1' regime). Got d_lower="
+                        f"{float(d_lower_arg)!r}. For d_lower > 0 use "
+                        f"design='continuous_near_d_lower' (local-linear "
+                        f"boundary-limit estimator) or design='mass_point' "
+                        f"(2SLS) as appropriate, or design='auto' which "
+                        f"auto-detects the correct path from the dose "
+                        f"distribution."
+                    )
             d_lower_val = 0.0
         elif d_lower_arg is None:
             d_lower_val = float(d_arr.min())
diff --git a/tests/test_had.py b/tests/test_had.py
index 343c2381..2aaddf36 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1272,6 +1272,46 @@ def test_force_mass_point_d_lower_none_on_zero_sample_raises(self):
         with pytest.raises(ValueError, match=r"d_lower > 0"):
             est.fit(panel, "outcome", "dose", "period", "unit")
 
+    def test_continuous_at_zero_with_nonzero_d_lower_raises(self):
+        """Review P1 round 12: continuous_at_zero must reject nonzero d_lower.
+
+        Paper Section 3.2 Design 1' is defined at d_lower = 0; silently
+        coercing a user-supplied d_lower=0.5 to zero would contradict
+        the documented regime contract.
+        """
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.5)
+        with pytest.raises(ValueError, match=r"d_lower == 0|Design 1'"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_continuous_at_zero_with_small_d_lower_raises(self):
+        """Even a small nonzero d_lower should raise."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.01)
+        with pytest.raises(ValueError, match=r"d_lower == 0|Design 1'"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
+    def test_continuous_at_zero_with_zero_d_lower_succeeds(self):
+        """d_lower=0.0 exactly is fine (redundant but allowed)."""
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="continuous_at_zero", d_lower=0.0)
+        r = est.fit(panel, "outcome", "dose", "period", "unit")
+        assert r.d_lower == 0.0
+        assert np.isfinite(r.att)
+
+    def test_auto_on_zero_sample_ignores_user_d_lower(self):
+        """design='auto' resolving to continuous_at_zero must ALSO reject
+        an explicit nonzero d_lower, not silently drop it.
+        """
+        d, dy = _dgp_continuous_at_zero(500, seed=0)
+        panel = _make_panel(d, dy)
+        est = HeterogeneousAdoptionDiD(design="auto", d_lower=0.5)
+        with pytest.raises(ValueError, match=r"d_lower == 0|Design 1'"):
+            est.fit(panel, "outcome", "dose", "period", "unit")
+
 
 # =============================================================================
 # Design 1 d_lower contract enforcement (mass-point + continuous_near_d_lower)

From 2f6e1d0b3677ad0e5dd936818497bcd79358e24b Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 21 Apr 2026 20:21:07 -0400
Subject: [PATCH 16/17] Address PR #346 CI review round 13: P1 reject
 non-finite d_lower + P3 docstring

**P1 (Methodology): d_lower=NaN/inf bypassed comparison guards**

The Design 1 regime guards (`d_lower > 0`, `d_lower == d.min()` within
tolerance, `d_lower <= tol` for continuous_at_zero) all use
comparison operators that return False for NaN. A user passing
`d_lower=np.nan` therefore slipped through every check and ended up
at `0.0` (continuous_at_zero branch) or `d.min()` (the snap). On a
zero-support sample, that could silently re-enter the paper-
incompatible continuous_near_d_lower / mass_point path the earlier
review rounds had fixed.

Fix: reject non-finite d_lower front-door in
`_validate_constructor_args()`. This makes `__init__`, `set_params()`,
and the atomic dry-run validation all share the same contract:
`d_lower` must be None or a finite scalar. NaN and +/-inf raise
`ValueError`. Atomic `set_params()` confirms that a failing
`d_lower=NaN` call leaves the estimator unchanged.

**P3 (Documentation/Tests): safe_inference coverage overstated**

The result-class docstring and REGISTRY said "all 5 inference fields
are routed through safe_inference". In reality, `safe_inference`
only gates the downstream triple (t_stat, p_value, conf_int) - `att`
and `se` are raw estimator outputs. The fit paths themselves return
`(nan, nan)` on degenerate configurations, which combined with the
safe_inference gate produces the advertised "all five fields NaN
together" behavior, but the docstring wording was misleading.

Fix: rewrote the Results docstring preamble and the REGISTRY.md
Phase 2a NaN-propagation line to describe the two-layer NaN flow
(fit-path NaN on att/se + safe_inference on the downstream triple)
precisely.

**Tests (+4 regression):**
- test_d_lower_nan_raises: __init__ rejects NaN.
- test_d_lower_posinf_raises / test_d_lower_neginf_raises: +/-inf
  rejected.
- test_d_lower_nan_via_set_params_raises: set_params(d_lower=NaN)
  raises AND leaves the estimator unchanged (atomic rollback).

Targeted regression: 166 HAD tests (+1 sklearn-gated skip) + 545
total (+1 skipped) across Phase 1 and adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py             | 30 ++++++++++++++++++++++++++----
 docs/methodology/REGISTRY.md |  2 +-
 tests/test_had.py            | 22 ++++++++++++++++++++++
 3 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index e0ea227c..ff9e40a3 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -127,10 +127,17 @@
 class HeterogeneousAdoptionDiDResults:
     """Estimator output for :class:`HeterogeneousAdoptionDiD`.
 
-    All inference fields (``att``, ``se``, ``t_stat``, ``p_value``,
-    ``conf_int``) are routed through :func:`diff_diff.utils.safe_inference`
-    at fit time; when SE is non-finite, zero, or negative, every inference
-    field is NaN.
+    NaN-safe inference: the three downstream fields ``t_stat``,
+    ``p_value``, and ``conf_int`` are routed through
+    :func:`diff_diff.utils.safe_inference`, which returns NaN on all
+    three whenever ``se`` is non-finite, zero, or negative. ``att`` and
+    ``se`` themselves are raw estimator outputs - when the estimator's
+    fit path detects a degenerate configuration (constant outcome,
+    no-variation-above-``d_lower``, divide-by-zero), it returns
+    ``(att=nan, se=nan)`` directly, which the safe-inference gate then
+    propagates into the remaining three fields. Net effect: users can
+    safely check ``np.isfinite(result.se)`` and expect all five fields
+    to be finite or all NaN together on the current fit paths.
 
     Attributes
     ----------
@@ -992,6 +999,21 @@ def _validate_constructor_args(self) -> None:
             )
         if not (0.0 < float(self.alpha) < 1.0):
             raise ValueError(f"alpha must be in (0, 1); got {self.alpha!r}.")
+        # d_lower must be None or a finite scalar. NaN / +/-inf would
+        # bypass every downstream comparison-based guard (`>`, `<=`,
+        # tolerance-abs-diff) because those return False for NaN, so we
+        # front-door reject non-finite values here. This fires under
+        # __init__ and set_params (which dry-runs through the
+        # constructor), keeping the contract uniform across all entry
+        # points.
+        if self.d_lower is not None:
+            d_lower_float = float(self.d_lower)
+            if not np.isfinite(d_lower_float):
+                raise ValueError(
+                    f"d_lower must be None or a finite scalar; got "
+                    f"d_lower={self.d_lower!r}. NaN and +/-inf are not "
+                    f"valid support-infimum values."
+                )
         if self.vcov_type is not None:
             if self.vcov_type.lower() in _MASS_POINT_VCOV_UNSUPPORTED:
                 # Don't raise here — the raise happens at fit() time on
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 0c425a6b..fe2768af 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2319,7 +2319,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
 - [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
 - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples.
 - [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand.
-- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) via `safe_inference` and `assert_nan_inference` fixture, covering constant-y and degenerate mass-point inputs.
+- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) covering constant-y and degenerate mass-point inputs. The inner `safe_inference()` gate covers the downstream triple (`t_stat`, `p_value`, `conf_int`); `att` and `se` are set to `NaN` by the fit paths themselves when they detect a degenerate configuration (constant outcome, no-variation-above-`d_lower`, divide-by-zero), so the five fields move to `NaN` together and the `assert_nan_inference` fixture holds.
     - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR.
     - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning.
     - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate.
diff --git a/tests/test_had.py b/tests/test_had.py
index 2aaddf36..99d4b5cb 100644
--- a/tests/test_had.py
+++ b/tests/test_had.py
@@ -1224,6 +1224,28 @@ def test_d_lower_float_accepted(self):
         est = HeterogeneousAdoptionDiD(d_lower=0.3)
         assert est.d_lower == 0.3
 
+    def test_d_lower_nan_raises(self):
+        """Review P1 round 13: d_lower=NaN must be rejected in __init__."""
+        with pytest.raises(ValueError, match=r"d_lower.*finite"):
+            HeterogeneousAdoptionDiD(d_lower=float("nan"))
+
+    def test_d_lower_posinf_raises(self):
+        with pytest.raises(ValueError, match=r"d_lower.*finite"):
+            HeterogeneousAdoptionDiD(d_lower=float("inf"))
+
+    def test_d_lower_neginf_raises(self):
+        with pytest.raises(ValueError, match=r"d_lower.*finite"):
+            HeterogeneousAdoptionDiD(d_lower=float("-inf"))
+
+    def test_d_lower_nan_via_set_params_raises(self):
+        """d_lower=NaN through set_params must also raise (atomic rollback)."""
+        est = HeterogeneousAdoptionDiD(d_lower=0.3)
+        baseline = est.get_params()
+        with pytest.raises(ValueError, match=r"d_lower.*finite"):
+            est.set_params(d_lower=float("nan"))
+        # Atomic rollback: d_lower unchanged after failure.
+        assert est.get_params() == baseline
+
 
 # =============================================================================
 # Explicit design override (don't auto-reject)

From 22d7f65f1ea126702ea97fbef2d03f1f8e5658db Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Tue, 21 Apr 2026 20:33:34 -0400
Subject: [PATCH 17/17] Address PR #346 CI review round 14: P3 tighten
 NaN-contract wording
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round 14 is ✅ Looks good; this closes the final residual P3 for
wording accuracy.

**P3 (Docs): NaN-contract wording overstated**

The prior wording said users can expect all five result fields to be
finite or all NaN together. That is true on most degenerate paths
(constant-y, no-dose-variation) but NOT on the single-cluster CR1
configuration of the mass-point path: `_fit_mass_point_2sls` returns
`(att=beta_hat, se=nan)` there - `att` is finite because the Wald-IV
ratio is well defined, while CR1 requires G >= 2 clusters so `se`
is NaN. `safe_inference` then gates the downstream triple correctly,
but the doc claim that `att` and `se` are coupled with the triple
was inaccurate.

Fix: rewrote the result-class docstring preamble and the REGISTRY
Phase 2a NaN-propagation line to describe the invariant precisely:

- GUARANTEED NaN coupling is on the downstream triple (`t_stat`,
  `p_value`, `conf_int`) via `safe_inference`.
- `att` and `se` are RAW estimator outputs from the chosen fit path
  and are NOT gated by safe_inference.
- On most degenerate paths, fit paths return `(nan, nan)` so all
  five move together, but on the single-cluster CR1 edge the fit
  path returns `(att=beta_hat, se=nan)` and only the downstream
  triple becomes NaN via the gate.

No code behavior changes; the `assert_nan_inference` fixture already
only checks the downstream triple, matching the restated contract.

Targeted regression: 166 HAD tests (+1 sklearn-gated skip) + 545
total (+1 skipped) across Phase 1 and adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/had.py             | 27 ++++++++++++++++++++-------
 docs/methodology/REGISTRY.md |  2 +-
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/diff_diff/had.py b/diff_diff/had.py
index ff9e40a3..d9ba428c 100644
--- a/diff_diff/had.py
+++ b/diff_diff/had.py
@@ -131,13 +131,26 @@ class HeterogeneousAdoptionDiDResults:
     ``p_value``, and ``conf_int`` are routed through
     :func:`diff_diff.utils.safe_inference`, which returns NaN on all
     three whenever ``se`` is non-finite, zero, or negative. ``att`` and
-    ``se`` themselves are raw estimator outputs - when the estimator's
-    fit path detects a degenerate configuration (constant outcome,
-    no-variation-above-``d_lower``, divide-by-zero), it returns
-    ``(att=nan, se=nan)`` directly, which the safe-inference gate then
-    propagates into the remaining three fields. Net effect: users can
-    safely check ``np.isfinite(result.se)`` and expect all five fields
-    to be finite or all NaN together on the current fit paths.
+    ``se`` themselves are RAW estimator outputs from the chosen fit
+    path and are NOT gated by ``safe_inference``:
+
+    - On the degenerate fit configurations (constant outcome on the
+      continuous paths, all-units-at-d_lower / no-dose-variation on the
+      mass-point path), the fit path explicitly returns
+      ``(att=nan, se=nan)``, which combined with the safe-inference
+      gate yields all five fields NaN together.
+    - On the degenerate CR1 cluster configuration (mass-point path
+      with a single cluster), ``_fit_mass_point_2sls`` returns
+      ``(att=beta_hat, se=nan)`` - ``att`` stays finite because the
+      Wald-IV ratio is well defined, but the cluster-robust SE is
+      not, so ``se`` is NaN and the downstream triple becomes NaN
+      via the safe-inference gate.
+
+    So the guaranteed NaN coupling is on the downstream triple
+    (``t_stat``, ``p_value``, ``conf_int``), not on ``att``. The
+    ``assert_nan_inference`` fixture in ``tests/conftest.py`` checks
+    the downstream triple against the gate contract and does not
+    assume ``att`` is NaN.
 
     Attributes
     ----------
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index fe2768af..5055e2ec 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2319,7 +2319,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
 - [x] Phase 2a: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1' (`continuous_at_zero`), Design 1 continuous-near-`d̲` (`continuous_near_d_lower`), and Design 1 mass-point. Continuous paths compose Phase 1c's `bias_corrected_local_linear` and form the beta-scale WAS estimate `β̂ = (mean(ΔY) - τ̂_bc) / den` where `τ̂_bc` is the bias-corrected local-linear estimate of the boundary limit `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` and `den = E[D_2]` for Design 1' (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator) or `den = E[D_2 - d̲]` for Design 1 (paper Theorem 3 / Equation 11, `WAS_{d̲}` under Assumption 6). Mass-point path uses a sample-average 2SLS estimator with instrument `1{D_{g,2} > d̲}` (paper Section 3.2.4).
 - [x] Phase 2a: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower). Implemented as strict first-match in `diff_diff.had._detect_design`; when `d.min() == 0` exactly, resolves `continuous_at_zero` unconditionally (modal-min check runs only when `d.min() > 0`). Edge case covered: 3% at `D=0` + 97% `Uniform(0.5, 1)` resolves to `continuous_at_zero`, matching the paper-endorsed Design 1' handling of small-share-of-treated samples.
 - [x] Phase 2a: Panel validator (`diff_diff.had._validate_had_panel`) verifies `D_{g,1} = 0` for all units, rejects negative post-period doses (`D_{g,2} < 0`) front-door on the original (unshifted) scale, rejects `>2` time periods (staggered reduction queued for Phase 2b), and rejects unbalanced panels and NaN in outcome/dose/unit columns. Both Design 1 paths (`continuous_near_d_lower` and `mass_point`) additionally require `d_lower == float(d.min())` within float tolerance; mismatched overrides raise with a pointer to the unsupported (LATE-like / off-support) estimand.
-- [x] Phase 2a: NaN-propagation tests across all 5 inference fields (`att`, `se`, `t_stat`, `p_value`, `conf_int`) covering constant-y and degenerate mass-point inputs. The inner `safe_inference()` gate covers the downstream triple (`t_stat`, `p_value`, `conf_int`); `att` and `se` are set to `NaN` by the fit paths themselves when they detect a degenerate configuration (constant outcome, no-variation-above-`d_lower`, divide-by-zero), so the five fields move to `NaN` together and the `assert_nan_inference` fixture holds.
+- [x] Phase 2a: NaN-propagation tests covering constant-y, degenerate-mass-point, and single-cluster-CR1 inputs. The guaranteed NaN coupling is on the DOWNSTREAM triple (`t_stat`, `p_value`, `conf_int`) via the `safe_inference()` gate, which returns NaN on all three whenever `se` is non-finite, zero, or negative. `att` and `se` themselves are raw estimator outputs: on constant-y / no-dose-variation / divide-by-zero the fit paths return `(att=nan, se=nan)` so all five fields move to NaN together; on the degenerate single-cluster CR1 configuration on the mass-point path, `_fit_mass_point_2sls` returns `(att=beta_hat, se=nan)` - `att` is finite (Wald-IV is well defined) while `se` is NaN, so the downstream triple is NaN while `att` remains the raw 2SLS coefficient. The `assert_nan_inference` fixture in `tests/conftest.py` checks the downstream triple against this contract without requiring `att` to be NaN.
     - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR.
     - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning.
     - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate.