diff --git a/CHANGELOG.md b/CHANGELOG.md index 5ee0ce38..15a8ddeb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Changed +- **HAD survey-design API consolidated to single `survey_design=` kwarg** across all 8 HAD surfaces: `HeterogeneousAdoptionDiD.fit`, `did_had_pretest_workflow`, `qug_test`, `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`. Matches the rest of the library (`ContinuousDiD`, `EfficientDiD`, `ChaisemartinDHaultfoeuille` already used `survey_design=`). On data-in surfaces (HAD.fit, workflow, joint data-in wrappers) `survey_design=` accepts a `SurveyDesign` instance (column references resolved against `data` at fit time, same convention as the rest of the library). On the three array-in linearity helpers (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`) `survey_design=` accepts a pre-resolved `ResolvedSurveyDesign`; passing a `SurveyDesign` raises `TypeError` with migration guidance to `make_pweight_design(arr)` (pweight-only) or pre-resolution. `qug_test` is the 8th surface and accepts the same kwarg signature for consistency, but **all** non-`None` values raise `NotImplementedError` per the Phase 4.5 C0 permanent deferral (no migration path; the qug-specific mutex error reflects this). New public helper `make_pweight_design(weights: np.ndarray) -> ResolvedSurveyDesign` exported from the `diff_diff` top level for the pweight-only convenience on the three array-in linearity helpers (formerly the private `survey._make_trivial_resolved`, kept as a permanent private alias); validates 1-D input at the front door. Three-way mutex (`survey_design + survey + weights`) extends the prior 2-way (`survey + weights`) — at most one may be non-None per call. Patch-level addition (additive new kwarg + permanent alias for the helper; no breaking changes this release). + +### Deprecated +- **`HeterogeneousAdoptionDiD.fit(survey=, weights=)`, `did_had_pretest_workflow(survey=, weights=)`, and the 6 HAD pretest helpers' `survey=` / `weights=` kwargs are deprecated** in favor of the canonical `survey_design=`. Emits `DeprecationWarning` with migration guidance; the deprecated kwargs continue to route through the unchanged legacy back-end paths so numerical results are identical to pre-PR (bit-exact regression locked by parity tests in `tests/test_had_dual_knob_deprecation.py`). Both `survey=` and `weights=` will be removed in the next minor release. **Carve-out for `qug_test`**: the deprecation is kwarg-name-consolidation only; `qug_test` permanently rejects all non-`None` `survey_design` / `survey` / `weights` values (Phase 4.5 C0 deferral) and `make_pweight_design(arr)` is NOT a valid migration target — the deprecation warning text on `qug_test` is qug-specific and points users to `did_had_pretest_workflow(..., survey_design=...)` for survey-aware HAD pretesting (which skips the QUG step under survey). + ### Added - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology. - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract. diff --git a/TODO.md b/TODO.md index 0c420b92..8aeff2a2 100644 --- a/TODO.md +++ b/TODO.md @@ -99,6 +99,7 @@ Deferred items from PR reviews that were not addressed before merge. | `HeterogeneousAdoptionDiD` Phase 4.5: weight-aware auto-bandwidth MSE-DPI selector. Phase 4.5 A ships weighted `lprobust` with an unweighted DPI selector; users who want a weight-aware bandwidth must pass `h`/`b` explicitly. Extending `lpbwselect_mse_dpi` to propagate weights through density, second-derivative, and variance stages is ~300 LoC of methodology and was out of scope. | `diff_diff/_nprobust_port.py::lpbwselect_mse_dpi` | Phase 4.5 | Low | | `HeterogeneousAdoptionDiD` Phase 4.5 C: replicate-weight SurveyDesigns (BRR / Fay / JK1 / JKn / SDR) on the continuous-dose paths. Phase 4.5 A raises `NotImplementedError` on replicate designs in `_aggregate_unit_resolved_survey`. Rao-Wu-style replicate bootstrap for HAD paths requires deriving the per-replicate weight-ratio rescaling for the local-linear intercept IF. | `diff_diff/had.py::_aggregate_unit_resolved_survey` | Phase 4.5 C | Low | | `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium | +| `HeterogeneousAdoptionDiD` survey-design API consolidation, **next minor bump**: drop the deprecated `survey=` and `weights=` kwargs on all 8 HAD surfaces (`HeterogeneousAdoptionDiD.fit`, `did_had_pretest_workflow`, `qug_test`, `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); only `survey_design=` remains. Also fold the legacy back-end `weights=` paths (e.g. `_aggregate_unit_weights` ad-hoc routing) into the unified `_resolve_survey_for_fit`-driven path. The `_make_trivial_resolved` underscore alias on `survey.py` stays (one-line, harmless). DeprecationWarning ships in this PR; the removal PR is ~50 LoC of cleanup. | `diff_diff/had.py`, `diff_diff/had_pretests.py` | next minor bump | Medium | | `HeterogeneousAdoptionDiD` continuous paths: thread `cluster=` through `bias_corrected_local_linear` (Phase 1c's wrapper already supports cluster; Phase 2a ignores it with a `UserWarning` on the continuous path to keep scope tight). | `diff_diff/had.py`, `diff_diff/local_linear.py` | Phase 2a | Low | | `HeterogeneousAdoptionDiD` Eq 18 linear-trend detrending (Pierce-Schott style): the joint-Stute infrastructure shipped in the Phase 3 follow-up supports pre-trends (mean-indep) and post-homogeneity (linearity) nulls. The Pierce-Schott application (paper Section 5.2) uses a LINEAR-TREND detrending of pre-period outcomes before the joint CvM — `Y_{g,t} - Y_{g,t_anchor} - (t - t_anchor)*(Y_{g,t_anchor} - Y_{g,t_anchor-1})` — reaching p=0.51 on US-China tariff data. Extends `joint_pretrends_test` with a detrending mode or a separate Eq 18-specific helper. Deferred to Phase 4 replication harness (where the published p=0.51 serves as the parity anchor). | `diff_diff/had_pretests.py::joint_pretrends_test` | Phase 4 | Medium | | `HeterogeneousAdoptionDiD` Phase 3 Stute performance: Appendix D vectorized matrix form replaces the per-iteration OLS refit with a single precomputed `M = I - X(X'X)^{-1}X'` applied to `eps * eta`. Functionally identical, ~2x faster. Shipped literal-refit form in Phase 3 to match paper text and keep reviewer surface small. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low | diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index ccf425f3..e95ec008 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -151,6 +151,7 @@ SurveyDesign, SurveyMetadata, compute_deff_diagnostics, + make_pweight_design, ) from diff_diff.staggered import ( CallawaySantAnna, @@ -445,6 +446,7 @@ "SurveyMetadata", "DEFFDiagnostics", "compute_deff_diagnostics", + "make_pweight_design", # Rust backend "HAS_RUST_BACKEND", # Linear algebra helpers diff --git a/diff_diff/had.py b/diff_diff/had.py index 84ac8963..b7a77f05 100644 --- a/diff_diff/had.py +++ b/diff_diff/had.py @@ -76,7 +76,13 @@ BiasCorrectedFit, bias_corrected_local_linear, ) -from diff_diff.survey import SurveyMetadata, compute_survey_metadata +from diff_diff.survey import ( + HAD_DEPRECATION_MSG_SURVEY_KWARG, + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_HAD_FIT, + HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN, + SurveyMetadata, + compute_survey_metadata, +) from diff_diff.utils import safe_inference __all__ = [ @@ -695,10 +701,13 @@ class HeterogeneousAdoptionDiDEventStudyResults: # fits stay unchanged; all None on unweighted fits). variance_formula: Optional[str] = None """Per-horizon variance family label (applied uniformly across all - horizons in the fit). One of ``"pweight"`` / ``"pweight_2sls"`` - (weights= shortcut; continuous / mass-point), ``"survey_binder_tsl"`` - / ``"survey_binder_tsl_2sls"`` (survey= path), or ``None`` on - unweighted fits. Mirrors the static-path ``variance_formula`` field.""" + horizons in the fit). One of ``"pweight"`` / ``"pweight_2sls"`` (when + a per-row weight array was supplied, including via the deprecated + ``weights=`` alias; continuous / mass-point), ``"survey_binder_tsl"`` + / ``"survey_binder_tsl_2sls"`` (when a SurveyDesign was supplied via + ``survey_design=`` or the deprecated ``survey=`` alias), or ``None`` + on unweighted fits. Mirrors the static-path ``variance_formula`` + field.""" effective_dose_mean: Optional[float] = None """Weighted denominator used by the β̂-scale rescaling. For continuous designs: weighted ``sum(w · d)/sum(w)`` (continuous_at_zero) or @@ -2783,9 +2792,15 @@ def fit( unit_col: str, first_treat_col: Optional[str] = None, aggregate: str = "overall", + # PR #376 R4 P1: preserve pre-PR positional-or-keyword status of + # `survey`, `weights`, `cband` for back-compat with positional + # callers. `survey_design=` is the only new addition and is + # keyword-only. survey: Any = None, weights: Optional[np.ndarray] = None, cband: bool = True, + *, + survey_design: Any = None, ) -> HeterogeneousAdoptionDiDResults: """Fit the HAD estimator. @@ -2835,66 +2850,123 @@ def fit( CIs per horizon; joint cross-horizon covariance is deferred to a follow-up PR. Staggered-timing panels are auto-filtered to the last-treatment cohort with a ``UserWarning``. - survey : SurveyDesign or None + survey_design : SurveyDesign or None, keyword-only Survey design (sampling weights + optional strata / PSU / FPC) - for design-based inference on the two continuous-dose paths - (``continuous_at_zero``, ``continuous_near_d_lower``). Passes - through :func:`compute_survey_if_variance` (Binder 1983 TSL) - for the SE; weights propagate pointwise into the lprobust - kernel composition. Only ``weight_type="pweight"`` is - supported in Phase 4.5 A — ``aweight`` / ``fweight`` raise - ``NotImplementedError``. Survey design columns (strata / PSU / - FPC) must be constant within unit (sampling-unit-level - assignment); within-unit variance raises ``ValueError``. - Replicate-weight designs raise ``NotImplementedError`` - (Phase 4.5 C). Phase 4.5 B support matrix: survey / weights - are now accepted on ALL design × aggregate combinations - (continuous × {overall, event-study}, mass-point × {overall, - event-study}); HAD pretests (``qug_test``, ``stute_test``, - ``yatchew_hr_test``, joint variants, - ``did_had_pretest_workflow``) still don't accept - survey/weights — deferred to Phase 4.5 C / C0. + for design-based inference. Supported on ALL design × aggregate + combinations after Phase 4.5 B: continuous paths + (``continuous_at_zero``, ``continuous_near_d_lower``) on both + ``aggregate="overall"`` and ``aggregate="event_study"``, AND + the ``mass_point`` design on both aggregates. Continuous paths + compose the SE via :func:`compute_survey_if_variance` (Binder + 1983 TSL); weights propagate pointwise into the lprobust + kernel. Mass-point composes the per-unit 2SLS IF on the + HC1-scale and Binder-TSL-aggregates that — requires + ``vcov_type='hc1'`` (the classical default raises + ``NotImplementedError`` on the survey path). Event-study fits + with ``cband=True`` add a multiplier-bootstrap simultaneous + confidence band. Only ``weight_type="pweight"`` is supported + (``aweight`` / ``fweight`` raise ``NotImplementedError``). + Survey design columns (strata / PSU / FPC) must be constant + within unit (sampling-unit-level assignment); within-unit + variance raises ``ValueError``. Replicate-weight designs raise + ``NotImplementedError``. Mutually exclusive with the deprecated + ``survey=`` and ``weights=`` aliases. See + ``docs/methodology/REGISTRY.md`` § HeterogeneousAdoptionDiD — + "Note (HAD survey-design API consolidation)" for the full + dispatch matrix. + survey : SurveyDesign or None + DEPRECATED alias of ``survey_design=``. Remains positional-or- + keyword for one minor cycle to preserve pre-PR call shapes; + will be removed in the next minor release. Prefer + ``survey_design=``. weights : np.ndarray or None - Per-row sampling weights as a lightweight shortcut equivalent - to ``survey=SurveyDesign(weights=)``. Produces the same - ATT; the SE uses the analytical weighted HC1 sandwich - (continuous: CCT-2014 weighted-robust; mass-point: pweight - 2SLS sandwich) rather than Binder-TSL. Must be constant - within each unit; row-order aligned with ``data`` (index - labels are resolved to positional offsets via - ``data.index.get_indexer``, so custom non-RangeIndex inputs - work as long as ``data.index`` is unique). Mutually - exclusive with ``survey=`` — passing both raises - ``ValueError``. + DEPRECATED alias for the per-row pweight shortcut. Remains + positional-or-keyword for one minor cycle. Prefer adding the + weights as a column on ``data`` and passing + ``survey_design=SurveyDesign(weights='col_name')`` instead. + Will be removed in the next minor release. Currently + preserved as the analytical-HC1-sandwich shortcut (continuous: + CCT-2014 weighted-robust; mass-point: pweight 2SLS sandwich) + with the per-row → per-unit aggregation invariant intact. + Mutually exclusive with ``survey_design=`` and ``survey=``. cband : bool, default True Phase 4.5 B: controls the multiplier-bootstrap simultaneous confidence band on the weighted event-study path. When - ``True`` (default) and ``aggregate="event_study"`` AND - ``weights=`` or ``survey=`` is supplied, the fit populates - ``cband_low`` / ``cband_high`` / ``cband_crit_value`` / - ``cband_method`` / ``cband_n_bootstrap`` on the result. When - ``False`` those fields stay ``None``. No effect on - ``aggregate="overall"`` or on unweighted event-study. - ``n_bootstrap`` and ``seed`` (constructor params) control - replicate count and RNG; defaults are 999 / ``None``. + ``True`` (default) and ``aggregate="event_study"`` AND any of + ``survey_design=`` / ``survey=`` / ``weights=`` is supplied, + the fit populates ``cband_low`` / ``cband_high`` / + ``cband_crit_value`` / ``cband_method`` / ``cband_n_bootstrap`` + on the result. When ``False`` those fields stay ``None``. No + effect on ``aggregate="overall"`` or on unweighted event- + study. ``n_bootstrap`` and ``seed`` (constructor params) + control replicate count and RNG; defaults are 999 / ``None``. Returns ------- HeterogeneousAdoptionDiDResults """ - # ---- aggregate / survey / weights validation ---- + # ---- aggregate / survey_design / survey / weights validation ---- if aggregate not in _VALID_AGGREGATES: raise ValueError( f"Invalid aggregate={aggregate!r}. Must be one of " f"{_VALID_AGGREGATES}." ) - if survey is not None and weights is not None: - raise ValueError( - "Pass survey= OR weights=, not both. " - "For SurveyDesign-composed inference (PSU, strata, FPC, " - "replicate weights), use survey=. For a simple pweight-only " - "shortcut, use weights=; it is internally equivalent to " - "survey=SurveyDesign(weights=w)." + # Three-way mutex on survey_design / survey / weights (data-in pattern). + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: + raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN) + + # Soft deprecation: route legacy survey=/weights= aliases to + # survey_design=. The internal back-end paths (legacy weights= and + # survey= routing below) are unchanged; only the entry signature + # wraps them. The bit-exact back-compat invariant is preserved + # because we only rebind names, not values, and the legacy `survey` + # / `weights` variables are re-derived from `survey_design` for + # downstream consumption. + if survey is not None: + warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2) + survey_design = survey + elif weights is not None: + warnings.warn( + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_HAD_FIT, + DeprecationWarning, + stacklevel=2, ) + # weights= shortcut preserved as-is on the back end (the + # downstream `if weights is not None:` branch consumes the + # raw array directly via _aggregate_unit_weights). Don't + # rebind survey_design here — the array is not a + # SurveyDesign and survey_design= cannot accept arrays. + else: + # Canonical path: survey_design= may be None or a SurveyDesign + # instance. Map back to the internal `survey` variable name + # so downstream code (legacy `if survey is not None:` branch) + # consumes the input transparently. + survey = survey_design + + # Type guard on the data-in surface (PR #376 R8 P1): HAD.fit() + # accepts a SurveyDesign that gets resolved against `data` at fit + # time; a pre-resolved ResolvedSurveyDesign (or its + # make_pweight_design factory output) goes to the array-in pretest + # helpers, NOT to fit(). Reject explicitly with migration guidance + # rather than letting `survey.resolve(data)` AttributeError or + # `survey.weights` (a numpy array on Resolved) be misinterpreted as + # a column name. Mirrors the array-in helpers' isinstance-SurveyDesign + # rejection in stute_test/yatchew_hr_test/stute_joint_pretest. + if survey is not None and not hasattr(survey, "resolve"): + raise TypeError( + "HeterogeneousAdoptionDiD.fit: `survey_design=` accepts a " + "SurveyDesign instance (column-referencing, gets " + "`.resolve(data)`'d at fit time) on the data-in estimator " + "surface. Got " + f"{type(survey).__name__} (no `.resolve()` method). " + "If you have a pre-resolved ResolvedSurveyDesign or used " + "`make_pweight_design(arr)`, that pattern is for the " + "array-in pretest helpers (`stute_test`, `yatchew_hr_test`, " + "`stute_joint_pretest`). On HAD.fit, add the weights as a " + "column on `data` and pass " + "`survey_design=SurveyDesign(weights='col_name', ...)`." + ) + # Dispatch the event-study path to a dedicated method so the # single-period path stays unchanged (Phase 2a contract preserved). # Note: event_study returns HeterogeneousAdoptionDiDEventStudyResults diff --git a/diff_diff/had_pretests.py b/diff_diff/had_pretests.py index 0f72e3cf..1b0e3bcf 100644 --- a/diff_diff/had_pretests.py +++ b/diff_diff/had_pretests.py @@ -75,7 +75,15 @@ _validate_had_panel, _validate_had_panel_event_study, ) -from diff_diff.survey import _make_trivial_resolved +from diff_diff.survey import ( + HAD_DEPRECATION_MSG_SURVEY_KWARG, + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_ARRAY_IN, + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN, + HAD_DUAL_KNOB_MUTEX_MSG_ARRAY_IN, + HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN, + SurveyDesign, + make_pweight_design, +) from diff_diff.utils import _generate_mammen_weights __all__ = [ @@ -100,6 +108,7 @@ _MIN_N_BOOTSTRAP = 99 _STUTE_LARGE_G_THRESHOLD = 100_000 + # Scale-invariant tolerance for detecting a numerically exact linear OLS fit. # The ratio SSR / TSS = sum(eps^2) / sum((dy - dybar)^2) equals 1 - R^2 # and is BOTH TRANSLATION-INVARIANT (centering absorbs additive shifts) @@ -1201,6 +1210,7 @@ def qug_test( d: np.ndarray, alpha: float = 0.05, *, + survey_design: Any = None, survey: Any = None, weights: Optional[np.ndarray] = None, ) -> QUGTestResults: @@ -1223,12 +1233,26 @@ def qug_test( Post-period dose vector. Must be 1D numeric and contain no NaN. alpha : float, default 0.05 One-sided significance level. Must satisfy ``0 < alpha < 1``. - survey : SurveyDesign or None, keyword-only, default None + survey_design : ResolvedSurveyDesign or None, keyword-only, default None Permanently rejected with ``NotImplementedError`` (Phase 4.5 C0 - decision gate). See *Notes -- Survey/weighted data*. + decision gate). Surface-symmetric kwarg with the rest of the HAD + family — accepted in the signature so all 8 HAD entry points + share the canonical kwarg name, but ``qug_test`` has no + survey-aware migration target. See *Notes -- Survey/weighted + data*. + survey : SurveyDesign or None, keyword-only, default None + DEPRECATED alias of ``survey_design=``. Surface-symmetric only; + any non-``None`` value still raises ``NotImplementedError`` — + the deprecation is about kwarg-name consolidation, NOT a + migration path (there is no survey-aware QUG). Will be removed + in the next minor release. weights : np.ndarray or None, keyword-only, default None - Permanently rejected with ``NotImplementedError`` (Phase 4.5 C0 - decision gate). See *Notes -- Survey/weighted data*. + DEPRECATED alias of ``survey_design=`` for the per-row pweight + shortcut on the rest of the HAD array-in family. On + ``qug_test``, surface-symmetric only; any non-``None`` value + still raises ``NotImplementedError`` — there is no migration + path (``make_pweight_design(arr)`` is NOT a valid QUG migration + target). Will be removed in the next minor release. Returns ------- @@ -1240,11 +1264,11 @@ def qug_test( ------ ValueError If ``d`` is not 1D numeric or contains NaN, or if ``alpha`` is - not in ``(0, 1)``, or if ``survey`` and ``weights`` are both - non-None (mutex). + not in ``(0, 1)``, or if more than one of + ``survey_design``/``survey``/``weights`` is non-None (mutex). NotImplementedError - If ``survey`` or ``weights`` is non-None. See - *Notes -- Survey/weighted data*. + If any of ``survey_design``, ``survey``, ``weights`` is non-None. + See *Notes -- Survey/weighted data*. Notes ----- @@ -1277,25 +1301,67 @@ def qug_test( if not (0.0 < alpha < 1.0): raise ValueError(f"alpha must satisfy 0 < alpha < 1, got {alpha}.") - # Mutex on survey/weights, mirroring HeterogeneousAdoptionDiD.fit() - # at had.py:2890 so users get a consistent error across the HAD - # surface area. - if survey is not None and weights is not None: + # Three-way mutex on survey_design / survey / weights. qug_test rejects + # ALL non-None survey-aware inputs (Phase 4.5 C0 permanent deferral, see + # NotImplementedError below), so the mutex message here is qug-specific + # and does NOT point users to `make_pweight_design(arr)` (which the + # array-in mutex on `stute_test`/`yatchew_hr_test`/`stute_joint_pretest` + # does suggest as the migration target). PR #376 R2 P3 fix. + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: raise ValueError( - "Pass survey= OR weights=, not both. " - "qug_test does not yet accept either kwarg (Phase 4.5 C0 " - "decision gate); see the NotImplementedError below for the " - "methodology rationale." + "qug_test: pass at most one of `survey_design=`, `survey=`, or " + "`weights=`. All three are permanently rejected on qug_test " + "(Phase 4.5 C0 deferral) — there is no migration path; see the " + "NotImplementedError raised below for the methodology rationale." ) + # Soft deprecation: route legacy survey=/weights= aliases through + # survey_design= for the gated NotImplementedError below. PR #376 R10 + # P3: qug_test-specific deprecation messages — the shared + # HAD_DEPRECATION_MSG_*_KWARG_ARRAY_IN strings tell users to migrate to + # `survey_design=` / `make_pweight_design(...)`, but qug_test + # permanently rejects ALL survey-aware kwargs (Phase 4.5 C0 deferral). + # Use qug-specific warning text that says the aliases are deprecated + # but survey-aware QUG remains unsupported, and points users to + # unweighted `qug_test()` or `did_had_pretest_workflow(..., + # survey_design=...)` for the survey-aware linearity family. + if survey is not None: + warnings.warn( + "`survey=` is deprecated on qug_test (will be removed in the " + "next minor release). Note that qug_test does NOT support " + "survey-aware inputs at all (Phase 4.5 C0 permanent deferral; " + "see the NotImplementedError below). For survey-aware HAD " + "pretesting, use `did_had_pretest_workflow(..., " + "survey_design=...)` (the workflow skips the QUG step under " + "survey/weights and runs the linearity family).", + DeprecationWarning, + stacklevel=2, + ) + survey_design = survey + elif weights is not None: + warnings.warn( + "`weights=` is deprecated on qug_test (will be removed in the " + "next minor release). Note that qug_test does NOT support " + "weighted/survey inputs at all (Phase 4.5 C0 permanent deferral; " + "see the NotImplementedError below). For survey-aware HAD " + "pretesting, use `did_had_pretest_workflow(..., " + "survey_design=...)` (the workflow skips the QUG step under " + "survey/weights and runs the linearity family).", + DeprecationWarning, + stacklevel=2, + ) + survey_design = make_pweight_design(np.asarray(weights, dtype=np.float64)) + # Phase 4.5 C0 decision gate: QUG-under-survey is permanently deferred. # Extreme-order-statistic functionals are not smooth in the empirical # CDF, so standard survey machinery (Binder TSL linearization, Rao-Wu # rescaled bootstrap) does not provide a calibrated test. See # REGISTRY.md § "QUG Null Test" for the full methodology note. - if survey is not None or weights is not None: + if survey_design is not None: raise NotImplementedError( - "qug_test does not support survey= / weights= kwargs.\n" + "qug_test does not support survey_design= / survey= / " + "weights= kwargs.\n" "\n" "QUG (de Chaisemartin et al. 2026, Theorem 4) tests " "H_0: d_lower = 0 via the ratio of the two smallest order " @@ -1311,7 +1377,7 @@ def qug_test( "boundary tests; no off-the-shelf survey-aware QUG exists.\n" "\n" "For survey-aware HAD pretesting, use the joint Stute family " - "via did_had_pretest_workflow(..., survey=..., " + "via did_had_pretest_workflow(..., survey_design=..., " "aggregate=...) -- shipped in Phase 4.5 C. The workflow " "skips the QUG step under survey/weights with a UserWarning " "and runs the linearity family with a PSU-level Mammen " @@ -1418,8 +1484,9 @@ def stute_test( n_bootstrap: int = 999, seed: Optional[int] = None, *, - weights: Optional[np.ndarray] = None, + survey_design: Any = None, survey: Any = None, + weights: Optional[np.ndarray] = None, ) -> StuteTestResults: """Run the Stute Cramer-von Mises linearity test (paper Appendix D). @@ -1446,19 +1513,22 @@ def stute_test( seed : int or None, default None Seed for ``np.random.default_rng``. Pass an integer for reproducible results. - weights : np.ndarray or None, keyword-only, default None - Per-unit positive weights for the pweight shortcut. Mutually - exclusive with ``survey``. When supplied, the bootstrap is routed - through a synthetic trivial ``ResolvedSurveyDesign`` (no - strata/PSU/FPC) so that the same survey-aware kernel handles both - entry points. See *Notes -- Survey/weighted data*. - survey : ResolvedSurveyDesign or None, keyword-only, default None - Already-resolved survey design (per-unit). Triggers the survey- - aware Stute calibration: PSU-level Mammen multipliers via + survey_design : ResolvedSurveyDesign or None, keyword-only, default None + Already-resolved survey design (per-unit). Array-in helpers + accept ``ResolvedSurveyDesign`` ONLY; passing a ``SurveyDesign`` + raises ``TypeError`` with migration guidance. For the pweight-only + shortcut, use ``survey_design=make_pweight_design(arr)``. Triggers + the survey-aware Stute calibration: PSU-level Mammen multipliers + via :func:`diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch`, broadcast to per-unit residual perturbation, with weighted CvM - recompute. Replicate-weight designs raise ``NotImplementedError`` - (deferred to a parallel follow-up after Phase 4.5 C). + recompute. Replicate-weight designs raise ``NotImplementedError``. + survey : ResolvedSurveyDesign or None, keyword-only, default None + DEPRECATED alias of ``survey_design=``. Will be removed in the + next minor release. + weights : np.ndarray or None, keyword-only, default None + DEPRECATED alias of ``survey_design=make_pweight_design(arr)``. + Will be removed in the next minor release. Returns ------- @@ -1470,8 +1540,16 @@ def stute_test( If ``d`` / ``dy`` are not 1D numeric, contain NaN, have unequal lengths, if any ``d`` value is negative (paper Section 2 HAD support restriction), if ``alpha`` is outside ``(0, 1)``, or if - ``n_bootstrap < 99``. Also raised if BOTH ``weights`` and - ``survey`` are supplied (mutex). + ``n_bootstrap < 99``. Also raised if more than one of + ``survey_design``, ``survey``, ``weights`` is supplied (3-way + mutex; ``survey=`` and ``weights=`` are deprecated aliases of + ``survey_design=``). + TypeError + If ``survey_design=SurveyDesign(...)`` (or the deprecated + ``survey=SurveyDesign(...)`` alias) is passed; array-in helpers + accept ``ResolvedSurveyDesign`` only. Use + ``survey_design=make_pweight_design(arr)`` for pweight-only or + pre-resolve via ``SurveyDesign(...).resolve(data)``. NotImplementedError If ``survey.replicate_weights is not None``. Replicate-weight pretests are a parallel follow-up after Phase 4.5 C; the @@ -1524,14 +1602,52 @@ def stute_test( f"Got n_bootstrap={n_bootstrap}." ) - # Phase 4.5 C: survey/weights mutex + replicate-weight rejection. - # Mirrors the C0 pattern from qug_test and HeterogeneousAdoptionDiD.fit(). - if survey is not None and weights is not None: - raise ValueError( - "stute_test: pass survey= OR weights=, " - "not both. survey= triggers full PSU-aware bootstrap; weights= is " - "the pweight shortcut routed through a synthetic trivial design." + # Three-way mutex on survey_design / survey / weights (array-in pattern). + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: + raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_ARRAY_IN) + + # Soft deprecation: route legacy survey=/weights= aliases to survey_design= + # FIRST so the type guard below covers `survey=SurveyDesign(...)` too + # (PR #376 R1 P1: alias must behave identically to the canonical kwarg). + # The bit-exact normalization-order invariant requires passing UNNORMALIZED + # weights to make_pweight_design; the unified path's mean=1 step (~line + # 1669) fires downstream EXACTLY ONCE. + if survey is not None: + warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2) + survey_design = survey + elif weights is not None: + warnings.warn( + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_ARRAY_IN, + DeprecationWarning, + stacklevel=2, ) + survey_design = make_pweight_design(np.asarray(weights, dtype=np.float64)) + + # Type guard: array-in helpers reject SurveyDesign (cannot resolve column + # names without `data`). Runs AFTER alias rebinding so it covers both + # `survey_design=SurveyDesign(...)` and the deprecated + # `survey=SurveyDesign(...)` form identically. + if survey_design is not None and isinstance(survey_design, SurveyDesign): + raise TypeError( + "stute_test: `survey_design=` accepts a pre-resolved " + "ResolvedSurveyDesign only (array-in helpers have no `data` to " + "resolve column names against). For pweight-only, use " + "`survey_design=make_pweight_design(arr)`. For full PSU/strata/" + "FPC, pre-resolve via `SurveyDesign(...).resolve(data)` and pass " + "the result." + ) + + # Internal alias rebind: downstream code uses `survey` and `weights` as + # internal variable names (Phase 4.5 C convention). After the deprecation + # block, fold the canonical survey_design back into the legacy variable + # names so the unchanged downstream logic consumes the input transparently. + survey = survey_design + weights = None # weights= alias has been folded into survey_design + + # Replicate-weight rejection: the per-replicate weight-ratio rescaling for + # the OLS-on-residuals refit step is not covered by the multiplier-bootstrap + # composition. Parallel follow-up after Phase 4.5 C. if survey is not None and getattr(survey, "replicate_weights", None) is not None: raise NotImplementedError( "stute_test: replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) " @@ -1721,7 +1837,7 @@ def stute_test( # (broadcast to per-obs perturbation), weighted OLS refit, weighted # CvM recompute. Routes via synthetic trivial ResolvedSurveyDesign # for the weights= shortcut to share the same kernel. - resolved_for_boot = survey if survey is not None else _make_trivial_resolved(w_arr) + resolved_for_boot = survey if survey is not None else make_pweight_design(w_arr) # R10 P1: reject stratified designs explicitly until a derived # Stute-specific correction lands. The HAD sup-t bootstrap # (had.py:2120+) applies a within-stratum demean + @@ -1833,8 +1949,9 @@ def yatchew_hr_test( dy: np.ndarray, alpha: float = 0.05, *, - weights: Optional[np.ndarray] = None, + survey_design: Any = None, survey: Any = None, + weights: Optional[np.ndarray] = None, ) -> YatchewTestResults: """Run the Yatchew heteroskedasticity-robust linearity test. @@ -1858,17 +1975,23 @@ def yatchew_hr_test( Dose and first-difference outcome vectors. alpha : float, default 0.05 One-sided significance level. - weights : np.ndarray or None, keyword-only, default None - Per-unit STRICTLY POSITIVE weights for the pweight shortcut. - Mutually exclusive with ``survey``. See *Notes -- Survey/weighted data*. - survey : ResolvedSurveyDesign or None, keyword-only, default None - Already-resolved survey design (per-unit). When supplied, the OLS + survey_design : ResolvedSurveyDesign or None, keyword-only, default None + Already-resolved survey design (per-unit). Array-in helpers accept + ``ResolvedSurveyDesign`` ONLY; passing a ``SurveyDesign`` raises + ``TypeError``. For pweight-only, use + ``survey_design=make_pweight_design(arr)``. When supplied, the OLS baseline becomes weighted OLS and all three variance components become their pweight-sandwich analogs. PSU clustering is NOT propagated through the variance-ratio statistic (would require deriving a survey-aware variance-of-variance estimator; out of scope per Phase 4.5 C). Replicate-weight designs raise ``NotImplementedError``. + survey : ResolvedSurveyDesign or None, keyword-only, default None + DEPRECATED alias of ``survey_design=``. Will be removed in the + next minor release. + weights : np.ndarray or None, keyword-only, default None + DEPRECATED alias of ``survey_design=make_pweight_design(arr)``. + Will be removed in the next minor release. Returns ------- @@ -1880,8 +2003,16 @@ def yatchew_hr_test( If ``d`` / ``dy`` are not 1D numeric, contain NaN, have unequal lengths, if any ``d`` value is negative (paper Section 2 HAD support restriction), or if ``alpha`` is outside ``(0, 1)``. - Also raised if BOTH ``weights`` and ``survey`` supplied (mutex), - or if any weight is non-positive. + Also raised if more than one of ``survey_design``, ``survey``, + ``weights`` is supplied (3-way mutex; ``survey=`` and + ``weights=`` are deprecated aliases of ``survey_design=``), or + if any weight is non-positive. + TypeError + If ``survey_design=SurveyDesign(...)`` (or the deprecated + ``survey=SurveyDesign(...)`` alias) is passed; array-in helpers + accept ``ResolvedSurveyDesign`` only. Use + ``survey_design=make_pweight_design(arr)`` for pweight-only or + pre-resolve via ``SurveyDesign(...).resolve(data)``. NotImplementedError If ``survey.replicate_weights is not None`` (deferred follow-up). @@ -1956,11 +2087,42 @@ def yatchew_hr_test( if not (0.0 < alpha < 1.0): raise ValueError(f"alpha must satisfy 0 < alpha < 1, got {alpha}.") - # Phase 4.5 C: survey/weights mutex + replicate-weight rejection. - if survey is not None and weights is not None: - raise ValueError( - "yatchew_hr_test: pass survey= OR " "weights=, not both." + # Three-way mutex on survey_design / survey / weights (array-in pattern). + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: + raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_ARRAY_IN) + + # Soft deprecation: route legacy survey=/weights= aliases to survey_design= + # FIRST so the type guard below covers `survey=SurveyDesign(...)` too + # (PR #376 R1 P1: alias must behave identically to the canonical kwarg). + if survey is not None: + warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2) + survey_design = survey + elif weights is not None: + warnings.warn( + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_ARRAY_IN, + DeprecationWarning, + stacklevel=2, + ) + survey_design = make_pweight_design(np.asarray(weights, dtype=np.float64)) + + # Type guard: array-in helpers reject SurveyDesign. Runs AFTER alias + # rebinding so it covers both `survey_design=SurveyDesign(...)` and the + # deprecated `survey=SurveyDesign(...)` form identically. + if survey_design is not None and isinstance(survey_design, SurveyDesign): + raise TypeError( + "yatchew_hr_test: `survey_design=` accepts a pre-resolved " + "ResolvedSurveyDesign only (array-in helpers have no `data` to " + "resolve column names against). For pweight-only, use " + "`survey_design=make_pweight_design(arr)`. For full PSU/strata/" + "FPC, pre-resolve via `SurveyDesign(...).resolve(data)`." ) + + # Internal alias rebind for back-compat with downstream code. + survey = survey_design + weights = None + + # Replicate-weight rejection. if survey is not None and getattr(survey, "replicate_weights", None) is not None: raise NotImplementedError( "yatchew_hr_test: replicate-weight survey designs (BRR/Fay/JK1/JKn/" @@ -2516,8 +2678,9 @@ def stute_joint_pretest( n_bootstrap: int = 999, seed: Optional[int] = None, null_form: str = "custom", - weights: Optional[np.ndarray] = None, + survey_design: Any = None, survey: Any = None, + weights: Optional[np.ndarray] = None, ) -> StuteJointResult: """Joint Cramer-von Mises pretest across multiple horizons. @@ -2563,21 +2726,25 @@ def stute_joint_pretest( (``"mean_independence"`` | ``"linearity"`` | ``"custom"``). The wrappers :func:`joint_pretrends_test` and :func:`joint_homogeneity_test` set this automatically. - weights : np.ndarray or None, keyword-only, default None - Per-unit positive weights (Phase 4.5 C). When supplied, the - per-horizon CvM uses :func:`_cvm_statistic_weighted` and the - bootstrap routes through a synthetic trivial - ``ResolvedSurveyDesign``. Mutually exclusive with ``survey``. + survey_design : ResolvedSurveyDesign or None, keyword-only, default None + Already-resolved per-unit survey design (Phase 4.5 C). Array-in + helpers accept ``ResolvedSurveyDesign`` ONLY; passing a + ``SurveyDesign`` raises ``TypeError``. For pweight-only, use + ``survey_design=make_pweight_design(arr)``. When supplied, the + bootstrap is a PSU-level Mammen multiplier bootstrap with the + multiplier matrix shared across horizons within each replicate + (preserves both vector-valued empirical-process unit-level + dependence + PSU clustering). Replicate-weight designs raise + ``NotImplementedError``; non-pweight weight types are rejected. + Variance-unidentified designs (``df_survey <= 0``) return NaN + with a ``UserWarning`` instead of calibrating against an + all-zero multiplier matrix. survey : ResolvedSurveyDesign or None, keyword-only, default None - Already-resolved per-unit survey design (Phase 4.5 C). When - supplied, the bootstrap is a PSU-level Mammen multiplier - bootstrap with the multiplier matrix shared across horizons - within each replicate (preserves both vector-valued empirical- - process unit-level dependence + PSU clustering). Replicate- - weight designs raise ``NotImplementedError``; non-pweight - weight types are rejected. Variance-unidentified designs - (``df_survey <= 0``) return NaN with a ``UserWarning`` instead - of calibrating against an all-zero multiplier matrix. + DEPRECATED alias of ``survey_design=``. Will be removed in the + next minor release. + weights : np.ndarray or None, keyword-only, default None + DEPRECATED alias of ``survey_design=make_pweight_design(arr)``. + Will be removed in the next minor release. Returns ------- @@ -2602,13 +2769,42 @@ def stute_joint_pretest( negative values, ``n_bootstrap < _MIN_N_BOOTSTRAP``, or invalid ``alpha``. ``G < _MIN_G_STUTE`` does NOT raise; see Returns. """ - # Phase 4.5 C: survey/weights mutex + replicate-weight rejection - # (mirrors stute_test, yatchew_hr_test, did_had_pretest_workflow). - if survey is not None and weights is not None: - raise ValueError( - "stute_joint_pretest: pass survey= OR " - "weights=, not both." + # Three-way mutex on survey_design / survey / weights (array-in pattern). + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: + raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_ARRAY_IN) + + # Soft deprecation: route legacy survey=/weights= aliases to survey_design= + # FIRST so the type guard below covers `survey=SurveyDesign(...)` too + # (PR #376 R1 P1: alias must behave identically to the canonical kwarg). + if survey is not None: + warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2) + survey_design = survey + elif weights is not None: + warnings.warn( + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_ARRAY_IN, + DeprecationWarning, + stacklevel=2, + ) + survey_design = make_pweight_design(np.asarray(weights, dtype=np.float64)) + + # Type guard: array-in helpers reject SurveyDesign. Runs AFTER alias + # rebinding so it covers both `survey_design=SurveyDesign(...)` and the + # deprecated `survey=SurveyDesign(...)` form identically. + if survey_design is not None and isinstance(survey_design, SurveyDesign): + raise TypeError( + "stute_joint_pretest: `survey_design=` accepts a pre-resolved " + "ResolvedSurveyDesign only (array-in helpers have no `data` to " + "resolve column names against). For pweight-only, use " + "`survey_design=make_pweight_design(arr)`. For full PSU/strata/" + "FPC, pre-resolve via `SurveyDesign(...).resolve(data)`." ) + + # Internal alias rebind for back-compat with downstream code. + survey = survey_design + weights = None + + # Replicate-weight rejection. if survey is not None and getattr(survey, "replicate_weights", None) is not None: raise NotImplementedError( "stute_joint_pretest: replicate-weight survey designs (BRR/Fay/JK1/" @@ -2935,7 +3131,7 @@ def stute_joint_pretest( # broadcasts the SAME multipliers, preserving both the # vector-valued empirical-process unit-level dependence (paper # convention) AND PSU clustering (Krieger-Pfeffermann 1997). - resolved_for_boot = survey if survey is not None else _make_trivial_resolved(w_arr) + resolved_for_boot = survey if survey is not None else make_pweight_design(w_arr) # R10 P1: reject stratified designs explicitly until a derived # Stute-specific correction lands (mirrors stute_test # single-horizon). @@ -3100,9 +3296,22 @@ def _resolve_pretest_unit_weights( return weights_unit, None # survey is not None if not hasattr(survey, "resolve"): + # PR #376 R9 P3: error message names the canonical kwarg + # `survey_design=` (with the deprecated `survey=` alias mentioned + # for back-compat), and points pre-resolved-design users to the + # array-in pretest helpers where ResolvedSurveyDesign / + # make_pweight_design(arr) belong. raise TypeError( - f"{caller_name}: survey= must be a SurveyDesign instance " - f"(with .resolve()); got {type(survey).__name__}." + f"{caller_name}: `survey_design=` (or the deprecated `survey=` " + f"alias) accepts a SurveyDesign instance (column-referencing, " + f"gets `.resolve(data)`'d at fit time) on data-in surfaces; " + f"got {type(survey).__name__} (no `.resolve()` method). " + "If you have a pre-resolved ResolvedSurveyDesign or used " + "`make_pweight_design(arr)`, that pattern is for the array-in " + "pretest helpers (`stute_test`, `yatchew_hr_test`, " + "`stute_joint_pretest`). On data-in surfaces, add the weights " + "as a column on `data` and pass " + "`survey_design=SurveyDesign(weights='col_name', ...)`." ) resolved_full = survey.resolve(data) if getattr(resolved_full, "replicate_weights", None) is not None: @@ -3151,8 +3360,9 @@ def joint_pretrends_test( alpha: float = 0.05, n_bootstrap: int = 999, seed: Optional[int] = None, - weights: Optional[np.ndarray] = None, + survey_design: Any = None, survey: Any = None, + weights: Optional[np.ndarray] = None, ) -> StuteJointResult: """Joint Stute pre-trends test (paper Section 4.2 step 2). @@ -3189,23 +3399,46 @@ def joint_pretrends_test( handling follows the HAD contract (staggered auto-filter warns and proceeds on last cohort; solo cohort proceeds). alpha, n_bootstrap, seed : as in :func:`stute_test`. - weights : np.ndarray or None, keyword-only, default None - Per-row positive weights (Phase 4.5 C). Aggregated to per-unit - via :func:`diff_diff.had._aggregate_unit_weights` (constant- - within-unit invariant enforced). On staggered panels the - wrapper subsets ``weights`` to the surviving cohort BEFORE - aggregation. Mutually exclusive with ``survey``. - survey : SurveyDesign or None, keyword-only, default None + survey_design : SurveyDesign or None, keyword-only, default None Survey design (Phase 4.5 C). Resolved on the filtered panel; replicate-weight designs raise ``NotImplementedError``; ``weight_type`` must be ``"pweight"``. Forwarded to :func:`stute_joint_pretest` as a per-unit - ``ResolvedSurveyDesign``. + ``ResolvedSurveyDesign``. Mutually exclusive with the deprecated + ``survey=`` and ``weights=`` aliases. + survey : SurveyDesign or None, keyword-only, default None + DEPRECATED alias of ``survey_design=``. Will be removed in the + next minor release. + weights : np.ndarray or None, keyword-only, default None + DEPRECATED alias for the per-row pweight shortcut. Prefer + ``survey_design=SurveyDesign(weights='col_name')`` against your + dataframe instead. Will be removed in the next minor release. Returns ------- StuteJointResult with ``null_form = "mean_independence"``. """ + # Three-way mutex on survey_design / survey / weights (data-in pattern). + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: + raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN) + + # Soft deprecation: route legacy survey=/weights= aliases to survey_design=. + if survey is not None: + warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2) + survey_design = survey + elif weights is not None: + warnings.warn( + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN, + DeprecationWarning, + stacklevel=2, + ) + # weights= shortcut preserved as-is on the back end. + + # Internal alias rebind: downstream code uses `survey` and `weights`. + if survey_design is not None and survey is None: + survey = survey_design + if len(pre_periods) == 0: raise ValueError( "pre_periods must be non-empty. Workflow dispatch handles " @@ -3380,6 +3613,17 @@ def joint_pretrends_test( design_matrix = np.ones((G, 1), dtype=np.float64) + # Internal forwarding: pass survey_design= directly to stute_joint_pretest + # to avoid emitting the deprecation warning on every internal call. The + # canonical kwarg is the same on both ends; the warning fires ONCE at the + # user-facing front door (this wrapper) when the user passed a deprecated + # alias. + if resolved_unit is not None: + joint_survey_design = resolved_unit + elif weights_unit is not None: + joint_survey_design = make_pweight_design(weights_unit) + else: + joint_survey_design = None return stute_joint_pretest( residuals_by_horizon=residuals_by_horizon, fitted_by_horizon=fitted_by_horizon, @@ -3389,8 +3633,7 @@ def joint_pretrends_test( n_bootstrap=n_bootstrap, seed=seed, null_form="mean_independence", - weights=weights_unit if resolved_unit is None else None, - survey=resolved_unit, + survey_design=joint_survey_design, ) @@ -3407,8 +3650,9 @@ def joint_homogeneity_test( alpha: float = 0.05, n_bootstrap: int = 999, seed: Optional[int] = None, - weights: Optional[np.ndarray] = None, + survey_design: Any = None, survey: Any = None, + weights: Optional[np.ndarray] = None, ) -> StuteJointResult: """Joint Stute homogeneity-linearity test (paper Section 4.3 joint). @@ -3440,19 +3684,43 @@ def joint_homogeneity_test( first_treat_col : str or None Forwarded to the underlying panel validator. alpha, n_bootstrap, seed : as in :func:`stute_test`. - weights : np.ndarray or None, keyword-only, default None - Per-row positive weights (Phase 4.5 C). See - :func:`joint_pretrends_test` for the contract; semantics are - identical (per-unit aggregation, staggered subsetting, - replicate-weight rejection). - survey : SurveyDesign or None, keyword-only, default None + survey_design : SurveyDesign or None, keyword-only, default None Survey design (Phase 4.5 C). Same contract as - :func:`joint_pretrends_test`. + :func:`joint_pretrends_test`. Mutually exclusive with the + deprecated ``survey=`` and ``weights=`` aliases. + survey : SurveyDesign or None, keyword-only, default None + DEPRECATED alias of ``survey_design=``. Will be removed in the + next minor release. + weights : np.ndarray or None, keyword-only, default None + DEPRECATED alias for the per-row pweight shortcut. Prefer + ``survey_design=SurveyDesign(weights='col_name')`` against your + dataframe instead. Will be removed in the next minor release. Returns ------- StuteJointResult with ``null_form = "linearity"``. """ + # Three-way mutex on survey_design / survey / weights (data-in pattern). + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: + raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN) + + # Soft deprecation: route legacy survey=/weights= aliases to survey_design=. + if survey is not None: + warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2) + survey_design = survey + elif weights is not None: + warnings.warn( + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN, + DeprecationWarning, + stacklevel=2, + ) + # weights= shortcut preserved as-is on the back end. + + # Internal alias rebind: downstream code uses `survey` and `weights`. + if survey_design is not None and survey is None: + survey = survey_design + if len(post_periods) == 0: raise ValueError( "post_periods must be non-empty. Workflow dispatch handles " @@ -3613,6 +3881,13 @@ def joint_homogeneity_test( design_matrix = np.column_stack([np.ones(G, dtype=np.float64), d_arr.astype(np.float64)]) + # Internal forwarding via canonical kwarg (avoids deprecation warning). + if resolved_unit is not None: + joint_survey_design = resolved_unit + elif weights_unit is not None: + joint_survey_design = make_pweight_design(weights_unit) + else: + joint_survey_design = None return stute_joint_pretest( residuals_by_horizon=residuals_by_horizon, fitted_by_horizon=fitted_by_horizon, @@ -3622,8 +3897,7 @@ def joint_homogeneity_test( n_bootstrap=n_bootstrap, seed=seed, null_form="linearity", - weights=weights_unit if resolved_unit is None else None, - survey=resolved_unit, + survey_design=joint_survey_design, ) @@ -3750,6 +4024,7 @@ def did_had_pretest_workflow( seed: Optional[int] = None, *, aggregate: str = "overall", + survey_design: Any = None, survey: Any = None, weights: Optional[np.ndarray] = None, ) -> HADPretestReport: @@ -3803,17 +4078,24 @@ def did_had_pretest_workflow( deterministic. aggregate : str, keyword-only, default ``"overall"`` Dispatch mode. Invalid values raise ``ValueError``. - survey : SurveyDesign or None, keyword-only, default None + survey_design : SurveyDesign or None, keyword-only, default None Survey design for design-based pretest inference. Linearity-family pretests use PSU-level Mammen multiplier bootstrap (Stute family) and weighted OLS + weighted variance components (Yatchew). The QUG step is skipped under survey with a ``UserWarning`` (permanent deferral per Phase 4.5 C0). Replicate-weight designs raise - ``NotImplementedError``. Mutually exclusive with ``weights``. + ``NotImplementedError``. Mutually exclusive with the deprecated + ``survey=`` and ``weights=`` aliases. + survey : SurveyDesign or None, keyword-only, default None + DEPRECATED alias of ``survey_design=``. Will be removed in the + next minor release; prefer ``survey_design=``. weights : np.ndarray or None, keyword-only, default None - Per-row positive weights for the pweight shortcut. Mutually - exclusive with ``survey``. Routed through a synthetic trivial - ``ResolvedSurveyDesign`` so the same kernel handles both paths. + DEPRECATED alias for the per-row pweight shortcut. Prefer adding + the weights as a column on ``data`` and passing + ``survey_design=SurveyDesign(weights='col_name')`` instead. Will + be removed in the next minor release. Currently routed through a + synthetic trivial ``ResolvedSurveyDesign`` so the same kernel + handles both paths. Returns ------- @@ -3830,9 +4112,11 @@ def did_had_pretest_workflow( Raises ------ ValueError - On invalid ``aggregate``, ``survey`` and ``weights`` both - non-None, or any downstream front-door failure (panel balance, - dtype, dose invariant). + On invalid ``aggregate``; if more than one of ``survey_design``, + ``survey``, ``weights`` is supplied (3-way mutex; ``survey=`` and + ``weights=`` are deprecated aliases of ``survey_design=``); or + any downstream front-door failure (panel balance, dtype, dose + invariant). NotImplementedError If ``survey.replicate_weights is not None`` (replicate-weight pretests deferred to a parallel follow-up after Phase 4.5 C). @@ -3885,19 +4169,43 @@ def did_had_pretest_workflow( f"aggregate must be one of {list(_VALID_AGGREGATES)!r}; " f"got {aggregate!r}." ) - # Phase 4.5 C: survey/weights mutex + presence detection. R6 P1 fix: - # do NOT call _resolve_pretest_unit_weights on the FULL panel here -- - # under aggregate='event_study' the panel may be staggered and the - # cohort filter at _validate_multi_period_panel can drop units. If - # those dropped units have zero/invalid weights, eager full-panel - # resolution would abort an otherwise-valid event-study run. Defer - # resolution to the per-aggregate branches: overall path resolves on - # the original data (no filtering); event-study path lets the joint - # wrappers handle resolution on data_filtered. - if survey is not None and weights is not None: - raise ValueError( - "did_had_pretest_workflow: pass survey= OR " "weights=, not both." + # Three-way mutex on survey_design / survey / weights (data-in pattern). + # R6 P1 fix: do NOT call _resolve_pretest_unit_weights on the FULL panel + # here -- under aggregate='event_study' the panel may be staggered and the + # cohort filter at _validate_multi_period_panel can drop units. If those + # dropped units have zero/invalid weights, eager full-panel resolution + # would abort an otherwise-valid event-study run. Defer resolution to the + # per-aggregate branches: overall path resolves on the original data (no + # filtering); event-study path lets the joint wrappers handle resolution + # on data_filtered. + n_set = sum(x is not None for x in (survey_design, survey, weights)) + if n_set > 1: + raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN) + + # Soft deprecation: route legacy survey=/weights= aliases to survey_design=. + # The internal back-end paths (_resolve_pretest_unit_weights + per-aggregate + # dispatch) consume `survey` and `weights` as internal variable names, so + # rebind both for back-compat with the unchanged downstream logic. The + # bit-exact regression invariant is preserved because we only rebind names, + # not values. + if survey is not None: + warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2) + survey_design = survey + elif weights is not None: + warnings.warn( + HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN, + DeprecationWarning, + stacklevel=2, ) + # weights= shortcut preserved as-is on the back end. Don't rebind + # survey_design -- the array is not a SurveyDesign. + + # Internal alias rebind: downstream code uses `survey` (when set, a + # SurveyDesign or pre-resolved). Map the canonical input back so the + # unchanged downstream `if survey is not None:` branches consume it. + if survey_design is not None and survey is None: + survey = survey_design + use_survey_path = (survey is not None) or (weights is not None) if use_survey_path: @@ -3999,41 +4307,54 @@ def did_had_pretest_workflow( # whose lexical and chronological order disagree (e.g. "q10" < # "q2" lexically but > chronologically). earlier_pre = list(t_pre_list[:-1]) - if len(earlier_pre) >= 1: - pretrends_joint = joint_pretrends_test( + # PR #376 R2 P3: when `weights=joint_weights` is forwarded to the joint + # wrappers (the only joint-internal entry that takes a numpy array), + # the wrapper would re-emit a DeprecationWarning. Suppress those + # nested warnings — the user-facing warning has already fired at the + # workflow's front door above. survey_design=joint_survey is a + # SurveyDesign (column-referencing) on the survey path and goes + # through canonically; only the weights= forwarding path needs the + # suppression. The joint wrappers also can't accept a pre-resolved + # ResolvedSurveyDesign (their `_resolve_pretest_unit_weights` requires + # a SurveyDesign with .resolve()), so converting weights= to + # survey_design= via make_pweight_design isn't an option here. + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + if len(earlier_pre) >= 1: + pretrends_joint = joint_pretrends_test( + data_filtered, + outcome_col=outcome_col, + dose_col=dose_col, + time_col=time_col, + unit_col=unit_col, + pre_periods=earlier_pre, + base_period=base_period, + first_treat_col=first_treat_col, + alpha=alpha, + n_bootstrap=n_bootstrap, + seed=seed, + survey_design=joint_survey, + weights=joint_weights, + ) + else: + pretrends_joint = None + + # Step 3: joint homogeneity-linearity on post-periods. + homogeneity_joint = joint_homogeneity_test( data_filtered, outcome_col=outcome_col, dose_col=dose_col, time_col=time_col, unit_col=unit_col, - pre_periods=earlier_pre, + post_periods=list(t_post_list), base_period=base_period, first_treat_col=first_treat_col, alpha=alpha, n_bootstrap=n_bootstrap, seed=seed, + survey_design=joint_survey, weights=joint_weights, - survey=joint_survey, ) - else: - pretrends_joint = None - - # Step 3: joint homogeneity-linearity on post-periods. - homogeneity_joint = joint_homogeneity_test( - data_filtered, - outcome_col=outcome_col, - dose_col=dose_col, - time_col=time_col, - unit_col=unit_col, - post_periods=list(t_post_list), - base_period=base_period, - first_treat_col=first_treat_col, - alpha=alpha, - n_bootstrap=n_bootstrap, - seed=seed, - weights=joint_weights, - survey=joint_survey, - ) # Event-study `all_pass`. On the unweighted path, every implemented # step must be conclusive AND none reject (Phase 3 convention). On @@ -4109,21 +4430,28 @@ def did_had_pretest_workflow( # already aggregated to per-unit (weights_unit / resolved_unit); the # _aggregate_first_difference call above also collapses to per-unit # (one row per unit), so weights_unit and resolved_unit are aligned. + # Internal forwarding uses the canonical survey_design= kwarg to skip + # deprecation warnings; the user-facing warning has already fired at the + # workflow's front door. + if resolved_unit is not None: + per_test_survey_design = resolved_unit + elif weights_unit is not None: + per_test_survey_design = make_pweight_design(weights_unit) + else: + per_test_survey_design = None stute_res = stute_test( d_arr, dy_arr, alpha=alpha, n_bootstrap=n_bootstrap, seed=seed, - weights=weights_unit if resolved_unit is None else None, - survey=resolved_unit, + survey_design=per_test_survey_design, ) yatchew_res = yatchew_hr_test( d_arr, dy_arr, alpha=alpha, - weights=weights_unit if resolved_unit is None else None, - survey=resolved_unit, + survey_design=per_test_survey_design, ) # `all_pass` must be conclusive under the paper's four-step workflow diff --git a/diff_diff/survey.py b/diff_diff/survey.py index 155bce05..085c151b 100644 --- a/diff_diff/survey.py +++ b/diff_diff/survey.py @@ -678,22 +678,41 @@ def needs_survey_vcov(self) -> bool: return True # Any resolved survey design uses the survey vcov path -def _make_trivial_resolved(weights: np.ndarray) -> "ResolvedSurveyDesign": - """Construct a trivial pweight-only ResolvedSurveyDesign (no strata/PSU/FPC). - - Used by survey-aware code paths invoked via a bare per-row ``weights`` - array (the pweight shortcut). Routing through this synthetic resolved - design lets the same bootstrap / variance kernel handle both the - ``weights=`` shortcut and the full ``survey=SurveyDesign(...)`` path - uniformly. Mirrors the PR #363 synthetic-trivial-resolved pattern that - fixed sup-t under the ``weights=`` shortcut on - ``HeterogeneousAdoptionDiD.fit()``. +def make_pweight_design(weights: np.ndarray) -> "ResolvedSurveyDesign": + """Construct a pweight-only ResolvedSurveyDesign from a raw weight array. + + Use this on the array-in HAD pretest helpers (``stute_test``, + ``yatchew_hr_test``, ``stute_joint_pretest``) when the caller has only + a per-observation weight array and no PSU/strata/FPC structure:: + + from diff_diff import stute_test, make_pweight_design + result = stute_test(d, dy, survey_design=make_pweight_design(w)) + + For the data-in HAD surfaces (``HeterogeneousAdoptionDiD.fit``, + ``did_had_pretest_workflow``, ``joint_pretrends_test``, + ``joint_homogeneity_test``), prefer adding the weights as a column on + your dataframe and passing ``SurveyDesign(weights="col_name")`` instead; + those surfaces resolve column references against ``data`` at fit time + (the standard library convention used by ContinuousDiD, EfficientDiD, + and ChaisemartinDHaultfoeuille). + + Internal note: this constructs a synthetic ``ResolvedSurveyDesign`` with + each observation as its own PSU and no strata/FPC, so PSU-level + multiplier-bootstrap kernels reduce bit-exactly to per-observation + Mammen draws while sharing the survey-aware code path with full PSU / + strata / FPC designs (mirrors the PR #363 synthetic-trivial-resolved + pattern). Parameters ---------- weights : np.ndarray, shape (n_obs,) - Per-observation positive weights. Caller is responsible for any - non-negativity / per-unit-constancy validation. + Per-observation positive weights. Must be 1-D (shape ``(n_obs,)``); + scalars, 0-D arrays, and column-vector inputs (shape ``(n, 1)``) + raise ``ValueError`` at the front door. Caller is responsible for + any non-negativity / per-unit-constancy validation. Typical usage + is positional (``make_pweight_design(arr)``); the parameter name + ``weights`` collides linguistically with the deprecated + ``weights=`` kwarg on HAD surfaces, so prefer positional form. Returns ------- @@ -702,8 +721,23 @@ def _make_trivial_resolved(weights: np.ndarray) -> "ResolvedSurveyDesign": ``n_strata=0``, ``n_psu=n_obs`` (each observation is its own PSU under the trivial design), ``lonely_psu="remove"``, ``replicate_weights=None``. + + Raises + ------ + ValueError + If ``weights`` is not 1-D (PR #376 R3 P1: catches scalar / 0-D / + column-vector inputs with a clear front-door message instead of + bubbling a low-level numpy or dataclass exception). """ w = np.asarray(weights, dtype=np.float64) + if w.ndim != 1: + raise ValueError( + f"make_pweight_design: weights must be 1-dimensional (1-D, shape " + f"(n_obs,)), got shape {w.shape}. Common mistakes: scalar / 0-D " + f"input (`make_pweight_design(1.0)`); column-vector " + f"(`make_pweight_design(df[['w']].to_numpy())` produces (n, 1) " + f"-- use `df['w'].to_numpy()` for (n,)); 2-D matrix input." + ) n_obs = int(w.shape[0]) return ResolvedSurveyDesign( weights=w, @@ -717,6 +751,70 @@ def _make_trivial_resolved(weights: np.ndarray) -> "ResolvedSurveyDesign": ) +_make_trivial_resolved = make_pweight_design + + +# Three-way mutex error messages for `survey_design=` / `survey=` / `weights=` +# kwargs across the 8 HAD surfaces (HeterogeneousAdoptionDiD.fit + +# did_had_pretest_workflow + 5 pretest helpers + qug_test). The migration +# target text differs between data-in surfaces (which can resolve +# ``SurveyDesign(weights="col_name")`` against ``data``) and array-in +# surfaces (which take pre-resolved ``ResolvedSurveyDesign`` and use +# ``make_pweight_design(arr)`` for the pweight-only convenience). Defined +# here to avoid circular imports between had.py and had_pretests.py. +HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN = ( + "Pass at most one of `survey_design=`, `survey=`, or `weights=`. " + "`survey=` and `weights=` are deprecated aliases of `survey_design=` " + "and will be removed in the next minor release. Prefer " + "`survey_design=SurveyDesign(weights='col_name', ...)`." +) +HAD_DUAL_KNOB_MUTEX_MSG_ARRAY_IN = ( + "Pass at most one of `survey_design=`, `survey=`, or `weights=`. " + "`survey=` and `weights=` are deprecated aliases of `survey_design=` " + "and will be removed in the next minor release. Prefer " + "`survey_design=make_pweight_design(arr)` for pweight-only or " + "`survey_design=` for full " + "PSU/strata/FPC." +) +HAD_DEPRECATION_MSG_SURVEY_KWARG = ( + "`survey=` is deprecated; use `survey_design=` instead " + "(same accepted types). Will be removed in the next minor release." +) +HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN = ( + "`weights=np.ndarray` is deprecated; add the weights as a column on " + "`data` and pass `survey_design=SurveyDesign(weights='col_name')` " + "instead. Will be removed in the next minor release." +) +# PR #376 R11 P3: HAD.fit-specific weights= deprecation message — the +# generic data-in suggestion above (use `survey_design=SurveyDesign(...)`) +# is the long-term API target, but on `HeterogeneousAdoptionDiD.fit` the +# two paths currently produce different SE families: the deprecated +# `weights=np.ndarray` shortcut yields `variance_formula="pweight"` / +# `"pweight_2sls"` (CCT-2014 weighted-robust / 2SLS pweight-sandwich) +# while `survey_design=SurveyDesign(...)` yields `"survey_binder_tsl"` / +# `"survey_binder_tsl_2sls"`. The next-minor cleanup (TODO row 102) will +# unify the two; until then, document the SE-family caveat explicitly so +# users know what changes when they migrate. +HAD_DEPRECATION_MSG_WEIGHTS_KWARG_HAD_FIT = ( + "`weights=np.ndarray` is deprecated on HeterogeneousAdoptionDiD.fit; " + "the long-term API is to add the weights as a column on `data` and " + "pass `survey_design=SurveyDesign(weights='col_name')`. Will be " + "removed in the next minor release. NOTE: in the current release the " + "two paths produce different SE families on this surface — the " + "`weights=` shortcut keeps the analytical CCT-2014 / 2SLS pweight-" + "sandwich (`variance_formula='pweight'` or `'pweight_2sls'`), while " + "`survey_design=SurveyDesign(...)` composes Binder-TSL " + "(`'survey_binder_tsl'` or `'survey_binder_tsl_2sls'`). The " + "long-term unification is tracked for the next minor release." +) +HAD_DEPRECATION_MSG_WEIGHTS_KWARG_ARRAY_IN = ( + "`weights=np.ndarray` is deprecated on array-in pretest helpers; use " + "`survey_design=make_pweight_design(weights)` instead " + "(import `make_pweight_design` from `diff_diff`). Will be removed in " + "the next minor release." +) + + @dataclass class SurveyMetadata: """ diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 77c62195..f5d1c596 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2347,7 +2347,8 @@ Under `survey=SurveyDesign(weights, strata, psu, fpc)`, the variance composes vi - **Note:** Monte Carlo oracle consistency — `tests/test_had_mc.py` validates that the weighted estimator recovers the oracle τ under informative sampling, with coverage near nominal and visible bias reduction vs unweighted. Slow-gated; 4 tests. - **Note:** Auto-bandwidth selection (Phase 1b MSE-DPI via `lpbwselect_mse_dpi`) remains UNWEIGHTED in this phase; users who want a weight-aware bandwidth should pass `h`/`b` explicitly. The auto path with uniform weights reduces to the existing unweighted bandwidth selector, so the uniform-weights bit-parity chain is preserved. - **Note:** Replicate-weight SurveyDesigns (BRR / Fay / JK1 / JKn / SDR) on the HAD continuous path raise `NotImplementedError` in this PR; Rao-Wu-style rescaled bootstrap is deferred to Phase 4.5 C (survey-under-pretests). -- **Note:** `HeterogeneousAdoptionDiD.fit()` dispatch matrix after Phase 4.5 B — survey / weights are supported on ALL design × aggregate combinations (continuous × {overall, event-study}, mass-point × {overall, event-study}). Pretests (`qug_test`, `stute_test`, `yatchew_hr_test`, joint Stute variants, `did_had_pretest_workflow`) still do NOT accept `survey=` / `weights=` — deferred to Phase 4.5 C / C0 per reciprocal-guard discipline. +- **Note:** `HeterogeneousAdoptionDiD.fit()` dispatch matrix after Phase 4.5 B + 4.5 C — survey/weights are supported on ALL design × aggregate combinations (continuous × {overall, event-study}, mass-point × {overall, event-study}). The HAD pretests (`qug_test`, `stute_test`, `yatchew_hr_test`, joint Stute variants, `did_had_pretest_workflow`) ship survey support in Phase 4.5 C (PR #370) — `qug_test` permanently rejects (Phase 4.5 C0 deferral; see "QUG Null Test" §); the linearity family supports pweight + PSU + FPC via PSU-level Mammen multipliers (Stute) + closed-form weighted variance components (Yatchew); replicate-weight and stratified designs raise `NotImplementedError` (parallel follow-ups). The canonical kwarg on all 8 HAD surfaces is `survey_design=` (see "Note (HAD survey-design API consolidation)" below); `survey=` / `weights=` remain accepted as deprecated aliases for one minor cycle. +- **Note (HAD survey-design API consolidation):** All 8 HAD surfaces — `HeterogeneousAdoptionDiD.fit`, `did_had_pretest_workflow`, `qug_test`, `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` — accept the canonical kwarg `survey_design=` (matching `ContinuousDiD`, `EfficientDiD`, `ChaisemartinDHaultfoeuille`). The pre-existing dual `survey=` and `weights=` kwargs become deprecated aliases (`DeprecationWarning`); both will be removed in the next minor release. Internal back-end behavior is UNCHANGED (the legacy paths for `weights=np.ndarray` and `survey=SurveyDesign(...)` still execute the same code; only the entry signature wraps them). Mutex semantics extend from 2-way (`survey + weights`) to 3-way (`survey_design + survey + weights`) — at most one may be non-None per call. Distinct mutex error messages by surface group: data-in surfaces (HAD.fit + workflow + joint data-in wrappers) point users to `survey_design=SurveyDesign(weights='col_name', ...)`; the three array-in linearity helpers (`stute_test` / `yatchew_hr_test` / `stute_joint_pretest`) point to `survey_design=make_pweight_design(arr)` (for pweight-only) or `survey_design=` (for full PSU/strata/FPC). The 8th surface — `qug_test` — has its own qug-specific mutex message that does NOT advertise `make_pweight_design(arr)` as a migration target; QUG-under-survey is permanently rejected (Phase 4.5 C0 deferral, see "QUG Null Test" §) regardless of which kwarg variant the caller uses, so the migration path doesn't apply. Array-in helpers reject `survey_design=SurveyDesign(...)` with `TypeError` since they have no `data` to resolve column names against. The `make_pweight_design(weights: np.ndarray) -> ResolvedSurveyDesign` factory is exported from the `diff_diff` top level (formerly `survey._make_trivial_resolved`, kept as a permanent private alias for back-compat); `weights` must be 1-D (scalar / 0-D / column-vector inputs raise `ValueError` at the front door). *Weighted 2SLS (Phase 4.5 B):* `_fit_mass_point_2sls(..., weights=, return_influence=)` extends the Wald-IV / 2SLS sandwich with pweight semantics: - **Weighted bread**: `Z'WX = Z'·diag(w)·X` (`w¹`, matches `estimatr::iv_robust(..., weights=)` weighted-bread convention). @@ -2431,8 +2432,8 @@ Tuning-parameter-free test of `H_0: d̲ = 0` versus `H_1: d̲ > 0`. Shipped in ` 3. **The literature on EVT under unequal-probability sampling is sparse.** Quintos et al. (2001) and Beirlant et al. cover tail-INDEX estimation under unequal sample sizes. There is no off-the-shelf method for "test the support endpoint under complex sampling" in the standard survey-statistics toolkit. Adapting Hill / Pickands / DEdH estimators to the boundary problem would be novel research, not engineering. The de Chaisemartin et al. (2026) paper itself does not discuss survey extensions of QUG. The survey-compatible alternative for HAD pretesting is **joint Stute** (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF for which Krieger-Pfeffermann (1997) + a survey-aware multiplier bootstrap give a calibrated test. Phase 4.5 C (PR #370) ships survey support for the linearity family — the **PSU-level Mammen multiplier bootstrap** for `stute_test` and the joint variants (NOT Rao-Wu rescaling — multiplier bootstrap is a different mechanism), and **closed-form weighted OLS + pweight-sandwich variance components** for `yatchew_hr_test`. See the dedicated Note (Phase 4.5 C) below for the full algorithm. **Research direction (out of scope for diff-diff):** the bridge IS sketchable by combining (a) endpoint-estimation EVT under iid (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006); (b) survey-aware functional CLT for the empirical process (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017); and (c) tail-empirical-process theory (Drees 2003) to define a "design-effective boundary intensity" `λ_eff = Σ_h W_h · f_h(0+)`. Under a "no boundary clumping" assumption (`P(D_{(1)}, D_{(2)}` in same PSU `| both ≤ δ) → 0`), the `Exp(1)/Exp(1)` limit law's pivotality is preserved and only the calibration needs a survey-aware bootstrap (subsampling within strata per Politis-Romano-Wolf, or Bertail et al.'s design-aware bootstrap). This is publishable methodology research — one paper, ~6-12 months for a methods PhD student. If the bridge gets built and published externally, this gate can be revisited. -- **Note (Phase 4.5 C):** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` accept `weights=` and `survey=ResolvedSurveyDesign` kwargs (or `survey=SurveyDesign` for the data-in entries). Mechanism varies by test: - - **Stute family** (`stute_test`, `stute_joint_pretest`, joint wrappers) uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap). Each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix; multipliers broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`. The bootstrap residual perturbation is `dy_b = fitted + eps * eta_obs` (paper Appendix D wild-bootstrap form — multipliers attach to UNWEIGHTED residuals; the weighting flows through the OLS refit + the weighted CvM, NOT through the perturbation step). Followed by weighted OLS refit (`_fit_weighted_ols_intercept_slope`) and weighted CvM recompute via `_cvm_statistic_weighted`. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence (Delgado 1993; Escanciano 2006) AND PSU clustering (Krieger-Pfeffermann 1997). PSU-shared multipliers are conservative under no-within-PSU outcome correlation (over-clustering gives conservative size in finite samples), asymptotically correct under the standard survey assumption that PSU is the ultimate sampling unit AND outcomes correlate within PSU. The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (constructed via `survey._make_trivial_resolved`) so the kernel is shared across both entry paths. NOT "Rao-Wu rescaled bootstrap" — different mechanism (the Rao-Wu kernel rescales per-unit weights via stratified PSU resampling, while this kernel applies multipliers without resampling). +- **Note (Phase 4.5 C):** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` accept `survey_design=` (canonical) plus the deprecated aliases `survey=` and `weights=` (DeprecationWarning, removal next minor — see "Note (HAD survey-design API consolidation)" below). On data-in surfaces (`did_had_pretest_workflow`, `joint_pretrends_test`, `joint_homogeneity_test`), `survey_design=` accepts a `SurveyDesign` (resolved against `data` at fit time). On array-in surfaces (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`), `survey_design=` accepts a pre-resolved `ResolvedSurveyDesign`; for the pweight-only convenience, construct via `survey_design=make_pweight_design(arr)` (`make_pweight_design` exported from the `diff_diff` top level). Mechanism varies by test: + - **Stute family** (`stute_test`, `stute_joint_pretest`, joint wrappers) uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap). Each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix; multipliers broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`. The bootstrap residual perturbation is `dy_b = fitted + eps * eta_obs` (paper Appendix D wild-bootstrap form — multipliers attach to UNWEIGHTED residuals; the weighting flows through the OLS refit + the weighted CvM, NOT through the perturbation step). Followed by weighted OLS refit (`_fit_weighted_ols_intercept_slope`) and weighted CvM recompute via `_cvm_statistic_weighted`. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence (Delgado 1993; Escanciano 2006) AND PSU clustering (Krieger-Pfeffermann 1997). PSU-shared multipliers are conservative under no-within-PSU outcome correlation (over-clustering gives conservative size in finite samples), asymptotically correct under the standard survey assumption that PSU is the ultimate sampling unit AND outcomes correlate within PSU. The pweight-only entry (`survey_design=make_pweight_design(arr)`, or the deprecated `weights=arr` alias) routes through a synthetic trivial `ResolvedSurveyDesign` (constructed via `make_pweight_design`, the public alias for the formerly private `survey._make_trivial_resolved`) so the kernel is shared across both entry paths. NOT "Rao-Wu rescaled bootstrap" — different mechanism (the Rao-Wu kernel rescales per-unit weights via stratified PSU resampling, while this kernel applies multipliers without resampling). - **Yatchew** (`yatchew_hr_test`) uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap). All three components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` in `TestYatchewHRTestSurvey::test_weighted_reduces_to_unweighted_at_uniform_weights`): - `sigma2_lin = sum(w * eps^2) / sum(w)` (weighted OLS residual variance). - `sigma2_diff = sum(w_avg * (dy_g - dy_{g-1})^2) / (2 * sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g + w_{g-1})/2`. Divisor uses `sum(w)` (=G at `w=1`), NOT `sum(w_avg)`, to match the existing `(1/(2G))` unweighted formula at `had_pretests.py:1635`. diff --git a/tests/test_had.py b/tests/test_had.py index b499cf3c..a7b5d79a 100644 --- a/tests/test_had.py +++ b/tests/test_had.py @@ -972,13 +972,16 @@ def test_aggregate_invalid_raises(self): ) def test_survey_bad_type_raises(self): - """survey= must be a SurveyDesign-like object with a .weights - attribute; a bare string (or any object lacking .weights) raises - TypeError front-door.""" + """survey= must be a SurveyDesign-like object with a `.resolve()` + method; a bare string (or any object lacking `.resolve()`) raises + TypeError front-door. Updated PR #376 R8 P1: the data-in type + guard now runs at the canonical entry and rejects on the + `hasattr(survey, "resolve")` check (which catches both bare + strings and ResolvedSurveyDesign / make_pweight_design output).""" d, dy = _dgp_continuous_at_zero(200, seed=0) panel = _make_panel(d, dy) est = HeterogeneousAdoptionDiD() - with pytest.raises(TypeError, match="SurveyDesign-like"): + with pytest.raises(TypeError, match="SurveyDesign"): est.fit( panel, "outcome", @@ -3389,7 +3392,7 @@ def test_survey_and_weights_mutex(self): panel_with_w = panel.assign(w=row_w) sd = SurveyDesign(weights="w") est = HeterogeneousAdoptionDiD(design="continuous_at_zero") - with pytest.raises(ValueError, match="OR weights"): + with pytest.raises(ValueError, match="at most one of"): est.fit( panel_with_w, "outcome", diff --git a/tests/test_had_dual_knob_deprecation.py b/tests/test_had_dual_knob_deprecation.py new file mode 100644 index 00000000..d66e65fb --- /dev/null +++ b/tests/test_had_dual_knob_deprecation.py @@ -0,0 +1,1312 @@ +"""Tests for HAD survey_design= consolidation + soft deprecation cycle. + +Covers all 8 HAD surfaces (HAD.fit + did_had_pretest_workflow + 4 array-in +pretests + 2 data-in joint wrappers) per the consolidation plan +(`whimsical-brewing-liskov.md`). Each surface gets: + +1. survey_design= positive smoke (new kwarg accepted, finite output). +2. weights= deprecation warning (DeprecationWarning emitted; back-compat + numerics preserved). +3. survey= deprecation warning (DeprecationWarning emitted; back-compat + numerics preserved). +4. Numerical parity legacy ≡ new at atol=0 (skipped on qug_test, which + raises NotImplementedError on all paths). +5. Three-way mutex ValueError (any 2-of-3 combo). + +Plus surface-spanning tests: +- make_pweight_design importable from diff_diff top-level. +- make_pweight_design ≡ _make_trivial_resolved (private alias). +- Array-in helpers reject SurveyDesign (TypeError). +- Bit-exact normalization-order invariant (scale-invariance). +- qug_test surface symmetry (signature consistent with siblings). +""" + +import warnings + +import numpy as np +import pandas as pd +import pytest + +from diff_diff import ( + HeterogeneousAdoptionDiD, + SurveyDesign, + did_had_pretest_workflow, + joint_homogeneity_test, + joint_pretrends_test, + make_pweight_design, + qug_test, + stute_joint_pretest, + stute_test, + yatchew_hr_test, +) +from diff_diff.survey import ResolvedSurveyDesign + +# ============================================================================= +# Fixtures +# ============================================================================= + + +@pytest.fixture +def array_in_data(): + """Simple (d, dy) arrays for the 3 numeric array-in helpers.""" + rng = np.random.default_rng(0) + G = 30 + d = rng.uniform(0, 1, size=G) + dy = 0.5 + 1.5 * d + rng.normal(0, 0.3, size=G) + return d, dy + + +@pytest.fixture +def array_in_doses(): + """Just doses for qug_test (single-array).""" + return np.array([0.1, 0.3, 0.5, 0.7, 0.9]) + + +@pytest.fixture +def two_period_panel(): + """Two-period panel for HAD.fit + did_had_pretest_workflow on + aggregate='overall'. G=200 units, T=2 periods, dose constant within unit, + Beta(0.5, 1) draws so d.min() approaches 0 (boundary at 0 satisfied for + Design 1' continuous_at_zero).""" + rng = np.random.default_rng(1) + G = 200 + # Beta(0.5, 1) puts mass near 0; d.min() will be very small relative to + # median, satisfying the Design 1' boundary heuristic. + d = rng.beta(0.5, 1.0, size=G) + rows = [] + for g in range(G): + for t in (0, 1): + y = 0.0 if t == 0 else d[g] * 1.2 + rng.normal(0, 0.1) + rows.append({"unit": g, "time": t, "y": y, "d": (0.0 if t == 0 else d[g])}) + df = pd.DataFrame(rows) + df["w"] = 1.0 # uniform weight column for SurveyDesign(weights="w") + return df + + +@pytest.fixture +def event_study_panel(): + """Multi-period panel for joint_pretrends/joint_homogeneity workflows.""" + rng = np.random.default_rng(2) + G = 30 + rows = [] + F = 2 + for g in range(G): + d_g = rng.uniform(0.0, 1.0) + for t in range(4): + d_t = 0.0 if t < F else d_g + y = (0.0 if t < F else d_t * 1.5) + rng.normal(0, 0.15) + rows.append({"unit": g, "time": t, "y": y, "d": d_t}) + df = pd.DataFrame(rows) + df["w"] = 1.0 + return df + + +# ============================================================================= +# 1. Surface-spanning tests +# ============================================================================= + + +class TestPublicHelpers: + def test_make_pweight_design_export(self): + """make_pweight_design is importable from the diff_diff top level.""" + from diff_diff import make_pweight_design as mpd + + assert mpd is make_pweight_design + + def test_make_pweight_design_returns_resolved(self): + w = np.array([1.0, 2.0, 3.0, 4.0]) + resolved = make_pweight_design(w) + assert isinstance(resolved, ResolvedSurveyDesign) + assert resolved.weight_type == "pweight" + assert resolved.strata is None + assert resolved.psu is None + assert resolved.fpc is None + assert resolved.replicate_weights is None + assert resolved.n_strata == 0 + assert resolved.n_psu == 4 + assert np.array_equal(resolved.weights, w.astype(np.float64)) + + def test_make_pweight_design_eq_underscore_alias(self): + """Permanent private alias _make_trivial_resolved IS make_pweight_design.""" + from diff_diff.survey import _make_trivial_resolved + + assert _make_trivial_resolved is make_pweight_design + + def test_make_pweight_design_rejects_scalar(self): + """PR #376 R3 P1: scalar / 0-D inputs raise a clear front-door + ValueError instead of bubbling a low-level numpy or dataclass + exception (was: `1.0` would fail at `int(w.shape[0])` with + `IndexError: tuple index out of range`).""" + with pytest.raises(ValueError, match="weights must be 1-dimensional"): + make_pweight_design(1.0) + + def test_make_pweight_design_rejects_zero_d_array(self): + """PR #376 R3 P1: `np.array(1.0)` (0-D ndarray) raises ValueError.""" + with pytest.raises(ValueError, match="weights must be 1-dimensional"): + make_pweight_design(np.array(1.0)) + + def test_make_pweight_design_rejects_column_vector(self): + """PR #376 R3 P1: `(n, 1)` column vectors raise ValueError pointing + users to `df['w'].to_numpy()` instead of `df[['w']].to_numpy()`.""" + with pytest.raises(ValueError, match="weights must be 1-dimensional"): + make_pweight_design(np.ones((5, 1))) + + def test_array_in_helpers_legacy_weights_scalar_raises_value_error(self, array_in_data): + """PR #376 R3 P1: deprecated `weights=scalar` on array-in helpers + also raises ValueError (the shim routes through make_pweight_design, + which catches scalars at its front door).""" + d, dy = array_in_data + with pytest.raises(ValueError, match="weights must be 1-dimensional"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + stute_test(d, dy, weights=1.0, n_bootstrap=199, seed=0) + with pytest.raises(ValueError, match="weights must be 1-dimensional"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + yatchew_hr_test(d, dy, weights=1.0) + + def test_qug_test_legacy_weights_scalar_raises_value_error(self, array_in_doses): + """PR #376 R3 P1: deprecated `weights=scalar` on qug_test also raises + ValueError (the shim routes through make_pweight_design before the + NotImplementedError gate).""" + with pytest.raises(ValueError, match="weights must be 1-dimensional"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + qug_test(array_in_doses, weights=1.0) + + +class TestArrayInTypeGuard: + """Array-in helpers reject SurveyDesign (cannot resolve column names). + + Both the canonical `survey_design=SurveyDesign(...)` form AND the + deprecated `survey=SurveyDesign(...)` alias trigger the same TypeError + (PR #376 R1 P1: alias must behave identically to the canonical kwarg). + """ + + def test_stute_test_rejects_SurveyDesign(self, array_in_data): + d, dy = array_in_data + with pytest.raises(TypeError, match="make_pweight_design"): + stute_test(d, dy, survey_design=SurveyDesign(weights="w"), n_bootstrap=199, seed=0) + + def test_stute_test_rejects_SurveyDesign_via_legacy_alias(self, array_in_data): + """PR #376 R1 P1: `survey=SurveyDesign(...)` (deprecated alias) must + trigger the same TypeError as `survey_design=SurveyDesign(...)`.""" + d, dy = array_in_data + with pytest.raises(TypeError, match="make_pweight_design"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + stute_test(d, dy, survey=SurveyDesign(weights="w"), n_bootstrap=199, seed=0) + + def test_yatchew_hr_test_rejects_SurveyDesign(self, array_in_data): + d, dy = array_in_data + with pytest.raises(TypeError, match="make_pweight_design"): + yatchew_hr_test(d, dy, survey_design=SurveyDesign(weights="w")) + + def test_yatchew_hr_test_rejects_SurveyDesign_via_legacy_alias(self, array_in_data): + """PR #376 R1 P1: alias parity with canonical kwarg.""" + d, dy = array_in_data + with pytest.raises(TypeError, match="make_pweight_design"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + yatchew_hr_test(d, dy, survey=SurveyDesign(weights="w")) + + def test_stute_joint_pretest_rejects_SurveyDesign(self): + rng = np.random.default_rng(3) + G = 30 + d = rng.uniform(0, 1, size=G) + residuals = {0: rng.normal(0, 0.1, G)} + fitted = {0: np.zeros(G)} + X = np.column_stack([np.ones(G), d]) + with pytest.raises(TypeError, match="make_pweight_design"): + stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + + def test_stute_joint_pretest_rejects_SurveyDesign_via_legacy_alias(self): + """PR #376 R1 P1: alias parity with canonical kwarg.""" + rng = np.random.default_rng(3) + G = 30 + d = rng.uniform(0, 1, size=G) + residuals = {0: rng.normal(0, 0.1, G)} + fitted = {0: np.zeros(G)} + X = np.column_stack([np.ones(G), d]) + with pytest.raises(TypeError, match="make_pweight_design"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + survey=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + + +class TestScaleInvariance: + """Bit-exact normalization-order invariant (Stability invariant #7). + + The legacy weights= deprecation shim binds + `survey_design = make_pweight_design(weights_unnormalized)` and lets + the unified survey_design= path apply the mean=1 normalization step + EXACTLY ONCE downstream. If the shim pre-normalized AND the unified + path also normalized, the test statistic would scale differently + under multiplicative weight rescaling. + """ + + def test_stute_weights_alias_scale_invariant(self, array_in_data): + d, dy = array_in_data + w = np.random.default_rng(4).uniform(0.5, 1.5, size=30) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r1 = stute_test(d, dy, weights=w, n_bootstrap=199, seed=0) + r2 = stute_test(d, dy, weights=w * 100.0, n_bootstrap=199, seed=0) + # Use atol/rtol=1e-14 (per `feedback_assert_allclose_numerical_parity`): + # the mean=1 normalization step `w * G/sum(w)` produces results that + # agree to ~16 significant figures but not bit-exactly across + # multiplicative rescaling (FP rounding in the renormalization step). + np.testing.assert_allclose(r1.cvm_stat, r2.cvm_stat, atol=1e-14, rtol=1e-14) + + def test_yatchew_weights_alias_scale_invariant(self, array_in_data): + d, dy = array_in_data + w = np.random.default_rng(5).uniform(0.5, 1.5, size=30) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r1 = yatchew_hr_test(d, dy, weights=w) + r2 = yatchew_hr_test(d, dy, weights=w * 100.0) + np.testing.assert_allclose(r1.t_stat_hr, r2.t_stat_hr, atol=1e-14, rtol=1e-14) + + +# ============================================================================= +# 2. Per-surface deprecation + parity tests +# ============================================================================= + + +class TestQUGTestDeprecation: + """qug_test (array-in, gated): all paths raise NotImplementedError; + consolidation tests focus on the deprecation/mutex cascade.""" + + def test_survey_design_kwarg_raises_notimpl(self, array_in_doses): + with pytest.raises(NotImplementedError, match="QUG"): + qug_test(array_in_doses, survey_design=make_pweight_design(np.ones(5))) + + def test_weights_emits_deprecation_warning(self, array_in_doses): + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + with pytest.raises(NotImplementedError): + qug_test(array_in_doses, weights=np.ones(5)) + + def test_survey_emits_deprecation_warning(self, array_in_doses): + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + with pytest.raises(NotImplementedError): + qug_test(array_in_doses, survey=SurveyDesign(weights="w")) + + def test_three_way_mutex_design_plus_survey(self, array_in_doses): + with pytest.raises(ValueError, match="at most one of"): + qug_test( + array_in_doses, + survey_design=make_pweight_design(np.ones(5)), + survey=SurveyDesign(weights="w"), + ) + + def test_three_way_mutex_design_plus_weights(self, array_in_doses): + with pytest.raises(ValueError, match="at most one of"): + qug_test( + array_in_doses, + survey_design=make_pweight_design(np.ones(5)), + weights=np.ones(5), + ) + + def test_three_way_mutex_all_three(self, array_in_doses): + with pytest.raises(ValueError, match="at most one of"): + qug_test( + array_in_doses, + survey_design=make_pweight_design(np.ones(5)), + survey=SurveyDesign(weights="w"), + weights=np.ones(5), + ) + + +class TestStuteTestDeprecation: + def test_survey_design_kwarg_smoke(self, array_in_data): + d, dy = array_in_data + w = np.ones(30) + r = stute_test(d, dy, survey_design=make_pweight_design(w), n_bootstrap=199, seed=0) + assert np.isfinite(r.cvm_stat) + assert 0.0 <= r.p_value <= 1.0 + + def test_weights_emits_deprecation_warning(self, array_in_data): + d, dy = array_in_data + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + stute_test(d, dy, weights=np.ones(30), n_bootstrap=199, seed=0) + + def test_survey_emits_deprecation_warning(self, array_in_data): + d, dy = array_in_data + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + stute_test( + d, + dy, + survey=make_pweight_design(np.ones(30)), + n_bootstrap=199, + seed=0, + ) + + def test_numerical_parity_weights_legacy_eq_new(self, array_in_data): + d, dy = array_in_data + w = np.random.default_rng(7).uniform(0.5, 1.5, size=30) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = stute_test(d, dy, weights=w, n_bootstrap=199, seed=0) + r_new = stute_test(d, dy, survey_design=make_pweight_design(w), n_bootstrap=199, seed=0) + assert r_legacy.cvm_stat == r_new.cvm_stat + assert r_legacy.p_value == r_new.p_value + + def test_numerical_parity_survey_legacy_eq_new(self, array_in_data): + d, dy = array_in_data + w = np.random.default_rng(8).uniform(0.5, 1.5, size=30) + resolved = make_pweight_design(w) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = stute_test(d, dy, survey=resolved, n_bootstrap=199, seed=0) + r_new = stute_test(d, dy, survey_design=resolved, n_bootstrap=199, seed=0) + assert r_legacy.cvm_stat == r_new.cvm_stat + assert r_legacy.p_value == r_new.p_value + + def test_three_way_mutex_design_plus_survey(self, array_in_data): + d, dy = array_in_data + w = np.ones(30) + with pytest.raises(ValueError, match="at most one of"): + stute_test( + d, + dy, + survey_design=make_pweight_design(w), + survey=make_pweight_design(w), + n_bootstrap=199, + seed=0, + ) + + def test_three_way_mutex_all_three(self, array_in_data): + d, dy = array_in_data + w = np.ones(30) + with pytest.raises(ValueError, match="at most one of"): + stute_test( + d, + dy, + survey_design=make_pweight_design(w), + survey=make_pweight_design(w), + weights=w, + n_bootstrap=199, + seed=0, + ) + + +class TestYatchewHRTestDeprecation: + def test_survey_design_kwarg_smoke(self, array_in_data): + d, dy = array_in_data + w = np.ones(30) + r = yatchew_hr_test(d, dy, survey_design=make_pweight_design(w)) + assert np.isfinite(r.t_stat_hr) + + def test_weights_emits_deprecation_warning(self, array_in_data): + d, dy = array_in_data + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + yatchew_hr_test(d, dy, weights=np.ones(30)) + + def test_survey_emits_deprecation_warning(self, array_in_data): + d, dy = array_in_data + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + yatchew_hr_test(d, dy, survey=make_pweight_design(np.ones(30))) + + def test_numerical_parity_weights_legacy_eq_new(self, array_in_data): + d, dy = array_in_data + w = np.random.default_rng(9).uniform(0.5, 1.5, size=30) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = yatchew_hr_test(d, dy, weights=w) + r_new = yatchew_hr_test(d, dy, survey_design=make_pweight_design(w)) + assert r_legacy.t_stat_hr == r_new.t_stat_hr + assert r_legacy.p_value == r_new.p_value + + def test_three_way_mutex_design_plus_weights(self, array_in_data): + d, dy = array_in_data + with pytest.raises(ValueError, match="at most one of"): + yatchew_hr_test( + d, + dy, + survey_design=make_pweight_design(np.ones(30)), + weights=np.ones(30), + ) + + +class TestStuteJointPretestDeprecation: + def _setup(self): + rng = np.random.default_rng(10) + G = 30 + d = rng.uniform(0, 1, size=G) + residuals = {0: rng.normal(0, 0.1, G), 1: rng.normal(0, 0.1, G)} + fitted = {0: np.zeros(G), 1: np.zeros(G)} + X = np.column_stack([np.ones(G), d]) + return d, residuals, fitted, X + + def test_survey_design_kwarg_smoke(self): + d, residuals, fitted, X = self._setup() + w = np.ones(30) + r = stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + survey_design=make_pweight_design(w), + n_bootstrap=199, + seed=0, + ) + assert np.isfinite(r.cvm_stat_joint) + + def test_weights_emits_deprecation_warning(self): + d, residuals, fitted, X = self._setup() + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + weights=np.ones(30), + n_bootstrap=199, + seed=0, + ) + + def test_survey_emits_deprecation_warning(self): + d, residuals, fitted, X = self._setup() + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + survey=make_pweight_design(np.ones(30)), + n_bootstrap=199, + seed=0, + ) + + def test_numerical_parity_weights_legacy_eq_new(self): + d, residuals, fitted, X = self._setup() + w = np.random.default_rng(11).uniform(0.5, 1.5, size=30) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + weights=w, + n_bootstrap=199, + seed=0, + ) + r_new = stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + survey_design=make_pweight_design(w), + n_bootstrap=199, + seed=0, + ) + assert r_legacy.cvm_stat_joint == r_new.cvm_stat_joint + assert r_legacy.p_value == r_new.p_value + + def test_three_way_mutex_all_three(self): + d, residuals, fitted, X = self._setup() + w = np.ones(30) + with pytest.raises(ValueError, match="at most one of"): + stute_joint_pretest( + residuals_by_horizon=residuals, + fitted_by_horizon=fitted, + doses=d, + design_matrix=X, + survey_design=make_pweight_design(w), + survey=make_pweight_design(w), + weights=w, + n_bootstrap=199, + seed=0, + ) + + +class TestJointPretrendsTestDeprecation: + def test_survey_design_kwarg_smoke(self, event_study_panel): + df = event_study_panel + r = joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert np.isfinite(r.cvm_stat_joint) + + def test_weights_emits_deprecation_warning(self, event_study_panel): + df = event_study_panel + n = len(df) + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + + def test_survey_emits_deprecation_warning(self, event_study_panel): + df = event_study_panel + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + survey=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + + def test_three_way_mutex_design_plus_survey(self, event_study_panel): + df = event_study_panel + n = len(df) + with pytest.raises(ValueError, match="at most one of"): + joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + survey_design=SurveyDesign(weights="w"), + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + + def test_legacy_alias_parity_survey(self, event_study_panel): + """PR #376 R9 P3: deprecated `survey=SurveyDesign(...)` ≡ canonical + `survey_design=SurveyDesign(...)` on joint_pretrends_test (locks + rebinding parity).""" + df = event_study_panel + sd = SurveyDesign(weights="w") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + survey=sd, + n_bootstrap=199, + seed=0, + ) + r_new = joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + survey_design=sd, + n_bootstrap=199, + seed=0, + ) + assert r_legacy.cvm_stat_joint == r_new.cvm_stat_joint + assert r_legacy.p_value == r_new.p_value + + def test_legacy_alias_parity_weights(self, event_study_panel): + """PR #376 R10 P3: deprecated `weights=np.ones(n)` ≡ canonical + `survey_design=SurveyDesign(weights="w")` (uniform 1.0 column) on + joint_pretrends_test.""" + df = event_study_panel + n = len(df) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + r_new = joint_pretrends_test( + df, + "y", + "d", + "time", + "unit", + pre_periods=[0], + base_period=1, + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert r_legacy.cvm_stat_joint == r_new.cvm_stat_joint + assert r_legacy.p_value == r_new.p_value + + +class TestJointHomogeneityTestDeprecation: + def test_survey_design_kwarg_smoke(self, event_study_panel): + df = event_study_panel + r = joint_homogeneity_test( + df, + "y", + "d", + "time", + "unit", + post_periods=[2, 3], + base_period=1, + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert np.isfinite(r.cvm_stat_joint) + + def test_weights_emits_deprecation_warning(self, event_study_panel): + df = event_study_panel + n = len(df) + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + joint_homogeneity_test( + df, + "y", + "d", + "time", + "unit", + post_periods=[2, 3], + base_period=1, + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + + def test_survey_emits_deprecation_warning(self, event_study_panel): + df = event_study_panel + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + joint_homogeneity_test( + df, + "y", + "d", + "time", + "unit", + post_periods=[2, 3], + base_period=1, + survey=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + + def test_legacy_alias_parity_survey(self, event_study_panel): + """PR #376 R9 P3: deprecated `survey=SurveyDesign(...)` ≡ canonical + `survey_design=SurveyDesign(...)` on joint_homogeneity_test.""" + df = event_study_panel + sd = SurveyDesign(weights="w") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = joint_homogeneity_test( + df, + "y", + "d", + "time", + "unit", + post_periods=[2, 3], + base_period=1, + survey=sd, + n_bootstrap=199, + seed=0, + ) + r_new = joint_homogeneity_test( + df, + "y", + "d", + "time", + "unit", + post_periods=[2, 3], + base_period=1, + survey_design=sd, + n_bootstrap=199, + seed=0, + ) + assert r_legacy.cvm_stat_joint == r_new.cvm_stat_joint + assert r_legacy.p_value == r_new.p_value + + def test_legacy_alias_parity_weights(self, event_study_panel): + """PR #376 R10 P3: deprecated `weights=np.ones(n)` ≡ canonical + `survey_design=SurveyDesign(weights="w")` on + joint_homogeneity_test.""" + df = event_study_panel + n = len(df) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = joint_homogeneity_test( + df, + "y", + "d", + "time", + "unit", + post_periods=[2, 3], + base_period=1, + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + r_new = joint_homogeneity_test( + df, + "y", + "d", + "time", + "unit", + post_periods=[2, 3], + base_period=1, + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert r_legacy.cvm_stat_joint == r_new.cvm_stat_joint + assert r_legacy.p_value == r_new.p_value + + +class TestHADFitDeprecation: + def test_survey_design_kwarg_smoke(self, two_period_panel): + df = two_period_panel + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + r = est.fit(df, "y", "d", "time", "unit", survey_design=SurveyDesign(weights="w")) + assert np.isfinite(r.att) + + def test_weights_emits_deprecation_warning(self, two_period_panel): + df = two_period_panel + n = len(df) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + est.fit(df, "y", "d", "time", "unit", weights=np.ones(n)) + + def test_survey_emits_deprecation_warning(self, two_period_panel): + df = two_period_panel + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + est.fit(df, "y", "d", "time", "unit", survey=SurveyDesign(weights="w")) + + def test_three_way_mutex_design_plus_weights(self, two_period_panel): + df = two_period_panel + n = len(df) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + with pytest.raises(ValueError, match="at most one of"): + est.fit( + df, + "y", + "d", + "time", + "unit", + survey_design=SurveyDesign(weights="w"), + weights=np.ones(n), + ) + + def test_fit_rejects_pre_resolved_design_overall(self, two_period_panel): + """PR #376 R8 P1: HAD.fit() data-in surface must reject a + pre-resolved ResolvedSurveyDesign with TypeError pointing users to + `SurveyDesign(weights='col_name', ...)`. Mirrors the array-in + helpers' rejection of SurveyDesign — the data-in/array-in surface + split is symmetric.""" + df = two_period_panel + n = len(df) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + # survey_design=ResolvedSurveyDesign should raise TypeError. + with pytest.raises(TypeError, match=r"`survey_design=` accepts a SurveyDesign"): + est.fit( + df, + "y", + "d", + "time", + "unit", + survey_design=make_pweight_design(np.ones(n // 2)), + ) + + def test_fit_rejects_pre_resolved_design_event_study(self, event_study_continuous_panel): + """PR #376 R8 P1: same TypeError on aggregate='event_study'.""" + df = event_study_continuous_panel + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", n_bootstrap=99, seed=0) + with pytest.raises(TypeError, match=r"`survey_design=` accepts a SurveyDesign"): + est.fit( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + survey_design=make_pweight_design(np.ones(200)), + ) + + def test_fit_rejects_pre_resolved_design_via_legacy_alias_overall(self, two_period_panel): + """PR #376 R8 P1: deprecated `survey=ResolvedSurveyDesign` (alias) + also raises TypeError after the alias rebinding.""" + df = two_period_panel + n = len(df) + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + with pytest.raises(TypeError, match=r"`survey_design=` accepts a SurveyDesign"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + est.fit( + df, + "y", + "d", + "time", + "unit", + survey=make_pweight_design(np.ones(n // 2)), + ) + + def test_fit_rejects_pre_resolved_design_via_legacy_alias_event_study( + self, event_study_continuous_panel + ): + """PR #376 R8 P1: deprecated `survey=ResolvedSurveyDesign` (alias) + on event-study path also raises TypeError.""" + df = event_study_continuous_panel + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", n_bootstrap=99, seed=0) + with pytest.raises(TypeError, match=r"`survey_design=` accepts a SurveyDesign"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + est.fit( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + survey=make_pweight_design(np.ones(200)), + ) + + def test_legacy_positional_call_back_compat(self, two_period_panel): + """PR #376 R4 P1: pre-PR positional call shape for `survey`, + `weights`, `cband` MUST still work (the consolidation is additive, + not breaking). Tests the full positional sequence: + `fit(data, outcome, dose, time, unit, first_treat, aggregate, + survey, weights, cband)`.""" + df = two_period_panel + est = HeterogeneousAdoptionDiD(design="continuous_at_zero") + # Pre-PR positional order: ..., first_treat_col, aggregate, survey, + # weights, cband. None of these should be flagged as keyword-only. + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r = est.fit( + df, + "y", + "d", + "time", + "unit", + None, # first_treat_col + "overall", # aggregate + SurveyDesign(weights="w"), # survey (positional) + None, # weights (positional) + True, # cband (positional) + ) + assert np.isfinite(r.att) + + +class TestDidHadPretestWorkflowDeprecation: + def test_survey_design_kwarg_smoke(self, two_period_panel): + df = two_period_panel + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) # QUG-skip warning + report = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert report.qug is None # skipped under survey path + assert report.stute is not None + + def test_weights_emits_deprecation_warning(self, two_period_panel): + df = two_period_panel + n = len(df) + with pytest.warns(DeprecationWarning, match="weights=.*deprecated"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + # We still need to allow the DeprecationWarning to propagate + # to the outer pytest.warns; only filter UserWarning. + warnings.simplefilter("always", DeprecationWarning) + did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + + def test_survey_emits_deprecation_warning(self, two_period_panel): + df = two_period_panel + with pytest.warns(DeprecationWarning, match="survey=.*deprecated"): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + warnings.simplefilter("always", DeprecationWarning) + did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + survey=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + + def test_three_way_mutex_all_three(self, two_period_panel): + df = two_period_panel + n = len(df) + with pytest.raises(ValueError, match="at most one of"): + did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + survey_design=SurveyDesign(weights="w"), + survey=SurveyDesign(weights="w"), + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + + def test_legacy_alias_parity_survey_overall(self, two_period_panel): + """PR #376 R9 P3: deprecated `survey=SurveyDesign(...)` ≡ canonical + `survey_design=SurveyDesign(...)` on + did_had_pretest_workflow(aggregate='overall'). Locks rebinding + parity on the workflow's overall-path data-in surface.""" + df = two_period_panel + sd = SurveyDesign(weights="w") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) # QUG-skip warning + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + survey=sd, + n_bootstrap=199, + seed=0, + ) + r_new = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + survey_design=sd, + n_bootstrap=199, + seed=0, + ) + assert r_legacy.stute.cvm_stat == r_new.stute.cvm_stat + assert r_legacy.stute.p_value == r_new.stute.p_value + assert r_legacy.yatchew.t_stat_hr == r_new.yatchew.t_stat_hr + + def test_legacy_alias_parity_weights_overall(self, two_period_panel): + """PR #376 R10 P3: deprecated `weights=np.ones(n)` ≡ canonical + `survey_design=SurveyDesign(weights="w")` on + did_had_pretest_workflow(aggregate='overall'). Closes the data-in + rebinding-parity gap on the weights= shortcut path.""" + df = two_period_panel + n = len(df) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + r_new = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert r_legacy.stute.cvm_stat == r_new.stute.cvm_stat + assert r_legacy.stute.p_value == r_new.stute.p_value + assert r_legacy.yatchew.t_stat_hr == r_new.yatchew.t_stat_hr + + +# ============================================================================= +# 3. PR #376 R2 P1: extended dispatch-matrix coverage on the new front door +# ============================================================================= +# +# Reviewer flagged that the canonical `survey_design=` kwarg was added across +# all HAD design × aggregate combinations but only directly tested on the +# two-period continuous_at_zero / overall path. These tests cover the +# weighted mass_point overall path, the weighted continuous event-study +# path, and the workflow event-study path — each with both a +# `survey_design=` smoke and a legacy-alias parity check. + + +@pytest.fixture +def mass_point_panel(): + """Two-period panel with a continuous mass-point at d_lower=0.05. + + G=200 units, fraction `0.06 > 0.02` modal at d_lower triggers the + mass-point heuristic in HAD's auto-detection. Used to exercise + `design="mass_point"` survey_design= forwarding through the weighted + 2SLS sandwich. + """ + rng = np.random.default_rng(13) + G = 200 + n_modal = int(0.06 * G) # 12 units at d_lower + d_modal = np.full(n_modal, 0.05) + d_continuous = rng.uniform(0.06, 1.0, size=G - n_modal) + d = np.concatenate([d_modal, d_continuous]) + rng.shuffle(d) + rows = [] + for g in range(G): + for t in (0, 1): + y = 0.0 if t == 0 else d[g] * 1.2 + rng.normal(0, 0.1) + rows.append({"unit": g, "time": t, "y": y, "d": (0.0 if t == 0 else d[g])}) + df = pd.DataFrame(rows) + df["w"] = 1.0 + return df + + +@pytest.fixture +def event_study_continuous_panel(): + """Multi-period continuous_at_zero panel for HAD.fit aggregate='event_study'. + + G=200 units, T=3 periods (t=0 pre, t=1 base, t=2 post), Beta(0.5, 1) + doses so d.min() approaches 0 (Design 1' boundary heuristic satisfied), + F=2 (treatment starts at t=2).""" + rng = np.random.default_rng(14) + G = 200 + d = rng.beta(0.5, 1.0, size=G) + rows = [] + F = 2 + for g in range(G): + for t in range(3): + d_t = 0.0 if t < F else d[g] + y = (0.0 if t < F else d_t * 1.2) + rng.normal(0, 0.1) + rows.append({"unit": g, "time": t, "y": y, "d": d_t}) + df = pd.DataFrame(rows) + df["w"] = 1.0 + return df + + +class TestHADFitMassPointSurveyDesign: + """PR #376 R2 P1: cover `design='mass_point'` + survey_design= path. + + Mass-point + survey requires vcov_type='hc1' (not the classical default) + per the documented Phase 4.5 B deviation: the survey path composes + Binder-TSL on the HC1-scale IF. + """ + + def test_survey_design_kwarg_smoke(self, mass_point_panel): + df = mass_point_panel + est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) # mass-point methodology warning + r = est.fit(df, "y", "d", "time", "unit", survey_design=SurveyDesign(weights="w")) + assert np.isfinite(r.att) + assert np.isfinite(r.se) + + def test_legacy_alias_parity_weights(self, mass_point_panel): + """weights=arr (deprecated) ≡ survey_design=SurveyDesign(weights='w') + produce identical point estimate on mass_point overall path. SE differs + by variance family (weights= → HC1 sandwich; survey_design= → + Binder-TSL on HC1-scale IF), so we assert att-only parity.""" + df = mass_point_panel + n = len(df) + est = HeterogeneousAdoptionDiD(design="mass_point", vcov_type="hc1") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + warnings.simplefilter("ignore", UserWarning) + r_legacy = est.fit(df, "y", "d", "time", "unit", weights=np.ones(n)) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r_new = est.fit(df, "y", "d", "time", "unit", survey_design=SurveyDesign(weights="w")) + np.testing.assert_allclose(r_legacy.att, r_new.att, atol=1e-10, rtol=1e-10) + + +class TestHADFitEventStudySurveyDesign: + """PR #376 R2 P1: cover aggregate='event_study' + cband=True + survey_design=.""" + + def test_survey_design_kwarg_smoke(self, event_study_continuous_panel): + df = event_study_continuous_panel + est = HeterogeneousAdoptionDiD(design="continuous_at_zero", n_bootstrap=99, seed=0) + r = est.fit( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + survey_design=SurveyDesign(weights="w"), + cband=True, + ) + # Event-study returns HeterogeneousAdoptionDiDEventStudyResults + assert r.att.shape[0] >= 1 + assert np.all(np.isfinite(r.att)) + assert r.cband_low is not None + assert r.cband_high is not None + + def test_legacy_alias_parity_survey(self, event_study_continuous_panel): + """survey=SurveyDesign(...) (deprecated) ≡ survey_design=SurveyDesign(...) + on event-study path.""" + df = event_study_continuous_panel + sd = SurveyDesign(weights="w") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = HeterogeneousAdoptionDiD( + design="continuous_at_zero", n_bootstrap=99, seed=0 + ).fit(df, "y", "d", "time", "unit", aggregate="event_study", survey=sd, cband=True) + r_new = HeterogeneousAdoptionDiD(design="continuous_at_zero", n_bootstrap=99, seed=0).fit( + df, "y", "d", "time", "unit", aggregate="event_study", survey_design=sd, cband=True + ) + np.testing.assert_array_equal(r_legacy.att, r_new.att) + np.testing.assert_array_equal(r_legacy.se, r_new.se) + + +class TestDidHadPretestWorkflowEventStudySurveyDesign: + """PR #376 R2 P1: cover did_had_pretest_workflow(aggregate='event_study', + survey_design=...).""" + + def test_survey_design_kwarg_smoke(self, event_study_panel): + df = event_study_panel + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) # QUG-skip + staggered + report = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert report.qug is None # skipped under survey path + assert report.homogeneity_joint is not None + + def test_legacy_alias_parity_survey(self, event_study_panel): + """survey=SurveyDesign(...) (deprecated) ≡ survey_design=SurveyDesign(...) + on workflow event-study path.""" + df = event_study_panel + sd = SurveyDesign(weights="w") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + warnings.simplefilter("ignore", DeprecationWarning) + r_legacy = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + survey=sd, + n_bootstrap=199, + seed=0, + ) + r_new = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + survey_design=sd, + n_bootstrap=199, + seed=0, + ) + # Joint Stute on the event-study path is bootstrap-driven; both calls + # use the same seed=0 + same survey design → identical bootstrap + # multiplier draws → identical p-values + statistics. + assert r_legacy.homogeneity_joint.cvm_stat_joint == r_new.homogeneity_joint.cvm_stat_joint + assert r_legacy.homogeneity_joint.p_value == r_new.homogeneity_joint.p_value + + def test_legacy_alias_parity_weights(self, event_study_panel): + """weights=arr (deprecated) ≡ survey_design=SurveyDesign(weights='w') + with uniform 1.0 weights on the workflow event-study path. Locks the + nested-DeprecationWarning suppression: the user-facing warning fires + ONCE at the workflow front door, no extra warnings from the joint + wrappers when survey/weights are forwarded internally.""" + df = event_study_panel + n = len(df) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + with warnings.catch_warnings(record=True) as w_record: + warnings.simplefilter("always", DeprecationWarning) + r_legacy = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + weights=np.ones(n), + n_bootstrap=199, + seed=0, + ) + # PR #376 R2 P3 fix: workflow event-study weights= path emits + # exactly ONE DeprecationWarning (not three — joint wrappers' + # nested warnings are suppressed since the user-facing one + # already fired at the workflow's front door). + n_dep_warnings = sum(1 for w in w_record if issubclass(w.category, DeprecationWarning)) + assert n_dep_warnings == 1, ( + f"expected 1 DeprecationWarning at workflow front door, got " f"{n_dep_warnings}" + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + r_new = did_had_pretest_workflow( + df, + "y", + "d", + "time", + "unit", + aggregate="event_study", + survey_design=SurveyDesign(weights="w"), + n_bootstrap=199, + seed=0, + ) + assert r_legacy.homogeneity_joint.cvm_stat_joint == r_new.homogeneity_joint.cvm_stat_joint + assert r_legacy.homogeneity_joint.p_value == r_new.homogeneity_joint.p_value diff --git a/tests/test_had_pretests.py b/tests/test_had_pretests.py index 4b97e2c2..b122e339 100644 --- a/tests/test_had_pretests.py +++ b/tests/test_had_pretests.py @@ -205,7 +205,7 @@ def test_mutex_both_set_raises_value_error(self): from diff_diff import SurveyDesign d = np.array([0.1, 0.5, 0.9]) - with pytest.raises(ValueError, match="OR weights=.*not both"): + with pytest.raises(ValueError, match="at most one of"): qug_test(d, survey=SurveyDesign(weights="w"), weights=np.ones(3)) def test_methodology_pointer_in_message(self): @@ -2881,7 +2881,7 @@ def test_workflow_mutex_both_raises(self): from diff_diff import SurveyDesign df = self._make_minimal_overall_panel(with_weight_col=True) - with pytest.raises(ValueError, match="OR weights=.*not both"): + with pytest.raises(ValueError, match="at most one of"): did_had_pretest_workflow( df, "y", @@ -3059,23 +3059,23 @@ def test_weights_smoke(self): def test_survey_smoke(self): """survey= via trivial ResolvedSurveyDesign produces a finite result.""" - from diff_diff.survey import _make_trivial_resolved + from diff_diff.survey import make_pweight_design d, dy = self._setup() w = np.random.default_rng(7).uniform(0.5, 2.0, size=30) - resolved = _make_trivial_resolved(w) + resolved = make_pweight_design(w) r = stute_test(d, dy, survey=resolved, n_bootstrap=199, seed=0) assert np.isfinite(r.cvm_stat) assert 0.0 <= r.p_value <= 1.0 def test_mutex_both_raises(self): """survey + weights mutex (mirrors workflow + qug_test pattern).""" - from diff_diff.survey import _make_trivial_resolved + from diff_diff.survey import make_pweight_design d, dy = self._setup() w = np.ones(30) - with pytest.raises(ValueError, match="OR weights=.*not both"): - stute_test(d, dy, weights=w, survey=_make_trivial_resolved(w), n_bootstrap=199, seed=0) + with pytest.raises(ValueError, match="at most one of"): + stute_test(d, dy, weights=w, survey=make_pweight_design(w), n_bootstrap=199, seed=0) def test_replicate_weights_raises(self): """Phase 4.5 C MEDIUM #4: replicate-weight survey designs raise @@ -3167,20 +3167,20 @@ def test_weights_smoke(self): assert 0.0 <= r.p_value <= 1.0 def test_survey_smoke(self): - from diff_diff.survey import _make_trivial_resolved + from diff_diff.survey import make_pweight_design d, dy = self._setup() w = np.random.default_rng(7).uniform(0.5, 2.0, size=30) - r = yatchew_hr_test(d, dy, survey=_make_trivial_resolved(w)) + r = yatchew_hr_test(d, dy, survey=make_pweight_design(w)) assert np.isfinite(r.t_stat_hr) def test_mutex_both_raises(self): - from diff_diff.survey import _make_trivial_resolved + from diff_diff.survey import make_pweight_design d, dy = self._setup() w = np.ones(30) - with pytest.raises(ValueError, match="not both"): - yatchew_hr_test(d, dy, weights=w, survey=_make_trivial_resolved(w)) + with pytest.raises(ValueError, match="at most one of"): + yatchew_hr_test(d, dy, weights=w, survey=make_pweight_design(w)) def test_zero_weight_rejected(self): """Per Reviewer Question #4: strictly-positive weights required @@ -3302,7 +3302,7 @@ def test_joint_pretrends_mutex_both_raises(self): df = self._make_event_study_panel() df["w"] = 1.0 - with pytest.raises(ValueError, match="not both"): + with pytest.raises(ValueError, match="at most one of"): joint_pretrends_test( df, "y",