diff --git a/CHANGELOG.md b/CHANGELOG.md index 0c3f2e0c..44eabb6a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **Inference-field aliases on staggered result classes** for adapter / external-consumer compatibility. Read-only `@property` aliases expose the flat `att` / `se` / `conf_int` / `p_value` / `t_stat` names (matching `DiDResults` / `TROPResults` / `SyntheticDiDResults` / `HeterogeneousAdoptionDiDResults`) on every result class that previously only carried prefixed canonical fields: `CallawaySantAnnaResults`, `StackedDiDResults`, `EfficientDiDResults`, `ChaisemartinDHaultfoeuilleResults`, `StaggeredTripleDiffResults`, `WooldridgeDiDResults`, `SunAbrahamResults`, `ImputationDiDResults`, `TwoStageDiDResults` (mapping to `overall_*`); `ContinuousDiDResults` (mapping to `overall_att_*`, ATT-side as the headline, ACRT-side accessible unchanged via `overall_acrt_*`); `MultiPeriodDiDResults` (mapping to `avg_*`). `ContinuousDiDResults` additionally exposes `overall_se` / `overall_conf_int` / `overall_p_value` / `overall_t_stat` aliases for naming consistency with the rest of the staggered family. Aliases are pure read-throughs over the canonical fields — no recomputation, no behavior change — so the `safe_inference()` joint-NaN contract (per CLAUDE.md "Inference computation") is inherited automatically (NaN canonical → NaN alias, locked at `tests/test_result_aliases.py::test_pattern_b_aliases_propagate_nan`). The native `overall_*` / `overall_att_*` / `avg_*` fields remain canonical for documentation and computation. Motivated by the `balance.interop.diff_diff.as_balance_diagnostic()` adapter (`facebookresearch/balance` PR #465) which calls `getattr(res, "se", None)` / `getattr(res, "conf_int", None)` without a fallback chain — pre-alias, every staggered result class returned `None` on those keys, silently dropping `se` and `conf_int` from the adapter's diagnostic dict. 23 alias-mechanic + balance-adapter regression tests at `tests/test_result_aliases.py`. Patch-level (additive on stable surfaces). - **`ChaisemartinDHaultfoeuille.by_path` + non-binary integer treatment** — `by_path=k` now accepts integer-coded discrete treatment (D in Z, e.g. ordinal `{0, 1, 2}`); path tuples become integer-state tuples like `(0, 2, 2, 2)`. The previous `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1870` is replaced by a `ValueError` for continuous D (e.g. `D=1.5`) at fit-time per the no-silent-failures contract — the existing `int(round(float(v)))` cast in `_enumerate_treatment_paths` is now defensive (no-op for integer-coded D). Validated against R `did_multiplegt_dyn(..., by_path)` for D in `{0, 1, 2}` via the new `multi_path_reversible_by_path_non_binary` golden-value scenario (78 switchers, 3 paths, single-baseline custom DGP, F_g >= 4): per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol observed; SE_RTOL=0.15 envelope). **Deviation from R for D >= 10:** R's `did_multiplegt_by_path` derives the per-path baseline via `path_index$baseline_XX <- substr(path_index$path, 1, 1)`, which captures only the first character of the comma-separated path string (e.g. for `path = "12,12,..."` it captures `"1"` instead of `"12"`); this mis-allocates R's per-path control-pool subset for D >= 10. Python's tuple-key matching is correct in this regime — the per-path point estimates we compute are correct; R's per-path subset for the same path is buggy. The shipped parity scenario stays in `D in {0, 1, 2}` to avoid the R bug. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary`; cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary`. - **New `paths_of_interest` kwarg on `ChaisemartinDHaultfoeuille`** for user-specified treatment-path subsets, alternative to `by_path=k`'s top-k automatic ranking. Mutually exclusive with `by_path`; setting both raises `ValueError` at `__init__` and `set_params` time. Each path tuple must be a list/tuple of `int` of length `L_max + 1` (uniformity validated at `__init__`; length match against `L_max + 1` validated at fit-time); `bool` and `np.bool_` are explicitly rejected, `np.integer` accepted and canonicalized to Python `int` for tuple-key consistency. Duplicates emit a `UserWarning` and are deduplicated; paths not observed in the panel emit a `UserWarning` and are omitted from `path_effects`. Paths appear in `results.path_effects` in the user-specified order, modulo deduplication and unobserved-path filtering. Composes with non-binary D and all downstream `by_path` surfaces (bootstrap, per-path placebos, per-path joint sup-t bands, `controls`, `trends_linear`, `trends_nonparam`) — mechanical filter on observed paths via the same `_enumerate_treatment_paths` call site, no methodology change. **Python-only API extension; no R equivalent** — R's `did_multiplegt_dyn(..., by_path=k)` only accepts a positive int (top-k) or `-1` (all paths). The `by_path` precondition gate at `chaisemartin_dhaultfoeuille.py:1118` (drop_larger_lower / L_max / `heterogeneity` / `design2` / `honest_did` / `survey_design` mutex) and the 11 `self.by_path is not None` activation branches in `fit()` were rerouted to fire under either selector. Validation + behavior + cross-feature regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestPathsOfInterest`. - **HAD `practitioner_next_steps()` handler + `llms-full.txt` reference section** (Phase 5). Adds `_handle_had` and `_handle_had_event_study` to `diff_diff/practitioner.py::_HANDLERS`, routing both `HeterogeneousAdoptionDiDResults` (single-period) and `HeterogeneousAdoptionDiDEventStudyResults` (event-study) through HAD-specific Baker et al. (2025) step guidance: `did_had_pretest_workflow` (step 3 — paper Section 4.2 step-2 closure on the event-study path), an estimand-difference routing nudge to `ContinuousDiD` (step 4 — fires when the user wants per-dose ATT(d) / ACRT(d) curves rather than HAD's WAS estimand and has never-treated controls; framed around estimand difference, NOT around the existence of untreated units, since HAD remains valid with a small never-treated share per REGISTRY § HeterogeneousAdoptionDiD edge cases and explicitly retains never-treated units on the staggered event-study path per paper Appendix B.2 / `had.py:1325`), `results.bandwidth_diagnostics` inspection on continuous designs and simultaneous (sup-t) `cband_*` reading on weighted event-study fits (step 6), per-horizon WAS event-study disaggregation (step 7), and the explicit design-auto-detection / last-cohort-only-WAS framing (step 8). Symmetric pair: `_handle_continuous` gains a Step-4 nudge to `HeterogeneousAdoptionDiD` for ContinuousDiD users on no-untreated panels (this direction is correct because ContinuousDiD's identification requires never-treated controls). Extends `_check_nan_att` with an ndarray branch via lazy `numpy` import for HAD's per-horizon `att` array; uses `np.all(np.isnan(arr))` semantics so partial-NaN arrays (legitimate event-study output under degenerate horizon-specific designs) do not over-fire the warning. Scalar path is bit-exact preserved across all 12 untouched handlers. Adds full HAD section + `HeterogeneousAdoptionDiDResults` / `HeterogeneousAdoptionDiDEventStudyResults` blocks + `## HAD Pretests` index covering all 7 pretest entry points + Choosing-an-Estimator row to `diff_diff/guides/llms-full.txt` (the bundled-in-wheel agent reference); the documented constructor + `fit()` signatures match the real `HeterogeneousAdoptionDiD.__init__` / `.fit` API exactly (verified by `inspect.signature`-based regression tests). Tightens the existing `Continuous treatment intensity` Choosing row to surface ATT(d) vs WAS as the estimand differentiator. `docs/doc-deps.yaml` updated to remove the `llms-full.txt` deferral note on `had.py` and add `llms-full.txt` entries to `had.py`, `had_pretests.py`, and `practitioner.py` blocks. Patch-level (additive on stable surfaces). 26 new tests (16 in `tests/test_practitioner.py::TestHADDispatch` + 9 in `tests/test_guides.py::TestLLMsFullHADCoverage` + 1 fixture-minimality regression locking the "handlers are STRING-ONLY at runtime" stability invariant). Closes the Phase 5 "agent surfaces" gap; T21 pretest tutorial and T22 weighted/survey tutorial remain queued as separate notebook PRs. diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py index ab5327a0..9d907c83 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille_results.py +++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py @@ -695,6 +695,27 @@ def _estimand_label(self) -> str: return f"{did_part}{suffix}_{sub_part}" if sub_part else f"{did_part}{suffix}" return base + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.overall_p_value) diff --git a/diff_diff/continuous_did_results.py b/diff_diff/continuous_did_results.py index f3fed115..623937b6 100644 --- a/diff_diff/continuous_did_results.py +++ b/diff_diff/continuous_did_results.py @@ -143,6 +143,45 @@ class ContinuousDiDResults: # Survey design metadata (SurveyMetadata instance from diff_diff.survey) survey_metadata: Optional[Any] = field(default=None) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + # ATT-side is the headline contract; ACRT remains accessible via overall_acrt_*. + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_att_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_att_conf_int + + @property + def p_value(self) -> float: + return self.overall_att_p_value + + @property + def t_stat(self) -> float: + return self.overall_att_t_stat + + # `overall_*` aliases for naming consistency with the rest of the staggered family. + @property + def overall_se(self) -> float: + return self.overall_att_se + + @property + def overall_conf_int(self) -> Tuple[float, float]: + return self.overall_att_conf_int + + @property + def overall_p_value(self) -> float: + return self.overall_att_p_value + + @property + def overall_t_stat(self) -> float: + return self.overall_att_t_stat + def __repr__(self) -> str: sig_att = _get_significance_stars(self.overall_att_p_value) sig_acrt = _get_significance_stars(self.overall_acrt_p_value) diff --git a/diff_diff/efficient_did_results.py b/diff_diff/efficient_did_results.py index 2e6e0463..6a393e7f 100644 --- a/diff_diff/efficient_did_results.py +++ b/diff_diff/efficient_did_results.py @@ -167,6 +167,27 @@ class EfficientDiDResults: # Survey design metadata (SurveyMetadata instance from diff_diff.survey) survey_metadata: Optional[Any] = field(default=None) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: sig = _get_significance_stars(self.overall_p_value) path = "DR" if self.estimation_path == "dr" else "nocov" diff --git a/diff_diff/imputation_results.py b/diff_diff/imputation_results.py index 95260b9e..03689c1f 100644 --- a/diff_diff/imputation_results.py +++ b/diff_diff/imputation_results.py @@ -143,6 +143,27 @@ class ImputationDiDResults: # Survey design metadata (SurveyMetadata instance from diff_diff.survey) survey_metadata: Optional[Any] = field(default=None, repr=False) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.overall_p_value) diff --git a/diff_diff/results.py b/diff_diff/results.py index 47c1cf8c..185feff4 100644 --- a/diff_diff/results.py +++ b/diff_diff/results.py @@ -447,6 +447,27 @@ class MultiPeriodDiDResults: vcov_type: Optional[str] = field(default=None) cluster_name: Optional[str] = field(default=None) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.avg_att + + @property + def se(self) -> float: + return self.avg_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.avg_conf_int + + @property + def p_value(self) -> float: + return self.avg_p_value + + @property + def t_stat(self) -> float: + return self.avg_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.avg_p_value) @@ -1180,7 +1201,7 @@ def get_loo_effects_df(self) -> pd.DataFrame: "back to fit-time unit IDs is not well-defined. See " "``result.placebo_effects`` for the raw PSU-level replicate " "array and ``docs/methodology/REGISTRY.md`` §SyntheticDiD " - "\"Note (survey + jackknife composition)\" for the " + '"Note (survey + jackknife composition)" for the ' "aggregation formula." ) if self._loo_unit_ids is None or self._loo_roles is None or self.placebo_effects is None: @@ -1386,9 +1407,7 @@ def in_time_placebo( lambda_fake, ) synthetic_pre_fake_n = Y_pre_c_n @ omega_eff_fake - pre_fit_n = float( - np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_fake_n) ** 2)) - ) + pre_fit_n = float(np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_fake_n) ** 2))) # ATT is scale-equivariant and shift-invariant in Y; RMSE is # scale-equivariant. Rescale back to original-Y units. row["att"] = float(att_fake_n * Y_scale) @@ -1482,12 +1501,8 @@ def sensitivity_to_zeta_omega( Y_post_treated_n = (snap.Y_post_treated - Y_shift) / Y_scale if snap.w_treated is not None: - y_pre_t_mean_n = np.average( - Y_pre_treated_n, axis=1, weights=snap.w_treated - ) - y_post_t_mean_n = np.average( - Y_post_treated_n, axis=1, weights=snap.w_treated - ) + y_pre_t_mean_n = np.average(Y_pre_treated_n, axis=1, weights=snap.w_treated) + y_post_t_mean_n = np.average(Y_post_treated_n, axis=1, weights=snap.w_treated) else: y_pre_t_mean_n = np.mean(Y_pre_treated_n, axis=1) y_post_t_mean_n = np.mean(Y_post_treated_n, axis=1) diff --git a/diff_diff/stacked_did_results.py b/diff_diff/stacked_did_results.py index 7145d86c..6f1a6f02 100644 --- a/diff_diff/stacked_did_results.py +++ b/diff_diff/stacked_did_results.py @@ -97,6 +97,27 @@ class StackedDiDResults: # Survey design metadata (SurveyMetadata instance from diff_diff.survey) survey_metadata: Optional[Any] = field(default=None) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.overall_p_value) diff --git a/diff_diff/staggered_results.py b/diff_diff/staggered_results.py index c1f4174b..d9d14250 100644 --- a/diff_diff/staggered_results.py +++ b/diff_diff/staggered_results.py @@ -138,6 +138,27 @@ class CallawaySantAnnaResults: epv_threshold: float = 10 pscore_fallback: str = "error" + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.overall_p_value) diff --git a/diff_diff/staggered_triple_diff_results.py b/diff_diff/staggered_triple_diff_results.py index 6096e3dd..29859169 100644 --- a/diff_diff/staggered_triple_diff_results.py +++ b/diff_diff/staggered_triple_diff_results.py @@ -95,6 +95,27 @@ class StaggeredTripleDiffResults: epv_threshold: float = 10 pscore_fallback: str = "error" + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.overall_p_value) diff --git a/diff_diff/sun_abraham.py b/diff_diff/sun_abraham.py index 6ad113f8..3b999957 100644 --- a/diff_diff/sun_abraham.py +++ b/diff_diff/sun_abraham.py @@ -92,6 +92,27 @@ class SunAbrahamResults: # Survey design metadata (SurveyMetadata instance from diff_diff.survey) survey_metadata: Optional[Any] = field(default=None) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.overall_p_value) diff --git a/diff_diff/two_stage_results.py b/diff_diff/two_stage_results.py index 96a013bb..4563907b 100644 --- a/diff_diff/two_stage_results.py +++ b/diff_diff/two_stage_results.py @@ -141,6 +141,27 @@ class TwoStageDiDResults: # Survey design metadata (SurveyMetadata instance from diff_diff.survey) survey_metadata: Optional[Any] = field(default=None, repr=False) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: """Concise string representation.""" sig = _get_significance_stars(self.overall_p_value) diff --git a/diff_diff/wooldridge_results.py b/diff_diff/wooldridge_results.py index 57f9eed1..1425f61e 100644 --- a/diff_diff/wooldridge_results.py +++ b/diff_diff/wooldridge_results.py @@ -96,7 +96,9 @@ def _agg_se(w_vec: np.ndarray) -> float: return float(np.sqrt(max(w_vec @ vcov @ w_vec, 0.0))) def _build_effect(att: float, se: float) -> Dict[str, Any]: - t_stat, p_value, conf_int = safe_inference(att, se, alpha=self.alpha, df=self._df_survey) + t_stat, p_value, conf_int = safe_inference( + att, se, alpha=self.alpha, df=self._df_survey + ) return { "att": att, "se": se, @@ -186,6 +188,7 @@ def summary(self, aggregation: str = "simple") -> str: if self.survey_metadata is not None: from diff_diff.results import _format_survey_block + lines.extend(_format_survey_block(self.survey_metadata, 70)) lines.append("-" * 70) @@ -337,6 +340,27 @@ def plot_event_study(self, **kwargs) -> None: se = {k: v["se"] for k, v in (self.event_study_effects or {}).items()} plot_event_study(effects=effects, se=se, alpha=self.alpha, **kwargs) + # --- Inference-field aliases (balance/external-adapter compatibility) --- + @property + def att(self) -> float: + return self.overall_att + + @property + def se(self) -> float: + return self.overall_se + + @property + def conf_int(self) -> Tuple[float, float]: + return self.overall_conf_int + + @property + def p_value(self) -> float: + return self.overall_p_value + + @property + def t_stat(self) -> float: + return self.overall_t_stat + def __repr__(self) -> str: n_gt = len(self.group_time_effects) att_str = f"{self.overall_att:.4f}" if not np.isnan(self.overall_att) else "NaN" diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 2b3a29d6..91c33f70 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2,6 +2,8 @@ This document provides the academic foundations and key implementation requirements for each estimator in diff-diff. It serves as a reference for contributors and users who want to understand the theoretical basis of the methods. +**Result-class field naming.** Headline scalar inference fields appear under one of four native naming patterns: flat `att` / `se` / `conf_int` / `p_value` / `t_stat` (`DiDResults`, `SyntheticDiDResults`, `TROPResults`, `HeterogeneousAdoptionDiDResults`); `overall_*` (`CallawaySantAnnaResults` and the rest of the staggered family); `overall_att_*` (`ContinuousDiDResults`, where `att` and `acrt` are parallel response curves); and `avg_*` (`MultiPeriodDiDResults`). Every scalar treatment-effect result class covered by this naming contract additionally exposes the flat `att` / `se` / `conf_int` / `p_value` / `t_stat` names as read-only `@property` aliases for adapter / external-consumer compatibility (see PR for v3.3.3, motivated by `balance.interop.diff_diff`); `ContinuousDiDResults` further exposes `overall_*` aliases pointing at the ATT side. The native field is canonical for documentation, semantics, and computation — aliases are pure read-throughs and inherit the `safe_inference()` joint-NaN consistency contract automatically. Because aliases are `@property` descriptors (not dataclass fields), they do NOT appear in `dataclasses.fields()` or `dataclasses.asdict()` output, and assignment to an alias raises `AttributeError`; serializers and field-walkers continue to see only the canonical field set. + ## Table of Contents 1. [Core DiD Estimators](#core-did-estimators) diff --git a/tests/test_practitioner.py b/tests/test_practitioner.py index 0db8d02e..862ecf78 100644 --- a/tests/test_practitioner.py +++ b/tests/test_practitioner.py @@ -126,7 +126,10 @@ def mock_efficient_results(): def mock_continuous_results(): r = ContinuousDiDResults.__new__(ContinuousDiDResults) r.overall_att = 0.4 - r.overall_se = 0.1 + # Canonical SE field on ContinuousDiDResults is overall_att_se (the ATT side). + # `overall_se` is a read-only property alias since the v3.3.3 inference-field + # alias surface; assigning to it raises AttributeError. + r.overall_att_se = 0.1 return r diff --git a/tests/test_result_aliases.py b/tests/test_result_aliases.py new file mode 100644 index 00000000..d799ef77 --- /dev/null +++ b/tests/test_result_aliases.py @@ -0,0 +1,349 @@ +"""Inference-field aliases on result classes (balance / external-adapter compatibility). + +Each in-scope result class exposes flat aliases (``att`` / ``se`` / ``conf_int`` / +``p_value`` / ``t_stat``) that map to the canonical native fields (``overall_*``, +``overall_att_*``, or ``avg_*``). This file pins the alias-canonical contract. + +Motivating bug: ``balance.interop.diff_diff.as_balance_diagnostic`` reads +``getattr(res, "se", None)`` and ``getattr(res, "conf_int", None)`` without +fallbacks to ``overall_se`` / ``overall_conf_int``. Pre-alias, every Pattern +B / C / D result class returned ``None`` on those keys, so balance's tutorial +shipped with ``se=NaN`` / ``conf_int=NaN`` in the methods-appendix table. +""" + +from __future__ import annotations + +import math +from dataclasses import fields + +import numpy as np +import pandas as pd +import pytest + +from diff_diff import ( + CallawaySantAnna, + generate_staggered_data, +) +from diff_diff.chaisemartin_dhaultfoeuille_results import ( + ChaisemartinDHaultfoeuilleResults, +) +from diff_diff.continuous_did_results import ContinuousDiDResults +from diff_diff.efficient_did_results import EfficientDiDResults +from diff_diff.imputation_results import ImputationDiDResults +from diff_diff.results import MultiPeriodDiDResults +from diff_diff.stacked_did_results import StackedDiDResults +from diff_diff.staggered_results import CallawaySantAnnaResults +from diff_diff.staggered_triple_diff_results import StaggeredTripleDiffResults +from diff_diff.sun_abraham import SunAbrahamResults +from diff_diff.two_stage_results import TwoStageDiDResults +from diff_diff.wooldridge_results import WooldridgeDiDResults + +# ============================================================================ +# Helpers +# ============================================================================ + + +def _alias_equal(a, b) -> bool: + """``==`` that treats NaN==NaN as True so aliases inherit NaN consistency.""" + if isinstance(a, tuple) and isinstance(b, tuple): + return len(a) == len(b) and all(_alias_equal(x, y) for x, y in zip(a, b)) + if isinstance(a, float) and isinstance(b, float): + if math.isnan(a) and math.isnan(b): + return True + return a == b + + +def _required_init_kwargs(cls, overrides): + """Return a kwargs dict for constructing a dataclass with sentinel defaults + for every required field, then merging in ``overrides``. + + Lets us build a minimal result instance for alias-mechanic tests without + having to enumerate every estimator-specific field. Sentinel values for + untouched fields are deliberately uninteresting (empty containers, zeros) + -- they are not exercised by these tests.""" + kwargs = {} + for f in fields(cls): + if f.name in overrides: + continue + # Skip fields with defaults; we only need to fill required positionals. + if f.default is not f.default_factory and f.default is not getattr( + __import__("dataclasses"), "MISSING", None + ): + # Field has a default value; let the dataclass apply it. + continue + # Required field — supply a type-compatible sentinel. + ann = str(f.type) + if "float" in ann: + kwargs[f.name] = 0.0 + elif "int" in ann: + kwargs[f.name] = 0 + elif "Tuple" in ann or "tuple" in ann: + kwargs[f.name] = (0.0, 0.0) + elif "List" in ann or "list" in ann: + kwargs[f.name] = [] + elif "Dict" in ann or "dict" in ann: + kwargs[f.name] = {} + elif "DataFrame" in ann: + kwargs[f.name] = pd.DataFrame() + elif "ndarray" in ann or "np.ndarray" in ann: + kwargs[f.name] = np.array([]) + else: + kwargs[f.name] = None + kwargs.update(overrides) + return kwargs + + +def _assert_pattern_b_aliases(res, *, att, se, t_stat, p_value, conf_int): + """Pattern B: 5 flat aliases mapping to the overall_* canonical fields.""" + assert _alias_equal(res.att, att), f"att alias != overall_att ({res.att} vs {att})" + assert _alias_equal(res.se, se) + assert _alias_equal(res.conf_int, conf_int) + assert _alias_equal(res.p_value, p_value) + assert _alias_equal(res.t_stat, t_stat) + + +# Sentinel inference values exercised across direct-construction tests. +_ATT = 1.5 +_SE = 0.3 +_T = 5.0 +_P = 0.001 +_CI = (1.0, 2.0) + + +# ============================================================================ +# Pattern B (9 classes) — direct-construction alias mechanics +# ============================================================================ + + +@pytest.mark.parametrize( + "cls", + [ + CallawaySantAnnaResults, + StackedDiDResults, + EfficientDiDResults, + ChaisemartinDHaultfoeuilleResults, + StaggeredTripleDiffResults, + WooldridgeDiDResults, + SunAbrahamResults, + ImputationDiDResults, + TwoStageDiDResults, + ], + ids=lambda c: c.__name__, +) +def test_pattern_b_aliases_match_overall(cls): + """Each Pattern B class's flat aliases equal the canonical overall_* fields.""" + overrides = { + "overall_att": _ATT, + "overall_se": _SE, + "overall_t_stat": _T, + "overall_p_value": _P, + "overall_conf_int": _CI, + } + res = cls(**_required_init_kwargs(cls, overrides)) + _assert_pattern_b_aliases(res, att=_ATT, se=_SE, t_stat=_T, p_value=_P, conf_int=_CI) + + +@pytest.mark.parametrize( + "cls", + [ + CallawaySantAnnaResults, + StackedDiDResults, + EfficientDiDResults, + ChaisemartinDHaultfoeuilleResults, + StaggeredTripleDiffResults, + WooldridgeDiDResults, + SunAbrahamResults, + ImputationDiDResults, + TwoStageDiDResults, + ], + ids=lambda c: c.__name__, +) +def test_pattern_b_aliases_propagate_nan(cls): + """When canonical overall_* fields are NaN (degenerate fit), aliases are NaN. + + Pins the safe_inference() joint-NaN contract (per CLAUDE.md: ALL inference + fields are computed together and stay NaN-consistent). Aliases are pure + read-throughs, so the contract holds without re-computation. + """ + overrides = { + "overall_att": np.nan, + "overall_se": np.nan, + "overall_t_stat": np.nan, + "overall_p_value": np.nan, + "overall_conf_int": (np.nan, np.nan), + } + res = cls(**_required_init_kwargs(cls, overrides)) + assert math.isnan(res.att) + assert math.isnan(res.se) + assert math.isnan(res.t_stat) + assert math.isnan(res.p_value) + assert math.isnan(res.conf_int[0]) + assert math.isnan(res.conf_int[1]) + + +# ============================================================================ +# Pattern C — ContinuousDiDResults: flat AND overall_* aliases +# ============================================================================ + + +def _continuous_did_overrides(att=_ATT, se=_SE, t=_T, p=_P, ci=_CI): + return { + "overall_att": att, + "overall_att_se": se, + "overall_att_t_stat": t, + "overall_att_p_value": p, + "overall_att_conf_int": ci, + "overall_acrt": 0.0, + "overall_acrt_se": 0.0, + "overall_acrt_t_stat": 0.0, + "overall_acrt_p_value": 1.0, + "overall_acrt_conf_int": (0.0, 0.0), + } + + +def test_continuous_did_flat_aliases(): + """ContinuousDiD flat aliases map to the ATT-side overall_att_* fields.""" + res = ContinuousDiDResults( + **_required_init_kwargs(ContinuousDiDResults, _continuous_did_overrides()) + ) + assert res.att == _ATT + assert res.se == _SE + assert res.conf_int == _CI + assert res.p_value == _P + assert res.t_stat == _T + + +def test_continuous_did_overall_aliases(): + """ContinuousDiD overall_* aliases also map to the ATT-side fields + (consistency with Pattern B family naming).""" + res = ContinuousDiDResults( + **_required_init_kwargs(ContinuousDiDResults, _continuous_did_overrides()) + ) + assert res.overall_se == _SE + assert res.overall_conf_int == _CI + assert res.overall_p_value == _P + assert res.overall_t_stat == _T + + +def test_continuous_did_double_alias_resolves_same_value(): + """``res.se`` and ``res.overall_se`` MUST point at the same value.""" + res = ContinuousDiDResults( + **_required_init_kwargs(ContinuousDiDResults, _continuous_did_overrides()) + ) + assert res.se == res.overall_se + assert res.conf_int == res.overall_conf_int + assert res.p_value == res.overall_p_value + assert res.t_stat == res.overall_t_stat + + +# ============================================================================ +# Pattern D — MultiPeriodDiDResults: avg_* -> flat aliases +# ============================================================================ + + +def test_multi_period_did_aliases(): + """MultiPeriodDiD flat aliases map to the avg_* canonical fields.""" + overrides = { + "avg_att": _ATT, + "avg_se": _SE, + "avg_t_stat": _T, + "avg_p_value": _P, + "avg_conf_int": _CI, + } + res = MultiPeriodDiDResults(**_required_init_kwargs(MultiPeriodDiDResults, overrides)) + assert res.att == _ATT + assert res.se == _SE + assert res.conf_int == _CI + assert res.p_value == _P + assert res.t_stat == _T + + +# ============================================================================ +# Read-only semantics +# ============================================================================ + + +@pytest.mark.parametrize( + ("cls", "ovr"), + [ + ( + CallawaySantAnnaResults, + { + "overall_att": _ATT, + "overall_se": _SE, + "overall_t_stat": _T, + "overall_p_value": _P, + "overall_conf_int": _CI, + }, + ), + (ContinuousDiDResults, _continuous_did_overrides()), + ( + MultiPeriodDiDResults, + { + "avg_att": _ATT, + "avg_se": _SE, + "avg_t_stat": _T, + "avg_p_value": _P, + "avg_conf_int": _CI, + }, + ), + ], + ids=lambda v: v.__name__ if hasattr(v, "__name__") else "ovr", +) +def test_aliases_are_read_only(cls, ovr): + """Assigning to an alias must raise AttributeError (no setter installed). + + Regression: a downstream test in tests/test_practitioner.py used + `r.overall_se = X` on a `ContinuousDiDResults.__new__()` mock — pre-alias + that silently created a junk attribute; post-alias the property correctly + rejects the assignment. Locking read-only here means future contributors + who write similar fixtures fail loudly via this test rather than via a + surprise `AttributeError: can't set attribute` deep in another suite. + """ + res = cls(**_required_init_kwargs(cls, ovr)) + for name in ("att", "se", "conf_int", "p_value", "t_stat"): + with pytest.raises(AttributeError): + setattr(res, name, object()) + + +# ============================================================================ +# Cross-cutting regression — balance.interop.diff_diff adapter pattern +# ============================================================================ + + +def test_balance_adapter_pattern_returns_populated_se(): + """Mimic balance.interop.diff_diff.as_balance_diagnostic: real CS fit then + flat ``getattr(res, "se", None)`` / ``getattr(res, "conf_int", None)``. + + Pre-alias: returned ``None`` on every Pattern B/C/D result class. This is + the test that would have caught the original bug if balance had exercised + a real fit instead of a stub class with attributes literally named ``se``. + """ + df = generate_staggered_data( + n_units=30, + n_periods=5, + cohort_periods=[3], + never_treated_frac=0.5, + seed=42, + ) + res = CallawaySantAnna(estimation_method="reg").fit( + df, + outcome="outcome", + time="period", + unit="unit", + first_treat="first_treat", + ) + se = getattr(res, "se", None) + conf_int = getattr(res, "conf_int", None) + p_value = getattr(res, "p_value", None) + assert se is not None and np.isfinite( + se + ), f"balance adapter would see se={se!r}; pre-alias bug returned None" + assert ( + conf_int is not None and len(conf_int) == 2 and all(np.isfinite(x) for x in conf_int) + ), f"balance adapter would see conf_int={conf_int!r}; pre-alias bug returned None" + assert p_value is not None and np.isfinite(p_value) + # Aliases must equal the canonical overall_* fields. + assert se == res.overall_se + assert conf_int == res.overall_conf_int + assert p_value == res.overall_p_value