Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CHANGELOG.md

Large diffs are not rendered by default.

51 changes: 27 additions & 24 deletions METHODOLOGY_REVIEW.md

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,6 @@ Deferred items from PR reviews that were not addressed before merge.

| Issue | Location | PR | Priority |
|-------|----------|----|----------|
| BaconDecomposition R parity goldens: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time (2026-05-16). R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`, writing `benchmarks/data/r_bacondecomp_golden.json`. `tests/test_methodology_bacon.py::TestBaconParityR` (3 tests) skips with a pointer until the JSON lands. The PR-B audit substantiates Theorem 1 (Eqs. 7-9 + 10e-g) via hand-calculable + machine-precision identity tests; R parity is desirable as a cross-language anchor but not the only substantiation. Mirrors StaggeredTripleDifference precedent (PR #245). | `benchmarks/R/generate_bacon_golden.R`, `benchmarks/data/r_bacondecomp_golden.json` (TBD), `tests/test_methodology_bacon.py::TestBaconParityR` | follow-up | Medium |
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
Expand Down
56 changes: 44 additions & 12 deletions benchmarks/R/generate_bacon_golden.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,21 @@
#
# The diff-diff BaconDecomposition implementation (`diff_diff/bacon.py`) with
# the default ``weights="exact"`` is expected to match the values in this JSON
# to atol=1e-6 on the per-component (treated, control, type) tuples, and to
# match the TWFE coefficient to the same tolerance. The ``weights="approximate"``
# path is a library-only optimization and is NOT covered by this parity harness.
# at atol=1e-6 along a three-tier contract:
# (1) aggregate TWFE coefficient + weights-sum on all 3 fixtures;
# (2) direct per-component (treated, control, type) parity on the 2
# non-remap fixtures AND on the 6 timing-vs-timing rows of
# `always_treated_remapped`;
# (3) cohort-level fold-back parity for the U bucket on
# `always_treated_remapped` — Python's paper-footnote-11 remap folds
# R's separate `Later vs Always Treated` + `Treated vs Untreated`
# rows into a single `treated_vs_never` cell per cohort, so the
# aggregate is invariant per Theorem 1 but the per-component
# breakdown differs by convention. See REGISTRY notes:
# `**Note (R parity convention divergence on always-treated)**` and
# `**Deviation (first-period boundary extension on always-treated remap)**`.
# The ``weights="approximate"`` path is a library-only optimization and is
# NOT covered by this parity harness.
#
# Three fixtures:
# 1. uniform_3groups_with_never_treated — 3 timing groups + never-treated U;
Expand All @@ -18,8 +30,8 @@
# 2. two_groups_no_never_treated — 2 timing groups only; tests the
# timing-only decomposition where the s_{kU} terms drop.
# 3. always_treated_remapped — 3 timing groups + 1 always-treated cohort
# (first_treat = 1). Validates that Python's warn+remap of t_i < 1 into
# U matches R bacondecomp's native behavior.
# (first_treat = 1). Validates the convention-divergent U-bucket
# fold-back on Python's warn+remap of always-treated units into U.
#
# Run:
# cd benchmarks/R && Rscript generate_bacon_golden.R
Expand Down Expand Up @@ -193,11 +205,21 @@ df2 <- build_panel(
fixture_2 <- extract_bacon(df2, "two_groups_no_never_treated")

cat("Building fixture 3: always_treated_remapped...\n")
# 3 timing-cohorts + 5 always-treated units (first_treat = 1, i.e., treated
# in every observable period) + 30 never-treated. R's bacondecomp natively
# groups the first_treat=1 cohort with U (since they are treated throughout
# every observable period and never serve as a within-window control), which
# matches what diff-diff's warn+remap does in Python.
# 3 timing-cohorts (3, 4, 5) + 5 always-treated units (first_treat = 1, i.e.,
# treated in every observable period) + 25 never-treated. R's bacondecomp
# keeps the first_treat=1 cohort as a *separate* timing cohort (not in U) and
# emits a `Later vs Always Treated` comparison row for each later cohort
# alongside the standard `Treated vs Untreated` row. Python's paper-footnote-11
# convention remaps these units into the U bucket and folds R's two columns
# of components into a single `treated_vs_never` cell per treated cohort.
# The aggregate (TWFE coefficient + weights-sum) is invariant per Theorem 1,
# but the per-component breakdown differs by convention — see REGISTRY
# `**Note (R parity convention divergence on always-treated)**` and
# `**Deviation (first-period boundary extension on always-treated remap)**`.
# `tests/test_methodology_bacon.py::TestBaconParityR` carves out the U-bucket
# rows for direct per-component parity (keeping the 6 timing-vs-timing rows
# under direct parity) and asserts the U-bucket fold-back separately via
# `test_always_treated_remapped_fold_back_matches_r` at atol=1e-6.
df3 <- build_panel(
n_units_per_cohort = 25L,
n_periods = 6L,
Expand All @@ -220,8 +242,18 @@ out <- list(
r_version = R.version.string,
description = paste(
"Goodman-Bacon (2021) decomposition parity goldens for diff-diff",
"BaconDecomposition. Parity target: atol=1e-6 on per-component",
"(treated, control, type) tuples plus the TWFE coefficient."
"BaconDecomposition. Parity target at atol=1e-6:",
"(1) aggregate TWFE coefficient + weights-sum across all 3 fixtures;",
"(2) direct per-component (treated, control, type) parity on the 2",
"non-remap fixtures AND on the 6 timing-vs-timing rows of",
"always_treated_remapped;",
"(3) cohort-level fold-back parity for the U bucket on",
"always_treated_remapped (Python's paper-footnote-11 remap folds",
"R's separate Later-vs-Always-Treated + Treated-vs-Untreated rows",
"into a single treated_vs_never cell per cohort; aggregate is",
"invariant per Theorem 1, breakdown differs by convention).",
"See REGISTRY Note (R parity convention divergence on always-treated)",
"+ Deviation (first-period boundary extension)."
)
),
uniform_3groups_with_never_treated = fixture_1,
Expand Down
211 changes: 211 additions & 0 deletions benchmarks/data/r_bacondecomp_golden.json

Large diffs are not rendered by default.

23 changes: 19 additions & 4 deletions diff_diff/bacon.py
Original file line number Diff line number Diff line change
Expand Up @@ -475,7 +475,15 @@ def fit(
excluding the never-treated sentinels ``0`` and ``np.inf``)
are automatically remapped to the ``U`` (untreated) bucket
per Goodman-Bacon (2021) footnote 11, with a
``UserWarning``. Detection uses ordered-time logic on the
``UserWarning``. **Library boundary extension:** the paper
uses the strict inequality ``t_i < 1`` (units treated
*before* the first observable period); the library uses the
**inclusive** ``first_treat <= min(time)`` rule, additionally
folding units treated *at* the first observable period
(``first_treat == min(time)``) into ``U`` because such units
have no untreated cell in-panel. See REGISTRY's
``**Deviation (first-period boundary extension on
always-treated remap)**`` block for the full contract. Detection uses ordered-time logic on the
**time axis** so panels whose ``time`` column contains
negative or zero-crossing labels (e.g. event-time
``time ∈ [-2,..,3]``) are handled correctly; the ``0``
Expand Down Expand Up @@ -1302,9 +1310,16 @@ def bacon_decompose(
>>> from diff_diff import bacon_decompose
>>>
>>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights
>>> # (weights="exact"); intended to match R bacondecomp::bacon() at
>>> # atol=1e-6 (R parity goldens pending — see TODO.md "R parity
>>> # goldens generation" for the deferred validation step).
>>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 on
>>> # the aggregate (TWFE coefficient + weights-sum) across all panels,
>>> # and on the per-component breakdown when there are no
>>> # always-treated / first-period-treated cohorts (i.e. all
>>> # non-sentinel first_treat values are strictly greater than
>>> # min(time)). For panels with always-treated units, the
>>> # per-component breakdown diverges by convention (Python remaps
>>> # to U per paper footnote 11; R emits `Later vs Always Treated`);
>>> # see REGISTRY note on R parity convention divergence. Validated
>>> # via tests/test_methodology_bacon.py::TestBaconParityR.
>>> results = bacon_decompose(
... data=panel_df,
... outcome='earnings',
Expand Down
12 changes: 7 additions & 5 deletions docs/methodology/REGISTRY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2616,7 +2616,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in

*Assumption checks / warnings:*
- Requires variation in treatment timing (staggered adoption)
- Always-treated units (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`; paper footnote 11) are automatically remapped to the `U` (untreated) bucket with a `UserWarning`; see the `**Note (always-treated remap)**` below for the full ordered-time / sentinel contract
- Always-treated units (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`; per paper footnote 11 with a library-convention extension on the first-period boundary case, see `**Deviation (first-period boundary extension)**` below) are automatically remapped to the `U` (untreated) bucket with a `UserWarning`; see the `**Note (always-treated remap)**` below for the full ordered-time / sentinel contract
- Unbalanced panels are accepted with a `UserWarning`; the paper's Appendix A proof assumes balanced panels
- Falls back to timing-only comparisons when no never-treated units are present (no untreated group → `s_{kU}` terms drop, weights rescale to sum to 1; **VWCT and ΔATT can still bias the result** — see paper Eqs. 14-15)

Expand Down Expand Up @@ -2668,7 +2668,7 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_
- Always-treated units: see `**Note (always-treated remap)**` below

**Reference implementation(s):**
- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens pending follow-up R install (see TODO.md).
- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens committed at `benchmarks/data/r_bacondecomp_golden.json` (generated against `bacondecomp` 0.1.1 + R 4.5.2). Parity validated at `atol=1e-6` via `tests/test_methodology_bacon.py::TestBaconParityR` (4 tests: TWFE coefficient + weights-sum match across 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures and on the 6 timing-vs-timing rows of `always_treated_remapped`; the U-bucket convention divergence on `always_treated_remapped` is pinned by a dedicated fold-back test).
- Stata: `bacondecomp` (SSC). Authors: Goodman-Bacon, Goldring, Nichols (2019).

**Requirements checklist:**
Expand All @@ -2678,11 +2678,13 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_
- [x] Visualization shows weight vs. estimate by comparison type
- [x] Always-treated remap to U per Goodman-Bacon (2021) footnote 11 (PR-B audit)
- [x] Hand-calculable Theorem 1 verification: `tests/test_methodology_bacon.py::TestBaconHandCalculation` (7 tests, atol=1e-10)
- [ ] R `bacondecomp::bacon()` parity at atol=1e-6 (R generator script committed; JSON goldens pending follow-up R install — `tests/test_methodology_bacon.py::TestBaconParityR` skips when missing)
- [x] R `bacondecomp::bacon()` parity at atol=1e-6 (3 fixtures; TWFE coefficient + weights-sum match across all 3; per-component parity locked on the 2 non-remap fixtures and on the 6 timing-vs-timing rows of `always_treated_remapped`; the U-bucket fold-back is asserted by a dedicated `test_always_treated_remapped_fold_back_matches_r` — see `**Note (R parity convention divergence)**` below)
- [x] Survey design support (Phase 3): weighted cell means, weighted within-transform, weighted group shares
- **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default with the **intent** to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated by hand-calculation (atol=1e-10) and TWFE-vs-weighted-sum identity (atol=1e-10) but the direct R bit-by-bit parity at atol=1e-6 is still pending the R `bacondecomp` install — see Test Coverage checklist above. The approximate path is retained for backward compatibility; numerical output may differ from R.
- **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`.
- **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated at `atol=1e-6` (see `**Note (R parity convention divergence)**` below for the one structural convention difference). Hand-calculation + TWFE-vs-weighted-sum identity hold at `atol=1e-10`. The approximate path is retained for backward compatibility; numerical output may differ from R.
- **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11 (with a library boundary extension on `first_treat == min(time)`; see `**Deviation (first-period boundary extension)**` below). Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`.
- **Note (Bacon survey diagnostic):** Bacon decomposition with survey weights is diagnostic; exact-sum guarantee holds at machine precision under `weights="exact"` **on balanced panels**. `weights="exact"` requires within-unit-constant survey columns (approximate path accepts time-varying weights).
- **Note (R parity convention divergence on always-treated):** R `bacondecomp::bacon()` keeps `first_treat=1` (the always-treated cohort) as a separate timing cohort and emits an additional comparison type `Later vs Always Treated` (cohort k vs the always-treated cell) alongside the standard `Treated vs Untreated` row. Python's footnote-11 convention remaps these units to the `U` bucket and folds those R-side rows into a single `treated_vs_never` cell per treated cohort. The aggregate (TWFE coefficient + sum of weights) is invariant to this re-bucketing — Theorem 1's identity holds identically because the U bucket's total weight gets re-allocated across nested 2x2 cells but the total weight on `{cohort_k vs U}` is the same. The per-component breakdown, however, differs structurally between the two conventions. The R parity test (`tests/test_methodology_bacon.py::TestBaconParityR::test_component_estimates_match_r`) asserts per-component parity at `atol=1e-6` on the 2 fixtures without always-treated (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`) AND on the 6 timing-vs-timing rows of `always_treated_remapped` — the carve-out is narrowed to U-bucket rows only (R's `Later vs Always Treated` rows canonicalize to `treated_vs_never` and are dropped alongside the matching Python rows). The R→Python U-bucket fold-back is pinned separately by `test_always_treated_remapped_fold_back_matches_r`, which aggregates R's split `Later vs Always Treated` + `Treated vs Untreated` rows per treated cohort and asserts the combined weight + weight-averaged estimate match Python's single `treated_vs_never` cell at `atol=1e-6`. Aggregate parity (`test_twfe_coef_matches_r`, `test_weights_sum_matches_r`) is locked across all 3 fixtures.
- **Deviation (first-period boundary extension on always-treated remap):** Paper footnote 11 (Goodman-Bacon 2021) uses the strict inequality `t_i < 1` (units treated *before* the first observable period) for the always-treated bucket. The library applies the **inclusive** `first_treat <= min(time)` rule, which additionally folds units treated *at* the first observable period (`first_treat == min(time)`) into `U`. This is a library boundary convention, not a paper-faithful rule: such units have no untreated cell in the observed panel and so cannot contribute to any 2x2 DD as a treated cohort, so folding them into the U bucket mirrors the always-treated handling rather than dropping them silently. R `bacondecomp::bacon()` does not apply this boundary fold-back — it keeps `first_treat == min(time)` cohorts in their own bucket and emits `Later vs Always Treated` comparisons (see the **Note (R parity convention divergence on always-treated)** above for how the parity tests handle the resulting structural breakdown difference; aggregate Theorem 1 identity remains invariant). When `min(time)` is strictly greater than 1 (no first-period-treated cohorts), the library rule reduces to the paper's strict rule and the two conventions coincide.
- **Deviation (unbalanced-panel library extension):** Unbalanced panels are accepted with a `UserWarning` ("Unbalanced panel detected. Bacon decomposition assumes balanced panels. Results may be inaccurate."). Goodman-Bacon (2021) Appendix A's proof assumes a balanced panel; under unbalance, the Theorem 1 identity holds only approximately. The decomposition still returns finite, well-defined outputs but `weights="exact"` does NOT achieve the machine-precision algebraic identity that the balanced-panel claims above describe.

---
Expand Down
Loading
Loading