From 7beef1a1ee5fe94926ee01b935dfe0b8dba646b3 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 13:50:42 -0400 Subject: [PATCH 01/13] BaconDecomposition R parity goldens MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the PR #454 deferred R parity follow-up (TODO.md row removed). Generated `benchmarks/data/r_bacondecomp_golden.json` from the committed `benchmarks/R/generate_bacon_golden.R` script against `bacondecomp 0.1.1` on R 4.5.2. Three DGP fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`. Parity results at atol=1e-6 via `tests/test_methodology_bacon.py::TestBaconParityR`: - TWFE coefficient: ✅ matches across all 3 fixtures - Weights-sum: ✅ matches across all 3 fixtures - Per-component: ✅ on the 2 non-remap fixtures; **structural convention divergence** on `always_treated_remapped` (skipped per-component, kept aggregate). R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort. The aggregate is invariant per Theorem 1 — the U bucket's weight is re-allocated across nested 2x2 cells but the total weight on {cohort_k vs U} is identical. Only the per-component breakdown differs structurally between conventions. Tracker promotions: - METHODOLOGY_REVIEW.md: BaconDecomposition status row → **Complete** (was `**Complete** (R parity pending)`); removed from In Progress prose mention; removed from Priority Order substantive-review list; Test Coverage count refreshed (24 → 33); R Comparison Results block rewritten as **Validated**. - docs/methodology/REGISTRY.md: Reference Implementations bullet + Verified Components checklist + Note (weight modes) updated; new Note (R parity convention divergence on always-treated) documents the convention. - TODO.md: BaconDecomposition R parity goldens row removed. - CHANGELOG.md: new `[Unreleased]` Added bullet for the close-out; PR-B Changed entry tightened ("intended to match" → "matching ... at atol=1e-6"). - diff_diff/bacon.py: `bacon_decompose` docstring example wording tightened from "intended to match" to "matches" with TestBaconParityR pointer. Tests: 33/33 pass in test_methodology_bacon.py (no skips; was 30+3 skipped); 32 pass in test_bacon.py; 101 pass across the broader bacon/decompose surface (was 98+3 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 3 +- METHODOLOGY_REVIEW.md | 40 ++-- TODO.md | 1 - benchmarks/data/r_bacondecomp_golden.json | 211 ++++++++++++++++++++++ diff_diff/bacon.py | 5 +- docs/methodology/REGISTRY.md | 7 +- tests/test_methodology_bacon.py | 15 ++ 7 files changed, 255 insertions(+), 27 deletions(-) create mode 100644 benchmarks/data/r_bacondecomp_golden.json diff --git a/CHANGELOG.md b/CHANGELOG.md index 223abd12..a4c923ef 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,13 +8,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (3 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures, with a **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort — the aggregate is invariant per Theorem 1 but the per-component breakdown differs structurally). Per-component assertion is skipped on the remap fixture with explicit documentation in the test class and a new `**Note (R parity convention divergence on always-treated)**` in `docs/methodology/REGISTRY.md`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`. - **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference(cluster="unit").fit(..., time="post")` (the cluster kwarg is required because `TripleDifference` is the repeated-cross-section `panel=FALSE` estimator and unclustered SE on panel-generated rows understates variance under within-unit serial correlation; the point estimate `att` is invariant to clustering — see the new `TripleDifference` REGISTRY note on panel-shaped input). Users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md. - **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete (R parity goldens pending)**). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (~24 tests across 6 classes: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` skips with a pointer when goldens missing; `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); JSON goldens deferred until `bacondecomp` R package is installed (parity tests skip cleanly with an explicit pointer). (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row). - **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422. ### Changed -- **BaconDecomposition: default `weights` flipped from `"approximate"` to `"exact"` (PR-B methodology audit).** The new default uses Goodman-Bacon (2021) Theorem 1's exact Eqs. 7-9 + 10e-g weights, **intended to match** R `bacondecomp::bacon()` (direct R parity at `atol=1e-6` pending the R `bacondecomp` install; validated via hand-calculation + TWFE-vs-weighted-sum identity at `atol=1e-10` — see TODO.md). The `weights="approximate"` path remains available as an opt-in fast diagnostic for speed-sensitive loops; its numerical output may differ from R. Three entry points were flipped: `BaconDecomposition(weights="exact")` (`bacon.py:397`), `bacon_decompose(weights="exact")` (`bacon.py:1064`), `TwoWayFixedEffects.decompose(weights="exact")` (`twfe.py:684`). **Behavior change for users not passing explicit `weights=`**: the decomposition weights are now paper-faithful by default. Users who depended on the previous `"approximate"` numerics for diagnostic plots or comparison-type weight shares can preserve the old behavior by passing `weights="approximate"` explicitly. **Survey-design behavior change**: `weights="exact"` (now the default) routes through `_validate_unit_constant_survey`, which rejects survey designs whose weights / strata / PSU / FPC columns vary within a unit across periods (the exact-mode path collapses to per-unit aggregation via `groupby().first()`). The previous `weights="approximate"` default tolerated time-varying within-unit survey weights via observation-level weighted means. Users whose survey-weighted Bacon calls used time-varying within-unit weights must now either (a) collapse their weights to be unit-constant or (b) pass explicit `weights="approximate"` to retain the legacy obs-level path. The production diagnostic surface (`diff_diff/diagnostic_report.py:1740`) was updated to pass explicit `weights="exact"`. Existing test assertions in `tests/test_bacon.py` continue to pass with the new default; the `test_weighted_sum_equals_twfe` tolerance was tightened from `< 0.1` to `< 1e-10` to lock the Theorem 1 algebraic-identity contract. +- **BaconDecomposition: default `weights` flipped from `"approximate"` to `"exact"` (PR-B methodology audit).** The new default uses Goodman-Bacon (2021) Theorem 1's exact Eqs. 7-9 + 10e-g weights, matching R `bacondecomp::bacon()` at `atol=1e-6` (validated via `tests/test_methodology_bacon.py::TestBaconParityR`; see the new Added entry above for the convention divergence on always-treated cohorts). Hand-calculation + TWFE-vs-weighted-sum identity also hold at `atol=1e-10`. The `weights="approximate"` path remains available as an opt-in fast diagnostic for speed-sensitive loops; its numerical output may differ from R. Three entry points were flipped: `BaconDecomposition(weights="exact")` (`bacon.py:397`), `bacon_decompose(weights="exact")` (`bacon.py:1064`), `TwoWayFixedEffects.decompose(weights="exact")` (`twfe.py:684`). **Behavior change for users not passing explicit `weights=`**: the decomposition weights are now paper-faithful by default. Users who depended on the previous `"approximate"` numerics for diagnostic plots or comparison-type weight shares can preserve the old behavior by passing `weights="approximate"` explicitly. **Survey-design behavior change**: `weights="exact"` (now the default) routes through `_validate_unit_constant_survey`, which rejects survey designs whose weights / strata / PSU / FPC columns vary within a unit across periods (the exact-mode path collapses to per-unit aggregation via `groupby().first()`). The previous `weights="approximate"` default tolerated time-varying within-unit survey weights via observation-level weighted means. Users whose survey-weighted Bacon calls used time-varying within-unit weights must now either (a) collapse their weights to be unit-constant or (b) pass explicit `weights="approximate"` to retain the legacy obs-level path. The production diagnostic surface (`diff_diff/diagnostic_report.py:1740`) was updated to pass explicit `weights="exact"`. Existing test assertions in `tests/test_bacon.py` continue to pass with the new default; the `test_weighted_sum_equals_twfe` tolerance was tightened from `< 0.1` to `< 1e-10` to lock the Theorem 1 algebraic-identity contract. - **`ChaisemartinDHaultfoeuille.predict_het` inference: t-distribution df threading (closes TODO pilot-412).** `_compute_heterogeneity_test` now passes `df = n_obs - rank(design)` to `safe_inference` on the non-survey OLS path, matching R `did_multiplegt_dyn(predict_het=...)`'s t-distribution inference (`DIDmultiplegtDYN:::did_multiplegt_main` `t_stat <- qt(0.975, df.residual(model))` site). Pre-PR Python used `df=None` (normal Z critical), producing 0.1-2% rtol gaps on `p_value` and `conf_int` vs R. Parity tolerance tightened on the existing forward-horizon scenarios (`multi_path_reversible_predict_het`, `multi_path_reversible_by_path_predict_het`) from "unpinned" to `INFERENCE_RTOL=1e-4` on `p_value` and `conf_int`; `beta` / `se` / `t_stat` continue at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5`. **Post-drop rank (post-2026-05-16 wrap-up):** the df denominator uses the post-drop numerical rank via `_detect_rank_deficiency`, which `solve_ols` already calls internally. For full-rank designs `rank == n_params` and behavior is bit-identical to the pre-PR `n_obs - n_params` path; for near-rank-deficient designs that `solve_ols` retains rather than NaN-out (e.g., cohort-collinearity at high horizons), the post-drop rank is strictly lower and the post-PR `df` is larger, matching R's `lm()` convention. The Z-vs-t REGISTRY deviation note is replaced with an "R parity (post-2026-05-15 df threading)" positive-claim note. diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index e5dc409a..03464983 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -24,7 +24,7 @@ A **Complete** entry has a documented review pass against the primary academic s The catalog grew incrementally over several quarters, so formats vary across the existing Complete entries; the consistent invariant is that someone walked through the implementation against the academic source and captured the result here. New reviews going forward should aim for the fuller structure (Verified Components + Corrections Made + Deviations + dedicated methodology test file) used by the more recent entries. -**In Progress** entries have a REGISTRY.md section and unit-test coverage, but no formal walk-through has been captured here yet. The In Progress band is wide — some entries also have some combination of a paper review (primary or companion), a dedicated methodology test file, and R parity fixtures (e.g., DCDH has a methodology file, R parity, and a companion-paper review for the 2026 universal-rollout extension; HAD has its primary-source paper review and R parity but no dedicated methodology file; ContinuousDiD has the methodology file but no paper review); others have only the REGISTRY entry and unit tests (e.g., BaconDecomposition, PowerAnalysis). The "Documentation in place" sub-section enumerates what each entry already has; the "Outstanding for promotion" sub-section enumerates what's still needed to flip it to Complete. +**In Progress** entries have a REGISTRY.md section and unit-test coverage, but no formal walk-through has been captured here yet. The In Progress band is wide — some entries also have some combination of a paper review (primary or companion), a dedicated methodology test file, and R parity fixtures (e.g., DCDH has a methodology file, R parity, and a companion-paper review for the 2026 universal-rollout extension; HAD has its primary-source paper review and R parity but no dedicated methodology file; ContinuousDiD has the methodology file but no paper review); others have only the REGISTRY entry and unit tests (e.g., PowerAnalysis). The "Documentation in place" sub-section enumerates what each entry already has; the "Outstanding for promotion" sub-section enumerates what's still needed to flip it to Complete. **Not Started** entries have neither a tracker walk-through nor an REGISTRY.md section. This tracker no longer carries any Not Started rows; new estimators are expected to enter as In Progress when their REGISTRY entry lands. @@ -78,7 +78,7 @@ The catalog grew incrementally over several quarters, so formats vary across the | Tool | Module | R Reference | Status | Last Review | |------|--------|-------------|--------|-------------| -| BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** (R parity pending) | 2026-05-16 | +| BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** | 2026-05-16 | | HonestDiD | `honest_did.py` | `HonestDiD` package | **Complete** | 2026-04-01 | | PreTrendsPower | `pretrends.py` | `pretrends` package | **In Progress** | — | | PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | **In Progress** | — | @@ -909,7 +909,7 @@ and covariate-adjusted specifications.) | Module | `bacon.py` | | Primary Reference | Goodman-Bacon (2021), *Difference-in-differences with variation in treatment timing*, J. Econometrics 225(2), 254-277 | | R Reference | `bacondecomp::bacon()` | -| Status | **Complete** (R parity goldens pending) | +| Status | **Complete** | | Last Review | 2026-05-16 | **Verified Components:** @@ -926,14 +926,17 @@ and covariate-adjusted specifications.) - [x] No untreated group: `s_{kU}` terms drop, weights renormalize, sum-to-1 still holds - [x] Single timing group with U: only `treated_vs_never` comparisons - [x] Survey design composes cleanly with exact mode and warn+remap -- [ ] R `bacondecomp::bacon()` parity at `atol=1e-6` — R generator script committed; JSON goldens pending follow-up R install (see TODO.md) +- [x] R `bacondecomp::bacon()` parity at `atol=1e-6` — 3 fixtures (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`); TWFE coefficient + weights-sum match across all 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped` (Python's footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition — aggregate is invariant, breakdown is structurally different). See `benchmarks/data/r_bacondecomp_golden.json` + `TestBaconParityR`. **Test Coverage:** -- 24 methodology tests in `tests/test_methodology_bacon.py` across 6 classes (21 active + 3 R-parity tests that skip on missing goldens) +- 33 methodology tests in `tests/test_methodology_bacon.py` across 6 classes (all active; R parity activates once goldens are committed) - 32 existing tests in `tests/test_bacon.py` (basic decomposition, weight properties, weights-parameter API, TWFE integration, visualization, balanced-panel warnings, edge cases) **R Comparison Results:** -- **Pending**: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time. R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`. JSON goldens land at `benchmarks/data/r_bacondecomp_golden.json` once generated. `tests/test_methodology_bacon.py::TestBaconParityR` skips with a pointer until then. Tracked in TODO.md follow-up row. +- **Validated** at `atol=1e-6` against `bacondecomp::bacon()` (version 0.1.1, R 4.5.2). Goldens at `benchmarks/data/r_bacondecomp_golden.json`; generator at `benchmarks/R/generate_bacon_golden.R`. Three DGP fixtures: + - `uniform_3groups_with_never_treated`: 9 components covering all three comparison types — full per-component parity (estimate + weight at `atol=1e-6`). + - `two_groups_no_never_treated`: 2 components, timing-only decomposition — full per-component parity. + - `always_treated_remapped`: TWFE coefficient + weights-sum match at `atol=1e-6`; per-component breakdown diverges by convention (Python's paper-footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition). The aggregate is invariant to the re-bucketing per Theorem 1; only the breakdown differs. Per-component assertion skipped for this fixture with explicit documentation in `TestBaconParityR.test_component_estimates_match_r`. **Corrections Made:** 1. **Theorem 1 exact-weights rewrite** (`bacon.py:_recompute_exact_weights`, lines ~740-880). The previous "exact" mode implementation did not actually compute Eqs. 7-9 / 10e-g — it was missing the `(1 - n_kU)` factor in the within-subsample treatment variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper. The post-hoc sum-to-1 normalization masked the relative-weight error but produced a decomposition error of ~0.3% (0.007 absolute) against TWFE on a 3-cohort + never-treated DGP. **Rewrote** the function to compute the exact numerators of Eqs. 10e/f/g (with proper Eqs. 7-9 variances) and let the post-hoc normalization handle the `V̂^D` denominator (Theorem 1 identity guarantees `V̂^D = Σ numerators`). Now matches TWFE at `atol=1e-10`. The existing `test_weighted_sum_equals_twfe` tolerance was tightened from `< 0.1` to `< 1e-10` to lock the contract. @@ -1203,22 +1206,21 @@ Promotion priority for the **In Progress** entries, ordered by what's blocked on **Substantive-review-blocked (no methodology test file, no paper review, no R parity):** -1. **BaconDecomposition** — chosen for next substantive review during the 2026-05-15 tracker refresh session. Smaller scope than estimator reviews; R reference (`bacondecomp::bacon()`) available; methodology is well-understood (Goodman-Bacon 2021); REGISTRY checklist provides a ready-made target. -2. **PreTrendsPower** — small surface, established R package (`pretrends`), Roth (2022) is short. -3. **PowerAnalysis** — larger surface (MDE / power / sample size / simulation paths); REGISTRY already lists Bloom (1995) and Burlig et al. (2020) as primary sources; least urgent if the library's power-analysis utilities are not heavily used. -4. **PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way. -5. **EfficientDiD** — no paper review on file; substantial implementation work (`tests/test_efficient_did.py` + validation tests) needs paper-vs-code audit against Chen, Sant'Anna & Xie (2025). -6. **ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`. +1. **PreTrendsPower** — small surface, established R package (`pretrends`), Roth (2022) is short. +2. **PowerAnalysis** — larger surface (MDE / power / sample size / simulation paths); REGISTRY already lists Bloom (1995) and Burlig et al. (2020) as primary sources; least urgent if the library's power-analysis utilities are not heavily used. +3. **PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way. +4. **EfficientDiD** — no paper review on file; substantial implementation work (`tests/test_efficient_did.py` + validation tests) needs paper-vs-code audit against Chen, Sant'Anna & Xie (2025). +5. **ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`. **Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):** -7. **HeterogeneousAdoptionDiD (HAD)** — largest current surface, Phase 4.5 just shipped; shares the de Chaisemartin (2026) paper review with DCDH; needs a dedicated Verified Components block. -8. **ChaisemartinDHaultfoeuille (DCDH)** — methodology test file + 24 R parity tests + 347 unit tests + a companion-paper review for the 2026 universal-rollout extension. Primary-source reviews for the 2020 AER and 2022/2024 NBER WP 29873 papers are still outstanding alongside the Verified Components walk-through. -9. **WooldridgeDiD (ETWFE)** — companion-paper review (Wooldridge 2023 nonlinear extension) merged in PR #443; primary-source review for Wooldridge (2025) ETWFE not yet on file, and no dedicated methodology test file. -10. **ContinuousDiD** — 15 methodology tests already in place; mostly a consolidation pass with a documented boundary-knots deviation from R `contdid` v0.1.0. -11. **TROP** — paper review recently merged (PR #443); needs methodology file and cross-language anchor (when paper-author reference becomes available). -12. **StaggeredTripleDifference** — shares the primary paper (Ortiz-Villavicencio & Sant'Anna 2025) with TripleDifference, but no dedicated paper review on file yet; needs R parity (R fixtures gitignored — tracked in TODO.md, PR #245). -13. **ConleySpatialHAC** — paper review + committed R `conleyreg` goldens; needs dedicated methodology test file + summary R-parity table in this tracker. +6. **HeterogeneousAdoptionDiD (HAD)** — largest current surface, Phase 4.5 just shipped; shares the de Chaisemartin (2026) paper review with DCDH; needs a dedicated Verified Components block. +7. **ChaisemartinDHaultfoeuille (DCDH)** — methodology test file + 24 R parity tests + 347 unit tests + a companion-paper review for the 2026 universal-rollout extension. Primary-source reviews for the 2020 AER and 2022/2024 NBER WP 29873 papers are still outstanding alongside the Verified Components walk-through. +8. **WooldridgeDiD (ETWFE)** — companion-paper review (Wooldridge 2023 nonlinear extension) merged in PR #443; primary-source review for Wooldridge (2025) ETWFE not yet on file, and no dedicated methodology test file. +9. **ContinuousDiD** — 15 methodology tests already in place; mostly a consolidation pass with a documented boundary-knots deviation from R `contdid` v0.1.0. +10. **TROP** — paper review recently merged (PR #443); needs methodology file and cross-language anchor (when paper-author reference becomes available). +11. **StaggeredTripleDifference** — shares the primary paper (Ortiz-Villavicencio & Sant'Anna 2025) with TripleDifference, but no dedicated paper review on file yet; needs R parity (R fixtures gitignored — tracked in TODO.md, PR #245). +12. **ConleySpatialHAC** — paper review + committed R `conleyreg` goldens; needs dedicated methodology test file + summary R-parity table in this tracker. 14. **Survey Data Support** — cross-cutting feature; promotion requires the per-estimator integration paths to be locked down first. --- diff --git a/TODO.md b/TODO.md index 45d4f658..7f3be5b5 100644 --- a/TODO.md +++ b/TODO.md @@ -74,7 +74,6 @@ Deferred items from PR reviews that were not addressed before merge. | Issue | Location | PR | Priority | |-------|----------|----|----------| -| BaconDecomposition R parity goldens: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time (2026-05-16). R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`, writing `benchmarks/data/r_bacondecomp_golden.json`. `tests/test_methodology_bacon.py::TestBaconParityR` (3 tests) skips with a pointer until the JSON lands. The PR-B audit substantiates Theorem 1 (Eqs. 7-9 + 10e-g) via hand-calculable + machine-precision identity tests; R parity is desirable as a cross-language anchor but not the only substantiation. Mirrors StaggeredTripleDifference precedent (PR #245). | `benchmarks/R/generate_bacon_golden.R`, `benchmarks/data/r_bacondecomp_golden.json` (TBD), `tests/test_methodology_bacon.py::TestBaconParityR` | follow-up | Medium | | dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low | | dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium | | dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low | diff --git a/benchmarks/data/r_bacondecomp_golden.json b/benchmarks/data/r_bacondecomp_golden.json new file mode 100644 index 00000000..a62aed76 --- /dev/null +++ b/benchmarks/data/r_bacondecomp_golden.json @@ -0,0 +1,211 @@ +{ + "meta": { + "generated_at": "2026-05-16", + "bacondecomp_version": "0.1.1", + "r_version": "R version 4.5.2 (2025-10-31)", + "description": "Goodman-Bacon (2021) decomposition parity goldens for diff-diff BaconDecomposition. Parity target: atol=1e-6 on per-component (treated, control, type) tuples plus the TWFE coefficient." + }, + "uniform_3groups_with_never_treated": { + "panel": { + "unit": [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 43, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 58, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 68, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 71, 71, 72, 72, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 74, 74, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 76, 76, 76, 76, 76, 76, 77, 77, 77, 77, 77, 77, 78, 78, 78, 78, 78, 78, 79, 79, 79, 79, 79, 79, 80, 80, 80, 80, 80, 80, 81, 81, 81, 81, 81, 81, 82, 82, 82, 82, 82, 82, 83, 83, 83, 83, 83, 83, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 85, 86, 86, 86, 86, 86, 86, 87, 87, 87, 87, 87, 87, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 92, 93, 93, 93, 93, 93, 93, 94, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 96, 96, 96, 96, 96, 96, 97, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 108, 108, 109, 109, 109, 109, 109, 109, 110, 110, 110, 110, 110, 110, 111, 111, 111, 111, 111, 111, 112, 112, 112, 112, 112, 112, 113, 113, 113, 113, 113, 113, 114, 114, 114, 114, 114, 114, 115, 115, 115, 115, 115, 115, 116, 116, 116, 116, 116, 116, 117, 117, 117, 117, 117, 117, 118, 118, 118, 118, 118, 118, 119, 119, 119, 119, 119, 119, 120, 120, 120, 120, 120, 120], + "time": [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6], + "y": [-0.882875613990965, -0.838281824164503, -1.36130971445594, -0.518865693454306, 0.0652911857851309, -0.437656652156114, 0.827953299717109, 1.15524480179167, 2.23690692400525, 0.882758792767102, 1.21335652379662, 1.8273389896164, -1.32183127217608, -1.95420339638421, -0.574097702010705, -1.85945353347996, 0.0419481646537776, 0.193681992262114, 1.27407830689964, 0.43841915710601, 0.274031412414014, 0.6596718643499, 0.222777158525966, 1.13749036184933, 1.05660151987436, 0.677608745540335, 1.15619018232993, 0.786502720960187, 1.00190547512333, 0.997807190392226, 2.13851774642774, 2.67441410085319, 2.27124848762197, 3.11407100139254, 2.6466392187382, 1.53643251580159, 1.56906662495654, 2.50401457428466, 1.40233636752449, 1.76184238579044, 1.75663794858663, 2.03461418619057, -0.877511228331894, -0.818914026711104, -0.389027667877559, 0.562629890323839, -0.11581084950458, -0.264752141548579, 1.93334251491279, 1.10856751712398, 2.35980924649751, 2.01758305170962, 2.69085802868249, 2.91440431415069, -0.155751124389262, 0.362517669569057, -0.155087359518525, -0.0656234756055651, 0.675348468172473, -0.324448101858779, 1.61060881532971, 0.783227209973701, 1.40895860751861, 1.72940250705382, 1.81876717936403, 1.21601500302768, -1.58311350728232, -1.49654411485087, -1.39169450733946, -0.329834250569954, -0.988672129721418, -0.733360982073699, 3.68171109535955, 3.23744382195278, 2.71758661613402, 3.24823072282054, 2.99326450693251, 4.44019613660957, -3.10184060180242, -2.74675555833844, -2.71565556502017, -3.22527692027495, -2.22188375624987, -2.72888384411352, 0.231595667136899, 0.173891627347787, -0.223966027797061, 0.0751945275228338, 0.125282104406178, 0.0481413704837373, 0.481609575800105, -1.27051077037403, 0.21254814569515, 0.0348802497408186, 0.760837665466918, 0.566475537873721, -1.42673694298896, -1.53945780734098, -1.17210570694194, -0.661383296052477, -0.567090657519675, -0.964882715488163, 0.156658948980139, 0.714566671030883, 0.159917173682481, 0.313634262003158, 1.22679048519023, 0.758786680821952, -1.2403896361558, -1.69405035400501, -0.95060896844149, -0.83435584122944, -1.48377078392882, -0.446414122598918, -3.70769300219641, -4.13406257298295, -3.60779081278716, -3.96791842359077, -3.07228143059612, -3.60366226609738, 0.0759947862004344, -0.40154326066543, -1.07749949490836, 0.197895049739391, 0.145016027811362, -0.0573752361529405, 0.789115518808427, 1.62890892154448, 1.99099825394714, 1.41259351062021, 1.79753258550922, 1.84048484222499, -0.0261614887802711, -0.508600899539451, -0.827403320294027, -0.65162163626018, -0.0735057499889841, 0.478331645111525, -3.34583584743578, -2.80142157458451, -2.76857045776118, -3.21468635444972, -1.648618448966, -2.61531603780937, 2.68253926601656, 2.0860174573226, 1.89015177592122, 1.87784660325903, 2.14113708936321, 1.53424702504294, -2.33804381301755, -2.63179357658927, -2.97277947343823, -2.22077949372795, -2.89542415799071, -2.12669654736306, 1.14278026020447, 0.609592349717885, 1.19657349560852, 0.550140731140165, 0.847236663976349, 0.459982183927489, 0.0322175108750593, 0.413845203505471, 0.609605011280172, -0.574270680059976, 0.121061352331047, 0.784429593115472, 0.391521064529944, 0.978259988949226, 1.05305630327014, 2.04070488324981, 1.30687676892992, 1.72798668779821, 1.35874150146093, 1.50384371875893, 1.129862982347, 1.76893550882124, 1.34063843065317, 1.47914540406932, 1.3069459288973, 1.86997506594278, 4.68206573526689, 4.29362859460035, 4.26805810256307, 3.90433565006662, 0.945415282880098, 0.128371281707646, 3.38914339634369, 2.74130715311165, 3.1778924654115, 2.90713968695497, 1.26379320864094, 2.47761762134228, 5.09781757766199, 3.18260506574544, 4.44132111808615, 5.67115000225268, -4.21931062691189, -3.45512170797833, -1.6504972790523, -2.00121000962884, -2.17103619643593, -1.52320940513144, 1.9752926807936, 2.25158193089756, 4.72671573293718, 4.78866890504702, 4.07794452004984, 5.22124480708619, -1.4609598492077, -1.46744736468998, 1.51513579246374, 0.717012626747329, 0.381272700117816, 1.98396792714436, 0.705442520766795, 1.03565992640432, 2.80671220916083, 2.20665387904642, 2.81516789399324, 2.28955370628527, 1.49373831889151, 2.31162491619564, 4.62548830943641, 4.11817585818728, 4.77076035937404, 4.1027483410691, -3.4932314682279, -2.22719317871591, -1.20784575573494, -0.812555680733482, -1.39928988975616, -0.212104393477067, 1.09943826131446, 0.0847742816435899, 3.81062551639338, 3.89461020444591, 3.78319591264822, 2.79216912604385, 0.848702633566097, 0.943773564017824, 3.93941375051072, 2.82760182282977, 3.29737641030993, 3.94033382602685, 2.61503085176352, 2.61238148281161, 3.19993941024359, 2.32782269462443, 3.47400669897101, 4.14680928154631, -4.62953781374565, -4.26799923114265, -1.78525894302464, -1.89424665787181, -1.90315886276349, -2.8586882539816, -1.15852114308207, -0.475855531646184, 1.4940781373722, 1.15765007121092, 1.52434753850352, 1.94125226945619, -0.817395813645323, -1.73967116222185, 0.621834605049932, -0.185841266903942, 0.686128216776779, 0.0143512366974435, 1.05900768097678, 1.66504148151973, 2.97258644153994, 3.33875963365518, 3.87577575325268, 2.91258413656329, 0.901036441771058, 0.385497699413046, 3.07890879174789, 3.34609882944908, 3.47760136260383, 2.69190944666605, -2.23808251977675, -0.992333928697828, 0.958785852816451, 0.575303874903005, 0.402243527158563, 1.20027415834717, 0.310314994001621, -0.760039265158166, 1.45054635630563, 1.68967310865103, 1.59751809499103, 2.6274452162074, -1.97675483490511, -2.79977485717598, -0.355515571570213, -1.03705371915696, -1.38409785146521, -0.818499006780233, -2.67596160221722, -2.51816221526258, 0.272538401704229, 0.905757379853634, -0.127067896031481, 0.281166111099629, -0.504258672381971, 0.27206039214442, 1.40484585204665, 1.26833637047306, 1.42301104724949, 2.41280023236809, 1.43591576156914, 1.48391259586644, 3.83224595977141, 3.05948982485073, 4.39896515928184, 3.39247517893319, -2.64548013418666, -2.48637326243677, -0.234567550419444, -0.541956740526225, -0.46980572754695, -0.12694547988944, 1.48468392579638, 2.1296537978331, 4.60022153189798, 3.67519757329243, 3.40332969917988, 4.22142620764385, -2.38311890610365, -2.09936422735791, -0.666504854788653, 0.42909643529552, -0.410168292995684, 0.438991791672617, 0.59372576342607, -0.419705515870354, 2.39585918590482, 2.79268453466693, 2.26724185227742, 2.72986211111338, 2.36219511989271, 2.79588903777183, 3.83643784154816, 4.08780823724617, 4.60920575973008, 4.16516110837405, 1.43370505508971, 2.800052881723, 5.03319474462739, 3.94100345588787, 5.41865546509765, 5.99659157334942, -0.839408001180464, -0.78919015447961, 1.84168440035462, 1.31585068918206, 1.15842215752506, 1.50800625329013, 0.0542233055483727, 0.188415581270782, 0.656329373731973, 2.40424696952348, 1.79146713272388, 2.33444817565689, -3.2574383810873, -2.6944584684231, -3.02819879504441, 0.199165536853827, -0.361790798967406, -0.625993645806446, -1.68081592800462, -1.51188071250076, -1.45808424441517, 0.556388002641502, 1.14974678545784, 0.88024802013412, 0.454254513872838, -0.447655631964528, -0.11126404861999, 2.29337922641278, 2.57664345339313, 2.537931804876, 0.932121621282547, 1.17507227420643, 1.41191744805949, 2.97417436026266, 3.51404288557592, 3.32625223215894, 1.43651946472843, 1.07601453152401, 1.0657357545432, 3.26626860278199, 3.32778615337511, 3.71066693221579, -1.75158021017158, -1.42483329835576, -1.40235601747714, 0.961363805924711, 0.805470116307042, 1.39318704716788, 0.383140455106542, 1.11216904904654, 1.39044537167921, 2.54061860072904, 3.28568130700203, 3.26884982899216, 1.09742932043066, -0.66396932872465, 0.737737064907706, 2.02054719955828, 3.07270479611202, 2.03349133881208, -0.143536442155575, -0.384820155594582, 0.271636640940408, 2.85951834843567, 1.86023784051796, 2.73014927188644, 3.47999662796641, 2.33147593060655, 3.43182288958677, 4.84980256667708, 6.20824455238199, 6.50984918266665, 3.48690209892231, 2.97405189046083, 3.85768188094636, 5.22196168919116, 5.22992538044254, 5.92060862726857, 2.72147929843506, 2.4915662974631, 2.61247117475447, 4.56054529835513, 5.65909476327652, 4.17222646654915, 0.0622198275645948, -0.209745221491589, -0.913324949464624, 2.52676094011251, 3.23930974748868, 2.47139195270385, -3.08142264057587, -3.51112382546271, -3.42596835244659, -1.76563737314478, -0.518334525359362, -0.473955823238203, -2.3176455454376, -0.997861457328365, -1.63192971869128, 0.2769914651431, 0.693128458876668, 0.170195225497571, 0.843184916928341, 0.958753862158414, 0.491857983072687, 3.2276558963763, 2.83022525892743, 2.48349351323252, -2.56828601051827, -2.20512173194476, -2.16861705891084, -0.447313033116545, -0.170394138601854, 0.541799576832239, 0.72795302683248, -0.0578807328742229, 0.518867090500676, 3.02289020263085, 1.94070410834265, 2.95680077651837, -0.38931299012049, 0.155797084526237, -0.321457108238977, 2.38681419705189, 3.24246356305853, 3.08732943322359, 3.60963881315635, 3.36797092207722, 3.8066219705417, 5.68180135069468, 5.87632490881545, 5.05079943034103, 2.40727051792583, 3.03115577955935, 2.65092351199068, 5.43288198255192, 4.72128988432576, 4.98076053687819, -0.840325134210429, -1.17908290475567, -1.36102943876645, 1.91131103951056, 1.07580927223353, 1.15564133287763, -0.270197836320825, -1.37645058195396, -0.645856154649467, 1.46869749409491, 2.17300772206927, 1.10617092874135, -3.33232453617296, -3.50191928669354, -3.82760292701965, -1.52573446084432, -1.39531069805778, -1.27405667582808, 0.934970761632414, 1.0672560772276, 1.20358384981041, 3.03046990218667, 3.66825599682463, 4.10888667797448, -0.533847685693145, 0.279498508319087, 0.228736124994634, 2.4014210047643, 2.40336974644232, 3.13558342243796, 1.9053908263102, 2.78027943370483, 3.28875860664032, 4.90733527331646, 5.08082245167471, 4.83496753520262, 1.24583218822376, 1.68034338147419, 1.41885910134251, 3.32103458436696, 3.18170472274989, 4.25149225012811, 3.38104491732531, 2.43971251933402, 3.31070458547148, 4.80519810022032, 5.69349705467654, 5.93152235306625, 2.32428562628828, 3.31455743231116, 2.96500905320082, 3.22893880443073, 5.78769661440836, 5.04467190604451, -0.123977246652757, 0.469271619573559, 0.657201825440559, 0.467487489107349, 2.45226832732408, 2.44698192059861, 0.172139136508923, 0.176382463252888, -0.944953772287288, -0.263612302453646, 1.83078698933096, 1.84395858802902, -1.49808008904802, -0.998699146812157, -0.884224373955126, -1.51439618029052, 1.44463602764242, 1.31424075947309, 0.816987599878402, 0.638238418109542, 0.923496833171376, 0.199166508224715, 3.32668532268075, 3.59627072830653, -1.51676354106673, -1.29735442508706, -1.08376287399312, -1.44586921620836, 0.480891691338172, 0.42662573058347, -1.08441164264814, -0.599352869480088, -0.516971907554432, -0.111968362166448, 1.62407778720678, 1.78254364271258, 2.71120415010414, 2.55856230964013, 3.16043913807001, 3.34469991121011, 6.01248008460651, 6.07881097939713, 0.727258519000654, 1.3216201992247, 0.886759596754614, 2.00818352249909, 3.81215185892009, 3.91646116188721, -1.94455433521188, -1.31052289093377, -1.35489312527137, -0.770528636606329, 0.686383989380026, 1.10370290845123, 0.416701986931862, 0.156577588508873, 0.704103556259327, 0.686429015182783, 3.44844469229869, 2.86989484251354, -1.27470966351611, -1.57223310131134, -2.26745917195558, -0.910320124998161, 2.14867805617919, 1.37155378673876, 3.74342909891145, 4.67667837685985, 5.5857318046442, 3.94624683494284, 7.73959925884654, 6.16238762263087, 2.36786793696856, 2.29699383777259, 3.14741061423481, 2.49003223587898, 4.90917530409732, 5.0650227678588, 1.67460551571547, 1.87594202314857, 1.5180967081392, 1.59632719568358, 3.5488415681684, 3.51214703347389, -0.591404446190342, -0.200192367797994, 0.649427652763707, -0.0765785923977668, 1.89940455406615, 1.49550710484899, 0.882339005302216, 0.251548560521127, 0.238729061785001, -0.0154513378709349, 2.07009508096724, 3.48091047805387, -4.61325189241432, -3.66950632143823, -3.68378174671288, -3.71395045850602, -1.92803265019212, -1.30121587554216, -1.57312273979794, -1.41884538855003, -0.413589414225648, -0.306742691637398, 1.42606927711682, 1.01921099366587, 2.98867486062654, 3.31419431533928, 4.255179091509, 4.2945433066671, 5.47658721222074, 5.87347176036055, -0.705280834955502, -2.49901047955946, -1.51087838748008, -1.52118092358062, 2.29925805857893, 0.751018806233379, 0.275088827409041, 0.130823761695346, 0.0735031712284534, -0.454068227045901, 2.09578719348501, 2.29985940494056, -0.247466004679817, -0.207859454186393, -0.105152055584457, -0.055403081326854, 2.06468499448063, 1.63756462438894, 3.55670492742049, 3.62008398306648, 2.77461684685384, 3.75523443118408, 5.85771463046804, 5.99119195752515, 1.51683966628034, 2.19207530904854, 2.19661293986958, 1.9103888838218, 5.11383454173085, 4.61633250839093, -1.69790371368498, -1.62706996803222, -0.493108812261359, -1.52913940665942, 1.03450752244658, 1.03598074165264, -1.28728198831359, -0.501774959399643, -0.68321279436814, -0.895409258065961, 1.75412602092276, 1.47073015722474, -1.4289446256644, -1.49245856496431, -1.25738532983044, -1.04813452598691, 0.472273707362973, 1.11622677624294, -1.39144933980519, -0.73793499485741, -0.516923728728691, -1.17940639364446, 1.03355663210109, 1.80707357557477, 1.98374740433959, 1.25687486310691, 1.31076336295895, 3.53067196619197, 4.23122235220778, 4.83577742391216], + "first_treat": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], + "treated": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1] + }, + "r_twfe_coef": 2.06179221809775, + "r_components": [ + { + "treated_group": 3, + "control_group": 99999, + "estimate": 1.97145496676111, + "weight": 0.186046511627907, + "type": "Treated vs Untreated" + }, + { + "treated_group": 4, + "control_group": 99999, + "estimate": 2.05562238949968, + "weight": 0.209302325581395, + "type": "Treated vs Untreated" + }, + { + "treated_group": 5, + "control_group": 99999, + "estimate": 2.05590097544404, + "weight": 0.186046511627907, + "type": "Treated vs Untreated" + }, + { + "treated_group": 4, + "control_group": 3, + "estimate": 2.39769192908238, + "weight": 0.0697674418604651, + "type": "Later vs Earlier Treated" + }, + { + "treated_group": 5, + "control_group": 3, + "estimate": 2.14862126249753, + "weight": 0.0930232558139535, + "type": "Later vs Earlier Treated" + }, + { + "treated_group": 3, + "control_group": 4, + "estimate": 2.19421395808733, + "weight": 0.0465116279069768, + "type": "Earlier vs Later Treated" + }, + { + "treated_group": 5, + "control_group": 4, + "estimate": 2.02836121101131, + "weight": 0.0465116279069768, + "type": "Later vs Earlier Treated" + }, + { + "treated_group": 3, + "control_group": 5, + "estimate": 1.91120795110639, + "weight": 0.0930232558139535, + "type": "Earlier vs Later Treated" + }, + { + "treated_group": 4, + "control_group": 5, + "estimate": 2.02002445173495, + "weight": 0.0697674418604651, + "type": "Earlier vs Later Treated" + } + ], + "r_weights_sum": 1, + "n_components": 9 + }, + "two_groups_no_never_treated": { + "panel": { + "unit": [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 43, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 58, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 60], + "time": [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6], + "y": [-1.65334161244149, -1.60412900476854, 0.0783386231117954, -0.772314923635521, 0.274023506430032, 0.22246018283309, -0.666885174026258, -1.16979652285518, 1.20378864071128, 1.97228681324151, 1.73770187100263, 1.37653987326944, -0.558510791208835, -0.518101203763071, 1.57746191813088, 1.4308121261141, 1.28316530405821, 0.74799457268858, -2.59690752230753, -1.54153591515967, 0.562542240715999, 1.18577902217017, 0.661616300243638, 0.82420097090256, -1.24139862158887, 0.473581455163078, 2.66950088750765, 2.17513887067371, 2.4986898214573, 2.04229727766234, -3.25362749822115, -2.36080358735649, 0.272762440596601, -0.962216212417164, -0.25578556120635, -0.531986463493883, -0.94584897806869, -1.12671266672754, 0.643675765600725, 1.45672909498594, 0.456205970078765, 1.02955735775918, -4.22908423074939, -3.68734810861614, -1.73434967453674, -2.11556803872226, -0.144673963928415, -1.33149031716945, 1.09304995402969, 0.575231565903894, 2.48360711450747, 1.85715053629538, 2.88278327683411, 3.86433522733794, 1.7977068574965, 1.04249595045148, 3.00491681641617, 3.75529990836506, 2.74274165622619, 2.69762964046289, -0.064440023245669, 0.302375274456684, 1.60285629277718, 1.45857452931691, 1.94800172017604, 2.7388663683465, 2.37144541786921, 2.86839974604435, 5.26522517491389, 5.206126041981, 4.24379023071679, 4.69397336950135, -1.76445972004299, -1.46912847314695, 0.922719528622635, 1.4176109763527, 1.08287230839594, 0.958398080165469, -0.16133874554423, 0.647417346037235, 2.73611574364671, 1.53273766950456, 2.77216584110771, 2.34817905346076, 4.21279348956094, 3.99003863499075, 5.52994276849131, 5.73915111172915, 5.78301199653744, 6.2170154054198, -0.706057596610728, 0.151343262055481, 2.16164237001965, 2.87719587765995, 1.99786009552514, 2.73046599343128, 3.86579388804585, 4.90930530548858, 6.75238331757625, 5.26559397653048, 7.08304393369183, 6.69892069239665, -0.690220390430469, -0.0630272414618631, 1.38364895501819, 1.67810931903802, 0.462896847895013, 1.57135360606274, 1.67449660981385, 1.80688012317392, 4.048743324901, 4.53883875412561, 3.72844064654758, 4.05589147315177, 3.6177539044391, 2.95202857103717, 5.13264512637559, 4.59075861592534, 6.48465507593144, 5.29955978895993, 1.14620798977394, 0.305259356543438, 3.13731122180347, 3.56020112277228, 3.03731968704777, 2.30739460266961, 2.23260909802331, 1.65712760108429, 4.3562766600797, 4.00076695228704, 4.48692231335215, 5.16641220603784, 1.58311806974867, 1.85000364683151, 4.36071183289635, 4.64558032200934, 4.38927964449825, 4.93349632279424, 1.78977657509393, 0.821694504171488, 4.05467368640514, 4.03407984467165, 4.65306864348387, 4.77084824381416, 1.896758497632, 1.4752412630847, 2.83590702755962, 2.89958694361131, 2.55840568324924, 4.12777049257646, 0.654338120163397, 0.825123589719661, 3.21902913095015, 2.48171522209278, 3.31786011074428, 3.85442276768593, 1.16797668237427, 1.20056641778031, 3.58216078586533, 4.36306473382341, 3.40551237982074, 2.7240623892048, 0.401128469508189, 1.09676359856053, 3.19612042755752, 2.63190883082612, 2.90775961127737, 3.56800156761722, -1.36261215885348, -2.37171012432135, 0.794402491536731, 0.526049882172984, 1.43566829892076, 1.37220928962811, 6.29600229766809, 5.8687634155831, 8.57336248254667, 7.33578957621711, 7.88709461170831, 7.18735112027684, 2.93953234505574, 3.23375029156243, 4.28545740492705, 3.98574814042522, 6.77857891082619, 6.46084188525108, 1.79737180058066, 0.997472150309917, 2.17847875843997, 1.86757076489859, 3.9001717926656, 5.14758305974514, -1.06306751499736, -0.93947070149677, -0.742353663648004, -0.188068109082652, 1.51643166845591, 1.33463047570647, -2.75862672524571, -2.3477192840211, -2.45568805539315, -2.86145728642806, -0.790362807572213, -1.03480960572233, -0.841038734220087, -1.16883602347538, -0.0446365265374861, 0.140190128498668, 1.39029751934112, 2.40084190725819, 0.880078040096762, 0.435469731223619, 1.38365399592111, 0.508986648515742, 4.80520377500686, 4.1271670721091, -2.09345487212772, -1.44383293088878, -1.3616439780546, -1.17422122162225, 0.695131273104552, 0.803877199504942, 0.731377452871822, 0.234859348556701, 1.16383260617664, 1.46548051378498, 2.6003078624839, 4.9500939304319, -3.42877665331015, -3.37274483239771, -4.3433546057583, -2.96829898868231, -0.509947530291241, -1.22868147461703, -1.35525280610502, -1.45793936872338, -0.732710713070547, -0.77171009309898, 0.807570627372012, 1.63258933895546, 3.00924593816779, 4.98214431683514, 3.59604209201975, 3.95817835740553, 6.98574073009034, 6.13896932646514, 0.65368972405975, 0.899895152989556, 1.46087417805793, 1.82209828264842, 3.70936420615589, 4.83805397962539, -0.526408098744512, 0.339125830275825, 0.569616855484992, 0.707656302651002, 1.70108981248599, 2.09350201452037, 0.344815351858776, 0.789854461483963, 0.717429859254209, 0.723688581257461, 2.8319674026493, 3.58288404460419, 2.29035534216575, 1.84439141376876, 1.9279565577608, 1.5355131817334, 4.18686550143238, 4.55229426550197, 0.66188625420854, 1.34910649576515, 1.15896612157619, 1.22694210715269, 3.07805456609839, 2.40743654777557, 3.79202296477861, 3.25031571377402, 4.30618612339065, 3.87163289169966, 6.10199465663498, 7.07894433750404, -2.37288339067224, -2.33641261216811, -1.34876281952591, -2.17034401869586, 0.416710159413304, 0.56110036542591, 0.926173876710715, 0.885374526818769, 0.415286802862945, 0.657288564488188, 3.14242053575745, 3.44070083759352, -1.32634743824261, -1.28172621285504, -1.02348466394833, -0.787375567748062, 1.4302342690682, 0.891544278643336, -1.42374653949961, -2.0590597025028, -1.18577485373201, -1.80240320400341, 0.72870515552382, 1.51401767702276, 3.66973839888614, 3.0575197038738, 1.94232303904388, 3.14159644361839, 5.11695840545068, 5.91878739064483, -0.165282927545181, 0.908621748682154, 0.321465480514389, 0.817600754536799, 2.77728187063265, 3.45265642072419, 3.60138971939133, 4.97815471850869, 4.65534899744145, 5.59136958538106, 6.83434856897689, 7.73099373545403, 1.12070002521612, 0.502542734910878, 1.36839331688263, 0.606668792371833, 3.08119725810638, 3.28960837864244, 1.45736423246947, 1.20414458487125, 1.28039424035109, 1.55243596246831, 4.74065915032053, 3.69370183688679, 0.198885303485778, -0.249020712182083, 0.335116716158033, 0.509601904455367, 2.25261292302389, 2.1094371021006, 0.441870348757742, 1.04213589911621, 1.58712120051009, 0.531452799729461, 2.04147789408322, 2.41060299701302, 0.741608242071467, 1.38024967698741, 0.538265618784003, 2.12034279290256, 3.11545457708014, 3.70807015241446, 0.00401592511677135, 0.1110691495907, -1.23583080655565, 0.25846641026895, 2.113314334835, 3.24843292408548], + "first_treat": [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], + "treated": [0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1] + }, + "r_twfe_coef": 2.02325582093393, + "r_components": [ + { + "treated_group": 5, + "control_group": 3, + "estimate": 2.17245897290607, + "weight": 0.5, + "type": "Later vs Earlier Treated" + }, + { + "treated_group": 3, + "control_group": 5, + "estimate": 1.87405266896179, + "weight": 0.5, + "type": "Earlier vs Later Treated" + } + ], + "r_weights_sum": 1, + "n_components": 2 + }, + "always_treated_remapped": { + "panel": { + "unit": [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 43, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 58, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 68, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 71, 71, 72, 72, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 74, 74, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 76, 76, 76, 76, 76, 76, 77, 77, 77, 77, 77, 77, 78, 78, 78, 78, 78, 78, 79, 79, 79, 79, 79, 79, 80, 80, 80, 80, 80, 80, 81, 81, 81, 81, 81, 81, 82, 82, 82, 82, 82, 82, 83, 83, 83, 83, 83, 83, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 85, 86, 86, 86, 86, 86, 86, 87, 87, 87, 87, 87, 87, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 92, 93, 93, 93, 93, 93, 93, 94, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 96, 96, 96, 96, 96, 96, 97, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105], + "time": [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6], + "y": [0.108658939159187, 2.20044644113459, 1.8288016518028, 1.37969368855956, 0.783654033627799, 1.28801589165298, 3.0230613712795, 3.32721096130123, 2.95929074457037, 2.65323750522216, 2.80847247043931, 3.01807177631839, 2.83235841346865, 2.74326040141446, 2.39007994300886, 3.9050906512111, 3.42982261995041, 3.12440941125882, 2.55636848643868, 2.03356922333211, 2.98266736002573, 3.17418885424699, 3.132308049084, 2.44389427802983, 0.937540733462694, 0.834406076533928, 0.933053827900388, 1.66309761010392, 0.770460732370049, 2.04349498763986, 2.40221134312672, 2.79644619752757, 2.56670725351825, 3.82268592107329, 3.51099711239315, 3.30335403645735, -1.51808679173248, -2.02590631981412, -2.97855813930954, -2.37532840605077, -2.66085329843128, -1.55420605635551, -1.31077136143854, -1.28193709489092, -1.63965305659266, -1.51407038467617, -0.839000195343381, -1.24891841768968, -0.575053917558418, 0.464877761897099, -0.00589676181407994, -0.236942960520375, 0.577412801077372, -0.661877913351698, 0.881606817477891, 1.38612600830785, 1.40482289515322, 0.991443126878854, 2.52692925270666, 1.72356085227418, 2.04419433234872, 2.63975962066242, 2.42623637667335, 2.68762118324333, 2.42396044103214, 2.43998868143972, 1.55393375860246, 1.88416787332511, 2.22137344773333, 2.58361000641179, 2.62957387969325, 2.48746007556524, 2.72474357002633, 1.65783474914444, 1.98477800603323, 1.84642985828242, 2.11878336638705, 2.83639577330802, 0.889209162023236, 1.50410982557748, 3.02505723497771, 2.31498774461377, 2.43124379278771, 2.76158630323601, -2.32812313187377, -1.63999149462659, -0.646831193498869, -1.94070145753434, -0.678265539126574, -0.861510698829438, -2.9779693609824, -2.32390167497033, -2.09159318741106, -2.08406618955274, -1.98670042195977, -2.13275034383589, 1.71563189043761, 1.93493501453395, 1.90863309288685, 2.23762027408387, 1.79955685130626, 1.78953664912187, -0.216505620149865, -1.49985043463215, -0.990021606749069, -0.722434089124685, -0.767252961278251, -0.299609416410954, -1.878225100632, -0.822937309901328, -0.590927708474151, -0.294830470362979, -0.319809339756538, 0.0924228508370181, 1.9824194052673, 1.77919584352958, 1.21760541291042, 1.38600908754606, 1.1581055154916, 1.82276815411856, 1.4747051054685, 1.58968714713433, 1.09080190225367, 2.37149280291418, 1.8334466659354, 1.65492769964735, 1.48634488421636, 0.45539276044813, 0.37128027692852, 1.38565097405776, 0.502528312173103, 1.44223672135988, -1.46185563694282, -1.81115047409909, -1.75274873515681, -2.35610979823551, -1.55955362086759, -1.37760098609229, -1.84879731017557, -0.698468354494254, -0.648125283767432, -1.81942638921485, -1.28536131593841, -1.32044266639728, 1.88341219665158, 2.00091393344412, 2.9877643373038, 0.715360848311587, 2.12286546061704, 1.45541552549261, 0.81838119302354, 1.03318867649513, 0.818158642485597, 0.331652781907278, 0.592379660835577, 0.961641371489572, 0.0381916536814685, 1.01491146627108, 0.529658687863022, -0.97768704705906, 0.105388922100255, -0.0594860731658837, -1.62700619729903, -2.76556599720536, -2.59067422625311, -2.13438144079045, -2.63928461614816, -1.66857062438221, 4.68643811518314, 4.68586559397301, 5.3801316512931, 5.49881260768329, 4.62430504331264, 4.26982326369538, 0.835390877713949, 2.58696403775165, 1.79411743265409, 1.86450690174085, 1.84522378906082, 1.93849299549722, -0.339360173198288, -0.492582575394121, 0.634683286063833, 1.04825189051361, 1.21452959516938, 2.01391552571576, -2.01732364920708, -0.429717290429931, 1.55787989530193, 0.54047660875855, 0.559119529038604, 0.988023783167722, 1.02197682407176, 0.7848058940032, 3.08623987927892, 1.57529778602893, 2.26793042103777, 3.48775593851638, 1.64551557626138, 1.48803564225979, 3.70568752607688, 3.06553907580203, 4.00984398670461, 3.15866888937458, -0.368185245717841, 0.0410866484383133, 1.76956647787536, 1.28963412327478, 1.49810061295054, 2.24111785214332, -2.26528795698373, -2.88994595659362, -0.550255243982855, -1.03812216780821, 0.659417124963708, -0.577949956953824, 1.01248696323672, 2.03447490076943, 3.87116066633538, 4.97396445536848, 4.99118671808126, 4.7196058678783, 0.650410366694842, 1.42983109779584, 2.73281114841328, 4.10106139373444, 3.2200046547881, 3.5388657765663, -2.07562760085341, -1.58894657437309, 0.149336759709986, 0.0535501268688941, 0.464364330914522, 1.28782714574608, -0.238697611857122, 0.0376017023793478, 2.40684457872262, 1.86998224602118, 2.35815118911528, 2.43818345745987, 2.7592173900222, 2.43447922086326, 5.74766400600187, 5.83388354637983, 6.62721976395922, 6.43586169566815, 1.33882391931936, 0.904355984417837, 3.81901535480429, 4.0816146886014, 4.41509718815856, 4.66111796009921, 1.96692742811021, 2.15019524805104, 3.92162172668411, 3.76248616672679, 4.43761125298315, 3.60056166583044, -0.315131506190604, -0.461810861209235, 1.62486306529095, 2.19510866490282, 2.18485247072851, 1.19497920345398, -0.377794693298662, -0.461410984628611, 0.795919710536723, 1.96757129099176, 0.977226087872269, 2.02769627553828, -2.1887069604874, -2.37133083679394, -0.217147355202376, -0.106458249943801, 0.393503819839253, 1.30819068779128, 2.0143161016789, 1.89389755469315, 4.56539432553839, 3.40964526206658, 4.63655035942163, 4.13928138563545, -3.28072108963223, -1.84641797097784, -1.17067179921838, -0.128433885190352, -1.14079907551393, -0.921755948680453, 3.91615471000423, 4.09782931384891, 6.00594696137307, 5.80281753491042, 6.37090018324922, 6.35566277449349, -2.30632532576768, -2.53593270058921, -0.512236755457526, -1.10282559085595, -0.793915784439132, -1.06369185860391, -0.574075327455248, -0.788571471989979, 0.944105329886802, 1.47714526799332, 1.66067099042974, 1.4815136709, 0.819151442353647, 0.113895152214113, 3.43203877432259, 3.2850243977245, 2.77714468240935, 3.59186949635862, -5.31857869142866, -4.49235580860934, -2.64792564159042, -2.69070381535615, -2.37002291496632, -2.88839230286951, -0.307457448757027, -0.298700224271227, 2.01274717871688, 2.3834414900354, 2.28090940372418, 2.41269118715018, -1.76037511149447, -1.67181932057412, 0.925101431220421, 1.83854846801144, 1.12750267523608, 1.72714878656768, -3.25110250690211, -3.87146648472896, -2.92282965067598, -1.24304766402708, -0.784797045987987, -0.805486117578832, -1.51413788132851, -1.31703276037947, -1.11745211019586, 2.43344621107509, 1.41994396191812, 1.41842936265389, -2.28605220648515, -1.92301335632868, -1.71319202775317, 1.22126233457458, 1.20698715041738, 0.589738671644679, 3.50526261268637, 3.64472112352863, 2.38698516264696, 5.35075678355936, 5.59692401723183, 6.07526383752121, 2.4638090368259, 3.51947878982916, 2.70340417188164, 5.14540323092111, 5.24756513479625, 5.38043464047782, -0.654476426042555, -0.664150262455739, -0.0625005411655422, 2.62305149262239, 1.76006483962769, 1.96359948868578, -3.91258034323285, -2.7250810479861, -3.10439244527058, -1.69744986223291, -1.02897672620732, -1.12200748874665, 0.0563141847204181, 0.265712184816272, 0.00229315544190356, 2.64132322727652, 2.2447967598807, 1.89562290116646, 0.534308551863677, 0.871438814446053, 1.63593593504707, 3.94099992615828, 3.86008115482791, 4.7252335311953, -2.30982901039024, -2.07426312569511, -0.760822489102454, -0.329337801095775, 1.04005290919026, 0.352336236261556, -0.971443358344016, -0.169653452160458, -0.366105290000073, 0.948456514256866, 2.0441516901633, 1.17824446407079, 3.13275986403188, 2.49908391051499, 2.81673753577355, 4.96105331778696, 6.08587570362468, 5.35922405688456, 1.3427546705349, 0.945298236487927, 0.32062320281464, 2.8391258868033, 3.8394670425801, 4.07339897251476, 0.471826046519307, 0.189434672029105, 1.20598800183972, 2.48205306937404, 3.57507717534798, 2.96831645775753, 2.18230025619376, 1.74297484346807, 1.84808354126117, 3.70120865050772, 4.10998796813014, 4.8703182756054, 0.862015395133013, 0.461506946380119, 1.08958257584856, 2.64063529143381, 3.13089657137759, 3.8921819836783, 1.61865786920022, 2.21369112525703, 1.68950563598656, 4.64912008634256, 4.6246903081326, 4.7727893385402, 6.2991361961654, 5.31587137581753, 5.53810097931811, 7.87050725755336, 7.71990800444773, 7.92242598326646, 0.000202802888246145, 1.48880400638883, -0.165265927058115, 2.32886898747194, 2.46565029322908, 2.54860503035108, 1.9904793567156, 2.54064173791874, 2.21032769715243, 4.78478457505072, 4.43796357246012, 5.1367300588122, -1.92978168283057, -0.688920096868469, -0.948917232451244, 1.68274285911124, 1.18205455331794, 0.717721667205075, -2.84934338233387, -3.13090861368319, -1.98773462951809, -0.0560231612615263, -0.303212732525186, -0.0801190174935725, 1.1318159457957, 0.114060621621842, 1.17330217409745, 3.47264777272166, 3.60282058671528, 3.47730245217732, 2.27242569451148, 2.52453339822716, 3.15275384485553, 4.88992964756691, 4.70567618845892, 5.09369479910792, 1.8385518662352, 2.35734388709752, 2.42538224629847, 4.94320573633645, 5.00379606783087, 6.06895157538634, -1.20883034577664, -1.11587164362918, -1.53780496495671, -0.864844346922926, 0.811744720671417, 1.48286782001324, 0.976513724472541, 2.76061254320184, 2.02765872221072, 2.08735259732614, 3.60754306418452, 4.21594955936887, 2.39428951967253, 2.59980249409554, 2.01707432439884, 2.54959221183576, 4.85295018016941, 4.56619329259476, 1.80333678635028, 0.924767476359233, 2.31858497841515, 2.21624086170821, 4.39757985104944, 5.1207265518189, 2.17977824220508, 3.48233456431438, 2.36804398203052, 3.38708357962159, 5.33109196015434, 5.29580303309136, -5.14051341166269, -4.4935288861429, -4.93777022461467, -4.43117149375318, -2.31381047007467, -2.50216618354922, -0.678335367590773, -0.346117115516304, 0.306014655002086, 1.56344911609566, 3.03176640203121, 2.9639609448779, 0.266507774958225, -0.0730321291139868, 0.629397610429072, 0.59256005254182, 2.2474562758478, 3.19814530271131, -0.167135389215384, 0.841285921560358, 0.172307947357268, 1.12093054494582, 2.67633601229707, 2.16807830305374, 3.66824874433049, 4.00702658167322, 4.57935203892353, 3.90737617292466, 5.51236487656229, 7.13673264825921, 0.315583117224713, 0.212922209774131, 0.559081964964464, -0.511635594616975, 2.22645644194324, 2.73586210400605, 0.397224124194428, 0.313884016754581, 1.3976009643301, 0.682797649946404, 2.91897907184268, 3.5979257311841, -1.23195943055697, -2.02240912615645, -1.14184603764168, -1.14840963982586, -0.309522412016135, 1.59526903759681, -2.04149530842743, -1.94632515290058, -1.08852387534326, -1.10991740795875, 1.6425963871392, 1.03558655506671, 1.60035649688446, 0.946477790489681, 1.47880300344376, 2.15516354036151, 4.6486319805148, 3.82434528805234, 2.70003610949074, 2.74562784132794, 2.75943591477256, 3.06161357539874, 4.81984525855201, 6.0140902995753, -2.21339364947307, -2.26130788927523, -1.87677409328004, -2.15338969950951, -1.65308800146823, 0.766584807139541, 1.01761514516435, 2.30669579984962, 0.449096912625319, 0.906149685604866, 3.21800260428817, 3.15057828254646, -1.23249204711816, -0.588171667898893, 0.362995425608448, -0.00467225358763007, 2.43044799998142, 2.08377947265025, 0.74359847163065, 0.685649093115789, 0.992761888217612, 1.24096464991103, 3.80535865298666, 3.81351573294764, -2.7399717813477, -3.36837903907433, -2.11010886582366, -2.47214375041244, -0.789701206583573, -0.0501645314593588, -1.33459151650553, -1.17816510693688, -1.11038583200541, -1.44007648834056, 1.79564078666548, 1.09412215446346, 1.90835139402718, 2.13300344620801, 1.79635128773634, 2.99258773559818, 4.74229520560892, 4.79013440384626, -0.648428813958483, -1.63215654254437, -1.74628399305461, -1.72908271721215, 0.103497254095757, 1.48152572528025, 0.760568276315964, 1.2799049167174, 2.03493683232877, 0.981468467744489, 3.02491012802842, 2.836545903481], + "first_treat": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], + "treated": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1] + }, + "r_twfe_coef": 2.11285036643278, + "r_components": [ + { + "treated_group": 3, + "control_group": 1, + "estimate": 2.01709964101752, + "weight": 0.0333333333333333, + "type": "Later vs Always Treated" + }, + { + "treated_group": 4, + "control_group": 1, + "estimate": 2.22194209617149, + "weight": 0.0375, + "type": "Later vs Always Treated" + }, + { + "treated_group": 5, + "control_group": 1, + "estimate": 2.31705723566306, + "weight": 0.0333333333333333, + "type": "Later vs Always Treated" + }, + { + "treated_group": 3, + "control_group": 99999, + "estimate": 2.10503098972966, + "weight": 0.166666666666667, + "type": "Treated vs Untreated" + }, + { + "treated_group": 4, + "control_group": 99999, + "estimate": 2.33132491093401, + "weight": 0.1875, + "type": "Treated vs Untreated" + }, + { + "treated_group": 5, + "control_group": 99999, + "estimate": 2.15671926488078, + "weight": 0.166666666666667, + "type": "Treated vs Untreated" + }, + { + "treated_group": 4, + "control_group": 3, + "estimate": 2.13990086651395, + "weight": 0.0625, + "type": "Later vs Earlier Treated" + }, + { + "treated_group": 5, + "control_group": 3, + "estimate": 1.93656842970099, + "weight": 0.0833333333333333, + "type": "Later vs Earlier Treated" + }, + { + "treated_group": 3, + "control_group": 4, + "estimate": 1.93565794904421, + "weight": 0.0416666666666667, + "type": "Earlier vs Later Treated" + }, + { + "treated_group": 5, + "control_group": 4, + "estimate": 1.94448887187764, + "weight": 0.0416666666666667, + "type": "Later vs Earlier Treated" + }, + { + "treated_group": 3, + "control_group": 5, + "estimate": 1.83272035191163, + "weight": 0.0833333333333333, + "type": "Earlier vs Later Treated" + }, + { + "treated_group": 4, + "control_group": 5, + "estimate": 2.04986440328345, + "weight": 0.0625, + "type": "Earlier vs Later Treated" + } + ], + "r_weights_sum": 1, + "n_components": 12 + } +} diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py index b0c690f1..b786fb15 100644 --- a/diff_diff/bacon.py +++ b/diff_diff/bacon.py @@ -1302,9 +1302,8 @@ def bacon_decompose( >>> from diff_diff import bacon_decompose >>> >>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights - >>> # (weights="exact"); intended to match R bacondecomp::bacon() at - >>> # atol=1e-6 (R parity goldens pending — see TODO.md "R parity - >>> # goldens generation" for the deferred validation step). + >>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 + >>> # (validated via tests/test_methodology_bacon.py::TestBaconParityR). >>> results = bacon_decompose( ... data=panel_df, ... outcome='earnings', diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 6d64c37d..e9578bd4 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2668,7 +2668,7 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_ - Always-treated units: see `**Note (always-treated remap)**` below **Reference implementation(s):** -- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens pending follow-up R install (see TODO.md). +- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens committed at `benchmarks/data/r_bacondecomp_golden.json` (generated against `bacondecomp` 0.1.1 + R 4.5.2). Parity validated at `atol=1e-6` via `tests/test_methodology_bacon.py::TestBaconParityR` (TWFE coefficient + weights-sum match across 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped`). - Stata: `bacondecomp` (SSC). Authors: Goodman-Bacon, Goldring, Nichols (2019). **Requirements checklist:** @@ -2678,11 +2678,12 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_ - [x] Visualization shows weight vs. estimate by comparison type - [x] Always-treated remap to U per Goodman-Bacon (2021) footnote 11 (PR-B audit) - [x] Hand-calculable Theorem 1 verification: `tests/test_methodology_bacon.py::TestBaconHandCalculation` (7 tests, atol=1e-10) -- [ ] R `bacondecomp::bacon()` parity at atol=1e-6 (R generator script committed; JSON goldens pending follow-up R install — `tests/test_methodology_bacon.py::TestBaconParityR` skips when missing) +- [x] R `bacondecomp::bacon()` parity at atol=1e-6 (3 fixtures; TWFE coefficient + weights-sum match across all 3; per-component parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped` — see `**Note (R parity convention divergence)**` below) - [x] Survey design support (Phase 3): weighted cell means, weighted within-transform, weighted group shares -- **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default with the **intent** to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated by hand-calculation (atol=1e-10) and TWFE-vs-weighted-sum identity (atol=1e-10) but the direct R bit-by-bit parity at atol=1e-6 is still pending the R `bacondecomp` install — see Test Coverage checklist above. The approximate path is retained for backward compatibility; numerical output may differ from R. +- **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated at `atol=1e-6` (see `**Note (R parity convention divergence)**` below for the one structural convention difference). Hand-calculation + TWFE-vs-weighted-sum identity hold at `atol=1e-10`. The approximate path is retained for backward compatibility; numerical output may differ from R. - **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`. - **Note (Bacon survey diagnostic):** Bacon decomposition with survey weights is diagnostic; exact-sum guarantee holds at machine precision under `weights="exact"` **on balanced panels**. `weights="exact"` requires within-unit-constant survey columns (approximate path accepts time-varying weights). +- **Note (R parity convention divergence on always-treated):** R `bacondecomp::bacon()` keeps `first_treat=1` (the always-treated cohort) as a separate timing cohort and emits an additional comparison type `Later vs Always Treated` (cohort k vs the always-treated cell) alongside the standard `Treated vs Untreated` row. Python's footnote-11 convention remaps these units to the `U` bucket and folds those R-side rows into a single `treated_vs_never` cell per treated cohort. The aggregate (TWFE coefficient + sum of weights) is invariant to this re-bucketing — Theorem 1's identity holds identically because the U bucket's total weight gets re-allocated across nested 2x2 cells but the total weight on `{cohort_k vs U}` is the same. The per-component breakdown, however, differs structurally between the two conventions. The R parity test (`tests/test_methodology_bacon.py::TestBaconParityR::test_component_estimates_match_r`) asserts per-component parity at `atol=1e-6` on the 2 fixtures without always-treated (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`) and skips the `always_treated_remapped` fixture for this assertion while keeping the aggregate parity (`test_twfe_coef_matches_r`, `test_weights_sum_matches_r`) locked across all 3 fixtures. - **Deviation (unbalanced-panel library extension):** Unbalanced panels are accepted with a `UserWarning` ("Unbalanced panel detected. Bacon decomposition assumes balanced panels. Results may be inaccurate."). Goodman-Bacon (2021) Appendix A's proof assumes a balanced panel; under unbalance, the Theorem 1 identity holds only approximately. The decomposition still returns finite, well-defined outputs but `weights="exact"` does NOT achieve the machine-precision algebraic identity that the balanced-panel claims above describe. --- diff --git a/tests/test_methodology_bacon.py b/tests/test_methodology_bacon.py index f4e43745..1c73331c 100644 --- a/tests/test_methodology_bacon.py +++ b/tests/test_methodology_bacon.py @@ -397,6 +397,21 @@ def _classify_r_type(c: dict, fixture_name: str) -> str: for fixture_name, fix in golden.items(): if fixture_name == "meta": continue + # ``always_treated_remapped``: R keeps ``first_treat=1`` as a + # separate cohort and emits ``Later vs Always Treated`` (and + # ``Treated vs Untreated``) comparisons against it. Python's + # paper-footnote-11 convention remaps those units to U, + # folding R's two columns of components into single + # ``treated_vs_never`` cells per treated cohort. The aggregate + # (TWFE coefficient + weights-sum) is invariant to this + # re-bucketing and is locked by ``test_twfe_coef_matches_r`` + # and ``test_weights_sum_matches_r`` above, but the + # per-component set differs **structurally** under the two + # conventions. Skip this fixture's per-component assertion + # while keeping the aggregate parity. See REGISTRY note on + # always-treated remap for the convention rationale. + if fixture_name == "always_treated_remapped": + continue panel = pd.DataFrame(fix["panel"]) results = bacon_decompose( panel, From e73a6c40aec68ebb2ff515908e535f39a395e719 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 14:01:58 -0400 Subject: [PATCH 02/13] PR #457 R1 polish: qualify docstring parity claim + refresh skip message R1 verdict was Looks good with 2 P3 informational items. Both addressed: 1. P3 (Documentation/Tests): `bacon_decompose()` docstring example said "matches R bacondecomp::bacon() at atol=1e-6" without mentioning the documented always-treated convention exception. Qualified the example to spell out the aggregate-vs-per-component split: aggregate parity holds for all panels at atol=1e-6, per-component parity holds when first_treat is bounded below by min(time) (no always-treated), and the divergence on always-treated panels is by convention (Python remap-to-U vs R's `Later vs Always Treated`). Cross-references the REGISTRY note for the full contract. 2. P3 (Documentation/Tests): `TestBaconParityR`'s skip message still said the goldens were "deferred until R is provisioned (see TODO.md)" but the TODO row was removed in this PR. Updated to describe the intended skip case (partial-checkout / packaging scenarios where the committed JSON is unavailable) and dropped the TODO reference. Tests unchanged: 33/33 pass in test_methodology_bacon.py. --- diff_diff/bacon.py | 11 +++++++++-- tests/test_methodology_bacon.py | 7 ++++--- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py index b786fb15..115a2234 100644 --- a/diff_diff/bacon.py +++ b/diff_diff/bacon.py @@ -1302,8 +1302,15 @@ def bacon_decompose( >>> from diff_diff import bacon_decompose >>> >>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights - >>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 - >>> # (validated via tests/test_methodology_bacon.py::TestBaconParityR). + >>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 on + >>> # the aggregate (TWFE coefficient + weights-sum) across all panels, + >>> # and on the per-component breakdown when first_treat is bounded + >>> # below by min(time) (no always-treated). For panels with + >>> # always-treated units, the per-component breakdown diverges by + >>> # convention (Python remaps to U per paper footnote 11; R emits + >>> # `Later vs Always Treated`); see REGISTRY note on R parity + >>> # convention divergence. Validated via + >>> # tests/test_methodology_bacon.py::TestBaconParityR. >>> results = bacon_decompose( ... data=panel_df, ... outcome='earnings', diff --git a/tests/test_methodology_bacon.py b/tests/test_methodology_bacon.py index 1c73331c..cc9762ec 100644 --- a/tests/test_methodology_bacon.py +++ b/tests/test_methodology_bacon.py @@ -301,11 +301,12 @@ def test_eq_9_later_vs_earlier_variance(self) -> None: def _load_r_golden() -> dict: if not _R_GOLDEN_PATH.exists(): pytest.skip( - f"R parity goldens missing at {_R_GOLDEN_PATH}. To generate, " + f"R parity goldens missing at {_R_GOLDEN_PATH}. To regenerate, " "install R + `install.packages('bacondecomp')` + " "`install.packages('jsonlite')` then `cd benchmarks/R && " - "Rscript generate_bacon_golden.R`. The R goldens are deferred " - "until R is provisioned (see TODO.md)." + "Rscript generate_bacon_golden.R`. The goldens are committed " + "to the repo by default; this skip path covers partial-checkout " + "or packaging scenarios where the JSON file is unavailable." ) return json.loads(_R_GOLDEN_PATH.read_text()) From 86facdd92c138022c81e1d399340ff70ef7a9d77 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 14:41:23 -0400 Subject: [PATCH 03/13] PR #457 R2 polish: refresh stale Test Coverage activation note R2 verdict was Looks good with 1 P3 informational item. METHODOLOGY_REVIEW.md Test Coverage line read "all active; R parity activates once goldens are committed" - stale after this PR commits the goldens and activates the 3 R-parity tests. Updated to reflect the post-PR state: all 33 tests active including R-parity (with pointer to the committed JSON). --- METHODOLOGY_REVIEW.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index 03464983..a944e31b 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -929,7 +929,7 @@ and covariate-adjusted specifications.) - [x] R `bacondecomp::bacon()` parity at `atol=1e-6` — 3 fixtures (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`); TWFE coefficient + weights-sum match across all 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped` (Python's footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition — aggregate is invariant, breakdown is structurally different). See `benchmarks/data/r_bacondecomp_golden.json` + `TestBaconParityR`. **Test Coverage:** -- 33 methodology tests in `tests/test_methodology_bacon.py` across 6 classes (all active; R parity activates once goldens are committed) +- 33 methodology tests in `tests/test_methodology_bacon.py` across 6 classes — all active, including the 3 R-parity tests (goldens committed at `benchmarks/data/r_bacondecomp_golden.json`) - 32 existing tests in `tests/test_bacon.py` (basic decomposition, weight properties, weights-parameter API, TWFE integration, visualization, balanced-panel warnings, edge cases) **R Comparison Results:** From 780d50287c1f7287786c6719eabb8edeb4da3740 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 14:51:17 -0400 Subject: [PATCH 04/13] =?UTF-8?q?PR=20#457=20R3=20polish:=20assert=20R?= =?UTF-8?q?=E2=86=92Python=20U-bucket=20fold-back=20on=20always-treated?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit R3 verdict was Looks good with 1 P3 informational item. The per-component parity test skips the `always_treated_remapped` fixture (R/Python decompose the U bucket differently by convention), and the REGISTRY note documents that aggregating R's `Later vs Always Treated` + `Treated vs Untreated` rows by treated cohort should match Python's single `treated_vs_never` component for that cohort. The reviewer flagged that the documented structural claim was not directly asserted in tests — a cohort-level regression in the fold-back could slip through under overall TWFE parity. Per memory `feedback_test_coverage_gap_treat_as_actionable`, the "test exists but doesn't directly exercise the documented surface" P3 is actionable. Added `test_always_treated_remapped_fold_back_matches_r` to `TestBaconParityR`: for each treated cohort in the remap fixture, aggregate R's `Later vs Always Treated` + `Treated vs Untreated` rows by combined weight and weight-averaged estimate, then assert both match Python's `treated_vs_never` component for that cohort at atol=1e-6. Currently passes — confirms the documented structural fold-back is exact at numerical precision. Tests: 34/34 pass in test_methodology_bacon.py (was 33; +1 new regression). --- tests/test_methodology_bacon.py | 68 +++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/tests/test_methodology_bacon.py b/tests/test_methodology_bacon.py index cc9762ec..bae949ce 100644 --- a/tests/test_methodology_bacon.py +++ b/tests/test_methodology_bacon.py @@ -465,6 +465,74 @@ def _classify_r_type(c: dict, fixture_name: str) -> str: f"{fixture_name} {k}: weight Python={py_weights[k]} " f"vs R={r_weights[k]}" ) + def test_always_treated_remapped_fold_back_matches_r(self, golden) -> None: + """Pin the documented R→Python fold-back for the always-treated U bucket. + + The per-component test above skips ``always_treated_remapped`` because + R and Python decompose the U bucket differently — but the documented + REGISTRY claim is that **aggregating** R's `Later vs Always Treated` + + `Treated vs Untreated` rows by treated cohort matches Python's + single `treated_vs_never` cell for that cohort. Assert that fold-back + directly so a cohort-level regression can't slip through under + overall TWFE parity. + + For each treated cohort k: + - R: combined weight w_R = w(k vs always-treated) + w(k vs untreated) + and weight-weighted estimate e_R = Σ w_i * e_i / w_R + - Python: single treated_vs_never component (w_Py, e_Py) + - Assert |w_Py - w_R| < 1e-6 AND |e_Py - e_R| < 1e-6. + """ + if "always_treated_remapped" not in golden: + pytest.skip("always_treated_remapped fixture not in goldens") + fix = golden["always_treated_remapped"] + panel = pd.DataFrame(fix["panel"]) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UserWarning) + results = bacon_decompose( + panel, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + weights="exact", + ) + # Build Python's treated_vs_never lookup: cohort -> (weight, estimate) + py_tvn = { + float(c.treated_group): (c.weight, c.estimate) + for c in results.comparisons + if c.comparison_type == "treated_vs_never" + } + # Aggregate R's two U-bucket types per treated cohort. + # R uses ctrl=99999 for untreated and ctrl=1 (the always-treated cohort) + # for the `Later vs Always Treated` rows. + r_agg: dict = {} + for c in fix["r_components"]: + ctype = c.get("type", "") + if "Untreated" in ctype or ("Always Treated" in ctype and "Later" in ctype): + k = float(c["treated_group"]) + w = float(c["weight"]) + e = float(c["estimate"]) + if k not in r_agg: + r_agg[k] = [0.0, 0.0] # [sum_w, sum_w_e] + r_agg[k][0] += w + r_agg[k][1] += w * e + # Cohorts must match + assert set(py_tvn.keys()) == set(r_agg.keys()), ( + f"always_treated_remapped: treated_vs_never cohorts differ. " + f"Python: {sorted(py_tvn)}, R-aggregated: {sorted(r_agg)}" + ) + for k, (py_w, py_e) in py_tvn.items(): + r_w, r_we = r_agg[k] + r_e = r_we / r_w + assert abs(py_w - r_w) < 1e-6, ( + f"always_treated_remapped cohort={k}: combined weight " + f"Python={py_w:.10f} vs R-aggregated={r_w:.10f}" + ) + assert abs(py_e - r_e) < 1e-6, ( + f"always_treated_remapped cohort={k}: weight-averaged estimate " + f"Python={py_e:.10f} vs R-aggregated={r_e:.10f}" + ) + # --------------------------------------------------------------------------- # 3. Always-treated warn+remap From 8225ba06567f823ee0624cfcec1cd93b08cf79cd Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 14:58:03 -0400 Subject: [PATCH 05/13] PR #457 R4 polish: narrow always-treated carve-out to U-bucket only R4 verdict was Looks good with 1 P3 informational item: the per-component parity test skipped the ENTIRE always_treated_remapped fixture, leaving the 6 timing-vs-timing rows (Earlier/Later vs Earlier/Later Treated between cohorts 3/4/5) without direct per-component parity assertions. Per memory feedback_test_coverage_gap_treat_as_actionable, this is the "test exists but doesn't directly exercise the surface" pattern and should be actionable. Narrowed the carve-out: instead of skipping the whole fixture, drop only the treated_vs_never keys from both Python and R sides (the actual U-bucket convention divergence), and keep direct atol=1e-6 parity assertions on the 6 timing-vs-timing keys. Also refined _classify_r_type to canonicalize R's "Later vs Always Treated" type string to treated_vs_never (Python folds those rows into the U bucket per paper footnote 11, so they belong to the U comparison set semantically even though R numbers them by the always-treated cohort), keeping the narrow carve-out simple. Tests: 34/34 pass in test_methodology_bacon.py (+6 directly asserted timing-vs-timing comparisons in the remap fixture vs prior coverage). --- tests/test_methodology_bacon.py | 67 ++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/tests/test_methodology_bacon.py b/tests/test_methodology_bacon.py index bae949ce..73f5d7b8 100644 --- a/tests/test_methodology_bacon.py +++ b/tests/test_methodology_bacon.py @@ -372,13 +372,21 @@ def _canonical_control(ctype: str, group): def _classify_r_type(c: dict, fixture_name: str) -> str: # R bacondecomp's `type` strings vary across versions - # ("Treated vs Untreated", "Earlier vs Later Treated", ...). - # Fall back to inferring from the control_group: U sentinel - # (0, np.inf, or "never"-containing string) -> treated_vs_never; - # otherwise treated_group < control_group is earlier-vs-later. + # ("Treated vs Untreated", "Earlier vs Later Treated", + # "Later vs Always Treated", ...). Fall back to inferring from + # the control_group: U sentinel (0, np.inf, or "never"-containing + # string) -> treated_vs_never; otherwise treated_group < + # control_group is earlier-vs-later. Note: ``Later vs Always + # Treated`` is canonicalized to ``treated_vs_never`` here because + # Python's paper-footnote-11 convention folds always-treated + # units into the U bucket — semantically these R rows belong + # to the U comparison set even though R numbers them by the + # always-treated cohort (typically first_treat=1). t = c.get("type") or "" if "never" in t.lower() or "untreated" in t.lower(): return "treated_vs_never" + if "always" in t.lower(): + return "treated_vs_never" ctrl = c["control_group"] if isinstance(ctrl, str) and "never" in ctrl.lower(): return "treated_vs_never" @@ -398,30 +406,17 @@ def _classify_r_type(c: dict, fixture_name: str) -> str: for fixture_name, fix in golden.items(): if fixture_name == "meta": continue - # ``always_treated_remapped``: R keeps ``first_treat=1`` as a - # separate cohort and emits ``Later vs Always Treated`` (and - # ``Treated vs Untreated``) comparisons against it. Python's - # paper-footnote-11 convention remaps those units to U, - # folding R's two columns of components into single - # ``treated_vs_never`` cells per treated cohort. The aggregate - # (TWFE coefficient + weights-sum) is invariant to this - # re-bucketing and is locked by ``test_twfe_coef_matches_r`` - # and ``test_weights_sum_matches_r`` above, but the - # per-component set differs **structurally** under the two - # conventions. Skip this fixture's per-component assertion - # while keeping the aggregate parity. See REGISTRY note on - # always-treated remap for the convention rationale. - if fixture_name == "always_treated_remapped": - continue panel = pd.DataFrame(fix["panel"]) - results = bacon_decompose( - panel, - outcome="y", - unit="unit", - time="time", - first_treat="first_treat", - weights="exact", - ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UserWarning) + results = bacon_decompose( + panel, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + weights="exact", + ) py_estimates = {} py_weights = {} for c in results.comparisons: @@ -443,6 +438,24 @@ def _classify_r_type(c: dict, fixture_name: str) -> str: ) r_estimates[key] = c["estimate"] r_weights[key] = c["weight"] + # ``always_treated_remapped`` carves out only the U-bucket rows, + # which R and Python decompose under different conventions + # (R: separate ``Later vs Always Treated`` + ``Treated vs + # Untreated``; Python: single ``treated_vs_never`` per cohort + # via paper-footnote-11 remap). The aggregated fold-back is + # asserted in ``test_always_treated_remapped_fold_back_matches_r``. + # The 6 timing-vs-timing rows in that fixture are NOT affected + # by the convention split and must satisfy direct per-component + # parity at atol=1e-6 — narrow the carve-out to U-bucket keys + # only so regressions in timing-vs-timing decomposition are + # caught directly, not just through aggregate parity. + if fixture_name == "always_treated_remapped": + # Drop only treated_vs_never keys from both sides; keep + # earlier_vs_later + later_vs_earlier for direct parity. + py_estimates = {k: v for k, v in py_estimates.items() if k[0] != "treated_vs_never"} + py_weights = {k: v for k, v in py_weights.items() if k[0] != "treated_vs_never"} + r_estimates = {k: v for k, v in r_estimates.items() if k[0] != "treated_vs_never"} + r_weights = {k: v for k, v in r_weights.items() if k[0] != "treated_vs_never"} # Full-set equality: no Python component missing from R, no R # component missing from Python. A dropped β̂_{kU} term or an # extra spurious comparison would fail here. From 8f504bdf0586d8b721de0da7a9a225aca3e33d6f Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 15:07:33 -0400 Subject: [PATCH 06/13] PR #457 R5 polish: refresh prose to match narrowed carve-out R5 verdict was Looks good with 1 P3 informational item: docs prose out of sync with the actual parity harness after R4's carve-out narrowing. CHANGELOG, REGISTRY, METHODOLOGY_REVIEW, and the fold-back test's own docstring still said TestBaconParityR has 3 tests and that always_treated_remapped is skipped for per-component parity, while the code now adds a fourth fold-back test and only carves out treated_vs_never rows while keeping direct parity on the 6 timing- vs-timing rows of that fixture. Refreshed 6 surfaces: - METHODOLOGY_REVIEW.md Verified Components checklist + Test Coverage count (33 -> 34) + R Comparison Results subsection for the remap fixture. - docs/methodology/REGISTRY.md Reference Implementations bullet, Requirements checklist, and Note (R parity convention divergence) text to reflect the narrowed carve-out and the fold-back test. - CHANGELOG.md PR-457 Added entry (4 tests, narrowed carve-out description, fold-back test mention). - tests/test_methodology_bacon.py::test_always_treated_remapped_fold_back_matches_r docstring (no longer says the per-component test "skips" the fixture; says it carves out only the U-bucket rows). Tests: 34/34 pass. --- CHANGELOG.md | 2 +- METHODOLOGY_REVIEW.md | 6 +++--- docs/methodology/REGISTRY.md | 6 +++--- tests/test_methodology_bacon.py | 12 +++++++----- 4 files changed, 14 insertions(+), 12 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a4c923ef..30bff40c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added -- **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (3 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures, with a **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort — the aggregate is invariant per Theorem 1 but the per-component breakdown differs structurally). Per-component assertion is skipped on the remap fixture with explicit documentation in the test class and a new `**Note (R parity convention divergence on always-treated)**` in `docs/methodology/REGISTRY.md`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`. +- **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (4 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); plus a dedicated fold-back test (`test_always_treated_remapped_fold_back_matches_r`) that pins the **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort) by aggregating R's split rows per cohort and asserting they match Python's single fold at `atol=1e-6`. The aggregate is invariant per Theorem 1; the per-component breakdown differs structurally between conventions but the fold-back is now directly asserted. New `**Note (R parity convention divergence on always-treated)**` in `docs/methodology/REGISTRY.md`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`. - **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference(cluster="unit").fit(..., time="post")` (the cluster kwarg is required because `TripleDifference` is the repeated-cross-section `panel=FALSE` estimator and unclustered SE on panel-generated rows understates variance under within-unit serial correlation; the point estimate `att` is invariant to clustering — see the new `TripleDifference` REGISTRY note on panel-shaped input). Users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md. - **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete (R parity goldens pending)**). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (~24 tests across 6 classes: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` skips with a pointer when goldens missing; `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); JSON goldens deferred until `bacondecomp` R package is installed (parity tests skip cleanly with an explicit pointer). (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row). - **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index a944e31b..c6be5be6 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -926,17 +926,17 @@ and covariate-adjusted specifications.) - [x] No untreated group: `s_{kU}` terms drop, weights renormalize, sum-to-1 still holds - [x] Single timing group with U: only `treated_vs_never` comparisons - [x] Survey design composes cleanly with exact mode and warn+remap -- [x] R `bacondecomp::bacon()` parity at `atol=1e-6` — 3 fixtures (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`); TWFE coefficient + weights-sum match across all 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped` (Python's footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition — aggregate is invariant, breakdown is structurally different). See `benchmarks/data/r_bacondecomp_golden.json` + `TestBaconParityR`. +- [x] R `bacondecomp::bacon()` parity at `atol=1e-6` — 3 fixtures (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`); TWFE coefficient + weights-sum match across all 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); R→Python U-bucket fold-back asserted by a dedicated `test_always_treated_remapped_fold_back_matches_r` test that aggregates R's split `Later vs Always Treated` + `Treated vs Untreated` rows per cohort and compares to Python's single `treated_vs_never` cell at `atol=1e-6`. See `benchmarks/data/r_bacondecomp_golden.json` + `TestBaconParityR`. **Test Coverage:** -- 33 methodology tests in `tests/test_methodology_bacon.py` across 6 classes — all active, including the 3 R-parity tests (goldens committed at `benchmarks/data/r_bacondecomp_golden.json`) +- 34 methodology tests in `tests/test_methodology_bacon.py` across 6 classes — all active, including the 4 R-parity tests (3 aggregate/per-component + 1 always-treated fold-back; goldens committed at `benchmarks/data/r_bacondecomp_golden.json`) - 32 existing tests in `tests/test_bacon.py` (basic decomposition, weight properties, weights-parameter API, TWFE integration, visualization, balanced-panel warnings, edge cases) **R Comparison Results:** - **Validated** at `atol=1e-6` against `bacondecomp::bacon()` (version 0.1.1, R 4.5.2). Goldens at `benchmarks/data/r_bacondecomp_golden.json`; generator at `benchmarks/R/generate_bacon_golden.R`. Three DGP fixtures: - `uniform_3groups_with_never_treated`: 9 components covering all three comparison types — full per-component parity (estimate + weight at `atol=1e-6`). - `two_groups_no_never_treated`: 2 components, timing-only decomposition — full per-component parity. - - `always_treated_remapped`: TWFE coefficient + weights-sum match at `atol=1e-6`; per-component breakdown diverges by convention (Python's paper-footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition). The aggregate is invariant to the re-bucketing per Theorem 1; only the breakdown differs. Per-component assertion skipped for this fixture with explicit documentation in `TestBaconParityR.test_component_estimates_match_r`. + - `always_treated_remapped`: TWFE coefficient + weights-sum match at `atol=1e-6`; the 6 timing-vs-timing rows (between cohorts 3/4/5) also satisfy direct per-component parity at `atol=1e-6` (carve-out narrowed to U-bucket rows only). The U-bucket breakdown diverges by convention (Python's paper-footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition); the aggregate is invariant to the re-bucketing per Theorem 1, and the R→Python fold-back is pinned by `test_always_treated_remapped_fold_back_matches_r` which aggregates R's split `Later vs Always Treated` + `Treated vs Untreated` rows per cohort and compares to Python's single `treated_vs_never` cell. **Corrections Made:** 1. **Theorem 1 exact-weights rewrite** (`bacon.py:_recompute_exact_weights`, lines ~740-880). The previous "exact" mode implementation did not actually compute Eqs. 7-9 / 10e-g — it was missing the `(1 - n_kU)` factor in the within-subsample treatment variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper. The post-hoc sum-to-1 normalization masked the relative-weight error but produced a decomposition error of ~0.3% (0.007 absolute) against TWFE on a 3-cohort + never-treated DGP. **Rewrote** the function to compute the exact numerators of Eqs. 10e/f/g (with proper Eqs. 7-9 variances) and let the post-hoc normalization handle the `V̂^D` denominator (Theorem 1 identity guarantees `V̂^D = Σ numerators`). Now matches TWFE at `atol=1e-10`. The existing `test_weighted_sum_equals_twfe` tolerance was tightened from `< 0.1` to `< 1e-10` to lock the contract. diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index e9578bd4..3f22afb1 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2668,7 +2668,7 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_ - Always-treated units: see `**Note (always-treated remap)**` below **Reference implementation(s):** -- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens committed at `benchmarks/data/r_bacondecomp_golden.json` (generated against `bacondecomp` 0.1.1 + R 4.5.2). Parity validated at `atol=1e-6` via `tests/test_methodology_bacon.py::TestBaconParityR` (TWFE coefficient + weights-sum match across 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped`). +- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens committed at `benchmarks/data/r_bacondecomp_golden.json` (generated against `bacondecomp` 0.1.1 + R 4.5.2). Parity validated at `atol=1e-6` via `tests/test_methodology_bacon.py::TestBaconParityR` (4 tests: TWFE coefficient + weights-sum match across 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures and on the 6 timing-vs-timing rows of `always_treated_remapped`; the U-bucket convention divergence on `always_treated_remapped` is pinned by a dedicated fold-back test). - Stata: `bacondecomp` (SSC). Authors: Goodman-Bacon, Goldring, Nichols (2019). **Requirements checklist:** @@ -2678,12 +2678,12 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_ - [x] Visualization shows weight vs. estimate by comparison type - [x] Always-treated remap to U per Goodman-Bacon (2021) footnote 11 (PR-B audit) - [x] Hand-calculable Theorem 1 verification: `tests/test_methodology_bacon.py::TestBaconHandCalculation` (7 tests, atol=1e-10) -- [x] R `bacondecomp::bacon()` parity at atol=1e-6 (3 fixtures; TWFE coefficient + weights-sum match across all 3; per-component parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped` — see `**Note (R parity convention divergence)**` below) +- [x] R `bacondecomp::bacon()` parity at atol=1e-6 (3 fixtures; TWFE coefficient + weights-sum match across all 3; per-component parity locked on the 2 non-remap fixtures and on the 6 timing-vs-timing rows of `always_treated_remapped`; the U-bucket fold-back is asserted by a dedicated `test_always_treated_remapped_fold_back_matches_r` — see `**Note (R parity convention divergence)**` below) - [x] Survey design support (Phase 3): weighted cell means, weighted within-transform, weighted group shares - **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated at `atol=1e-6` (see `**Note (R parity convention divergence)**` below for the one structural convention difference). Hand-calculation + TWFE-vs-weighted-sum identity hold at `atol=1e-10`. The approximate path is retained for backward compatibility; numerical output may differ from R. - **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`. - **Note (Bacon survey diagnostic):** Bacon decomposition with survey weights is diagnostic; exact-sum guarantee holds at machine precision under `weights="exact"` **on balanced panels**. `weights="exact"` requires within-unit-constant survey columns (approximate path accepts time-varying weights). -- **Note (R parity convention divergence on always-treated):** R `bacondecomp::bacon()` keeps `first_treat=1` (the always-treated cohort) as a separate timing cohort and emits an additional comparison type `Later vs Always Treated` (cohort k vs the always-treated cell) alongside the standard `Treated vs Untreated` row. Python's footnote-11 convention remaps these units to the `U` bucket and folds those R-side rows into a single `treated_vs_never` cell per treated cohort. The aggregate (TWFE coefficient + sum of weights) is invariant to this re-bucketing — Theorem 1's identity holds identically because the U bucket's total weight gets re-allocated across nested 2x2 cells but the total weight on `{cohort_k vs U}` is the same. The per-component breakdown, however, differs structurally between the two conventions. The R parity test (`tests/test_methodology_bacon.py::TestBaconParityR::test_component_estimates_match_r`) asserts per-component parity at `atol=1e-6` on the 2 fixtures without always-treated (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`) and skips the `always_treated_remapped` fixture for this assertion while keeping the aggregate parity (`test_twfe_coef_matches_r`, `test_weights_sum_matches_r`) locked across all 3 fixtures. +- **Note (R parity convention divergence on always-treated):** R `bacondecomp::bacon()` keeps `first_treat=1` (the always-treated cohort) as a separate timing cohort and emits an additional comparison type `Later vs Always Treated` (cohort k vs the always-treated cell) alongside the standard `Treated vs Untreated` row. Python's footnote-11 convention remaps these units to the `U` bucket and folds those R-side rows into a single `treated_vs_never` cell per treated cohort. The aggregate (TWFE coefficient + sum of weights) is invariant to this re-bucketing — Theorem 1's identity holds identically because the U bucket's total weight gets re-allocated across nested 2x2 cells but the total weight on `{cohort_k vs U}` is the same. The per-component breakdown, however, differs structurally between the two conventions. The R parity test (`tests/test_methodology_bacon.py::TestBaconParityR::test_component_estimates_match_r`) asserts per-component parity at `atol=1e-6` on the 2 fixtures without always-treated (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`) AND on the 6 timing-vs-timing rows of `always_treated_remapped` — the carve-out is narrowed to U-bucket rows only (R's `Later vs Always Treated` rows canonicalize to `treated_vs_never` and are dropped alongside the matching Python rows). The R→Python U-bucket fold-back is pinned separately by `test_always_treated_remapped_fold_back_matches_r`, which aggregates R's split `Later vs Always Treated` + `Treated vs Untreated` rows per treated cohort and asserts the combined weight + weight-averaged estimate match Python's single `treated_vs_never` cell at `atol=1e-6`. Aggregate parity (`test_twfe_coef_matches_r`, `test_weights_sum_matches_r`) is locked across all 3 fixtures. - **Deviation (unbalanced-panel library extension):** Unbalanced panels are accepted with a `UserWarning` ("Unbalanced panel detected. Bacon decomposition assumes balanced panels. Results may be inaccurate."). Goodman-Bacon (2021) Appendix A's proof assumes a balanced panel; under unbalance, the Theorem 1 identity holds only approximately. The decomposition still returns finite, well-defined outputs but `weights="exact"` does NOT achieve the machine-precision algebraic identity that the balanced-panel claims above describe. --- diff --git a/tests/test_methodology_bacon.py b/tests/test_methodology_bacon.py index 73f5d7b8..c57d5da3 100644 --- a/tests/test_methodology_bacon.py +++ b/tests/test_methodology_bacon.py @@ -481,11 +481,13 @@ def _classify_r_type(c: dict, fixture_name: str) -> str: def test_always_treated_remapped_fold_back_matches_r(self, golden) -> None: """Pin the documented R→Python fold-back for the always-treated U bucket. - The per-component test above skips ``always_treated_remapped`` because - R and Python decompose the U bucket differently — but the documented - REGISTRY claim is that **aggregating** R's `Later vs Always Treated` - + `Treated vs Untreated` rows by treated cohort matches Python's - single `treated_vs_never` cell for that cohort. Assert that fold-back + The per-component test above carves out **only the U-bucket rows** + from ``always_treated_remapped`` (the 6 timing-vs-timing rows are + still asserted directly at atol=1e-6); R and Python decompose the + U bucket differently — but the documented REGISTRY claim is that + **aggregating** R's `Later vs Always Treated` + `Treated vs + Untreated` rows by treated cohort matches Python's single + `treated_vs_never` cell for that cohort. Assert that fold-back directly so a cohort-level regression can't slip through under overall TWFE parity. From 1bd8b0efa3017bb93408f3d7ff0ebc62120e35fd Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 15:13:00 -0400 Subject: [PATCH 07/13] PR #457 R6 polish: harmonize PR-B CHANGELOG bullet with R-parity close-out R6 verdict was Looks good with 1 P3 informational item: the older PR-B audit bullet at CHANGELOG.md:13 (added in PR #454) still described the pre-goldens deferral state ("JSON goldens deferred", "TestBaconParityR skips with a pointer when goldens missing", "status flipped to **Complete (R parity goldens pending)**"). That contradicts the new PR-457 bullet at CHANGELOG.md:11 (committed goldens + 4 active parity tests) within the same [Unreleased] section, so the release notes read as internally inconsistent. Updated 3 strings in the PR-B bullet to reflect the within-release close-out: - Status flip wording: now says the (R parity pending) caveat was closed by the parity-goldens bullet above in this same release. - TestBaconParityR description: 4 tests, all active post-release; skips only in partial-checkout scenarios. - (4) outcome: parity goldens deferral was closed within this release. --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 30bff40c..930a110b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,7 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (4 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); plus a dedicated fold-back test (`test_always_treated_remapped_fold_back_matches_r`) that pins the **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort) by aggregating R's split rows per cohort and asserting they match Python's single fold at `atol=1e-6`. The aggregate is invariant per Theorem 1; the per-component breakdown differs structurally between conventions but the fold-back is now directly asserted. New `**Note (R parity convention divergence on always-treated)**` in `docs/methodology/REGISTRY.md`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`. - **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference(cluster="unit").fit(..., time="post")` (the cluster kwarg is required because `TripleDifference` is the repeated-cross-section `panel=FALSE` estimator and unclustered SE on panel-generated rows understates variance under within-unit serial correlation; the point estimate `att` is invariant to clustering — see the new `TripleDifference` REGISTRY note on panel-shaped input). Users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md. -- **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete (R parity goldens pending)**). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (~24 tests across 6 classes: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` skips with a pointer when goldens missing; `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); JSON goldens deferred until `bacondecomp` R package is installed (parity tests skip cleanly with an explicit pointer). (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row). +- **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete** — initially with an R-parity-goldens caveat that was closed by the parity-goldens bullet above in this same release). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (~24 tests across 6 classes: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` (4 tests, all active post-release once the R parity goldens bullet above landed; skips cleanly with a regenerate-instructions pointer in partial-checkout scenarios where the JSON is unavailable); `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); the JSON goldens deferral at audit time was closed in this same release by the parity-goldens bullet above. (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row). - **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422. From 2a2c8813ad53c7189fbb99fb311dff6558f7e135 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 15:22:21 -0400 Subject: [PATCH 08/13] PR #457 R7: label first-period boundary as explicit Deviation (P1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit R7 surfaced a real P1: the REGISTRY presented the always-treated remap condition `first_treat <= min(time)` as "per paper footnote 11", but the paper's strict rule is `t_i < 1` (units treated *before* the first observable period). The inclusive `<= min(time)` rule additionally folds `first_treat == min(time)` cohorts into U — that's a library boundary convention, not a paper-faithful rule. The test class docstring already called this out, but the authoritative REGISTRY contract did not, which read as an undocumented methodology deviation on PRs that promote BaconDecomposition to **Complete**. Resolution: added a new explicit `**Deviation (first-period boundary extension on always-treated remap)**` block to REGISTRY's Bacon section that: - Names the paper's strict `t_i < 1` rule - States the library's inclusive `<= min(time)` rule - Explains the rationale (`first_treat == min(time)` cohorts have no untreated cell in-panel) - Notes R does NOT apply this fold (it keeps such cohorts in their own bucket and emits `Later vs Always Treated`) - Notes the rules coincide when `min(time) > 1` Mirrored in: - REGISTRY Assumption checks bullet (line 2619): now points at the new Deviation block - REGISTRY `**Note (always-treated remap)**` (line 2684): qualifies the "per paper footnote 11" claim - METHODOLOGY_REVIEW.md Deviations block: re-titled to include paper deviations, added the boundary entry as item 1 - `bacon_decompose()` docstring (`bacon.py:467-487`): explicit boundary-extension paragraph with REGISTRY pointer - CHANGELOG PR-457 Added entry: explicit boundary-deviation callout Also fixes R7 P3: CHANGELOG PR-B test count "~24 tests" updated to acknowledge the post-release 34-test count after R-parity-goldens expansion. Tests: 34/34 pass. --- CHANGELOG.md | 4 ++-- METHODOLOGY_REVIEW.md | 9 +++++---- diff_diff/bacon.py | 10 +++++++++- docs/methodology/REGISTRY.md | 5 +++-- 4 files changed, 19 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 930a110b..fdc888d7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,9 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added -- **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (4 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); plus a dedicated fold-back test (`test_always_treated_remapped_fold_back_matches_r`) that pins the **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort) by aggregating R's split rows per cohort and asserting they match Python's single fold at `atol=1e-6`. The aggregate is invariant per Theorem 1; the per-component breakdown differs structurally between conventions but the fold-back is now directly asserted. New `**Note (R parity convention divergence on always-treated)**` in `docs/methodology/REGISTRY.md`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`. +- **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (4 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); plus a dedicated fold-back test (`test_always_treated_remapped_fold_back_matches_r`) that pins the **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort) by aggregating R's split rows per cohort and asserting they match Python's single fold at `atol=1e-6`. The aggregate is invariant per Theorem 1; the per-component breakdown differs structurally between conventions but the fold-back is now directly asserted. New `**Note (R parity convention divergence on always-treated)**` and `**Deviation (first-period boundary extension on always-treated remap)**` in `docs/methodology/REGISTRY.md`. **First-period boundary deviation:** the paper uses strict `t_i < 1` for the always-treated bucket; the library uses the inclusive `first_treat <= min(time)` rule and folds `first_treat == min(time)` cohorts into `U`. R does NOT apply this fold (it keeps such cohorts as their own bucket). When `min(time) > 1` the rules coincide. Explicitly labeled in REGISTRY's Deviations block and mirrored in `METHODOLOGY_REVIEW.md` and `bacon.py`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`. - **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference(cluster="unit").fit(..., time="post")` (the cluster kwarg is required because `TripleDifference` is the repeated-cross-section `panel=FALSE` estimator and unclustered SE on panel-generated rows understates variance under within-unit serial correlation; the point estimate `att` is invariant to clustering — see the new `TripleDifference` REGISTRY note on panel-shaped input). Users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md. -- **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete** — initially with an R-parity-goldens caveat that was closed by the parity-goldens bullet above in this same release). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (~24 tests across 6 classes: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` (4 tests, all active post-release once the R parity goldens bullet above landed; skips cleanly with a regenerate-instructions pointer in partial-checkout scenarios where the JSON is unavailable); `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); the JSON goldens deferral at audit time was closed in this same release by the parity-goldens bullet above. (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row). +- **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete** — initially with an R-parity-goldens caveat that was closed by the parity-goldens bullet above in this same release). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (34 tests across 6 classes post-release; the audit added ~24 tests and the R-parity-goldens bullet above expanded coverage: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` (4 tests, all active post-release once the R parity goldens bullet above landed; skips cleanly with a regenerate-instructions pointer in partial-checkout scenarios where the JSON is unavailable); `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); the JSON goldens deferral at audit time was closed in this same release by the parity-goldens bullet above. (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row). - **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422. diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index c6be5be6..35697061 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -943,10 +943,11 @@ and covariate-adjusted specifications.) 2. **Default `weights` flipped from `"approximate"` to `"exact"`** at three entry points: `BaconDecomposition.__init__()` (`bacon.py:397`), `bacon_decompose()` convenience function (`bacon.py:1064`), `TwoWayFixedEffects.decompose()` (`twfe.py:684`). The paper-faithful Theorem 1 weights are now the default; the simplified approximate path remains opt-in via explicit `weights="approximate"`. `diff_diff/diagnostic_report.py:1740` (production diagnostic surface) was updated to pass explicit `weights="exact"`. 3. **Always-treated warn+remap via internal column** (`bacon.py:fit()`, lines ~487-525). Paper footnote 11 puts units with `t_i < 1` in `U`, but `bacon.py` previously only mapped `first_treat ∈ {0, np.inf}` into U. Added detection using ordered-time logic on the **time axis** (`first_treat <= min(time)` while excluding the never-treated sentinels `0` and `np.inf`) with `UserWarning` and automatic remap via an internal column (`__bacon_first_treat_internal__`), preserving the user's `first_treat` column unchanged. Detection handles event-time-encoded panels (`time ∈ [-2,..,3]`) correctly; the `0` sentinel restriction applies only to `first_treat`. Count exposed via new `BaconDecompositionResults.n_always_treated_remapped` field. -**Deviations from R's `bacondecomp::bacon()`:** -1. **Unbalanced panel acceptance** (library extension): R errors on unbalanced panels; Python emits a `UserWarning` and decomposes. The paper's Appendix A proof assumes balanced panels — decomposition on unbalanced panels is approximate to Theorem 1. -2. **Approximate weight mode** (Python-only optimization): `weights="approximate"` is a library-only fast path with simplified variance computation, not present in R. Users who want Python-R numerical parity should pass `weights="exact"` (the new default). -3. **NaN for invalid inference fields not applicable**: the decomposition is deterministic; there are no SE/p-value fields on the comparison output. The `decomposition_error` field is a finite float (zero in well-conditioned cases). +**Deviations from R's `bacondecomp::bacon()` and from the paper:** +1. **First-period boundary extension on always-treated remap** (library convention, deviation from paper footnote 11 strict rule and from R): Goodman-Bacon (2021) footnote 11 uses strict `t_i < 1` for the always-treated bucket (units treated *before* the first observable period). The library applies the **inclusive** `first_treat <= min(time)` rule, additionally folding units treated *at* the first observable period (`first_treat == min(time)`) into `U`. Rationale: such units have no untreated cell in-panel and cannot contribute as a treated cohort, so folding them into U mirrors the always-treated handling rather than dropping them silently. R `bacondecomp::bacon()` does NOT apply this boundary fold-back — it keeps `first_treat == min(time)` cohorts in their own bucket and emits `Later vs Always Treated` comparisons. When `min(time) > 1` (no first-period-treated cohorts) the library rule reduces to the paper's strict rule. Documented in REGISTRY `**Deviation (first-period boundary extension on always-treated remap)**`. +2. **Unbalanced panel acceptance** (library extension): R errors on unbalanced panels; Python emits a `UserWarning` and decomposes. The paper's Appendix A proof assumes balanced panels — decomposition on unbalanced panels is approximate to Theorem 1. +3. **Approximate weight mode** (Python-only optimization): `weights="approximate"` is a library-only fast path with simplified variance computation, not present in R. Users who want Python-R numerical parity should pass `weights="exact"` (the new default). +4. **NaN for invalid inference fields not applicable**: the decomposition is deterministic; there are no SE/p-value fields on the comparison output. The `decomposition_error` field is a finite float (zero in well-conditioned cases). --- diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py index 115a2234..965b5d04 100644 --- a/diff_diff/bacon.py +++ b/diff_diff/bacon.py @@ -475,7 +475,15 @@ def fit( excluding the never-treated sentinels ``0`` and ``np.inf``) are automatically remapped to the ``U`` (untreated) bucket per Goodman-Bacon (2021) footnote 11, with a - ``UserWarning``. Detection uses ordered-time logic on the + ``UserWarning``. **Library boundary extension:** the paper + uses the strict inequality ``t_i < 1`` (units treated + *before* the first observable period); the library uses the + **inclusive** ``first_treat <= min(time)`` rule, additionally + folding units treated *at* the first observable period + (``first_treat == min(time)``) into ``U`` because such units + have no untreated cell in-panel. See REGISTRY's + ``**Deviation (first-period boundary extension on + always-treated remap)**`` block for the full contract. Detection uses ordered-time logic on the **time axis** so panels whose ``time`` column contains negative or zero-crossing labels (e.g. event-time ``time ∈ [-2,..,3]``) are handled correctly; the ``0`` diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 3f22afb1..3890f715 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2616,7 +2616,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in *Assumption checks / warnings:* - Requires variation in treatment timing (staggered adoption) -- Always-treated units (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`; paper footnote 11) are automatically remapped to the `U` (untreated) bucket with a `UserWarning`; see the `**Note (always-treated remap)**` below for the full ordered-time / sentinel contract +- Always-treated units (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`; per paper footnote 11 with a library-convention extension on the first-period boundary case, see `**Deviation (first-period boundary extension)**` below) are automatically remapped to the `U` (untreated) bucket with a `UserWarning`; see the `**Note (always-treated remap)**` below for the full ordered-time / sentinel contract - Unbalanced panels are accepted with a `UserWarning`; the paper's Appendix A proof assumes balanced panels - Falls back to timing-only comparisons when no never-treated units are present (no untreated group → `s_{kU}` terms drop, weights rescale to sum to 1; **VWCT and ΔATT can still bias the result** — see paper Eqs. 14-15) @@ -2681,9 +2681,10 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_ - [x] R `bacondecomp::bacon()` parity at atol=1e-6 (3 fixtures; TWFE coefficient + weights-sum match across all 3; per-component parity locked on the 2 non-remap fixtures and on the 6 timing-vs-timing rows of `always_treated_remapped`; the U-bucket fold-back is asserted by a dedicated `test_always_treated_remapped_fold_back_matches_r` — see `**Note (R parity convention divergence)**` below) - [x] Survey design support (Phase 3): weighted cell means, weighted within-transform, weighted group shares - **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated at `atol=1e-6` (see `**Note (R parity convention divergence)**` below for the one structural convention difference). Hand-calculation + TWFE-vs-weighted-sum identity hold at `atol=1e-10`. The approximate path is retained for backward compatibility; numerical output may differ from R. -- **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`. +- **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11 (with a library boundary extension on `first_treat == min(time)`; see `**Deviation (first-period boundary extension)**` below). Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`. - **Note (Bacon survey diagnostic):** Bacon decomposition with survey weights is diagnostic; exact-sum guarantee holds at machine precision under `weights="exact"` **on balanced panels**. `weights="exact"` requires within-unit-constant survey columns (approximate path accepts time-varying weights). - **Note (R parity convention divergence on always-treated):** R `bacondecomp::bacon()` keeps `first_treat=1` (the always-treated cohort) as a separate timing cohort and emits an additional comparison type `Later vs Always Treated` (cohort k vs the always-treated cell) alongside the standard `Treated vs Untreated` row. Python's footnote-11 convention remaps these units to the `U` bucket and folds those R-side rows into a single `treated_vs_never` cell per treated cohort. The aggregate (TWFE coefficient + sum of weights) is invariant to this re-bucketing — Theorem 1's identity holds identically because the U bucket's total weight gets re-allocated across nested 2x2 cells but the total weight on `{cohort_k vs U}` is the same. The per-component breakdown, however, differs structurally between the two conventions. The R parity test (`tests/test_methodology_bacon.py::TestBaconParityR::test_component_estimates_match_r`) asserts per-component parity at `atol=1e-6` on the 2 fixtures without always-treated (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`) AND on the 6 timing-vs-timing rows of `always_treated_remapped` — the carve-out is narrowed to U-bucket rows only (R's `Later vs Always Treated` rows canonicalize to `treated_vs_never` and are dropped alongside the matching Python rows). The R→Python U-bucket fold-back is pinned separately by `test_always_treated_remapped_fold_back_matches_r`, which aggregates R's split `Later vs Always Treated` + `Treated vs Untreated` rows per treated cohort and asserts the combined weight + weight-averaged estimate match Python's single `treated_vs_never` cell at `atol=1e-6`. Aggregate parity (`test_twfe_coef_matches_r`, `test_weights_sum_matches_r`) is locked across all 3 fixtures. +- **Deviation (first-period boundary extension on always-treated remap):** Paper footnote 11 (Goodman-Bacon 2021) uses the strict inequality `t_i < 1` (units treated *before* the first observable period) for the always-treated bucket. The library applies the **inclusive** `first_treat <= min(time)` rule, which additionally folds units treated *at* the first observable period (`first_treat == min(time)`) into `U`. This is a library boundary convention, not a paper-faithful rule: such units have no untreated cell in the observed panel and so cannot contribute to any 2x2 DD as a treated cohort, so folding them into the U bucket mirrors the always-treated handling rather than dropping them silently. R `bacondecomp::bacon()` does not apply this boundary fold-back — it keeps `first_treat == min(time)` cohorts in their own bucket and emits `Later vs Always Treated` comparisons (see the **Note (R parity convention divergence on always-treated)** above for how the parity tests handle the resulting structural breakdown difference; aggregate Theorem 1 identity remains invariant). When `min(time)` is strictly greater than 1 (no first-period-treated cohorts), the library rule reduces to the paper's strict rule and the two conventions coincide. - **Deviation (unbalanced-panel library extension):** Unbalanced panels are accepted with a `UserWarning` ("Unbalanced panel detected. Bacon decomposition assumes balanced panels. Results may be inaccurate."). Goodman-Bacon (2021) Appendix A's proof assumes a balanced panel; under unbalance, the Theorem 1 identity holds only approximately. The decomposition still returns finite, well-defined outputs but `weights="exact"` does NOT achieve the machine-precision algebraic identity that the balanced-panel claims above describe. --- From 9210aeb9e040fc10eac10b7f914daa760a404b3f Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 15:27:33 -0400 Subject: [PATCH 09/13] PR #457 R8 polish: renumber Priority Order after Bacon removal R8 verdict was Looks good with 1 P3 informational item: the METHODOLOGY_REVIEW.md Priority Order list jumped from item 12 to item 14 after PR #457 removed Bacon (the prior item 1). Renumbered Survey Data Support from 14 to 13 to close the gap. --- METHODOLOGY_REVIEW.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index 35697061..548b181f 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -1222,7 +1222,7 @@ Promotion priority for the **In Progress** entries, ordered by what's blocked on 10. **TROP** — paper review recently merged (PR #443); needs methodology file and cross-language anchor (when paper-author reference becomes available). 11. **StaggeredTripleDifference** — shares the primary paper (Ortiz-Villavicencio & Sant'Anna 2025) with TripleDifference, but no dedicated paper review on file yet; needs R parity (R fixtures gitignored — tracked in TODO.md, PR #245). 12. **ConleySpatialHAC** — paper review + committed R `conleyreg` goldens; needs dedicated methodology test file + summary R-parity table in this tracker. -14. **Survey Data Support** — cross-cutting feature; promotion requires the per-estimator integration paths to be locked down first. +13. **Survey Data Support** — cross-cutting feature; promotion requires the per-estimator integration paths to be locked down first. --- From e592a5b4154791b8356d3756734961bb253a406d Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 15:32:47 -0400 Subject: [PATCH 10/13] PR #457 R9 polish: fix inverted-boundary wording in docstring example MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit R9 verdict was Looks good with 1 P3 informational item: the bacon_decompose docstring example said per-component parity holds "when first_treat is bounded below by min(time)" — that's the inverse of the correct condition. Under the library's `<= min(time)` remap rule, always-treated panels are exactly the ones WITH at least one non-sentinel `first_treat <= min(time)`, so per-component parity holds when all non-sentinel cohorts have `first_treat > min(time)` (i.e. bounded *above* by min(time), or equivalently "no first- period-treated cohorts"). Rephrased. Tests: 34/34 pass; no behavior change (docstring-only fix). --- diff_diff/bacon.py | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py index 965b5d04..4276cfab 100644 --- a/diff_diff/bacon.py +++ b/diff_diff/bacon.py @@ -1312,13 +1312,14 @@ def bacon_decompose( >>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights >>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 on >>> # the aggregate (TWFE coefficient + weights-sum) across all panels, - >>> # and on the per-component breakdown when first_treat is bounded - >>> # below by min(time) (no always-treated). For panels with - >>> # always-treated units, the per-component breakdown diverges by - >>> # convention (Python remaps to U per paper footnote 11; R emits - >>> # `Later vs Always Treated`); see REGISTRY note on R parity - >>> # convention divergence. Validated via - >>> # tests/test_methodology_bacon.py::TestBaconParityR. + >>> # and on the per-component breakdown when there are no + >>> # always-treated / first-period-treated cohorts (i.e. all + >>> # non-sentinel first_treat values are strictly greater than + >>> # min(time)). For panels with always-treated units, the + >>> # per-component breakdown diverges by convention (Python remaps + >>> # to U per paper footnote 11; R emits `Later vs Always Treated`); + >>> # see REGISTRY note on R parity convention divergence. Validated + >>> # via tests/test_methodology_bacon.py::TestBaconParityR. >>> results = bacon_decompose( ... data=panel_df, ... outcome='earnings', From a86498ee744c56dcd932484f4a9028f657c362d7 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 16:01:08 -0400 Subject: [PATCH 11/13] PR #457 R10 polish: case-insensitive R-row selector + refresh golden meta MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fresh R10 verdict was Looks good with 2 P3 informational items: 1. P3 (Maintainability): the always-treated fold-back test selected R rows via case-sensitive literal substrings ("Untreated", "Always Treated", "Later"), while the neighboring _classify_r_type classifier uses case-insensitive semantic matching. Made the selector consistent — case-insensitive matching on "untreated" / "never" / "always" tokens, so the fold-back survives bacondecomp label variation across versions. 2. P3 (Documentation/Tests): committed golden JSON's meta.description still advertised full per-component (treated, control, type) tuple parity as the contract, but PR #457 intentionally replaces that for the always_treated_remapped U-bucket rows with aggregate + fold-back parity. Updated meta.description to describe the actual three-tier contract (aggregate / direct per-component on non-remap + 6 timing-vs-timing rows / cohort fold-back for U bucket) with a pointer to the REGISTRY Notes that document the convention divergence. Tests: 34/34 still pass. --- benchmarks/data/r_bacondecomp_golden.json | 2 +- tests/test_methodology_bacon.py | 11 ++++++++--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/benchmarks/data/r_bacondecomp_golden.json b/benchmarks/data/r_bacondecomp_golden.json index a62aed76..235417b3 100644 --- a/benchmarks/data/r_bacondecomp_golden.json +++ b/benchmarks/data/r_bacondecomp_golden.json @@ -3,7 +3,7 @@ "generated_at": "2026-05-16", "bacondecomp_version": "0.1.1", "r_version": "R version 4.5.2 (2025-10-31)", - "description": "Goodman-Bacon (2021) decomposition parity goldens for diff-diff BaconDecomposition. Parity target: atol=1e-6 on per-component (treated, control, type) tuples plus the TWFE coefficient." + "description": "Goodman-Bacon (2021) decomposition parity goldens for diff-diff BaconDecomposition. Parity target at atol=1e-6: (1) aggregate TWFE coefficient + weights-sum across all 3 fixtures; (2) direct per-component (treated, control, type) parity on the 2 non-remap fixtures AND on the 6 timing-vs-timing rows of always_treated_remapped; (3) cohort-level fold-back parity for the U bucket on always_treated_remapped (Python's paper-footnote-11 remap folds R's separate Later-vs-Always-Treated + Treated-vs-Untreated rows into a single treated_vs_never cell per cohort; aggregate is invariant per Theorem 1, breakdown differs by convention). See REGISTRY Note (R parity convention divergence on always-treated) + Deviation (first-period boundary extension)." }, "uniform_3groups_with_never_treated": { "panel": { diff --git a/tests/test_methodology_bacon.py b/tests/test_methodology_bacon.py index c57d5da3..1a2622e5 100644 --- a/tests/test_methodology_bacon.py +++ b/tests/test_methodology_bacon.py @@ -519,11 +519,16 @@ def test_always_treated_remapped_fold_back_matches_r(self, golden) -> None: } # Aggregate R's two U-bucket types per treated cohort. # R uses ctrl=99999 for untreated and ctrl=1 (the always-treated cohort) - # for the `Later vs Always Treated` rows. + # for the `Later vs Always Treated` rows. Match on case-insensitive + # semantic tokens so the selector survives `bacondecomp` label + # variation across versions (same convention as the neighboring + # ``_classify_r_type`` helper used by the per-component test). r_agg: dict = {} for c in fix["r_components"]: - ctype = c.get("type", "") - if "Untreated" in ctype or ("Always Treated" in ctype and "Later" in ctype): + tlow = (c.get("type") or "").lower() + is_untreated = "untreated" in tlow or "never" in tlow + is_always_treated_compare = "always" in tlow + if is_untreated or is_always_treated_compare: k = float(c["treated_group"]) w = float(c["weight"]) e = float(c["estimate"]) From a202dca3777edf892e0dae7a8202608dba854bf2 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 16:06:59 -0400 Subject: [PATCH 12/13] PR #457 R11 polish: harmonize R script header + meta.description template R11 verdict was Looks good with 1 P3 informational item: I had updated the committed JSON's meta.description in R10 to describe the narrowed contract, but the R generator script at benchmarks/R/generate_bacon_golden.R still had the old "atol=1e-6 on per-component (treated, control, type) tuples plus TWFE coefficient" description in BOTH (a) its header docstring (lines 8-22) AND (b) its meta.description value template (lines 218-225). Re-running the script would have overwritten my committed JSON polish with the old contradictory description. Updated both surfaces to the three-tier contract: (1) aggregate TWFE + weights-sum on all 3 fixtures; (2) direct per-component parity on the 2 non-remap fixtures + 6 timing-vs-timing rows of always_treated_remapped; (3) cohort fold-back parity for the U bucket on always_treated_remapped. Pointers to REGISTRY Note (R parity convention divergence on always-treated) + Deviation (first- period boundary extension). Re-ran the R script; JSON written matches the committed text and tests remain green (4/4 in TestBaconParityR, 34/34 across the file). Script is now idempotent on its own committed output. --- benchmarks/R/generate_bacon_golden.R | 36 ++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/benchmarks/R/generate_bacon_golden.R b/benchmarks/R/generate_bacon_golden.R index 040f2552..8ee3ef18 100644 --- a/benchmarks/R/generate_bacon_golden.R +++ b/benchmarks/R/generate_bacon_golden.R @@ -7,9 +7,21 @@ # # The diff-diff BaconDecomposition implementation (`diff_diff/bacon.py`) with # the default ``weights="exact"`` is expected to match the values in this JSON -# to atol=1e-6 on the per-component (treated, control, type) tuples, and to -# match the TWFE coefficient to the same tolerance. The ``weights="approximate"`` -# path is a library-only optimization and is NOT covered by this parity harness. +# at atol=1e-6 along a three-tier contract: +# (1) aggregate TWFE coefficient + weights-sum on all 3 fixtures; +# (2) direct per-component (treated, control, type) parity on the 2 +# non-remap fixtures AND on the 6 timing-vs-timing rows of +# `always_treated_remapped`; +# (3) cohort-level fold-back parity for the U bucket on +# `always_treated_remapped` — Python's paper-footnote-11 remap folds +# R's separate `Later vs Always Treated` + `Treated vs Untreated` +# rows into a single `treated_vs_never` cell per cohort, so the +# aggregate is invariant per Theorem 1 but the per-component +# breakdown differs by convention. See REGISTRY notes: +# `**Note (R parity convention divergence on always-treated)**` and +# `**Deviation (first-period boundary extension on always-treated remap)**`. +# The ``weights="approximate"`` path is a library-only optimization and is +# NOT covered by this parity harness. # # Three fixtures: # 1. uniform_3groups_with_never_treated — 3 timing groups + never-treated U; @@ -18,8 +30,8 @@ # 2. two_groups_no_never_treated — 2 timing groups only; tests the # timing-only decomposition where the s_{kU} terms drop. # 3. always_treated_remapped — 3 timing groups + 1 always-treated cohort -# (first_treat = 1). Validates that Python's warn+remap of t_i < 1 into -# U matches R bacondecomp's native behavior. +# (first_treat = 1). Validates the convention-divergent U-bucket +# fold-back on Python's warn+remap of always-treated units into U. # # Run: # cd benchmarks/R && Rscript generate_bacon_golden.R @@ -220,8 +232,18 @@ out <- list( r_version = R.version.string, description = paste( "Goodman-Bacon (2021) decomposition parity goldens for diff-diff", - "BaconDecomposition. Parity target: atol=1e-6 on per-component", - "(treated, control, type) tuples plus the TWFE coefficient." + "BaconDecomposition. Parity target at atol=1e-6:", + "(1) aggregate TWFE coefficient + weights-sum across all 3 fixtures;", + "(2) direct per-component (treated, control, type) parity on the 2", + "non-remap fixtures AND on the 6 timing-vs-timing rows of", + "always_treated_remapped;", + "(3) cohort-level fold-back parity for the U bucket on", + "always_treated_remapped (Python's paper-footnote-11 remap folds", + "R's separate Later-vs-Always-Treated + Treated-vs-Untreated rows", + "into a single treated_vs_never cell per cohort; aggregate is", + "invariant per Theorem 1, breakdown differs by convention).", + "See REGISTRY Note (R parity convention divergence on always-treated)", + "+ Deviation (first-period boundary extension)." ) ), uniform_3groups_with_never_treated = fixture_1, From 86c0389bfd5800985cd7beeaaa164297562f9878 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 16 May 2026 16:11:45 -0400 Subject: [PATCH 13/13] PR #457 R12 polish: correct fixture-3 inline comment in R generator MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit R12 verdict was Looks good with 1 P3 informational item: the fixture-3 inline comment in benchmarks/R/generate_bacon_golden.R still described the old contract — said R "natively groups first_treat=1 with U" (wrong; R keeps them as a distinct cohort and emits `Later vs Always Treated`) and said "30 never-treated" (wrong; the script builds 25 never-treated). The header docstring + meta.description template were updated in R11, but this inline block-comment slipped. Rewrote the inline comment to match: (a) the actual fixture construction (5 always-treated, 25 never-treated, 3 timing cohorts at times 3/4/5); (b) the correct R behavior (separate cohort, separate `Later vs Always Treated` rows); (c) pointers to REGISTRY note + deviation block; (d) what the parity tests carve out vs fold-back. --- benchmarks/R/generate_bacon_golden.R | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/benchmarks/R/generate_bacon_golden.R b/benchmarks/R/generate_bacon_golden.R index 8ee3ef18..39701b4d 100644 --- a/benchmarks/R/generate_bacon_golden.R +++ b/benchmarks/R/generate_bacon_golden.R @@ -205,11 +205,21 @@ df2 <- build_panel( fixture_2 <- extract_bacon(df2, "two_groups_no_never_treated") cat("Building fixture 3: always_treated_remapped...\n") -# 3 timing-cohorts + 5 always-treated units (first_treat = 1, i.e., treated -# in every observable period) + 30 never-treated. R's bacondecomp natively -# groups the first_treat=1 cohort with U (since they are treated throughout -# every observable period and never serve as a within-window control), which -# matches what diff-diff's warn+remap does in Python. +# 3 timing-cohorts (3, 4, 5) + 5 always-treated units (first_treat = 1, i.e., +# treated in every observable period) + 25 never-treated. R's bacondecomp +# keeps the first_treat=1 cohort as a *separate* timing cohort (not in U) and +# emits a `Later vs Always Treated` comparison row for each later cohort +# alongside the standard `Treated vs Untreated` row. Python's paper-footnote-11 +# convention remaps these units into the U bucket and folds R's two columns +# of components into a single `treated_vs_never` cell per treated cohort. +# The aggregate (TWFE coefficient + weights-sum) is invariant per Theorem 1, +# but the per-component breakdown differs by convention — see REGISTRY +# `**Note (R parity convention divergence on always-treated)**` and +# `**Deviation (first-period boundary extension on always-treated remap)**`. +# `tests/test_methodology_bacon.py::TestBaconParityR` carves out the U-bucket +# rows for direct per-component parity (keeping the 6 timing-vs-timing rows +# under direct parity) and asserts the U-bucket fold-back separately via +# `test_always_treated_remapped_fold_back_matches_r` at atol=1e-6. df3 <- build_panel( n_units_per_cohort = 25L, n_periods = 6L,