From a0b15acbd0edb71cea0e4099d0bf03700e9e6da3 Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 21 May 2026 12:47:39 -0400 Subject: [PATCH 1/2] PR-B: DCDH methodology-review-tracker promotion MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to Complete. Adds the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns detail section mirroring the ContinuousDiD (PR #476) and HAD (PR #473) precedents. Consolidates 7 DCDH deviations from the paper, from R DIDmultiplegtDYN, and library extensions into a labeled REGISTRY surface per the AI-review "Documenting Deviations" convention. CHANGELOG [Unreleased] gains a new Added entry. L27 In Progress example re-pointed to WooldridgeDiD; L1289 priority-order queue item #6 removed and items #7-#11 renumbered to #6-#10. No source code changes, no new tests, no new docstrings — documentation consolidation only. Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 3 ++ METHODOLOGY_REVIEW.md | 80 +++++++++++++++++++++++++----------- docs/methodology/REGISTRY.md | 12 ++++++ 3 files changed, 71 insertions(+), 24 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 08f66118..8f86f07f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473) and ContinuousDiD precedent (PR #476). REGISTRY `## ChaisemartinDHaultfoeuille` gains a formal `### Deviations from the paper / from R / library extensions` block consolidating 7 documented deviations into a single AI-review-recognized labeled surface (per CLAUDE.md "Documenting Deviations (AI Review Compatibility)"): (D1) equal-cell weighting (deviation from BOTH AER 2020 Equation 3 AND R `DIDmultiplegtDYN`); (D2) period-based vs cohort-based stable controls; (D3) balanced-baseline panel + interior-gap drops + terminal-missingness retention + cell-period-allocator targeted `ValueError`; (D4) SE normalization `N_l` vs R `G` (~4% smaller analytical SE); (D5) singleton-cohort degeneracy → NaN with `UserWarning`; (D6) `<50%` switcher warning at far horizons (library extension citing Favara-Imbs application, footnote 14 of NBER WP 29873); (D7) Phase 3 `DID^X` covariate first-stage equal-cell weights. R cross-language coverage holds at documented tolerance bands in `tests/test_chaisemartin_dhaultfoeuille_parity.py` (`POINT_RTOL = 1e-4` on pure-direction point estimates, `MIXED_POINT_RTOL = 0.025` on mixed-direction, `PURE_DIRECTION_SE_RTOL = 0.05` on pure-direction SE, `SE_RTOL = 0.10` on multi-horizon SE, `se_rtol=0.15` on the long-panel `L_max=5` joiners-only scenario where cell-count-weighting compounds). No source code changes, no new tests, no new docstrings — consolidation only against the existing 12 methodology tests (`tests/test_methodology_chaisemartin_dhaultfoeuille.py`), 26 R-parity tests (`tests/test_chaisemartin_dhaultfoeuille_parity.py`), 352 unit tests (`tests/test_chaisemartin_dhaultfoeuille.py`), survey suites (`tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, three cell-period coverage suites), and three primary-source paper reviews on disk (2020 AER + 2022/2023 NBER WP 29873 via PR #478, 2026 Knau et al. universal-rollout companion). The REGISTRY Deviations block uses semantic section-name anchors (rather than fragile line numbers) for back-references to other parts of the DCDH section — an intentional divergence from the PR #476 ContinuousDiD precedent reflecting PR-A wording-drift CI feedback that flagged line-number cross-references as drift-prone in long sections. `METHODOLOGY_REVIEW.md` DCDH row promoted **In Progress** → **Complete**; L27 In Progress example paragraph re-pointed to WooldridgeDiD; L1289 priority-order queue item #6 (DCDH) removed and items #7-#11 renumbered to #6-#10. + ## [3.4.1] - 2026-05-21 ### Added diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index a1bec076..f22feb1d 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -24,7 +24,7 @@ A **Complete** entry has a documented review pass against the primary academic s The catalog grew incrementally over several quarters, so formats vary across the existing Complete entries; the consistent invariant is that someone walked through the implementation against the academic source and captured the result here. New reviews going forward should aim for the fuller structure (Verified Components + Corrections Made + Deviations + dedicated methodology test file) used by the more recent entries. -**In Progress** entries have a REGISTRY.md section and unit-test coverage, but no formal walk-through has been captured here yet. The In Progress band is wide — some entries also have some combination of a paper review (primary or companion), a dedicated methodology test file, and R parity fixtures (e.g., DCDH has a methodology file, R parity, and a companion-paper review for the 2026 universal-rollout extension); others have only the REGISTRY entry and unit tests (e.g., PowerAnalysis). The "Documentation in place" sub-section enumerates what each entry already has; the "Outstanding for promotion" sub-section enumerates what's still needed to flip it to Complete. +**In Progress** entries have a REGISTRY.md section and unit-test coverage, but no formal walk-through has been captured here yet. The In Progress band is wide — some entries also have some combination of a paper review (primary or companion), a dedicated methodology test file, and R parity fixtures (e.g., WooldridgeDiD has a companion-paper review for Wooldridge (2023) plus unit tests but no primary-source review for Wooldridge (2025) and no dedicated methodology test file yet); others have only the REGISTRY entry and unit tests (e.g., PowerAnalysis). The "Documentation in place" sub-section enumerates what each entry already has; the "Outstanding for promotion" sub-section enumerates what's still needed to flip it to Complete. **Not Started** entries have neither a tracker walk-through nor an REGISTRY.md section. This tracker no longer carries any Not Started rows; new estimators are expected to enter as In Progress when their REGISTRY entry lands. @@ -57,7 +57,7 @@ The catalog grew incrementally over several quarters, so formats vary across the | Estimator | Module | R / Stata Reference | Status | Last Review | |-----------|--------|---------------------|--------|-------------| | ContinuousDiD | `continuous_did.py` | `contdid` v0.1.0 | **Complete** | 2026-05-20 | -| ChaisemartinDHaultfoeuille (DCDH) | `chaisemartin_dhaultfoeuille.py` | `DIDmultiplegtDYN` | **In Progress** | — | +| ChaisemartinDHaultfoeuille (DCDH) | `chaisemartin_dhaultfoeuille.py` | `DIDmultiplegtDYN` | **Complete** | 2026-05-21 | | HeterogeneousAdoptionDiD (HAD) | `had.py`, `had_pretests.py` | `chaisemartin::did_had` (`Credible-Answers/did_had` v2.0.0); `nprobust` for bandwidth | **Complete** | 2026-05-20 | | TROP | `trop.py`, `trop_local.py`, `trop_global.py` | (forthcoming; paper-author reference implementation) | **In Progress** | — | @@ -691,25 +691,58 @@ These three are feature deferrals (paper-supported extensions that the library h | Field | Value | |-------|-------| | Module | `chaisemartin_dhaultfoeuille.py`, `chaisemartin_dhaultfoeuille_bootstrap.py`, `chaisemartin_dhaultfoeuille_results.py` | -| Primary References | (a) de Chaisemartin & D'Haultfœuille (2020), *Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects*, AER 110(9), 2964-2996. (b) de Chaisemartin & D'Haultfœuille (2022, revised 2024), *Difference-in-Differences Estimators of Intertemporal Treatment Effects*, NBER WP 29873 — Web Appendix Section 3.7.3 for cohort-recentered plug-in variance. (c) de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for the universal-rollout case. | +| Primary References | (a) de Chaisemartin & D'Haultfœuille (2020), *Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects*, AER 110(9), 2964-2996. (b) de Chaisemartin & D'Haultfœuille (2022, revised July 2023), *Difference-in-Differences Estimators of Intertemporal Treatment Effects*, NBER WP 29873 — Web Appendix Section 3.7.3 for cohort-recentered plug-in variance. (c) de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for the universal-rollout case. | | R Reference | `DIDmultiplegtDYN` | -| Status | **In Progress** | -| Last Review | — | +| Status | **Complete** | +| Last Review | 2026-05-21 | -**Documentation in place:** -- REGISTRY.md section: `## ChaisemartinDHaultfoeuille` (DID_M, DID_+, DID_-, single-lag placebo, TWFE-weights diagnostic, multiplier bootstrap, DID^X / DID^{fd} / state-set-specific trends / heterogeneity testing / Design-2 / by_path / HonestDiD integration, survey design + replicate weights + HM wild bootstrap) -- **Companion-paper review on file**: `docs/methodology/papers/dechaisemartin-2026-review.md` covers the 2026 universal-rollout extension (Knau et al.), which is the primary source for HAD rather than for DCDH. The 2020 AER and 2022/2024 NBER WP 29873 papers that define DCDH's core DID_M / DID_+ / DID_- and dynamic estimators do **not** yet have dedicated review files on disk. -- `tests/test_methodology_chaisemartin_dhaultfoeuille.py`: 12 tests across 4 classes (worked example, cohort recentering, TWFE diagnostic, large-N recovery) -- `tests/test_chaisemartin_dhaultfoeuille_parity.py`: 24 R parity tests against `DIDmultiplegtDYN` -- Implementation: 347 unit tests in `tests/test_chaisemartin_dhaultfoeuille.py` -- Survey-specific: `tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, plus three dCDH cell-period coverage suites +**Verified Components:** +- [x] **AER 2020 Theorem 3** — per-period `DID_{+,t}` / `DID_{-,t}` plus aggregate `DID_M`, `DID_+`, `DID_-` — `tests/test_methodology_chaisemartin_dhaultfoeuille.py::TestMethodologyWorkedExample` (hand-calculable 4-group example: `DID_M = 2.5`, `DID_+ = 2.0`, `DID_- = 3.0` exact); paper review at `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2020-review.md`. +- [x] **AER 2020 single-lag placebo `DID_M^pl`** — same Theorem 3 logic applied to `Y_{g,t-1} - Y_{g,t-2}` on 3-period cells — covered by the worked example class and `tests/test_chaisemartin_dhaultfoeuille.py::TestA11Handling` for the placebo Assumption 11 zero-retention path. +- [x] **AER 2020 Theorem 1 TWFE-weights diagnostic** — negative-weight detection on binary treatment via `twfe_diagnostic=True` and standalone `twowayfeweights()` — `tests/test_methodology_chaisemartin_dhaultfoeuille.py::TestTWFEDiagnostic` + `tests/test_chaisemartin_dhaultfoeuille.py::TestTwowayFeweightsHelper`; binary-only contract documented (non-binary inputs trigger `UserWarning` from `fit()` and `ValueError` from the standalone helper). +- [x] **NBER WP 29873 dynamic event study `DID_l`** (Equation 3 / 5 of the dynamic paper) — `tests/test_methodology_chaisemartin_dhaultfoeuille.py::TestCohortRecenteringCritical` + `TestLargeNRecovery`; paper review at `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2022-review.md`. The `TestLargeNRecovery` class verifies that the multi-horizon estimator recovers the true ATT at large G. +- [x] **NBER WP 29873 dynamic placebos `DID^{pl}_l`, normalized `DID^n_l`, cost-benefit `delta`** (Lemma 4) — `tests/test_chaisemartin_dhaultfoeuille.py::TestMultiHorizonPlacebos`, `TestNormalizedEffects`, `TestCostBenefitDelta`, `TestSupTBands` (simultaneous confidence bands). +- [x] **NBER WP 29873 Web Appendix Section 3.7.3 cohort-recentered plug-in variance** — locked by `tests/test_methodology_chaisemartin_dhaultfoeuille.py::TestCohortRecenteringCritical::test_cohort_recentering_not_grand_mean` (constructs a designed DGP where cohort recentering and grand-mean recentering produce materially different SE and asserts they diverge — guards against silent regression to a single-mean centering). +- [x] **R `DIDmultiplegtDYN` parity at documented tolerance bands** — `tests/test_chaisemartin_dhaultfoeuille_parity.py` (26 tests). Tolerance class constants: `POINT_RTOL = 1e-4` (pure-direction point estimates), `MIXED_POINT_RTOL = 0.025` (2.5% on mixed-direction panels), `PURE_DIRECTION_SE_RTOL = 0.05` (5% on pure-direction SE after the Round 2 full-IF fix), `SE_RTOL = 0.10` (10% on multi-horizon SE), and `se_rtol=0.15` on the `joiners_only_long_multi_horizon` L_max=5 scenario where cell-count-weighting compounds across horizons. Deviations from R are documented in the consolidated REGISTRY Deviations block (D2 period-vs-cohort + D4 SE-normalization explain the residual SE gap). +- [x] **Phase 3 covariate adjustment (`DID^X`)** — Web Appendix Section 1.2 residualization-style adjustment with first-stage OLS on first-differenced covariates with time FEs, restricted to not-yet-treated observations — `tests/test_chaisemartin_dhaultfoeuille.py::TestCovariateAdjustment`. +- [x] **Phase 3 group-specific linear trends (`DID^{fd}`)** — Web Appendix Section 1.3 / Lemma 6, Z_mat first-differencing — `tests/test_chaisemartin_dhaultfoeuille.py::TestLinearTrends`. +- [x] **Phase 3 state-set-specific trends** — Web Appendix Section 1.4 control-pool restriction — `tests/test_chaisemartin_dhaultfoeuille.py::TestStateSetTrends`. +- [x] **Phase 3 heterogeneity testing** — Web Appendix Section 1.5 / Lemma 7 saturated-OLS test for treatment-effect heterogeneity along an interaction variable — `tests/test_chaisemartin_dhaultfoeuille.py::TestHeterogeneityTesting`. +- [x] **Design-2 switch-in/switch-out descriptive wrapper** — Web Appendix Section 1.6 — `tests/test_chaisemartin_dhaultfoeuille.py::TestDesign2`. +- [x] **`by_path` per-path event-study disaggregation** — `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `TestByPathBehavior` / `TestByPathEdgeCases` / `TestByPathBootstrap` / `TestByPathPlacebo` / `TestByPathSupTBands` / `TestByPathControls` / `TestByPathTrendsLinear` (~8 classes covering API gates, point-estimate path, bootstrap composition, placebo composition, sup-t bands, covariates, and linear trends). +- [x] **HonestDiD (Rambachan-Roth 2023) integration** on placebo + event study surface — `tests/test_chaisemartin_dhaultfoeuille.py::TestHonestDiDIntegration`. +- [x] **Non-binary (ordinal or continuous) treatment** — paper Section 2 of the dynamic companion defines treatment as a general `D_{g,t}`; binary `{0, 1}` is a special case — `tests/test_chaisemartin_dhaultfoeuille.py::TestNonBinaryTreatment`. +- [x] **Survey design support: Taylor-series linearization + replicate weights + Hall-Mammen wild PSU bootstrap** — `tests/test_survey_dcdh.py` (Binder TSL on the main ATT, DID^X, heterogeneity, TWFE diagnostic, and HonestDiD surfaces), `tests/test_survey_dcdh_replicate_psu.py` (Rao-Wu rescaled replicate weights for BRR/Fay/JK1/JKn/SDR), and three cell-period coverage suites (`tests/test_dcdh_cell_period_coverage.py`, `tests/test_dcdh_bootstrap_cell_period_coverage.py`, `tests/test_dcdh_heterogeneity_cell_period_coverage.py`) — the cell-period allocator's per-cell IF expansion is what enables within-group-varying PSU. +- [x] **Three primary-source paper reviews on file**: `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2020-review.md` (2020 AER), `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2022-review.md` (2022/2023 NBER WP 29873), and `docs/methodology/papers/dechaisemartin-2026-review.md` (Knau et al. 2026 universal-rollout companion — primary source for `HeterogeneousAdoptionDiD`, not DCDH). -**Outstanding for promotion:** -- **Primary-source paper reviews**: write `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2020-review.md` covering the 2020 AER and a companion review covering 2022/2024 NBER WP 29873 (intertemporal treatment effects). The existing 2026 review covers the universal-rollout extension only. -- Formal Verified Components block here matching REGISTRY's exhaustive Implementation Checklist -- Consolidated Deviations summary (currently scattered across REGISTRY Notes): equal-cell weighting vs R cell-size weighting, terminal-missingness retention, A11 zero-retention convention, `<50%` switcher warning at far horizons -- Documented R parity tolerance bands at `l=1` (existing parity fixture in `test_chaisemartin_dhaultfoeuille_parity.py`) -- "Corrections Made" listing for the Round 2 full-IF fix (never-switching groups now participate in variance via stable-control roles) +**Test Coverage:** +- 12 methodology tests in `tests/test_methodology_chaisemartin_dhaultfoeuille.py` (4 classes: `TestMethodologyWorkedExample`, `TestCohortRecenteringCritical`, `TestTWFEDiagnostic`, `TestLargeNRecovery`). +- 26 R-parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` against `DIDmultiplegtDYN`. +- 352 unit tests in `tests/test_chaisemartin_dhaultfoeuille.py` covering Phase 1 + Phase 2 + Phase 3 + survey-design + by-path + HonestDiD surfaces (37 test classes). +- Survey-specific: `tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, plus three dCDH cell-period coverage suites (`test_dcdh_cell_period_coverage.py`, `test_dcdh_bootstrap_cell_period_coverage.py`, `test_dcdh_heterogeneity_cell_period_coverage.py`). +- Three primary-source paper reviews on disk (see Verified Components above). + +**Corrections Made:** +1. **Round 2 full-IF fix** (pre-promotion): never-switching groups now participate in the variance via stable-control roles under the full `Lambda^G_{g,l=1}` influence function. The `n_groups_dropped_never_switching` results field is retained for backwards compatibility but no longer represents an actual exclusion. After this fix, SE parity vs R on pure-direction scenarios narrowed from ~18% to ~3% (documented in REGISTRY `## ChaisemartinDHaultfoeuille` § "Note (deviation from R DIDmultiplegtDYN):" on period-vs-cohort control sets). +2. **PR-A consolidation (PR #478, 2026-05-21):** REGISTRY `## ChaisemartinDHaultfoeuille` reframed to clarify that the library's equal-cell weighting is a documented deviation from BOTH the AER 2020 Equation 3 (`N_{d,d',t} = sum_g N_{g,t}` observation sums) AND R `DIDmultiplegtDYN` (cell-size weighting); the prior framing called the Python contract "paper-literal", which was incorrect against the main-text formula. The period-vs-cohort Note was tightened to use the AER 2020's "transition-state notation" language. `docs/references.rst:199`, `docs/methodology/REGISTRY.md:488`, code docstrings, and the new paper review file headers all align on the `(2022, revised July 2023)` revision string for NBER WP 29873. +3. **PR-B tracker-promotion consolidation (this PR):** formal `### Deviations from the paper / from R / library extensions` block added to REGISTRY `## ChaisemartinDHaultfoeuille` consolidating 7 documented deviations into a single AI-review-recognized labeled surface per CLAUDE.md "Documenting Deviations" labels. The original scattered `**Note (deviation from R...):**` entries remain in place — the new Deviations block is an additional canonical surface for AI-review consumption. + +**Deviations from the paper / from R / library extensions:** + +(Cross-references the consolidated REGISTRY Deviations block — see `docs/methodology/REGISTRY.md` `## ChaisemartinDHaultfoeuille` § Deviations for the same 7 entries with full mechanical detail. Listed here in summary form.) + +1. **Equal-cell weighting (deviation from BOTH paper Equation 3 AND R `DIDmultiplegtDYN`).** Library: each `(g,t)` cell contributes once regardless of within-cell observation count. AER 2020 Equation 3 defines `N_{d,d',t} = sum_g N_{g,t}` (observation-sum weighting); R weights by cell size. Agreement is exact on one-observation-per-cell inputs (the parity test generator). Phase 2 estimands (`DID_l`, `DID^{pl}_l`, `DID^n_l`, delta cost-benefit) inherit the same contract. Locked by `tests/test_chaisemartin_dhaultfoeuille.py::test_cell_count_weighting_unbalanced_input` (in `TestDropLargerLower`). +2. **Period-based vs cohort-based stable controls (deviation from R `DIDmultiplegtDYN`).** Python: `stable_0(t)` is any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}` (matches AER 2020 Theorem 3 transition-state notation `N_{0,0,t}` and `N_{1,1,t}` literally). R: cohort-based control sets additionally require `D_{g,1}` to match the side. Agreement exact on pure-direction panels; ~1% point-estimate divergence on mixed-direction panels where joiners' post-switch cells could serve as leavers' controls (or vice versa). After the Round 2 full-IF fix, SE parity gap on pure-direction scenarios narrowed from ~18% to ~3%. +3. **Balanced-baseline panel required + terminal missingness retained (deviation from R `DIDmultiplegtDYN`)** — one composite deviation with four enforcement paths: (a) groups missing the first global period raise `ValueError`; (b) groups with interior period gaps are dropped with `UserWarning`; (c) groups with terminal missingness (observed at baseline but missing one or more later periods) are **retained** and contribute from their observed periods only; (d) cell-period allocator paths (Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT, cell-level wild PSU bootstrap) emit a targeted `ValueError` when cohort recentering would leak nonzero centered IF mass onto cells with no positive-weight observations. R accepts unbalanced panels with documented missing-treatment-before-first-switch handling. The four paths share a single underlying contract — "the panel must be balanced at baseline; terminal missingness is the only allowed unbalance; downstream variance machinery refuses to silently leak IF mass past the cell-period boundary". +4. **SE normalization `N_l` vs R `G` (~4% smaller analytical SE).** Python implements the dynamic paper's Section 3.7.3 plug-in formula verbatim: `SE = sigma-hat / sqrt(N_l)` where `N_l` is the number of eligible switcher groups at horizon `l`. R normalizes the influence function by `G` (total number of groups including never-switchers and stable controls). Both converge to the same asymptotic variance as `G → ∞`. In finite samples Python's tighter SE remains conservative (paper formula is already an upper bound on the true variance via Jensen's inequality under Assumption 8). Gap is deterministic on identical data and ~3.5-5.1% across horizons and scenarios. +5. **Singleton-cohort degeneracy → NaN with warning (deviation from R `DIDmultiplegtDYN`).** When every variance-eligible group forms its own `(D_{g,1}, F_g, S_g)` cohort, cohort recentering collapses the centered IF vector to all zeros and the estimator returns `overall_se = NaN` with `UserWarning`. R returns a non-zero SE on degenerate small-panel cases via small-sample sandwich machinery that Python does not implement. Both responses are valid for a degenerate case; Python's `NaN` + warning is the safer default. Bootstrap inherits the same degeneracy. +6. **`<50%` switcher warning at far horizons (library extension).** When fewer than 50% of the `l=1` switchers contribute at a far horizon `l`, `fit()` emits a `UserWarning`. The dynamic paper (NBER WP 29873) recommends not reporting such horizons (Favara-Imbs application, footnote 14). Library convention is to warn but compute; not present in R. +7. **DID^X covariate first-stage equal-cell weights (deviation from R `DIDmultiplegtDYN`).** Phase 3 covariate adjustment (`controls=[...]`) residualizes outcomes via per-baseline first-stage OLS on first-differenced covariates with time FEs. Python uses equal cell weights consistent with the Phase 1 cell-count convention (Deviation #1); R weights by `N_{gt}`. On one-observation-per-cell panels results are identical. When baseline-specific first stages fail (`n_obs = 0` or `n_obs < n_params`), both Python and R drop the affected strata. + +**Outstanding Concerns:** +- **Customer-supplied `cluster=`** — currently `cluster=None` is the only supported value; passing any non-`None` value raises `NotImplementedError` at construction time (and the same gate fires from `set_params`). Reserved for a future phase. Auto-cluster at the group level is the default; `survey_design.psu` is the auto-cluster surface for hierarchical sampling, and PSU-within-group-constant regimes (including the default `psu=group` auto-inject and strictly-coarser PSU with within-group constancy) route through the legacy group-level allocator with bit-identical SE. +- **2024 NBER WP revision re-review** — the disk PDF is the "March 2022, revised July 2023" version; the 2024 revision was not re-reviewed for this PR. `docs/references.rst:199`, `docs/methodology/REGISTRY.md:488`, and the paper review file headers all align on the July 2023 revision string. If the 2024 revision introduces methodological changes, a re-review may be queued as a TODO follow-up post-promotion. +- **Methodology-test-file coverage of the universal-rollout extension (Knau et al. 2026)** is out of scope for DCDH. The 2026 paper's primary contribution is the `HeterogeneousAdoptionDiD` (HAD) estimator (already promoted in PR #473) and the universal-rollout case where no unit remains untreated. DCDH covers the reversible / mixed-direction designs from the 2020 AER and the dynamic event study from the 2022/2023 NBER WP 29873; the 2026 paper's universal-rollout contributions are documented in the companion review at `docs/methodology/papers/dechaisemartin-2026-review.md` without a dedicated DCDH methodology test file. --- @@ -1286,12 +1319,11 @@ Promotion priority for the **In Progress** entries, ordered by what's blocked on **Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):** -6. **ChaisemartinDHaultfoeuille (DCDH)** — methodology test file + 24 R parity tests + 347 unit tests + a companion-paper review for the 2026 universal-rollout extension. Primary-source reviews for the 2020 AER and 2022/2024 NBER WP 29873 papers are still outstanding alongside the Verified Components walk-through. -7. **WooldridgeDiD (ETWFE)** — companion-paper review (Wooldridge 2023 nonlinear extension) merged in PR #443; primary-source review for Wooldridge (2025) ETWFE not yet on file, and no dedicated methodology test file. -8. **TROP** — paper review recently merged (PR #443); needs methodology file and cross-language anchor (when paper-author reference becomes available). -9. **StaggeredTripleDifference** — shares the primary paper (Ortiz-Villavicencio & Sant'Anna 2025) with TripleDifference, but no dedicated paper review on file yet; needs R parity (R fixtures gitignored — tracked in TODO.md, PR #245). -10. **ConleySpatialHAC** — paper review + committed R `conleyreg` goldens; needs dedicated methodology test file + summary R-parity table in this tracker. -11. **Survey Data Support** — cross-cutting feature; promotion requires the per-estimator integration paths to be locked down first. +6. **WooldridgeDiD (ETWFE)** — companion-paper review (Wooldridge 2023 nonlinear extension) merged in PR #443; primary-source review for Wooldridge (2025) ETWFE not yet on file, and no dedicated methodology test file. +7. **TROP** — paper review recently merged (PR #443); needs methodology file and cross-language anchor (when paper-author reference becomes available). +8. **StaggeredTripleDifference** — shares the primary paper (Ortiz-Villavicencio & Sant'Anna 2025) with TripleDifference, but no dedicated paper review on file yet; needs R parity (R fixtures gitignored — tracked in TODO.md, PR #245). +9. **ConleySpatialHAC** — paper review + committed R `conleyreg` goldens; needs dedicated methodology test file + summary R-parity table in this tracker. +10. **Survey Data Support** — cross-cutting feature; promotion requires the per-estimator integration paths to be locked down first. --- diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index ba0ba112..d7a86f82 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -688,6 +688,18 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and - **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. **Strata that vary across cells of a group require either an explicit `psu=` or the original `SurveyDesign(..., nest=True)` flag** — under `nest=True` the resolver combines `(stratum, psu)` into globally-unique labels, so the auto-injected `psu=` is re-labeled per stratum and the cell allocator proceeds. Only the `nest=False` + varying-strata + omitted-psu combination is rejected up front with a targeted `ValueError` at `fit()` time (the synthesized PSU column would reuse group labels across strata and trip the cross-stratum PSU uniqueness check in `SurveyDesign.resolve()`). Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Post-period attribution extends to heterogeneity (Binder TSL branch only):** the heterogeneity WLS coefficient IF `ψ_g = inv(X'WX)[1,:] @ x_g * W_g * r_g` is attributed in full to the single post-period cell `(g, out_idx)` at each horizon (same single-cell convention as DID_l), then expanded as `ψ_i = ψ_g * (w_i / W_{g, out_idx})`, and fed through `compute_survey_if_variance`. Under PSU=group the PSU-level aggregate telescopes to `ψ_g`, so Binder variance is byte-identical relative to the pre-cell-period release; under within-group-varying PSU mass lands in the post-period PSU. **Replicate-weight branch keeps the legacy group-level allocator** `ψ_i = ψ_g * (w_i / W_g)` because `compute_replicate_if_variance` computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping: redistributing mass onto the post-period cell would silently change the replicate SE whenever a replicate column's ratios vary within a group (the library accepts arbitrary per-row replicate matrices, not just PSU-aligned ones). The legacy allocator preserves byte-identity of the replicate SE for every previously-supported fit. Replicate + within-group-varying PSU is unreachable by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`). - **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Cell-level wild PSU bootstrap under within-group-varying PSU:** when the PSU varies across the cells of a group, the bootstrap switches to a cell-level allocator: each `(g, t)` cell draws its multiplier from `w[psu(cell)]` via the per-cell PSU map `psu_codes_per_cell` (shape `(n_eligible_groups, n_periods)`, -1 sentinel for zero-weight cells). The bootstrap statistic becomes `theta_r = sum_c w[psu(c)] * u_centered_pp[c] / divisor` using the cohort-recentered per-cell IF `U_centered_per_period`. Under PSU-within-group-constant regimes (including PSU=group and strictly-coarser PSU with within-group constancy), the per-cell sum telescopes to the group-level form via the row-sum identity `sum_{c in g} U_centered_per_period[g, t] == U_centered[g]` (enforced by `_cohort_recenter_per_period`). A dispatcher in `_compute_dcdh_bootstrap` detects within-group-constancy and routes those regimes through the legacy group-level bootstrap path so their SE is **bit-identical** to the pre-cell-level release (guarded primarily by `test_bootstrap_se_matches_pre_pr4_baseline` and by the existing `test_auto_inject_bit_identical_to_group_level`). Under within-group-varying PSU, a group contributing cells to PSUs `p1, p2, ...` receives independent multiplier draws per PSU — the correct Hall-Mammen wild PSU clustering at cell granularity. **Multi-horizon bootstraps** draw a single shared `(n_bootstrap, n_psu)` PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution across horizons. **Library extension** — R `DIDmultiplegtDYN` does not support survey designs, so "deviation from R" does not apply. **Scope note (terminal missingness + any cell-period-allocator path):** see the balanced-baseline Note above for the full carve-out. In brief: when a terminally-missing group is in a cohort whose other groups still contribute at the missing period, `_cohort_recenter_per_period` leaks non-zero centered IF mass onto cells with no positive-weight observations. The targeted `ValueError` fires from every survey variance path that uses the cell-period allocator: Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT (which always uses the cell allocator), and the cell-level wild PSU bootstrap. Pre-process the panel to remove terminal missingness, or (for Binder TSL only) use an explicit `psu=` so the analytical path routes through the legacy group-level allocator. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design. +### Deviations from the paper / from R / library extensions + +*Notes #1, #2, #3, #4, #5, and #7 codify deviations from R `DIDmultiplegtDYN` (and from the paper's Equation 3 in the case of #1). Note #6 codifies a library extension with no R correspondence. The original scattered `**Note:**` and `**Note (deviation from R...):**` entries throughout the section above remain in place — this Deviations block is the canonical AI-review surface per CLAUDE.md "Documenting Deviations (AI Review Compatibility)" labels. Cross-references back to the existing Notes use semantic anchors (Phase / section names) rather than line numbers because the DCDH section is liable to shift as new contracts land; test-file references retain line numbers / class names because test files are more stable.* + +1. **Deviation from R / Deviation from the paper (Equation 3):** Equal-cell weighting — each `(g,t)` cell contributes equally regardless of within-cell observation count. AER 2020 Equation 3 prescribes `N_{d,d',t} = sum_g N_{g,t}` (observation sums); R `DIDmultiplegtDYN` weights by cell size. Phase 2 estimands (`DID_l`, `DID^{pl}_l`, `DID^n_l`, delta cost-benefit) inherit the same contract. Locked in `tests/test_chaisemartin_dhaultfoeuille.py::TestDropLargerLower::test_cell_count_weighting_unbalanced_input`. Cross-references the Phase 1 Theorem 3 equation block above (where `N_{a,b,t}` is documented as the count of `(g, t)` cells in each transition state) and `METHODOLOGY_REVIEW.md` § DCDH Deviations #1. +2. **Deviation from R:** Period-based stable-control sets (`stable_0(t)` = any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}`) — R uses cohort-based control sets that additionally require baseline `D_{g,1}` to match the side. Pure-direction panels agree exactly; ~1% point-estimate divergence on mixed-direction panels where joiners' post-switch cells could serve as leavers' controls. SE parity gap on pure-direction scenarios narrowed from ~18% to ~3% after the Round 2 full-IF fix. Cross-references the existing `**Note (deviation from R DIDmultiplegtDYN):**` in the period-vs-cohort discussion above and `METHODOLOGY_REVIEW.md` § DCDH Deviations #2. +3. **Deviation from R:** Balanced-baseline panel required + interior-gap drops + terminal-missingness retention + cell-period-allocator targeted `ValueError` — one composite deviation with four enforcement paths. Step 5b validation in `fit()` enforces the contract via `ValueError` (missing baseline) / `UserWarning` (interior gaps) / silent retention (terminal missingness). R accepts unbalanced panels. The cell-period allocator paths (Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT, cell-level wild PSU bootstrap) have a targeted `ValueError` when cohort recentering would leak nonzero centered IF mass onto cells with no positive-weight observations. The four enforcement paths share a single underlying contract — "the panel must be balanced at baseline; terminal missingness is the only allowed unbalance; downstream variance machinery refuses to silently leak IF mass past the cell-period boundary". Cross-references the existing `**Note (deviation from R DIDmultiplegtDYN):**` in the ragged-panel discussion above (which itself details the three affected cell-period-allocator sub-paths) and `METHODOLOGY_REVIEW.md` § DCDH Deviations #3. +4. **Deviation from R:** SE normalization — Python uses paper Section 3.7.3 verbatim `SE = sigma-hat / sqrt(N_l)`; R normalizes by `G` (total groups). Analytical SE is ~4% smaller than R on identical data (deterministic; ~3.5-5.1% across horizons and scenarios). Both converge to the same asymptotic variance as `G → ∞`. Cross-references the `**Note (deviation from R DIDmultiplegtDYN - SE normalization):**` in the SE / variance discussion above and `METHODOLOGY_REVIEW.md` § DCDH Deviations #4. +5. **Deviation from R:** Singleton-cohort degeneracy → `NaN` with `UserWarning`. R returns a non-zero SE via small-sample sandwich machinery that Python does not implement. Bootstrap inherits the same degeneracy. Cross-references the singleton-cohort `**Note:**` in the SE / variance discussion above and `METHODOLOGY_REVIEW.md` § DCDH Deviations #5. +6. **Library extension (no R correspondence):** `<50%` switcher warning at far horizons. Library convention is to warn but compute; the dynamic paper (NBER WP 29873) recommends not reporting such horizons (Favara-Imbs application, footnote 14). Cross-references the `**Note (Phase 2 \`<50%\` switcher warning):**` in the Phase 2 discussion above and `METHODOLOGY_REVIEW.md` § DCDH Deviations #6. +7. **Deviation from R:** Phase 3 `DID^X` covariate adjustment uses equal cell weights in the first-stage OLS (consistent with the Phase 1 cell-count convention, deviation #1). R weights by `N_{gt}`. On one-observation-per-cell panels results are identical. When baseline-specific first stages fail (`n_obs = 0` or `n_obs < n_params`), both Python and R drop the affected strata. Cross-references the `**Note (Phase 3 DID^X covariate adjustment):**` in the Phase 3 discussion above and `METHODOLOGY_REVIEW.md` § DCDH Deviations #7. + --- ## ContinuousDiD From 426e36c61e17464dc328782b1f20f5da3f3321a9 Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 21 May 2026 13:09:26 -0400 Subject: [PATCH 2/2] Narrow DCDH primary-source scope to 2020 AER + 2022/2023 NBER WP Aligns METHODOLOGY_REVIEW.md DCDH Primary References cell and "paper reviews on file" claims with the REGISTRY ## ChaisemartinDHaultfoeuille "Primary sources" header (2020 AER + 2022/2023 NBER WP only). The Knau et al. 2026 universal-rollout paper is HAD's primary source and is referenced from DCDH as adjacent context, not as DCDH primary-source coverage. Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 2 +- METHODOLOGY_REVIEW.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8f86f07f..ea0b1ca5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added -- **ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473) and ContinuousDiD precedent (PR #476). REGISTRY `## ChaisemartinDHaultfoeuille` gains a formal `### Deviations from the paper / from R / library extensions` block consolidating 7 documented deviations into a single AI-review-recognized labeled surface (per CLAUDE.md "Documenting Deviations (AI Review Compatibility)"): (D1) equal-cell weighting (deviation from BOTH AER 2020 Equation 3 AND R `DIDmultiplegtDYN`); (D2) period-based vs cohort-based stable controls; (D3) balanced-baseline panel + interior-gap drops + terminal-missingness retention + cell-period-allocator targeted `ValueError`; (D4) SE normalization `N_l` vs R `G` (~4% smaller analytical SE); (D5) singleton-cohort degeneracy → NaN with `UserWarning`; (D6) `<50%` switcher warning at far horizons (library extension citing Favara-Imbs application, footnote 14 of NBER WP 29873); (D7) Phase 3 `DID^X` covariate first-stage equal-cell weights. R cross-language coverage holds at documented tolerance bands in `tests/test_chaisemartin_dhaultfoeuille_parity.py` (`POINT_RTOL = 1e-4` on pure-direction point estimates, `MIXED_POINT_RTOL = 0.025` on mixed-direction, `PURE_DIRECTION_SE_RTOL = 0.05` on pure-direction SE, `SE_RTOL = 0.10` on multi-horizon SE, `se_rtol=0.15` on the long-panel `L_max=5` joiners-only scenario where cell-count-weighting compounds). No source code changes, no new tests, no new docstrings — consolidation only against the existing 12 methodology tests (`tests/test_methodology_chaisemartin_dhaultfoeuille.py`), 26 R-parity tests (`tests/test_chaisemartin_dhaultfoeuille_parity.py`), 352 unit tests (`tests/test_chaisemartin_dhaultfoeuille.py`), survey suites (`tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, three cell-period coverage suites), and three primary-source paper reviews on disk (2020 AER + 2022/2023 NBER WP 29873 via PR #478, 2026 Knau et al. universal-rollout companion). The REGISTRY Deviations block uses semantic section-name anchors (rather than fragile line numbers) for back-references to other parts of the DCDH section — an intentional divergence from the PR #476 ContinuousDiD precedent reflecting PR-A wording-drift CI feedback that flagged line-number cross-references as drift-prone in long sections. `METHODOLOGY_REVIEW.md` DCDH row promoted **In Progress** → **Complete**; L27 In Progress example paragraph re-pointed to WooldridgeDiD; L1289 priority-order queue item #6 (DCDH) removed and items #7-#11 renumbered to #6-#10. +- **ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473) and ContinuousDiD precedent (PR #476). REGISTRY `## ChaisemartinDHaultfoeuille` gains a formal `### Deviations from the paper / from R / library extensions` block consolidating 7 documented deviations into a single AI-review-recognized labeled surface (per CLAUDE.md "Documenting Deviations (AI Review Compatibility)"): (D1) equal-cell weighting (deviation from BOTH AER 2020 Equation 3 AND R `DIDmultiplegtDYN`); (D2) period-based vs cohort-based stable controls; (D3) balanced-baseline panel + interior-gap drops + terminal-missingness retention + cell-period-allocator targeted `ValueError`; (D4) SE normalization `N_l` vs R `G` (~4% smaller analytical SE); (D5) singleton-cohort degeneracy → NaN with `UserWarning`; (D6) `<50%` switcher warning at far horizons (library extension citing Favara-Imbs application, footnote 14 of NBER WP 29873); (D7) Phase 3 `DID^X` covariate first-stage equal-cell weights. R cross-language coverage holds at documented tolerance bands in `tests/test_chaisemartin_dhaultfoeuille_parity.py` (`POINT_RTOL = 1e-4` on pure-direction point estimates, `MIXED_POINT_RTOL = 0.025` on mixed-direction, `PURE_DIRECTION_SE_RTOL = 0.05` on pure-direction SE, `SE_RTOL = 0.10` on multi-horizon SE, `se_rtol=0.15` on the long-panel `L_max=5` joiners-only scenario where cell-count-weighting compounds). No source code changes, no new tests, no new docstrings — consolidation only against the existing 12 methodology tests (`tests/test_methodology_chaisemartin_dhaultfoeuille.py`), 26 R-parity tests (`tests/test_chaisemartin_dhaultfoeuille_parity.py`), 352 unit tests (`tests/test_chaisemartin_dhaultfoeuille.py`), survey suites (`tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, three cell-period coverage suites), and two primary-source DCDH paper reviews on disk (2020 AER + 2022/2023 NBER WP 29873 via PR #478; the `dechaisemartin-2026-review.md` on disk is HAD's primary source, not DCDH's, and is referenced as adjacent context only). The REGISTRY Deviations block uses semantic section-name anchors (rather than fragile line numbers) for back-references to other parts of the DCDH section — an intentional divergence from the PR #476 ContinuousDiD precedent reflecting PR-A wording-drift CI feedback that flagged line-number cross-references as drift-prone in long sections. `METHODOLOGY_REVIEW.md` DCDH row promoted **In Progress** → **Complete**; L27 In Progress example paragraph re-pointed to WooldridgeDiD; L1289 priority-order queue item #6 (DCDH) removed and items #7-#11 renumbered to #6-#10. ## [3.4.1] - 2026-05-21 diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md index f22feb1d..acfc5b06 100644 --- a/METHODOLOGY_REVIEW.md +++ b/METHODOLOGY_REVIEW.md @@ -691,7 +691,7 @@ These three are feature deferrals (paper-supported extensions that the library h | Field | Value | |-------|-------| | Module | `chaisemartin_dhaultfoeuille.py`, `chaisemartin_dhaultfoeuille_bootstrap.py`, `chaisemartin_dhaultfoeuille_results.py` | -| Primary References | (a) de Chaisemartin & D'Haultfœuille (2020), *Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects*, AER 110(9), 2964-2996. (b) de Chaisemartin & D'Haultfœuille (2022, revised July 2023), *Difference-in-Differences Estimators of Intertemporal Treatment Effects*, NBER WP 29873 — Web Appendix Section 3.7.3 for cohort-recentered plug-in variance. (c) de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for the universal-rollout case. | +| Primary References | (a) de Chaisemartin & D'Haultfœuille (2020), *Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects*, AER 110(9), 2964-2996. (b) de Chaisemartin & D'Haultfœuille (2022, revised July 2023), *Difference-in-Differences Estimators of Intertemporal Treatment Effects*, NBER WP 29873 — Web Appendix Section 3.7.3 for cohort-recentered plug-in variance. (Matches `docs/methodology/REGISTRY.md` `## ChaisemartinDHaultfoeuille` § "Primary sources". The Knau et al. 2026 universal-rollout paper, while authored by overlapping authors, is the primary source for `HeterogeneousAdoptionDiD` and is treated as adjacent context for DCDH, not a primary reference — see "Outstanding Concerns" below.) | | R Reference | `DIDmultiplegtDYN` | | Status | **Complete** | | Last Review | 2026-05-21 | @@ -713,14 +713,14 @@ These three are feature deferrals (paper-supported extensions that the library h - [x] **HonestDiD (Rambachan-Roth 2023) integration** on placebo + event study surface — `tests/test_chaisemartin_dhaultfoeuille.py::TestHonestDiDIntegration`. - [x] **Non-binary (ordinal or continuous) treatment** — paper Section 2 of the dynamic companion defines treatment as a general `D_{g,t}`; binary `{0, 1}` is a special case — `tests/test_chaisemartin_dhaultfoeuille.py::TestNonBinaryTreatment`. - [x] **Survey design support: Taylor-series linearization + replicate weights + Hall-Mammen wild PSU bootstrap** — `tests/test_survey_dcdh.py` (Binder TSL on the main ATT, DID^X, heterogeneity, TWFE diagnostic, and HonestDiD surfaces), `tests/test_survey_dcdh_replicate_psu.py` (Rao-Wu rescaled replicate weights for BRR/Fay/JK1/JKn/SDR), and three cell-period coverage suites (`tests/test_dcdh_cell_period_coverage.py`, `tests/test_dcdh_bootstrap_cell_period_coverage.py`, `tests/test_dcdh_heterogeneity_cell_period_coverage.py`) — the cell-period allocator's per-cell IF expansion is what enables within-group-varying PSU. -- [x] **Three primary-source paper reviews on file**: `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2020-review.md` (2020 AER), `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2022-review.md` (2022/2023 NBER WP 29873), and `docs/methodology/papers/dechaisemartin-2026-review.md` (Knau et al. 2026 universal-rollout companion — primary source for `HeterogeneousAdoptionDiD`, not DCDH). +- [x] **Two primary-source DCDH paper reviews on file**: `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2020-review.md` (2020 AER) and `docs/methodology/papers/dechaisemartin-dhaultfoeuille-2022-review.md` (2022/2023 NBER WP 29873). The adjacent `docs/methodology/papers/dechaisemartin-2026-review.md` is on disk as the primary source for `HeterogeneousAdoptionDiD` (HAD's universal-rollout case) and is referenced from DCDH as context only; it is not DCDH primary-source coverage. **Test Coverage:** - 12 methodology tests in `tests/test_methodology_chaisemartin_dhaultfoeuille.py` (4 classes: `TestMethodologyWorkedExample`, `TestCohortRecenteringCritical`, `TestTWFEDiagnostic`, `TestLargeNRecovery`). - 26 R-parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` against `DIDmultiplegtDYN`. - 352 unit tests in `tests/test_chaisemartin_dhaultfoeuille.py` covering Phase 1 + Phase 2 + Phase 3 + survey-design + by-path + HonestDiD surfaces (37 test classes). - Survey-specific: `tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, plus three dCDH cell-period coverage suites (`test_dcdh_cell_period_coverage.py`, `test_dcdh_bootstrap_cell_period_coverage.py`, `test_dcdh_heterogeneity_cell_period_coverage.py`). -- Three primary-source paper reviews on disk (see Verified Components above). +- Two primary-source DCDH paper reviews on disk: 2020 AER and 2022/2023 NBER WP 29873 (see Verified Components above). The adjacent `docs/methodology/papers/dechaisemartin-2026-review.md` is on disk but is HAD's primary source, not DCDH's — it does not count toward DCDH primary-source coverage. **Corrections Made:** 1. **Round 2 full-IF fix** (pre-promotion): never-switching groups now participate in the variance via stable-control roles under the full `Lambda^G_{g,l=1}` influence function. The `n_groups_dropped_never_switching` results field is retained for backwards compatibility but no longer represents an actual exclusion. After this fix, SE parity vs R on pure-direction scenarios narrowed from ~18% to ~3% (documented in REGISTRY `## ChaisemartinDHaultfoeuille` § "Note (deviation from R DIDmultiplegtDYN):" on period-vs-cohort control sets).