From 1224a19c8391fa7641593ce2f3bf64ad22709065 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 08:29:21 -0400 Subject: [PATCH 1/4] Close SDID placebo R-parity gap: warm-start + R-anchored fixture + test seam MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the last queued SDID R-parity follow-up (per ``project_sdid_pr349_followups.md``, now removed from the memory index — the work is shipped). Symmetric with the existing ``test_jackknife_se_matches_r`` anchor in TestJackknifeSERParity. Methodology fix — placebo warm-start: ``synthdid:::placebo_se`` (R/vcov.R) seeds Frank-Wolfe per draw with ``weights.boot$omega = sum_normalize(weights$omega[ind[1:N0_placebo]])`` (fit-time ω subsetted + renormalized) and the fit-time ``weights$lambda``, then re-estimates with ``update.omega=TRUE, update.lambda=TRUE``. Python's ``_placebo_variance_se`` previously used uniform cold-start, producing finite-iter convergence-pattern drift on a handful of draws relative to R's reference SE on the same panel. Fix: add ``init_omega`` and ``init_lambda`` kwargs to ``_placebo_variance_se``. The dispatcher now passes ``init_omega= unit_weights, init_lambda=time_weights`` (fit-time outputs); the loop seeds ``compute_sdid_unit_weights(init_weights= _sum_normalize(init_omega[pseudo_control_idx]))`` and ``compute_time_weights(init_weights=init_lambda)`` per draw, mirroring R's warm-start pattern. At the global FW optimum the two starts are equivalent (strictly convex objective) — this is a finite-iter parity fix, not a methodology change. R-parity fixture + test seam: * ``benchmarks/R/generate_sdid_placebo_parity_fixture.R`` — R 4.5.2 + synthdid 0.0.9. Reuses the same Y matrix as ``TestJackknifeSERParity`` (same R_ATT = 4.980848860060929) so jackknife and placebo R-parity tests share an anchor panel. Records the 200 per-rep permutations R consumed and the SE from R's manual ``placebo_se`` loop (which matches ``vcov(method="placebo")`` to machine precision when seeded); permutations are 0-indexed for direct numpy consumption. * ``tests/data/sdid_placebo_indices_r.json`` — committed fixture. * ``_placebo_variance_se`` gains a private ``_placebo_indices`` kwarg (underscore-prefixed, test-only). When supplied, each row replaces the per-draw ``rng.permutation(n_control)`` so a Python fit can consume R's exact permutation sequence and produce a bit-identical SE. * ``test_placebo_se_matches_r`` (in ``TestJackknifeSERParity``) intercepts the dispatcher's call to ``_placebo_variance_se`` to capture the normalized fit-time inputs, then re-invokes the method with R's permutations through the seam. Asserts ``|py_se - r_se| < 1e-8`` — Rust FW vs R FW differ at sub-ULP on the same warm-start; tight enough to catch real divergences without masking BLAS reduction-order tolerance. Baseline rebase: ``TestScaleEquivariance::test_baseline_parity_small_scale[placebo]`` captured pre-warm-start SE = 0.29385822261006445. New value is 0.293840360160448 (sub-percent shift). The test's bit-identity contract is preserved per backend; baseline updated with a comment documenting the warm-start change and pointer to the new R-parity test that pins the post-fix value to R's reference. p-value (placebo uses empirical formula, not analytical) is unchanged at 0.004975124378109453. Verification: ``pytest tests/test_methodology_sdid.py tests/test_survey_phase5.py -q`` → 230 passed (1 new R-parity test; existing TestScaleEquivariance baseline rebased; all other SDID + survey tests unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 3 + .../R/generate_sdid_placebo_parity_fixture.R | 142 ++++++++++++++++++ diff_diff/synthetic_did.py | 61 +++++++- docs/methodology/REGISTRY.md | 2 +- tests/data/sdid_placebo_indices_r.json | 1 + tests/test_methodology_sdid.py | 100 +++++++++++- 6 files changed, 299 insertions(+), 10 deletions(-) create mode 100644 benchmarks/R/generate_sdid_placebo_parity_fixture.R create mode 100644 tests/data/sdid_placebo_indices_r.json diff --git a/CHANGELOG.md b/CHANGELOG.md index eb4f1d69..fccf3794 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [3.3.0] - 2026-04-25 +### Fixed +- **`SyntheticDiD(variance_method="placebo")` SE now uses R-default warm-start** matching `synthdid:::placebo_se`. R's placebo loop seeds Frank-Wolfe per draw with `weights.boot$omega = sum_normalize(weights$omega[ind[1:N0_placebo]])` (fit-time ω subsetted + renormalized) and the fit-time `weights$lambda` — Python previously used uniform cold-start, producing finite-iter convergence-pattern drift on a handful of draws relative to R's reference SE. New `_placebo_variance_se` kwargs `init_omega` / `init_lambda` thread fit-time weights through the existing two-pass FW dispatcher; on the global FW optimum the values are init-independent (strictly convex objective), so the change is a finite-iter parity fix, not a methodology change. Existing placebo SE values shift by sub-percent on most panels; the bit-identity baseline pin in `TestScaleEquivariance::test_baseline_parity_small_scale[placebo]` was rebased from `0.29385822261006445` to `0.293840360160448`. New R-parity test `tests/test_methodology_sdid.py::TestJackknifeSERParity::test_placebo_se_matches_r` asserts SE matches R's `vcov(method="placebo")` to within `< 1e-8` using R's exact permutation sequence (recorded by `benchmarks/R/generate_sdid_placebo_parity_fixture.R` into `tests/data/sdid_placebo_indices_r.json`). The `_placebo_indices` kwarg on `_placebo_variance_se` is the test seam; not part of the public API. + ### Added - **`qug_test` and `did_had_pretest_workflow` survey-aware NotImplementedError gates (Phase 4.5 C0 decision gate).** `qug_test(d, *, survey=None, weights=None)` and `did_had_pretest_workflow(..., *, survey=None, weights=None)` now accept the two kwargs as keyword-only with default `None`. Passing either non-`None` raises `NotImplementedError` with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on `survey=` + `weights=` mirrors `HeterogeneousAdoptionDiD.fit()` at `had.py:2890`. **QUG-under-survey is permanently deferred** — the test statistic uses extreme order statistics `D_{(1)}, D_{(2)}` which are NOT smooth functionals of the empirical CDF, so standard survey machinery (Binder-TSL linearization, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the `Exp(1)/Exp(1)` limit law's independence assumption breaks; and the EVT-under-unequal-probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is **temporary** — Phase 4.5 C will close it for the linearity-family pretests with mechanism varying by test: Rao-Wu rescaled bootstrap for `stute_test` and the joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); weighted OLS residuals + weighted variance estimator for `yatchew_hr_test` (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based). Sister pretests (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) keep their closed signatures in this release — Phase 4.5 C will add kwargs and implementation together to avoid API churn. Unweighted `qug_test(d)` and `did_had_pretest_workflow(...)` calls are bit-exact pre-PR (kwargs are keyword-only after `*`; positional path unchanged). New tests at `tests/test_had_pretests.py::TestQUGTest` (5 rejection / mutex / message / regression tests) and the new `TestHADPretestWorkflowSurveyGuards` class (6 tests covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the full methodology rationale plus a sketch of the (out-of-scope) theoretical bridge that combines endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) — publishable methodology research, not engineering work. - **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants. diff --git a/benchmarks/R/generate_sdid_placebo_parity_fixture.R b/benchmarks/R/generate_sdid_placebo_parity_fixture.R new file mode 100644 index 00000000..329572db --- /dev/null +++ b/benchmarks/R/generate_sdid_placebo_parity_fixture.R @@ -0,0 +1,142 @@ +#!/usr/bin/env Rscript +# Generate a fixture pinning R's `synthdid::vcov(method="placebo")` SE plus +# the per-replication permutations R consumed, so the Python R-parity test +# can feed those exact permutations through `_placebo_variance_se` and +# assert SE match at machine precision. +# +# Usage: +# Rscript benchmarks/R/generate_sdid_placebo_parity_fixture.R +# +# Output: +# tests/data/sdid_placebo_indices_r.json +# +# Symmetric with the existing jackknife R-parity test +# (TestJackknifeSERParity in tests/test_methodology_sdid.py:1410). Reuses +# the same Y matrix and (N0, N1, T0, T1) shape so the placebo + jackknife +# parity tests share an anchor panel. +# +# R version: 4.5.2; synthdid version: 0.0.9. + +library(synthdid) +library(jsonlite) + +# Reconstruct R's panel exactly as TestJackknifeSERParity does (set.seed(42), +# 23 units × 8 periods, treated = i > N0 with effect 5 in t > T0). +set.seed(42) +N0 <- 20 +N1 <- 3 +T0 <- 5 +T1 <- 3 +N <- N0 + N1 +T <- T0 + T1 +Y <- matrix(0, nrow = N, ncol = T) +for (i in 1:N) { + unit_fe <- rnorm(1, sd = 2) + for (t in 1:T) { + Y[i, t] <- 10 + unit_fe + (t - 1) * 0.3 + rnorm(1, sd = 0.5) + if (i > N0 && t > T0) Y[i, t] <- Y[i, t] + 5.0 + } +} + +# Fit-time ATT (sanity check — must match TestJackknifeSERParity.R_ATT). +tau_hat <- synthdid_estimate(Y, N0, T0) +r_att <- as.numeric(tau_hat) + +# Reproduce R's placebo_se loop exactly so we can record permutations and +# the per-rep tau alongside the resulting SE. Mirrors `synthdid:::placebo_se` +# (R/vcov.R), including the warm-start weights pass-through: +# +# theta = function(ind) { +# N0 = length(ind) - N1 +# weights.boot = weights +# weights.boot$omega = sum_normalize(weights$omega[ind[1:N0]]) +# do.call(synthdid_estimate, c(list(Y = setup$Y[ind, ], +# N0 = N0, T0 = setup$T0, X = setup$X[ind, , ], +# weights = weights.boot), opts)) +# } +# +# The warm-start `weights.boot$omega` differs from a fresh uniform init +# at finite FW iterations and is what `vcov(method="placebo")` actually +# consumes — so reproducing it here is required for bit-identical SE. +opts_used <- attr(tau_hat, "opts") +fit_weights <- attr(tau_hat, "weights") +fit_setup <- attr(tau_hat, "setup") +replications <- 200 + +# Use a fresh seed for the placebo loop so the recorded permutations are +# independent of the fit-time RNG state. Python consumes the recorded +# permutations directly (no RNG-state matching needed). +set.seed(42) +perms <- vector("list", replications) +taus <- numeric(replications) + +for (r in 1:replications) { + ind <- sample(1:N0, N0) + perms[[r]] <- ind + N0_placebo <- N0 - N1 + weights_boot <- fit_weights + weights_boot$omega <- synthdid:::sum_normalize(fit_weights$omega[ind[1:N0_placebo]]) + # IMPORTANT: R's `placebo_se` uses ONLY the N0 controls (subdivided into + # N0-N1 pseudo-controls + N1 pseudo-treated). Real treated rows are NOT + # included in the placebo Y matrix — that's what makes the placebo a + # null-distribution test. ``Y = setup$Y[ind, ]`` is N0 rows; appending + # the real treated rows (i.e., ``setup$Y[c(ind, (N0+1):N), ]``) would + # change the test entirely (and produces SE ~0.132 instead of R's 0.226 + # — a 2× drift on this fixture). + est_placebo <- do.call( + synthdid_estimate, + c(list( + Y = fit_setup$Y[ind, ], + N0 = N0_placebo, + T0 = T0, + X = fit_setup$X[ind, , ], + weights = weights_boot + ), opts_used) + ) + taus[r] <- as.numeric(est_placebo) +} + +r_placebo_se <- sqrt((replications - 1) / replications) * sd(taus) + +# Sanity check against R's vcov() entry point. With the warm-start pattern +# applied explicitly above, the manual loop and `vcov()` should produce +# the same SE up to MC noise on the seed sequence. Match isn't required +# for the parity test (we use `r_placebo_se` from our recorded +# permutations); both values are kept for transparency. +set.seed(42) +r_placebo_se_via_vcov <- sqrt(vcov(tau_hat, method = "placebo", replications = replications)[1, 1]) + +cat(sprintf("R ATT: %.15f\n", r_att)) +cat(sprintf("R placebo SE (manual loop): %.15f\n", r_placebo_se)) +cat(sprintf("R placebo SE (via vcov): %.15f\n", r_placebo_se_via_vcov)) +cat(sprintf("Replications: %d\n", replications)) + +# Convert permutations to 0-indexed for Python (R uses 1-indexed). +perms_0indexed <- lapply(perms, function(p) as.integer(p - 1L)) + +payload <- list( + metadata = list( + R_version = paste(R.version$major, R.version$minor, sep = "."), + synthdid_version = as.character(packageVersion("synthdid")), + seed = 42L, + replications = as.integer(replications), + note = paste( + "Permutations are 0-indexed for direct numpy consumption.", + "R ATT, R placebo SE (manual loop), and per-rep taus are pinned", + "for downstream Python parity assertion." + ) + ), + N0 = as.integer(N0), + N1 = as.integer(N1), + T0 = as.integer(T0), + T1 = as.integer(T1), + R_ATT = r_att, + R_PLACEBO_SE = r_placebo_se, + R_PLACEBO_SE_VIA_VCOV = r_placebo_se_via_vcov, + R_PLACEBO_TAUS = as.numeric(taus), + R_PERMUTATIONS = perms_0indexed +) + +out_path <- "tests/data/sdid_placebo_indices_r.json" +write_json(payload, out_path, auto_unbox = TRUE, digits = 17) +cat(sprintf("\nWrote %s\n", out_path)) diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py index ba056418..d6a74ab9 100644 --- a/diff_diff/synthetic_did.py +++ b/diff_diff/synthetic_did.py @@ -1153,6 +1153,8 @@ def fit( # type: ignore[override] min_decrease=min_decrease, replications=self.n_bootstrap, w_control=w_control, + init_omega=unit_weights, + init_lambda=time_weights, ) se = se_n * Y_scale placebo_effects = np.asarray(placebo_effects_n) * Y_scale @@ -1695,6 +1697,9 @@ def _placebo_variance_se( min_decrease: float = 1e-5, replications: int = 200, w_control=None, + init_omega: Optional[np.ndarray] = None, + init_lambda: Optional[np.ndarray] = None, + _placebo_indices: Optional[np.ndarray] = None, ) -> Tuple[float, np.ndarray]: """ Compute placebo-based variance matching R's synthdid methodology. @@ -1747,6 +1752,21 @@ def _placebo_variance_se( rng = np.random.default_rng(self.seed) n_pre, n_control = Y_pre_control.shape + # R-parity test seam (PR follow-up to #349). When + # ``_placebo_indices`` is provided, each row replaces the + # per-draw ``rng.permutation(n_control)`` so a Python fit can + # consume R's exact permutation sequence and produce a + # bit-identical SE. Shape: ``(replications, n_control)``, + # 0-indexed. Underscore-prefixed → test-only / private API. + if _placebo_indices is not None: + _placebo_indices = np.asarray(_placebo_indices, dtype=np.int64) + if _placebo_indices.ndim != 2 or _placebo_indices.shape[1] != n_control: + raise ValueError( + f"_placebo_indices shape {_placebo_indices.shape} does not " + f"match expected (replications, {n_control})" + ) + replications = _placebo_indices.shape[0] + # Ensure we have enough controls for the split n_pseudo_control = n_control - n_treated if n_pseudo_control < 1: @@ -1771,10 +1791,17 @@ def _placebo_variance_se( placebo_estimates = [] - for _ in range(replications): + for _rep in range(replications): try: - # Random permutation of control indices (Algorithm 4, step 1) - perm = rng.permutation(n_control) + # Random permutation of control indices (Algorithm 4, step 1). + # Test seam: when ``_placebo_indices`` is supplied, consume + # the externally-provided permutation instead — used by + # ``test_placebo_se_matches_r`` to feed R's exact + # permutation sequence through Python for bit-identical SE. + if _placebo_indices is not None: + perm = _placebo_indices[_rep] + else: + perm = rng.permutation(n_control) # Split into pseudo-controls and pseudo-treated (step 2) pseudo_control_idx = perm[:n_pseudo_control] @@ -1805,15 +1832,28 @@ def _placebo_variance_se( Y_post_control[:, pseudo_treated_idx], axis=1 ) - # Re-estimate weights on permuted data (matching R's behavior) - # R passes update.omega=TRUE, update.lambda=TRUE via opts, - # re-estimating weights from uniform initialization (fresh start). - # Unit weights: re-estimate on pseudo-control/pseudo-treated data + # Re-estimate weights on permuted data (matching R's behavior). + # R's `placebo_se` (synthdid:::placebo_se in R/vcov.R) passes + # `weights.boot$omega = sum_normalize(weights$omega[ind[1:N0]])` + # as a warm-start to `synthdid_estimate` with `update.omega=TRUE`, + # then re-estimates ω. Python mirrors that: when fit-time ω + # (`init_omega`) is supplied, seed the FW first pass with the + # subsetted-renormalized fit-time omega; otherwise use cold- + # start (uniform). At the global FW optimum the two are + # equivalent, but the warm-start matches R's exact iterates + # for bit-identical SE under the R-parity test. + if init_omega is not None: + pseudo_omega_init = _sum_normalize( + init_omega[pseudo_control_idx] + ) + else: + pseudo_omega_init = None pseudo_omega = compute_sdid_unit_weights( Y_pre_pseudo_control, Y_pre_pseudo_treated_mean, zeta_omega=zeta_omega, min_decrease=min_decrease, + init_weights=pseudo_omega_init, ) # Compose pseudo_omega with control survey weights @@ -1824,12 +1864,17 @@ def _placebo_variance_se( else: pseudo_omega_eff = pseudo_omega - # Time weights: re-estimate on pseudo-control data + # Time weights: re-estimate on pseudo-control data. + # R's `placebo_se` reuses the same `weights.boot` (with + # the original ``weights$lambda`` untouched) as warm- + # start to ``synthdid_estimate``. Mirror by passing + # fit-time λ as ``init_weights`` when supplied. pseudo_lambda = compute_time_weights( Y_pre_pseudo_control, Y_post_pseudo_control, zeta_lambda=zeta_lambda, min_decrease=min_decrease, + init_weights=init_lambda, ) # Compute placebo SDID estimate (step 4) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index fcc56f0b..5442680a 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -1678,7 +1678,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi - [x] Auto-regularization: noise_level = sd(first_diffs), zeta_omega = (N1*T1)^0.25 * noise_level, zeta_lambda = 1e-6 * noise_level - [x] Sparsification: v[v <= max(v)/4] = 0; v = v/sum(v) - [x] Placebo SE formula: sqrt((r-1)/r) * sd(placebo_estimates) -- [x] Placebo SE: re-estimates omega and lambda per replication (matching R's update.omega=TRUE, update.lambda=TRUE) +- [x] Placebo SE: re-estimates omega and lambda per replication (matching R's update.omega=TRUE, update.lambda=TRUE), with R-default warm-start (`init_omega = sum_normalize(unit_weights[pseudo_control_idx])` per draw; `init_lambda = time_weights`) matching `synthdid:::placebo_se`'s `weights.boot$omega = sum_normalize(weights$omega[ind[1:N0_placebo]])` pattern. R-parity verified at `< 1e-8` SE tolerance via `tests/test_methodology_sdid.py::TestJackknifeSERParity::test_placebo_se_matches_r` using R's exact permutation sequence threaded through the `_placebo_indices` test seam. - [x] Bootstrap: paper-faithful Algorithm 2 step 2 — re-estimates ω̂_b and λ̂_b per draw via two-pass sparsified Frank-Wolfe on the resampled panel using the fit-time normalized-scale zeta. Matches R's default `synthdid::vcov(method="bootstrap")` (which rebinds `attr(estimate, "opts")` so the renormalized ω serves only as Frank-Wolfe initialization). Survey designs (pweight-only AND strata/PSU/FPC) are supported via the weighted-FW + hybrid pairs-bootstrap + Rao-Wu rescaling composition described in the "Note (survey + bootstrap composition)" above (PR #355). - [x] Placebo: Survey-weighted pseudo-treated means + weighted-FW re-estimation on pseudo-panel for both pweight-only and full-design paths. Full-design path (strata/PSU/FPC) uses stratified-permutation allocator — see "Note (survey + placebo composition)" above. - [x] Jackknife SE: fixed weights, LOO all units, formula `sqrt((n-1)/n * sum((u-ubar)^2))`. Full-design path (strata/PSU/FPC) uses PSU-level LOO with stratum aggregation — see "Note (survey + jackknife composition)" above. diff --git a/tests/data/sdid_placebo_indices_r.json b/tests/data/sdid_placebo_indices_r.json new file mode 100644 index 00000000..c65736a1 --- /dev/null +++ b/tests/data/sdid_placebo_indices_r.json @@ -0,0 +1 @@ +{"metadata":{"R_version":"4.5.2","synthdid_version":"0.0.9","seed":42,"replications":200,"note":"Permutations are 0-indexed for direct numpy consumption. R ATT, R placebo SE (manual loop), and per-rep taus are pinned for downstream Python parity assertion."},"N0":20,"N1":3,"T0":5,"T1":3,"R_ATT":4.9808488600609291,"R_PLACEBO_SE":0.22634276335564449,"R_PLACEBO_SE_VIA_VCOV":0.22634276335564449,"R_PLACEBO_TAUS":[-0.17514514139049764,0.051095789597656815,0.17398742857711361,-0.068630360949537964,0.20043606942973233,-0.27178140082206054,-0.23364596111589159,-0.050338053916422952,0.1499813172878092,0.18845658709491653,-0.3883795506282709,-0.16112796602421031,-0.073920942245004784,-0.095725372152953389,-0.23775776062271295,-0.13048844305643839,-0.005026529124715305,-0.008983354540174314,-0.29000022165689326,0.040542136004489943,0.052507173089492029,0.44383136559381747,0.34986345701920829,-0.040556345314975852,-0.37482900911198708,-0.25413618423954648,0.035978949684040973,0.26951564622723284,0.21309519866721713,-0.025851752634965731,-0.33588795038443392,-0.10094611801776011,0.18547958789639024,0.091496464861127072,0.51210303533168,0.06878737108872332,-0.28945758145987466,-0.13886944926390496,0.30695101771749889,-0.14040587243922278,0.050287943634037734,-0.17981571627173221,-0.10186747897418731,-0.15607784314623896,0.37621098150567628,-0.55206595905518996,-0.4916432396852316,-0.19636027201217679,0.0063186735200593301,0.14150393803958117,-0.026249808388021937,0.062570468788135838,0.012463117155965086,0.1554997083594688,0.082241160497572519,-0.051093433265308005,-0.46235315796708981,0.1038490509039664,0.48890748339083562,0.47501035296148197,-0.37757773718547205,-0.076426435516089963,-0.15258498542198778,0.2074118896649324,-0.20809007583217057,-0.10186747897418864,0.038048935337206082,-0.086145539126234294,0.046373635199862158,-0.2089319434211514,0.13505538840152312,0.18974594251869642,-0.13875506808722671,0.21037798101338082,-0.26147386598513145,-0.21065052145781915,0.11715326997361043,-0.07752440713244535,-0.088044421425546138,-0.040556345314976032,0.39532125227858,-0.23629596743686504,0.2686781987442598,-0.010224550666356589,0.49753502770485009,0.0040771407676736971,-0.26103606389129796,0.54244848475024421,0.0027316069919338108,-0.043112116168558937,0.15965410226949139,0.30695101771749705,0.43318136294331666,-0.2483746838332038,0.43558802196655189,0.17656514061952258,0.11244100541215261,-0.27526554474587217,0.12896332837551935,0.19903047220568534,-0.47993787793540388,-0.16687984157088911,0.39278757303940287,0.12496325972138489,-0.1464901927382132,0.073320908802760737,-0.092863040121185916,-0.15490188458096738,0.17795323253383508,-0.089354954237821096,0.39078426253727427,-0.35646064065719985,-0.057322675924979999,0.16201322631137466,0.18974594251869523,-0.021616987677113064,0.175448057222061,0.05250717308949094,0.086958556375184445,0.25155604258843206,-0.077524407132444198,0.062570468788138003,0.26940200878868348,-0.028221718548654247,0.26951564622723273,0.0050130768370545194,0.13859204856248114,0.37999478373096746,0.20325085174705343,0.20568465008553311,0.16612918294220497,0.31164918343657716,0.23646528368783135,0.19428035363566556,-0.0052631253276375174,0.021839344554781429,0.15965410226949123,-0.27947393994391045,-0.10875496368512941,0.034710248440361857,-0.15808675455166751,0.17905200261954035,0.10672115408232594,-0.073920942245004992,-0.10094611801776093,-0.11317211375084424,-0.20607652882313063,0.27614164532358848,-0.44113155567264833,0.23823397642053262,0.39122059899391048,0.26951564622723306,0.60535269998088526,-0.0080573353738578601,0.038048935337207081,-0.12512078325570833,0.305175714180562,0.12896332837551833,-0.25894521675824173,-0.28731300452378689,0.32656190117968065,-0.13663238995603058,0.18271789959541967,0.13616991904494469,-0.0089833545401723711,-0.15918091439526538,-0.048901826672185254,0.21392459491297994,-0.094772091899143637,0.26860560268616357,-0.014611856062943296,0.43017817313488088,-0.17552303201701944,-0.25759713659549788,0.028888906032389902,0.28585930297103612,0.092403701220697493,-0.14790356509073407,-0.073968449470622005,-0.014611856062945518,0.14430237607521501,-0.16112796602420906,0.12496325972138535,0.24467325442378868,0.23260864615054214,0.041920112531541334,-0.005026529124715841,-0.38283619384410766,-0.15808675455166787,0.071654751001691344,-0.13663238995602775,0.26467592172342691,-0.17981571627173099,0.21037798101337735,0.31377087727811376,0.046740928896169702,0.25637086032247297,-0.11490257908764238,0.28342025146291272,-0.37207386653218433],"R_PERMUTATIONS":[[16,4,0,9,3,1,19,17,7,6,15,8,18,5,13,14,11,2,12,10],[3,4,12,18,19,1,11,7,2,9,0,17,5,13,8,14,15,16,6,10],[4,3,1,2,7,6,0,19,9,17,5,11,10,18,16,12,15,8,14,13],[9,7,4,0,16,6,3,12,19,8,13,10,2,18,17,5,11,1,15,14],[1,17,16,4,13,7,6,14,18,3,8,10,15,19,12,2,0,9,5,11],[15,16,11,8,18,4,13,10,19,9,7,6,5,0,14,12,17,2,1,3],[10,9,18,0,19,8,12,14,2,16,5,7,6,11,17,4,3,15,1,13],[16,17,1,5,19,15,14,3,11,18,9,6,12,13,7,10,0,8,4,2],[1,10,13,19,11,16,18,9,8,5,4,0,15,12,2,3,14,7,17,6],[12,10,16,3,17,18,0,8,13,11,15,14,19,7,9,1,2,5,4,6],[4,12,11,0,13,2,17,19,7,5,6,9,8,16,15,10,14,1,18,3],[14,15,0,5,10,8,16,9,13,11,3,6,18,19,1,4,17,2,12,7],[1,5,0,13,15,17,10,11,19,6,12,3,8,16,4,14,2,9,18,7],[19,17,8,10,11,5,9,18,6,13,2,12,16,7,3,4,14,1,15,0],[16,11,5,14,8,9,7,1,6,18,4,2,12,0,10,19,13,15,3,17],[13,15,3,5,17,4,18,10,9,0,12,19,14,16,6,8,7,11,2,1],[4,17,6,12,9,13,5,1,11,2,0,3,10,15,14,16,7,8,19,18],[11,9,8,14,13,6,2,5,3,18,1,16,0,10,15,7,17,19,4,12],[17,18,9,15,1,19,3,8,14,6,0,13,5,2,12,10,16,11,7,4],[6,18,1,5,8,7,10,16,4,9,0,15,17,13,3,19,2,11,14,12],[5,4,9,1,6,3,8,13,14,12,2,16,10,0,11,18,15,7,19,17],[3,8,7,15,9,11,16,6,10,12,5,18,2,1,13,0,19,17,4,14],[18,9,12,5,0,10,17,19,13,11,15,7,3,16,1,8,2,14,4,6],[15,11,19,2,7,4,6,16,9,14,18,13,12,10,8,1,17,5,3,0],[10,12,2,9,17,1,6,3,4,11,16,0,19,5,13,14,15,7,8,18],[11,12,17,14,4,0,10,7,1,5,2,9,18,15,6,16,13,8,3,19],[5,7,4,8,9,10,11,15,14,13,6,0,19,12,3,18,2,17,1,16],[2,12,15,7,17,19,8,10,6,1,11,4,13,9,3,18,14,0,16,5],[15,7,17,9,14,12,11,2,3,5,16,1,18,0,4,19,10,6,13,8],[1,14,4,13,17,9,0,11,19,10,5,16,12,6,15,7,3,18,8,2],[7,13,0,18,4,19,1,14,16,15,9,2,11,12,6,5,8,17,10,3],[7,18,1,10,14,17,19,13,9,15,5,3,6,0,8,11,2,4,12,16],[18,10,19,11,0,1,3,6,15,13,2,16,17,8,7,12,14,9,4,5],[7,2,12,1,6,3,8,10,11,9,0,17,5,15,14,16,13,18,19,4],[8,18,4,1,19,2,7,11,16,10,13,12,5,9,15,0,3,14,6,17],[7,18,14,5,2,0,13,3,11,8,17,16,12,10,9,19,15,1,4,6],[16,10,8,18,2,0,19,14,12,7,1,11,6,5,9,4,17,13,3,15],[17,3,14,6,13,9,2,16,4,7,5,11,8,12,1,0,15,18,10,19],[12,6,9,16,1,3,0,17,11,7,10,5,8,19,18,2,15,13,14,4],[1,3,7,14,13,0,9,4,16,11,15,18,10,5,6,2,19,17,8,12],[8,13,2,3,17,6,0,11,7,4,5,16,10,15,19,9,12,14,18,1],[2,19,8,6,10,16,11,12,0,3,18,9,1,5,4,13,14,7,15,17],[17,4,2,9,7,8,6,12,10,16,13,18,0,11,5,3,14,19,1,15],[4,18,10,16,6,7,9,11,5,3,17,2,19,14,0,1,12,8,13,15],[0,19,17,2,13,4,6,5,18,7,16,1,8,12,10,15,3,11,9,14],[14,7,2,1,0,5,17,10,9,4,19,15,13,11,6,16,18,3,12,8],[2,1,5,18,15,13,8,14,4,0,3,9,17,16,19,6,11,10,12,7],[10,7,6,0,11,18,2,14,3,1,4,13,5,9,15,17,19,12,16,8],[6,4,0,10,1,15,19,9,5,18,3,16,11,17,7,13,2,12,14,8],[18,4,2,12,17,19,16,8,1,3,11,0,5,7,6,10,14,15,9,13],[16,9,18,8,5,10,7,2,6,17,11,13,14,4,3,19,1,0,12,15],[9,17,15,7,4,18,6,19,1,11,13,0,3,12,16,10,14,8,2,5],[16,12,5,11,2,0,10,19,13,15,4,6,9,7,18,3,17,8,14,1],[14,7,5,10,1,0,13,3,4,11,19,9,17,8,15,6,12,16,2,18],[12,17,9,2,0,11,4,18,15,10,6,7,3,5,13,14,8,16,19,1],[18,14,15,2,1,5,13,6,9,11,10,16,7,0,3,12,17,4,19,8],[7,12,19,4,0,8,14,13,17,2,15,9,5,10,16,6,18,11,3,1],[12,13,18,11,3,7,10,1,14,9,19,4,17,2,16,15,0,5,6,8],[14,18,11,10,8,12,7,15,13,2,17,1,16,4,19,3,5,9,0,6],[13,18,3,15,14,6,7,8,10,2,9,1,4,11,16,12,17,5,0,19],[9,4,12,19,0,16,6,2,14,7,1,5,11,13,18,3,17,10,8,15],[12,8,13,11,1,6,16,17,5,2,15,9,7,4,0,19,10,18,3,14],[0,5,3,19,12,14,16,13,9,10,8,1,4,2,18,6,15,11,7,17],[8,4,5,10,18,2,17,7,12,0,9,11,19,3,15,16,14,1,13,6],[10,9,17,14,16,15,2,12,3,6,19,8,13,11,1,4,0,7,18,5],[0,8,10,4,13,17,2,16,11,14,12,7,5,9,3,6,18,1,15,19],[6,12,9,15,3,1,18,11,0,16,13,17,8,4,7,14,2,10,5,19],[10,6,9,15,18,2,11,1,4,17,13,3,14,16,7,0,5,19,8,12],[3,9,2,6,5,0,10,18,19,12,11,13,8,17,15,1,16,7,4,14],[2,5,9,18,10,0,17,13,19,4,8,14,7,12,11,6,3,16,15,1],[16,18,13,0,5,15,17,19,3,11,7,8,10,6,9,1,2,12,14,4],[15,12,16,14,0,9,6,2,7,8,17,1,10,11,5,3,4,18,19,13],[13,6,4,11,5,19,17,0,14,18,16,15,3,9,12,7,8,1,10,2],[18,0,5,10,11,2,12,19,7,13,14,16,4,1,3,15,9,6,8,17],[6,11,0,12,14,2,8,13,7,5,16,19,9,4,15,18,10,3,1,17],[18,2,0,12,8,17,9,15,14,6,4,13,3,16,10,5,11,19,1,7],[9,14,1,12,10,11,3,18,6,2,19,0,7,8,15,16,13,4,5,17],[8,1,13,11,12,18,16,5,17,10,19,9,2,3,14,0,4,7,15,6],[18,11,13,19,15,3,6,5,14,4,16,9,0,10,12,7,17,1,8,2],[13,8,1,18,14,11,7,15,17,9,6,19,12,2,16,4,10,3,0,5],[16,3,6,4,1,12,11,13,18,17,5,15,10,9,7,14,8,0,19,2],[17,2,11,12,15,4,14,18,7,1,0,16,3,6,9,19,5,10,8,13],[1,2,0,5,8,14,18,13,17,4,16,9,15,3,10,12,7,19,6,11],[14,2,18,16,3,0,7,1,19,5,4,12,8,13,15,17,9,6,11,10],[2,10,5,8,11,18,19,3,6,15,9,1,16,7,17,12,4,0,14,13],[19,3,16,14,6,11,17,18,8,15,1,4,5,2,9,12,10,0,7,13],[9,7,5,14,0,6,12,4,15,11,2,17,8,19,10,1,16,13,18,3],[11,1,18,17,3,7,9,4,2,16,12,19,10,15,13,5,8,0,14,6],[3,19,16,0,11,1,18,7,12,9,10,14,4,2,6,13,15,5,17,8],[15,12,13,14,9,4,1,6,3,19,18,17,0,16,7,8,5,11,10,2],[6,1,11,19,13,0,12,10,7,14,18,4,3,8,2,17,5,16,15,9],[11,18,2,19,8,6,16,10,12,1,9,5,3,17,15,7,0,4,14,13],[16,5,1,2,15,11,9,12,10,3,8,7,0,17,4,6,13,18,14,19],[8,13,1,0,18,5,12,2,15,16,19,14,9,3,4,11,6,10,7,17],[5,13,1,8,7,12,10,19,0,2,4,15,9,18,3,11,6,16,14,17],[14,10,11,13,4,8,9,18,0,7,16,1,2,12,6,3,5,15,17,19],[19,1,5,4,10,0,6,13,3,8,12,2,17,16,14,7,15,18,11,9],[7,2,16,14,17,0,18,10,11,4,5,15,13,19,12,1,6,9,8,3],[12,7,17,6,11,16,3,9,18,19,2,13,4,14,1,8,10,0,5,15],[9,19,5,17,8,16,6,3,18,4,0,2,1,10,7,11,12,15,13,14],[7,1,11,17,5,13,16,9,14,2,19,6,10,15,4,0,12,18,8,3],[2,0,17,19,1,12,9,6,18,16,3,13,4,7,14,8,15,5,10,11],[1,15,3,16,13,12,10,2,8,4,6,11,14,18,7,17,19,9,5,0],[15,16,17,1,2,4,10,8,18,3,0,6,13,14,11,7,9,19,12,5],[7,10,2,13,4,0,12,6,8,16,14,18,11,17,9,5,1,15,19,3],[4,3,11,18,6,12,13,10,5,1,8,14,19,15,0,2,16,9,7,17],[2,19,17,4,15,14,5,8,18,1,13,0,9,16,3,10,11,6,7,12],[14,13,9,2,16,5,7,0,3,12,1,6,8,11,4,19,18,10,17,15],[12,3,6,19,14,1,7,15,8,4,2,16,10,5,11,17,0,13,18,9],[7,0,14,10,5,13,3,9,6,17,1,8,18,4,12,2,19,16,15,11],[0,8,6,10,4,15,17,11,1,3,9,19,12,18,2,16,7,13,14,5],[19,14,17,6,16,1,11,2,12,15,5,8,18,4,9,0,7,13,3,10],[16,0,14,5,4,15,12,6,17,9,10,11,18,8,7,2,1,13,19,3],[18,11,0,8,4,14,17,2,1,3,7,6,10,12,13,9,16,19,5,15],[17,6,9,5,10,16,2,8,0,11,3,7,14,15,4,12,1,18,19,13],[15,2,0,13,14,12,8,16,3,19,6,18,9,4,10,11,7,17,5,1],[3,15,19,6,9,7,11,4,12,1,18,13,16,17,8,0,5,14,2,10],[1,13,16,14,10,4,9,15,5,11,18,2,12,0,8,6,3,17,7,19],[13,4,5,7,9,18,8,15,12,16,0,1,19,14,6,11,3,2,10,17],[9,12,0,18,2,19,7,17,13,14,3,8,4,15,10,1,11,16,5,6],[12,18,3,17,9,4,11,16,13,10,14,5,1,0,8,19,2,15,6,7],[9,11,12,16,6,7,4,14,0,18,3,1,15,13,17,10,19,2,8,5],[2,6,15,10,14,12,3,0,1,8,4,7,16,13,11,5,17,18,19,9],[13,8,16,1,11,2,18,3,15,12,14,19,17,6,5,7,9,4,10,0],[15,18,10,7,13,6,3,2,12,14,4,1,9,17,8,19,11,16,0,5],[11,12,14,9,3,4,16,5,6,10,17,18,13,0,8,1,15,19,2,7],[19,6,2,11,5,14,10,13,3,15,4,18,16,7,1,0,12,8,17,9],[19,10,9,15,6,14,16,3,11,4,18,12,0,8,5,7,1,17,13,2],[10,19,4,3,14,13,15,8,16,2,7,0,12,11,17,18,5,6,1,9],[2,12,3,16,6,4,11,13,19,1,18,8,17,9,7,10,0,5,14,15],[0,6,3,11,14,16,1,8,17,7,19,18,12,10,9,13,15,4,2,5],[10,12,3,11,5,0,9,6,8,7,15,13,1,14,4,18,19,17,2,16],[0,19,5,11,16,2,13,4,1,15,7,18,8,3,17,9,10,6,14,12],[7,16,19,14,3,2,15,0,6,4,18,10,8,9,1,12,11,17,5,13],[13,14,10,17,4,16,1,2,6,8,3,5,7,15,0,9,18,19,11,12],[1,2,15,14,16,11,0,9,19,6,7,18,10,3,4,13,8,5,12,17],[2,8,17,11,4,13,10,19,5,12,18,0,3,6,14,7,1,15,9,16],[4,17,6,13,15,9,5,3,8,19,0,12,7,2,16,14,1,18,10,11],[9,8,6,4,14,18,2,1,0,16,5,13,10,15,17,7,12,11,19,3],[2,13,10,6,0,19,1,9,8,11,4,15,17,16,12,18,3,14,7,5],[15,19,7,8,3,6,9,0,13,18,2,12,5,1,11,17,14,10,16,4],[7,1,4,18,0,11,13,3,6,15,17,19,2,12,16,10,9,8,5,14],[13,15,9,2,14,6,12,7,5,8,17,4,3,1,19,16,10,11,18,0],[8,0,13,12,15,16,17,10,2,11,4,5,14,6,1,19,3,7,18,9],[6,9,17,3,10,13,7,5,1,15,14,18,11,19,0,2,8,12,16,4],[13,8,14,7,18,0,2,15,9,6,19,5,10,1,11,4,12,3,16,17],[12,13,16,10,11,1,3,2,8,15,0,18,17,14,6,9,19,5,4,7],[4,8,0,7,3,12,19,10,13,9,14,2,15,16,11,1,17,5,18,6],[2,15,19,8,10,9,7,3,4,13,17,16,0,6,18,5,14,12,1,11],[3,8,16,17,4,10,14,9,0,7,12,15,13,5,18,2,11,1,6,19],[15,1,12,17,10,2,3,19,11,5,4,13,9,0,7,16,8,18,6,14],[2,17,3,4,15,8,14,19,13,1,10,18,7,11,9,6,12,5,16,0],[19,15,1,14,3,16,17,8,7,11,4,0,18,12,13,10,5,9,6,2],[13,15,5,19,2,9,0,16,14,12,11,7,3,18,8,17,1,4,6,10],[2,12,0,14,4,18,3,16,1,11,13,15,9,17,8,7,6,5,19,10],[3,13,15,4,19,9,18,14,0,10,6,8,11,17,7,16,5,1,12,2],[1,4,14,3,7,2,16,8,6,12,0,10,15,11,19,18,17,13,5,9],[9,19,3,4,6,1,12,17,14,2,10,13,7,18,8,11,16,15,5,0],[13,12,0,10,18,16,2,11,7,9,14,4,3,8,19,6,17,5,15,1],[7,19,15,0,6,5,8,14,18,17,11,3,9,4,12,13,2,16,10,1],[15,3,13,10,18,8,19,12,0,4,2,17,9,1,6,11,7,14,5,16],[4,11,1,18,17,6,3,5,12,13,15,0,2,7,16,19,14,8,9,10],[0,10,11,6,7,5,2,3,13,14,17,1,8,15,9,12,18,4,16,19],[10,18,16,4,5,7,9,12,15,1,11,14,3,0,19,17,6,8,13,2],[14,17,18,11,9,16,1,0,10,2,7,15,5,6,8,13,3,12,4,19],[10,5,13,17,4,18,8,6,3,11,15,1,14,9,2,16,0,12,19,7],[18,3,6,9,10,17,13,7,11,19,14,16,4,5,8,1,0,12,15,2],[5,12,8,3,16,9,7,6,17,1,18,4,2,15,11,14,13,19,10,0],[18,7,17,12,3,4,11,1,9,8,16,5,6,0,13,14,19,15,10,2],[1,7,6,10,18,14,15,8,0,3,4,5,11,12,13,9,17,16,2,19],[4,17,13,14,0,10,11,5,3,18,16,7,12,15,9,2,19,6,1,8],[15,6,18,8,16,4,11,14,7,2,1,9,3,10,5,17,12,13,0,19],[3,9,16,12,10,18,1,17,11,19,6,2,0,14,4,13,7,8,5,15],[9,11,6,17,15,8,1,0,3,2,14,19,13,16,10,12,5,7,18,4],[4,18,11,17,1,14,12,6,3,15,0,8,10,2,9,5,13,16,19,7],[11,13,14,0,2,5,1,4,12,7,9,3,18,8,10,16,17,15,19,6],[6,7,4,14,0,3,11,16,13,18,10,17,8,2,9,12,15,1,5,19],[15,6,5,0,13,9,3,19,1,7,10,18,14,17,11,12,2,8,16,4],[10,13,3,2,6,11,9,14,4,8,0,1,16,15,12,5,17,19,7,18],[11,12,15,19,5,9,7,18,10,14,2,16,3,17,4,13,0,1,6,8],[3,13,7,19,16,5,4,18,11,8,10,6,1,15,14,0,9,17,2,12],[6,19,11,4,17,3,13,15,1,5,8,10,9,14,0,18,16,12,7,2],[18,3,16,8,11,9,15,0,4,13,14,6,17,10,2,7,1,19,12,5],[12,15,7,14,6,13,11,8,16,1,9,4,3,10,5,2,0,17,18,19],[15,11,1,19,18,14,17,16,4,5,9,10,3,2,8,0,7,12,6,13],[8,19,1,11,10,17,14,6,15,12,9,4,13,2,3,18,5,16,7,0],[16,1,10,11,6,0,9,2,12,17,5,13,3,15,14,4,7,18,8,19],[6,11,18,7,5,8,19,12,9,1,2,0,15,4,17,13,14,16,10,3],[9,15,17,11,8,7,14,6,2,1,0,19,3,13,18,5,12,10,16,4],[0,1,10,2,19,9,3,7,13,14,12,8,16,6,15,11,5,18,17,4],[16,14,11,7,5,1,4,3,17,6,19,15,12,18,13,2,0,8,9,10],[2,3,1,19,17,13,15,16,7,8,0,9,6,10,12,4,11,14,5,18],[0,6,1,5,8,2,4,19,10,12,13,18,3,14,11,9,16,17,7,15],[1,4,3,19,7,9,0,18,16,10,13,2,15,5,14,12,11,6,8,17],[3,15,6,10,13,17,16,12,5,19,0,1,11,8,14,7,4,2,18,9],[8,1,5,3,11,17,14,18,0,16,13,15,2,10,19,7,9,4,6,12],[4,18,7,19,5,15,3,10,1,9,2,0,16,6,12,13,11,17,14,8],[14,4,12,19,15,13,7,11,17,5,6,9,10,1,0,3,2,18,16,8],[18,6,5,2,17,7,3,4,11,10,1,0,16,8,19,15,13,12,14,9],[13,4,7,17,8,10,0,11,9,2,5,14,12,19,6,16,1,3,18,15]]} diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py index d1a6a93e..42a55a95 100644 --- a/tests/test_methodology_sdid.py +++ b/tests/test_methodology_sdid.py @@ -1534,6 +1534,90 @@ def test_jackknife_se_matches_r(self, r_panel_df): ) assert abs(results.se - self.R_JACKKNIFE_SE) < 1e-10 + def test_placebo_se_matches_r(self, r_panel_df): + """Placebo SE should match R's vcov(method='placebo') under the + same panel and the exact permutation sequence R consumed. + + Loads ``tests/data/sdid_placebo_indices_r.json``, which pins + R's per-rep permutation indices (200 × N0, 0-indexed) and the + SE from R's ``placebo_se`` loop. + + Intercepts ``_placebo_variance_se`` to capture the per-fit args + the dispatcher would have passed (Y_*_n, zetas, init_omega, + init_lambda, etc.), then re-invokes the method directly with + the same args plus the ``_placebo_indices`` test seam carrying + R's permutations. Avoids manually reconstructing the + normalization constants. + + Python uses the same warm-start ``weights.boot$omega = + sum_normalize(omega[ind[1:N0_placebo]])`` that R's + ``placebo_se`` consumes from ``attr(estimate, "weights")`` — + the warm-start matters for finite-iter FW convergence under + R's ``update.omega=TRUE`` semantics, even though the global + FW optimum is init-independent. + + Skip if the fixture file is missing — CI's isolated-install + job copies only ``tests/``, but ``tests/data/`` is included + (this skip is a defensive guard per the existing convention). + """ + import json + import pathlib + + fixture_path = ( + pathlib.Path(__file__).parent + / "data" / "sdid_placebo_indices_r.json" + ) + if not fixture_path.exists(): + pytest.skip( + f"Missing R-parity fixture {fixture_path}; regenerate via " + "`Rscript benchmarks/R/generate_sdid_placebo_parity_fixture.R`." + ) + payload = json.loads(fixture_path.read_text()) + r_se = payload["R_PLACEBO_SE"] + r_perms = np.asarray(payload["R_PERMUTATIONS"], dtype=np.int64) + replications = payload["metadata"]["replications"] + assert r_perms.shape == (replications, self.N0) + + sdid = SyntheticDiD( + variance_method="placebo", + n_bootstrap=replications, + seed=42, + ) + # Capture the dispatcher's call args for `_placebo_variance_se`. + captured: Dict[str, Any] = {} + original_method = sdid._placebo_variance_se + + def capture_then_call(*args, **kwargs): + captured["args"] = args + captured["kwargs"] = kwargs + return original_method(*args, **kwargs) + + sdid._placebo_variance_se = capture_then_call # type: ignore[assignment] + sdid.fit( + r_panel_df, outcome="outcome", treatment="treated", + unit="unit", time="time", + post_periods=[5, 6, 7], + ) + sdid._placebo_variance_se = original_method # type: ignore[assignment] + + # Re-call `_placebo_variance_se` with the SAME normalized inputs + # the dispatcher used, plus R's permutations through the seam. + # The recovered Y_scale rescales the normalized SE back to user + # outcome scale (matches the dispatcher's `se = se_n * Y_scale` + # at synthetic_did.py around L1164). + kwargs = dict(captured["kwargs"]) + kwargs["replications"] = replications + kwargs["_placebo_indices"] = r_perms + se_n, _ = sdid._placebo_variance_se(*captured["args"], **kwargs) + Y_scale = sdid.results_.zeta_omega / kwargs["zeta_omega"] + py_se = se_n * Y_scale + # Match R within cross-library FW tolerance (Rust vs R BLAS + # reductions differ at sub-ULP; 1e-8 absorbs that without + # masking a real divergence). + assert abs(py_se - r_se) < 1e-8, ( + f"Python placebo SE {py_se} != R {r_se} (delta {py_se - r_se})" + ) + # ============================================================================= # Edge Cases @@ -2886,7 +2970,21 @@ class TestScaleEquivariance: # Hard-coded baselines captured pre-fix on a well-scaled panel. If these # drift the fix is not a true no-op on normal data and review is warranted. _BASELINE = { - "placebo": (4.603349837478791, 0.29385822261006445, 0.004975124378109453, 200), + # placebo = R-default warm-start (PR follow-up to #349 R-parity + # work): per-draw FW is initialized with ``sum_normalize( + # unit_weights[pseudo_control_idx])`` for ω and with the + # fit-time ``time_weights`` for λ, matching R's + # ``vcov.R::placebo_se`` ``weights.boot$omega = sum_normalize( + # weights$omega[ind[1:N0_placebo]])`` warm-start. Drift from + # the cold-start capture (0.29385822261006445) is the same + # finite-iter convergence-pattern shift as the bootstrap warm- + # start landed in PR #349 — strict-convexity guarantees the + # converged answer is unique, but the 100-iter pre-sparsify + # pass produces different sparsification under uniform vs warm + # init on a handful of draws. Warm-start matches R at machine + # precision (test_placebo_se_matches_r in + # TestJackknifeSERParity). + "placebo": (4.603349837478791, 0.293840360160448, 0.004975124378109453, 200), # bootstrap = paper-faithful refit with R-default warm-start: FW is # initialized with ``sum_normalize(unit_weights[boot_control_idx])`` # for ω and with the fit-time ``time_weights`` for λ on each draw, From 09a3ef44472a5f069c083bfc2b674ff694830e04 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 08:43:28 -0400 Subject: [PATCH 2/4] =?UTF-8?q?Address=20PR=20#369=20R1=20P3:=20refresh=20?= =?UTF-8?q?placebo=20docstring=20+=20per-draw=20=CF=84=20regression?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - _placebo_variance_se docstring step 3 now describes the warm-start semantics (was still saying "uniform initialization, fresh start" after PR #369 landed the warm-start). Adds Parameters entries for init_omega / init_lambda / _placebo_indices. - test_placebo_se_matches_r now also asserts elementwise match between Python's placebo_effects and R_PLACEBO_TAUS from the fixture, so a permutation that diverged on a single draw but happened to leave sd() unchanged would still trip the regression. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/synthetic_did.py | 29 ++++++++++++++++++++++++++--- tests/test_methodology_sdid.py | 19 ++++++++++++++++++- 2 files changed, 44 insertions(+), 4 deletions(-) diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py index d6a74ab9..dc7ae03c 100644 --- a/diff_diff/synthetic_did.py +++ b/diff_diff/synthetic_did.py @@ -1709,9 +1709,17 @@ def _placebo_variance_se( 1. Randomly sample N₀ control indices (permutation) 2. Designate last N₁ as pseudo-treated, first (N₀-N₁) as pseudo-controls - 3. Re-estimate both omega and lambda on the permuted data (from - uniform initialization, fresh start), matching R's behavior where - ``update.omega=TRUE, update.lambda=TRUE`` are passed via ``opts`` + 3. Re-estimate both omega and lambda on the permuted data with + ``update.omega=TRUE, update.lambda=TRUE`` semantics. Per-draw FW + is warm-started from the fit-time weights — ω is initialized with + ``sum_normalize(init_omega[pseudo_control_idx])`` and λ is + initialized with ``init_lambda`` — matching R's + ``vcov.R::placebo_se`` ``weights.boot$omega = sum_normalize( + weights$omega[ind[1:N0_placebo]])`` warm-start. The global FW + optimum is init-independent (strict convexity), but the 100-iter + pre-sparsify pass converges to different sparsification patterns + under uniform vs warm init on a handful of draws — warm-start + closes the resulting sub-percent SE drift against R. 4. Compute SDID estimate with re-estimated weights 5. Repeat `replications` times 6. SE = sqrt((r-1)/r) * sd(estimates) @@ -1736,6 +1744,21 @@ def _placebo_variance_se( Convergence threshold for Frank-Wolfe (for re-estimation). replications : int, default=200 Number of placebo replications. + init_omega : np.ndarray, optional + Fit-time unit weights used to warm-start per-draw ω FW. + Subset to pseudo-controls and renormalized inside the loop; + mirrors R's ``weights.boot$omega = sum_normalize(weights$omega[ + ind[1:N0_placebo]])``. Cold-start (uniform init) when ``None``. + init_lambda : np.ndarray, optional + Fit-time time weights used to warm-start per-draw λ FW; + mirrors R passing ``weights.boot$lambda = weights$lambda`` + through. Cold-start when ``None``. + _placebo_indices : np.ndarray, optional + Private R-parity test seam. When provided, each row of shape + ``(replications, n_control)`` replaces the per-draw + ``rng.permutation(n_control)`` so a Python fit can consume + R's exact permutation sequence and produce a bit-identical SE + (see ``test_placebo_se_matches_r``). Returns ------- diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py index 42a55a95..e70bf0c7 100644 --- a/tests/test_methodology_sdid.py +++ b/tests/test_methodology_sdid.py @@ -1608,15 +1608,32 @@ def capture_then_call(*args, **kwargs): kwargs = dict(captured["kwargs"]) kwargs["replications"] = replications kwargs["_placebo_indices"] = r_perms - se_n, _ = sdid._placebo_variance_se(*captured["args"], **kwargs) + se_n, placebo_effects_n = sdid._placebo_variance_se( + *captured["args"], **kwargs + ) Y_scale = sdid.results_.zeta_omega / kwargs["zeta_omega"] py_se = se_n * Y_scale + py_taus = np.asarray(placebo_effects_n) * Y_scale # Match R within cross-library FW tolerance (Rust vs R BLAS # reductions differ at sub-ULP; 1e-8 absorbs that without # masking a real divergence). assert abs(py_se - r_se) < 1e-8, ( f"Python placebo SE {py_se} != R {r_se} (delta {py_se - r_se})" ) + # Per-draw τ regression: equal-SE doesn't imply equal sample, and + # the placebo τ vector is user-visible through ``placebo_effects`` + # and feeds the empirical placebo p-value (synthetic_did.py + # around L1164-L1170). Compare elementwise so a permutation that + # diverged at a single draw — but happened to leave sd() unchanged + # — still trips the regression. + r_taus = np.asarray(payload["R_PLACEBO_TAUS"], dtype=float) + assert r_taus.shape == py_taus.shape == (replications,), ( + f"shape mismatch: r {r_taus.shape}, py {py_taus.shape}" + ) + np.testing.assert_allclose( + py_taus, r_taus, atol=1e-8, rtol=1e-8, + err_msg="Per-draw placebo τ diverges from R despite SE match", + ) # ============================================================================= From fd96d08ad1c9325f516a9ee885a461cc337184f3 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 09:31:36 -0400 Subject: [PATCH 3/4] Repin TestScaleEquivariance[placebo] baseline to Linux/OpenBLAS value MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Swap the captured SE from 0.293840360160448 (macOS Apple Accelerate local) to 0.2938403592163006 (Linux OpenBLAS, ubuntu-24.04-arm CI runner). The warm-start landed in the prior commit threads ``unit_weights`` (a fit-time FW output that carries sub-ULP BLAS reduction-order divergence) into per-draw FW init; across 200 draws with path-dependent sparsification the SE diverges ~1e-9 between the two platforms — exceeding the existing ``1e-12`` bit-identity gate. CI is the gating surface. macOS local fits will now drift at ~1e-9 on this fixture; the inline comment documents that the delta is finite-iter FW path-dependence, not a numerical regression. ATT and empirical p-value are unchanged: ATT comes from the deterministic FW solve (platform-stable to <1e-14 on this panel) and the placebo p-value is integer-driven (1/(B+1) = 1/201 with no placebo τ exceeding |ATT|). Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/test_methodology_sdid.py | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py index e70bf0c7..c193849e 100644 --- a/tests/test_methodology_sdid.py +++ b/tests/test_methodology_sdid.py @@ -3000,8 +3000,16 @@ class TestScaleEquivariance: # pass produces different sparsification under uniform vs warm # init on a handful of draws. Warm-start matches R at machine # precision (test_placebo_se_matches_r in - # TestJackknifeSERParity). - "placebo": (4.603349837478791, 0.293840360160448, 0.004975124378109453, 200), + # TestJackknifeSERParity). Capture: Linux/OpenBLAS (CI runner) — + # warm-start carries ``unit_weights`` (fit-time FW output) into + # per-draw init, which is platform-divergent at sub-ULP from + # BLAS reduction order; across 200 draws with path-dependent + # sparsification the SE diverges ~1e-9 between Apple Accelerate + # (macOS local: 0.293840360160448) and OpenBLAS (Linux CI: + # 0.2938403592163006). Linux value pinned because CI is the + # gating surface; macOS local fits will drift at ~1e-9 — that + # delta is finite-iter FW path-dependence, not a numerical bug. + "placebo": (4.603349837478791, 0.2938403592163006, 0.004975124378109453, 200), # bootstrap = paper-faithful refit with R-default warm-start: FW is # initialized with ``sum_normalize(unit_weights[boot_control_idx])`` # for ω and with the fit-time ``time_weights`` for λ on each draw, From db377e6b2887a6076dfe252aaaa45b6e8ec52a76 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 10:12:21 -0400 Subject: [PATCH 4/4] Loosen TestScaleEquivariance[placebo] SE tolerance to rel=1e-7 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The placebo SE warm-start landed in PR #369 threads ``unit_weights`` (a fit-time FW output that carries sub-ULP BLAS reduction-order divergence) into each per-draw FW init. Across 200 placebo draws with path-dependent sparsification in the 100-iter pre-sparsify pass, that ULP-level input difference accumulates to ~1e-9 SE divergence between Apple Accelerate (macOS) and OpenBLAS (Linux). No single double satisfies both at the prior ``1e-12`` gate. The placebo row's SE assertion is loosened to ``rel=1e-7`` (drift detector, not bit-identity). Bootstrap and jackknife stay at ``rel=1e-14``: bootstrap dilutes the divergence by resampling from the full unit set with replacement; jackknife uses fixed weights and no FW re-estimation. Bit-identity protection for placebo moves to ``test_placebo_se_matches_r`` (``TestJackknifeSERParity``), which uses the ``_placebo_indices`` test seam to feed R's exact permutations through the same normalized inputs the dispatcher would, bypassing the platform-divergent fit-time path. That test asserts both aggregate SE (< 1e-8 vs R) and per-draw τ (< 1e-8 elementwise vs R), which is strictly stronger than the prior ``1e-12`` capture-vs-capture gate. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/test_methodology_sdid.py | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py index c193849e..5ea986ff 100644 --- a/tests/test_methodology_sdid.py +++ b/tests/test_methodology_sdid.py @@ -2995,20 +2995,19 @@ class TestScaleEquivariance: # weights$omega[ind[1:N0_placebo]])`` warm-start. Drift from # the cold-start capture (0.29385822261006445) is the same # finite-iter convergence-pattern shift as the bootstrap warm- - # start landed in PR #349 — strict-convexity guarantees the - # converged answer is unique, but the 100-iter pre-sparsify - # pass produces different sparsification under uniform vs warm - # init on a handful of draws. Warm-start matches R at machine + # start landed in PR #349. Warm-start matches R at machine # precision (test_placebo_se_matches_r in - # TestJackknifeSERParity). Capture: Linux/OpenBLAS (CI runner) — - # warm-start carries ``unit_weights`` (fit-time FW output) into - # per-draw init, which is platform-divergent at sub-ULP from - # BLAS reduction order; across 200 draws with path-dependent - # sparsification the SE diverges ~1e-9 between Apple Accelerate + # TestJackknifeSERParity). Captured value is platform-divergent + # at sub-ULP — warm-start carries ``unit_weights`` (fit-time + # FW output) into per-draw init, and BLAS reduction-order + # differences accumulate across 200 draws with path-dependent + # sparsification to ~1e-9 SE divergence between Apple Accelerate # (macOS local: 0.293840360160448) and OpenBLAS (Linux CI: - # 0.2938403592163006). Linux value pinned because CI is the - # gating surface; macOS local fits will drift at ~1e-9 — that - # delta is finite-iter FW path-dependence, not a numerical bug. + # 0.2938403592163006). The placebo row's SE assertion is + # therefore loosened to ``rel=1e-7`` below; bit-identity + # protection moves to ``test_placebo_se_matches_r``, which + # bypasses the platform-divergent fit-time path through a test + # seam. Pinned value is the Linux/OpenBLAS capture. "placebo": (4.603349837478791, 0.2938403592163006, 0.004975124378109453, 200), # bootstrap = paper-faithful refit with R-default warm-start: FW is # initialized with ``sum_normalize(unit_weights[boot_control_idx])`` @@ -3065,7 +3064,14 @@ def test_baseline_parity_small_scale(self, variance_method): warnings.simplefilter("ignore", UserWarning) r = self._fit(data, variance_method) assert r.att == pytest.approx(att0, rel=1e-14) - assert r.se == pytest.approx(se0, rel=1e-14) + # Placebo SE is warm-start-driven (PR #369) and the per-draw FW + # init carries platform-divergent ``unit_weights``; ~1e-9 macOS- + # vs-Linux drift on 200 draws makes bit-identity unattainable. + # Bit-identity protection moves to ``test_placebo_se_matches_r`` + # (TestJackknifeSERParity). Bootstrap and jackknife stay at the + # original gate; their paths don't accumulate the same way. + se_rel = 1e-7 if variance_method == "placebo" else 1e-14 + assert r.se == pytest.approx(se0, rel=se_rel) assert r.p_value == pytest.approx(p0, rel=1e-14) assert len(r.placebo_effects) == n0