Fix SyntheticDiD catastrophic cancellation at extreme Y scale by igerber · Pull Request #312 · igerber/diff-diff

igerber · 2026-04-18T12:59:58Z

Summary

Internally centers and scales Y once in SyntheticDiD.fit() using mean/std of Y_pre_control; threads normalized Y and normalized zetas through weight optimization, the estimator, and all three variance procedures; then rescales τ/SE/CI/effect vectors back to original outcome units.
Mathematically a no-op (τ is location-invariant and scale-equivariant; Frank-Wolfe argmin is invariant under (Y, ζ) → (Y/s, ζ/s)), but prevents ~6-digit precision loss in the SDID double-difference when outcomes span millions-to-billions. Pre-fix, p-values clustered near 0.5 regardless of true effect size. Matches pre-existing results on normally-scaled data to float epsilon (rel ≲ 1e-14).
zeta_omega, zeta_lambda, and noise_level on the results object are reported on the user's original outcome scale, preserving the behavior of in_time_placebo() and sensitivity_to_zeta_omega() which operate on the stored original-scale fit snapshot.

Methodology references (required if estimator / math changes)

Method: Synthetic Difference-in-Differences (Arkhangelsky et al., 2021) — Algorithms 3 (jackknife) and 4 (placebo), plus the bootstrap path matching synthdid::vcov.
Intentional deviation: internal Y normalization is not performed by R's synthdid; the R package has the same extreme-scale vulnerability (Sensitivity to the scaling of variables synth-inference/synthdid#71). Documented in docs/methodology/REGISTRY.md under the SyntheticDiD edge-case list with a **Note:** label.
Scope limitation: in_time_placebo() and sensitivity_to_zeta_omega() continue to run on original-scale snapshot and zetas (pre-fix behavior preserved). Documented in the same REGISTRY note as a tracked follow-up.

Validation

Tests added in tests/test_methodology_sdid.py::TestScaleEquivariance:
- test_baseline_parity_small_scale[placebo|bootstrap|jackknife] — hard-coded literal baselines asserted at rel=1e-8 to catch any drift from the no-op guarantee.
- test_scale_equivariance[placebo|bootstrap|jackknife] — refits on a*Y + b for (a, b) ∈ {(1e-6,0), (1,0), (1e6,1e9), (1e9,-1e6), (1,1e9)} and asserts len(placebo_effects) parity, att/a, se/|a|, p_value, and noise_level/zeta_omega scaling.
- test_detects_true_effect_at_extreme_scale[...] — generates a panel with Y ~ 1e9 and known ATT = 5e6 (0.5% of baseline, below pre-fix 6-digit precision floor); asserts detectable z-stat and correct p-value.
All 116 pre-existing SDID methodology tests pass unchanged — no tolerance relaxation.
156 SDID-related tests across methodology, survey phases 5/6, SE accuracy, and power all green; 203 additional estimator/alias/visualization tests also green.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Normalize Y once in fit() using mean/std of Y_pre_control, thread normalized Y (and zetas) through weight optimization, the estimator, and variance procedures, then rescale tau/SE/CI/effect vectors back to original units. Mathematically a no-op - tau is location-invariant and scale-equivariant, FW weights are invariant under (Y, zeta) -> (Y/s, zeta/s) - but prevents ~6-digit precision loss in the SDID double-difference when outcomes span millions-to-billions. Pre-fix, p-values clustered near 0.5 regardless of true effect size. Matches pre-existing behavior on normally-scaled data to float epsilon. Zetas and noise_level are reported to users on the original outcome scale, preserving pre-fix behavior of in_time_placebo() and sensitivity_to_zeta_omega() which operate on the stored original-scale snapshot. REGISTRY.md adds a Note documenting the internal normalization. New TestScaleEquivariance class covers baseline parity, scale equivariance across (a, b) in {(1e-6,0), (1,0), (1e6,1e9), (1e9,-1e6), (1,1e9)}, and detection of a known effect at Y~1e9 base scale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-18T13:07:39Z

Overall Assessment

✅ Looks good

Executive Summary

Affected methodology: SyntheticDiD.fit() plus all three variance paths (bootstrap, jackknife, placebo).
The internal Y normalization is explicitly documented as a deviation in docs/methodology/REGISTRY.md#L1518, so it is informational rather than a defect.
I did not find an unmitigated mismatch with the SDID method mapping: fixed-weight jackknife is preserved, placebo still re-estimates omega/lambda, and user-facing ATT/SE/CI/effect vectors are rescaled back to original units.
The unchanged original-scale behavior of in_time_placebo() and sensitivity_to_zeta_omega() is also documented, so it is not a blocker.
I could not execute the tests locally because this environment does not have numpy or pytest; the review below is based on static inspection.

Methodology

P3 Informational: The PR changes the SDID estimation path in diff_diff/synthetic_did.py#L419 and threads normalized Y/zeta_* through unit-weight optimization, time-weight optimization, ATT computation, and all three variance procedures before rescaling outputs at diff_diff/synthetic_did.py#L517, diff_diff/synthetic_did.py#L571, and diff_diff/synthetic_did.py#L688. Cross-checking against the SDID docstring references and the registry, I do not see an undocumented methodology deviation, incorrect variance/SE formula, or missing assumption check. Concrete fix: none.
P3 Informational: Algorithm 3/4 behavior is still intact. The jackknife path remains fixed-weight at diff_diff/synthetic_did.py#L590, and the placebo path still re-estimates both omega and lambda at diff_diff/synthetic_did.py#L605. Impact: no P0/P1 methodology issue found. Concrete fix: none.
P3 Informational: in_time_placebo() and sensitivity_to_zeta_omega() intentionally continue to use the original-scale fit snapshot in diff_diff/results.py#L1136 and diff_diff/results.py#L1313, matching the documented scope note in docs/methodology/REGISTRY.md#L1518. Impact: those diagnostics do not yet gain the new stabilization, but the limitation is documented and therefore non-blocking. Concrete fix: none required for approval.

Code Quality

No findings. The diff still routes inference through safe_inference at diff_diff/synthetic_did.py#L624, so it does not introduce the inline-inference or partial-NaN anti-patterns.

Performance

No findings. The temporary normalized matrices created at diff_diff/synthetic_did.py#L439 increase peak memory during fit(), but I do not see a material algorithmic regression in scope.

Maintainability

No findings. The implementation comment at diff_diff/synthetic_did.py#L419 and the registry note at docs/methodology/REGISTRY.md#L1518 make the deviation and its intended scope easy to audit.

Tech Debt

No findings. I did not identify new untracked deferred work in the changed code.

Security

No findings. The diff adds no new security surface, and I did not see secrets or PII in the changed files.

Documentation/Tests

P3 Informational: The added tests in tests/test_methodology_sdid.py#L2148 cover baseline parity, scale equivariance, and extreme-scale detectability across placebo/bootstrap/jackknife, which is the right test surface for this change. Concrete fix: none.
Reviewer note: test execution was not possible here because numpy and pytest are unavailable in the current environment.

Absorbs #312 (sdid scale fix), #313 (roadmap refresh + dCDH docstring rewrite), and #314 (within_transform convergence warnings) from main. Conflicts resolved in: - diff_diff/chaisemartin_dhaultfoeuille.py: took main's comprehensive Phase 1-3 feature list in the class docstring but merged in the PR #311 group-vs-PSU bootstrap-clustering framing and the replicate-weight survey-support line. Kept the PR #311 'user-specified cluster= not supported + automatic PSU-level under survey_design' wording for the cluster= parameter docstring (strictly more accurate than main's 'always clusters at the group level' text). - diff_diff/guides/llms-full.txt: kept main's more detailed placebo SE contract paragraph (which already distinguishes single-period NaN from multi-horizon analytical/bootstrap) and appended the sup-t / shared-weights / cross-horizon coverage details from the PR #311 update. Kept the PR #311 survey_design signature comment that mentions TSL + replicate + PSU bootstrap. Full regression across touched areas: 336 + 324 passing.

… remove deprecated SyntheticDiD params Package four merged PRs (#312 SDID catastrophic cancellation at extreme Y scale, #313 roadmap refresh, #314 FE imputation non-convergence signaling, #315 Frank-Wolfe SC weight solver non-convergence signaling) as 3.1.2. Also remove the SyntheticDiD(lambda_reg=...) and SyntheticDiD(zeta=...) kwargs, which have been deprecated with DeprecationWarning since v2.3.1 (2026-02-10) in favor of zeta_omega / zeta_lambda; their warning messages announced removal in v3.1. Passing the old kwargs now raises TypeError at __init__ and ValueError: Unknown parameter at set_params. Internal ridge-regression helpers that accept a lambda_reg parameter (compute_synthetic_weights, rank_control_units, Rust FFI bindings) are unaffected. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, and diff_diff/guides/llms-full.txt. CHANGELOG populated with Fixed / Changed / Removed sections and comparison-link footer. TODO.md's "Deprecated Code" entry removed now that the task is done. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per CI review feedback (#316): removing public kwargs under a patch version violates Semantic Versioning, which CHANGELOG.md explicitly claims to adhere to. Restore lambda_reg and zeta handling in SyntheticDiD.__init__ and set_params as warning-only, and bump the removal target in the DeprecationWarning text from "v3.1" to "v4.0.0". The 3.1.2 release now carries only the four fix/doc PRs (#312 SDID scale, #313 roadmap, #314 FE imputation convergence, #315 Frank-Wolfe convergence) with no breaking changes. - diff_diff/synthetic_did.py: restore deprecated kwargs + warnings (v4.0.0 text) - tests/test_methodology_sdid.py: restore TestDeprecatedParams class + set_params deprecation test - tests/test_estimators.py: restore test_deprecated_params - CHANGELOG.md: drop Removed section; add Changed entry documenting the v3.1 -> v4.0.0 bump in the removal target - TODO.md: restore Deprecated Code section with v4.0.0 removal target and SemVer rationale Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #312 centered and scaled Y once in SyntheticDiD.fit() to avoid catastrophic cancellation in the SDID double-difference at extreme Y. The fix did not reach the two public diagnostic methods on SyntheticDiDResults: in_time_placebo() and sensitivity_to_zeta_omega() both re-run Frank-Wolfe on the original-scale fit snapshot with original-scale zetas, so at Y ~ 1e9 they reproduce the same silent failure the main path was fixed for. Capture Y_shift and Y_scale on the fit snapshot, apply the same (Y - shift) / scale normalization inside the two diagnostic methods, pass zeta / Y_scale and min_decrease derived from the normalized noise level to the FW weight solvers, and rescale att / pre_fit_rmse back to original-Y units before reporting. Unit-weight diagnostics (max_unit_weight, effective_n) are scale-invariant and reported without rescaling. Regression coverage: - TestDiagnosticScaleParity: att / pre_fit_rmse must scale by |a| across (Y -> a*Y + b) for both diagnostic methods; a no-effect DGP at Y ~ 1e9 must produce finite placebo atts landing within 5 * noise_level. - TestHeterogeneousAndRampingScale: cross-unit heterogeneous scale (units spanning 1e6 to 1e9) and cross-period ramping (trend growing 4 orders of magnitude across periods) must leave the fit and both diagnostic surfaces finite. These are the heterogeneity pathways the original TestScaleEquivariance (affine-only) suite did not cover. Covered by audit axis A (numerical precision / scale fragility). Findings D-4 and D-4b from docs/audits/silent-failures-findings.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extend PR #312 Y-normalization contract into SDID diagnostic methods

igerber added the ready-for-ci Triggers CI test workflows label Apr 18, 2026

igerber merged commit f5b8c8a into main Apr 18, 2026
40 of 42 checks passed

igerber deleted the sdid-scale-fix branch April 18, 2026 14:48

igerber mentioned this pull request Apr 18, 2026

Release 3.1.2: SDID scale fix, convergence warnings, roadmap refresh #316

Merged

igerber mentioned this pull request Apr 18, 2026

Surface PowerAnalysis simulation-failure count and narrow except clause #326

Merged

igerber mentioned this pull request Apr 19, 2026

Extend PR #312 Y-normalization contract into SDID diagnostic methods #328

Merged

igerber added a commit that referenced this pull request Apr 19, 2026

Merge pull request #328 from igerber/fix/sdid-diagnostics-scale-parity

cd67d60

Extend PR #312 Y-normalization contract into SDID diagnostic methods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SyntheticDiD catastrophic cancellation at extreme Y scale#312

Fix SyntheticDiD catastrophic cancellation at extreme Y scale#312
igerber merged 1 commit intomainfrom
sdid-scale-fix

igerber commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 18, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant