Skip to content

Fix SyntheticDiD catastrophic cancellation at extreme Y scale#312

Merged
igerber merged 1 commit intomainfrom
sdid-scale-fix
Apr 18, 2026
Merged

Fix SyntheticDiD catastrophic cancellation at extreme Y scale#312
igerber merged 1 commit intomainfrom
sdid-scale-fix

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 18, 2026

Summary

  • Internally centers and scales Y once in SyntheticDiD.fit() using mean/std of Y_pre_control; threads normalized Y and normalized zetas through weight optimization, the estimator, and all three variance procedures; then rescales τ/SE/CI/effect vectors back to original outcome units.
  • Mathematically a no-op (τ is location-invariant and scale-equivariant; Frank-Wolfe argmin is invariant under (Y, ζ) → (Y/s, ζ/s)), but prevents ~6-digit precision loss in the SDID double-difference when outcomes span millions-to-billions. Pre-fix, p-values clustered near 0.5 regardless of true effect size. Matches pre-existing results on normally-scaled data to float epsilon (rel ≲ 1e-14).
  • zeta_omega, zeta_lambda, and noise_level on the results object are reported on the user's original outcome scale, preserving the behavior of in_time_placebo() and sensitivity_to_zeta_omega() which operate on the stored original-scale fit snapshot.

Methodology references (required if estimator / math changes)

  • Method: Synthetic Difference-in-Differences (Arkhangelsky et al., 2021) — Algorithms 3 (jackknife) and 4 (placebo), plus the bootstrap path matching synthdid::vcov.
  • Intentional deviation: internal Y normalization is not performed by R's synthdid; the R package has the same extreme-scale vulnerability (Sensitivity to the scaling of variables synth-inference/synthdid#71). Documented in docs/methodology/REGISTRY.md under the SyntheticDiD edge-case list with a **Note:** label.
  • Scope limitation: in_time_placebo() and sensitivity_to_zeta_omega() continue to run on original-scale snapshot and zetas (pre-fix behavior preserved). Documented in the same REGISTRY note as a tracked follow-up.

Validation

  • Tests added in tests/test_methodology_sdid.py::TestScaleEquivariance:
    • test_baseline_parity_small_scale[placebo|bootstrap|jackknife] — hard-coded literal baselines asserted at rel=1e-8 to catch any drift from the no-op guarantee.
    • test_scale_equivariance[placebo|bootstrap|jackknife] — refits on a*Y + b for (a, b) ∈ {(1e-6,0), (1,0), (1e6,1e9), (1e9,-1e6), (1,1e9)} and asserts len(placebo_effects) parity, att/a, se/|a|, p_value, and noise_level/zeta_omega scaling.
    • test_detects_true_effect_at_extreme_scale[...] — generates a panel with Y ~ 1e9 and known ATT = 5e6 (0.5% of baseline, below pre-fix 6-digit precision floor); asserts detectable z-stat and correct p-value.
  • All 116 pre-existing SDID methodology tests pass unchanged — no tolerance relaxation.
  • 156 SDID-related tests across methodology, survey phases 5/6, SE accuracy, and power all green; 203 additional estimator/alias/visualization tests also green.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Normalize Y once in fit() using mean/std of Y_pre_control, thread
normalized Y (and zetas) through weight optimization, the estimator,
and variance procedures, then rescale tau/SE/CI/effect vectors back
to original units.

Mathematically a no-op - tau is location-invariant and scale-equivariant,
FW weights are invariant under (Y, zeta) -> (Y/s, zeta/s) - but prevents
~6-digit precision loss in the SDID double-difference when outcomes span
millions-to-billions. Pre-fix, p-values clustered near 0.5 regardless of
true effect size. Matches pre-existing behavior on normally-scaled data
to float epsilon.

Zetas and noise_level are reported to users on the original outcome
scale, preserving pre-fix behavior of in_time_placebo() and
sensitivity_to_zeta_omega() which operate on the stored original-scale
snapshot.

REGISTRY.md adds a Note documenting the internal normalization. New
TestScaleEquivariance class covers baseline parity, scale equivariance
across (a, b) in {(1e-6,0), (1,0), (1e6,1e9), (1e9,-1e6), (1,1e9)},
and detection of a known effect at Y~1e9 base scale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Executive Summary

  • Affected methodology: SyntheticDiD.fit() plus all three variance paths (bootstrap, jackknife, placebo).
  • The internal Y normalization is explicitly documented as a deviation in docs/methodology/REGISTRY.md#L1518, so it is informational rather than a defect.
  • I did not find an unmitigated mismatch with the SDID method mapping: fixed-weight jackknife is preserved, placebo still re-estimates omega/lambda, and user-facing ATT/SE/CI/effect vectors are rescaled back to original units.
  • The unchanged original-scale behavior of in_time_placebo() and sensitivity_to_zeta_omega() is also documented, so it is not a blocker.
  • I could not execute the tests locally because this environment does not have numpy or pytest; the review below is based on static inspection.

Methodology

Code Quality

  • No findings. The diff still routes inference through safe_inference at diff_diff/synthetic_did.py#L624, so it does not introduce the inline-inference or partial-NaN anti-patterns.

Performance

  • No findings. The temporary normalized matrices created at diff_diff/synthetic_did.py#L439 increase peak memory during fit(), but I do not see a material algorithmic regression in scope.

Maintainability

Tech Debt

  • No findings. I did not identify new untracked deferred work in the changed code.

Security

  • No findings. The diff adds no new security surface, and I did not see secrets or PII in the changed files.

Documentation/Tests

  • P3 Informational: The added tests in tests/test_methodology_sdid.py#L2148 cover baseline parity, scale equivariance, and extreme-scale detectability across placebo/bootstrap/jackknife, which is the right test surface for this change. Concrete fix: none.
  • Reviewer note: test execution was not possible here because numpy and pytest are unavailable in the current environment.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 18, 2026
@igerber igerber merged commit f5b8c8a into main Apr 18, 2026
40 of 42 checks passed
@igerber igerber deleted the sdid-scale-fix branch April 18, 2026 14:48
igerber added a commit that referenced this pull request Apr 18, 2026
Absorbs #312 (sdid scale fix), #313 (roadmap refresh + dCDH docstring
rewrite), and #314 (within_transform convergence warnings) from main.

Conflicts resolved in:
- diff_diff/chaisemartin_dhaultfoeuille.py: took main's comprehensive
  Phase 1-3 feature list in the class docstring but merged in the
  PR #311 group-vs-PSU bootstrap-clustering framing and the
  replicate-weight survey-support line. Kept the PR #311
  'user-specified cluster= not supported + automatic PSU-level under
  survey_design' wording for the cluster= parameter docstring (strictly
  more accurate than main's 'always clusters at the group level' text).
- diff_diff/guides/llms-full.txt: kept main's more detailed placebo SE
  contract paragraph (which already distinguishes single-period NaN
  from multi-horizon analytical/bootstrap) and appended the sup-t /
  shared-weights / cross-horizon coverage details from the PR #311
  update. Kept the PR #311 survey_design signature comment that
  mentions TSL + replicate + PSU bootstrap.

Full regression across touched areas: 336 + 324 passing.
igerber added a commit that referenced this pull request Apr 18, 2026
… remove deprecated SyntheticDiD params

Package four merged PRs (#312 SDID catastrophic cancellation at extreme Y scale,
#313 roadmap refresh, #314 FE imputation non-convergence signaling, #315 Frank-Wolfe
SC weight solver non-convergence signaling) as 3.1.2. Also remove the
SyntheticDiD(lambda_reg=...) and SyntheticDiD(zeta=...) kwargs, which have been
deprecated with DeprecationWarning since v2.3.1 (2026-02-10) in favor of
zeta_omega / zeta_lambda; their warning messages announced removal in v3.1. Passing
the old kwargs now raises TypeError at __init__ and ValueError: Unknown parameter
at set_params. Internal ridge-regression helpers that accept a lambda_reg parameter
(compute_synthetic_weights, rank_control_units, Rust FFI bindings) are unaffected.

Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml,
and diff_diff/guides/llms-full.txt. CHANGELOG populated with Fixed / Changed /
Removed sections and comparison-link footer. TODO.md's "Deprecated Code" entry
removed now that the task is done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 18, 2026
Per CI review feedback (#316): removing public kwargs under a patch version
violates Semantic Versioning, which CHANGELOG.md explicitly claims to adhere to.
Restore lambda_reg and zeta handling in SyntheticDiD.__init__ and set_params
as warning-only, and bump the removal target in the DeprecationWarning text
from "v3.1" to "v4.0.0". The 3.1.2 release now carries only the four fix/doc
PRs (#312 SDID scale, #313 roadmap, #314 FE imputation convergence, #315
Frank-Wolfe convergence) with no breaking changes.

- diff_diff/synthetic_did.py: restore deprecated kwargs + warnings (v4.0.0 text)
- tests/test_methodology_sdid.py: restore TestDeprecatedParams class + set_params deprecation test
- tests/test_estimators.py: restore test_deprecated_params
- CHANGELOG.md: drop Removed section; add Changed entry documenting the v3.1 -> v4.0.0 bump in the removal target
- TODO.md: restore Deprecated Code section with v4.0.0 removal target and SemVer rationale

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
PR #312 centered and scaled Y once in SyntheticDiD.fit() to avoid
catastrophic cancellation in the SDID double-difference at extreme Y.
The fix did not reach the two public diagnostic methods on
SyntheticDiDResults: in_time_placebo() and sensitivity_to_zeta_omega()
both re-run Frank-Wolfe on the original-scale fit snapshot with
original-scale zetas, so at Y ~ 1e9 they reproduce the same silent
failure the main path was fixed for.

Capture Y_shift and Y_scale on the fit snapshot, apply the same
(Y - shift) / scale normalization inside the two diagnostic methods,
pass zeta / Y_scale and min_decrease derived from the normalized
noise level to the FW weight solvers, and rescale att / pre_fit_rmse
back to original-Y units before reporting. Unit-weight diagnostics
(max_unit_weight, effective_n) are scale-invariant and reported
without rescaling.

Regression coverage:
- TestDiagnosticScaleParity: att / pre_fit_rmse must scale by |a|
  across (Y -> a*Y + b) for both diagnostic methods; a no-effect
  DGP at Y ~ 1e9 must produce finite placebo atts landing within
  5 * noise_level.
- TestHeterogeneousAndRampingScale: cross-unit heterogeneous scale
  (units spanning 1e6 to 1e9) and cross-period ramping (trend
  growing 4 orders of magnitude across periods) must leave the fit
  and both diagnostic surfaces finite. These are the heterogeneity
  pathways the original TestScaleEquivariance (affine-only) suite
  did not cover.

Covered by audit axis A (numerical precision / scale fragility).
Findings D-4 and D-4b from docs/audits/silent-failures-findings.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
Extend PR #312 Y-normalization contract into SDID diagnostic methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant