Fix SyntheticDiD catastrophic cancellation at extreme Y scale#312
Merged
Fix SyntheticDiD catastrophic cancellation at extreme Y scale#312
Conversation
Normalize Y once in fit() using mean/std of Y_pre_control, thread
normalized Y (and zetas) through weight optimization, the estimator,
and variance procedures, then rescale tau/SE/CI/effect vectors back
to original units.
Mathematically a no-op - tau is location-invariant and scale-equivariant,
FW weights are invariant under (Y, zeta) -> (Y/s, zeta/s) - but prevents
~6-digit precision loss in the SDID double-difference when outcomes span
millions-to-billions. Pre-fix, p-values clustered near 0.5 regardless of
true effect size. Matches pre-existing behavior on normally-scaled data
to float epsilon.
Zetas and noise_level are reported to users on the original outcome
scale, preserving pre-fix behavior of in_time_placebo() and
sensitivity_to_zeta_omega() which operate on the stored original-scale
snapshot.
REGISTRY.md adds a Note documenting the internal normalization. New
TestScaleEquivariance class covers baseline parity, scale equivariance
across (a, b) in {(1e-6,0), (1,0), (1e6,1e9), (1e9,-1e6), (1,1e9)},
and detection of a known effect at Y~1e9 base scale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
igerber
added a commit
that referenced
this pull request
Apr 18, 2026
Absorbs #312 (sdid scale fix), #313 (roadmap refresh + dCDH docstring rewrite), and #314 (within_transform convergence warnings) from main. Conflicts resolved in: - diff_diff/chaisemartin_dhaultfoeuille.py: took main's comprehensive Phase 1-3 feature list in the class docstring but merged in the PR #311 group-vs-PSU bootstrap-clustering framing and the replicate-weight survey-support line. Kept the PR #311 'user-specified cluster= not supported + automatic PSU-level under survey_design' wording for the cluster= parameter docstring (strictly more accurate than main's 'always clusters at the group level' text). - diff_diff/guides/llms-full.txt: kept main's more detailed placebo SE contract paragraph (which already distinguishes single-period NaN from multi-horizon analytical/bootstrap) and appended the sup-t / shared-weights / cross-horizon coverage details from the PR #311 update. Kept the PR #311 survey_design signature comment that mentions TSL + replicate + PSU bootstrap. Full regression across touched areas: 336 + 324 passing.
igerber
added a commit
that referenced
this pull request
Apr 18, 2026
… remove deprecated SyntheticDiD params Package four merged PRs (#312 SDID catastrophic cancellation at extreme Y scale, #313 roadmap refresh, #314 FE imputation non-convergence signaling, #315 Frank-Wolfe SC weight solver non-convergence signaling) as 3.1.2. Also remove the SyntheticDiD(lambda_reg=...) and SyntheticDiD(zeta=...) kwargs, which have been deprecated with DeprecationWarning since v2.3.1 (2026-02-10) in favor of zeta_omega / zeta_lambda; their warning messages announced removal in v3.1. Passing the old kwargs now raises TypeError at __init__ and ValueError: Unknown parameter at set_params. Internal ridge-regression helpers that accept a lambda_reg parameter (compute_synthetic_weights, rank_control_units, Rust FFI bindings) are unaffected. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, and diff_diff/guides/llms-full.txt. CHANGELOG populated with Fixed / Changed / Removed sections and comparison-link footer. TODO.md's "Deprecated Code" entry removed now that the task is done. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 18, 2026
Per CI review feedback (#316): removing public kwargs under a patch version violates Semantic Versioning, which CHANGELOG.md explicitly claims to adhere to. Restore lambda_reg and zeta handling in SyntheticDiD.__init__ and set_params as warning-only, and bump the removal target in the DeprecationWarning text from "v3.1" to "v4.0.0". The 3.1.2 release now carries only the four fix/doc PRs (#312 SDID scale, #313 roadmap, #314 FE imputation convergence, #315 Frank-Wolfe convergence) with no breaking changes. - diff_diff/synthetic_did.py: restore deprecated kwargs + warnings (v4.0.0 text) - tests/test_methodology_sdid.py: restore TestDeprecatedParams class + set_params deprecation test - tests/test_estimators.py: restore test_deprecated_params - CHANGELOG.md: drop Removed section; add Changed entry documenting the v3.1 -> v4.0.0 bump in the removal target - TODO.md: restore Deprecated Code section with v4.0.0 removal target and SemVer rationale Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
PR #312 centered and scaled Y once in SyntheticDiD.fit() to avoid catastrophic cancellation in the SDID double-difference at extreme Y. The fix did not reach the two public diagnostic methods on SyntheticDiDResults: in_time_placebo() and sensitivity_to_zeta_omega() both re-run Frank-Wolfe on the original-scale fit snapshot with original-scale zetas, so at Y ~ 1e9 they reproduce the same silent failure the main path was fixed for. Capture Y_shift and Y_scale on the fit snapshot, apply the same (Y - shift) / scale normalization inside the two diagnostic methods, pass zeta / Y_scale and min_decrease derived from the normalized noise level to the FW weight solvers, and rescale att / pre_fit_rmse back to original-Y units before reporting. Unit-weight diagnostics (max_unit_weight, effective_n) are scale-invariant and reported without rescaling. Regression coverage: - TestDiagnosticScaleParity: att / pre_fit_rmse must scale by |a| across (Y -> a*Y + b) for both diagnostic methods; a no-effect DGP at Y ~ 1e9 must produce finite placebo atts landing within 5 * noise_level. - TestHeterogeneousAndRampingScale: cross-unit heterogeneous scale (units spanning 1e6 to 1e9) and cross-period ramping (trend growing 4 orders of magnitude across periods) must leave the fit and both diagnostic surfaces finite. These are the heterogeneity pathways the original TestScaleEquivariance (affine-only) suite did not cover. Covered by audit axis A (numerical precision / scale fragility). Findings D-4 and D-4b from docs/audits/silent-failures-findings.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
Extend PR #312 Y-normalization contract into SDID diagnostic methods
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SyntheticDiD.fit()usingmean/stdofY_pre_control; threads normalized Y and normalized zetas through weight optimization, the estimator, and all three variance procedures; then rescales τ/SE/CI/effect vectors back to original outcome units.(Y, ζ) → (Y/s, ζ/s)), but prevents ~6-digit precision loss in the SDID double-difference when outcomes span millions-to-billions. Pre-fix, p-values clustered near 0.5 regardless of true effect size. Matches pre-existing results on normally-scaled data to float epsilon (rel ≲ 1e-14).zeta_omega,zeta_lambda, andnoise_levelon the results object are reported on the user's original outcome scale, preserving the behavior ofin_time_placebo()andsensitivity_to_zeta_omega()which operate on the stored original-scale fit snapshot.Methodology references (required if estimator / math changes)
synthdid::vcov.synthdid; the R package has the same extreme-scale vulnerability (Sensitivity to the scaling of variables synth-inference/synthdid#71). Documented indocs/methodology/REGISTRY.mdunder the SyntheticDiD edge-case list with a**Note:**label.in_time_placebo()andsensitivity_to_zeta_omega()continue to run on original-scale snapshot and zetas (pre-fix behavior preserved). Documented in the same REGISTRY note as a tracked follow-up.Validation
tests/test_methodology_sdid.py::TestScaleEquivariance:test_baseline_parity_small_scale[placebo|bootstrap|jackknife]— hard-coded literal baselines asserted atrel=1e-8to catch any drift from the no-op guarantee.test_scale_equivariance[placebo|bootstrap|jackknife]— refits ona*Y + bfor(a, b) ∈ {(1e-6,0), (1,0), (1e6,1e9), (1e9,-1e6), (1,1e9)}and assertslen(placebo_effects)parity,att/a,se/|a|,p_value, andnoise_level/zeta_omegascaling.test_detects_true_effect_at_extreme_scale[...]— generates a panel withY ~ 1e9and known ATT = 5e6 (0.5% of baseline, below pre-fix 6-digit precision floor); asserts detectable z-stat and correct p-value.Security / privacy