Fix CHANGELOG: rename survey-bootstrap PR placeholder #352 to #355#361
Merged
Fix CHANGELOG: rename survey-bootstrap PR placeholder #352 to #355#361
Conversation
PR #355 (SDID survey-bootstrap restoration, merged 2026-04-24) was authored on branch sdid-bootstrap-survey using PR #352 as an internal placeholder; the placeholder was never updated at merge. The actual PR #352 is an unrelated PR (HAD Phase 3: pre-test diagnostics), so the release-notes section headers (`### Added (PR #352)` / `### Changed (PR #352)`) currently mislead readers into conflating two unrelated feature sets. This commit fixes the three CHANGELOG line-hits only. PR-number placeholders elsewhere in the codebase (code comments, REGISTRY methodology notes, survey-roadmap.md, tutorial 16, TODO, llms-full.txt) are accepted cosmetic drift — git blame is authoritative for authorship. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Overall assessment✅ Looks good Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
igerber
added a commit
that referenced
this pull request
Apr 24, 2026
…ck get_loo_effects_df on survey jackknife
P0 (Methodology — survey jackknife silently skipping undefined LOO):
The Rust & Rao (1996) stratified jackknife formula `SE² =
Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` requires every
PSU-LOO `τ̂_{(h,j)}` to be defined. The previous implementation
silently skipped PSUs whose deletion removed all treated units (or
zeroed control ω_eff mass, or raised in the estimator) while still
applying the full `(n_h-1)/n_h` factor, under-scaling variance on
designs where treated units pack into a single PSU.
Fix: `_jackknife_se_survey` now tracks any undefined replicate in a
contributing stratum (n_h ≥ 2) and short-circuits to `SE=NaN` with a
targeted `UserWarning` naming the stratum / PSU / reason (deletion
removes all treated, kept ω_eff zero, kept treated survey mass zero,
estimator raised, estimator returned non-finite). Partial LOOs are
still returned in `placebo_effects` for debugging; users needing a
variance estimator that accommodates PSU-deletion infeasibility
should use `variance_method="bootstrap"`. Silent stratum-level skip
for `n_h < 2` is preserved (canonical lonely-PSU handling matching
R `survey::svyjkn`).
New regression `test_jackknife_full_design_undefined_replicate_returns_nan`
exercises the fix on the original `sdid_survey_data_full_design`
fixture (treated all in stratum 0 PSU 0 → LOO PSU 0 removes all
treated) and asserts both the `UserWarning` match and `np.isnan(se)`.
The existing jackknife tests that asserted finite SE now use a new
`sdid_survey_data_jk_well_formed` fixture where treated units are
spread across two PSUs within stratum 0 (so every LOO leaves ≥1
treated). The self-consistency test
(`test_jackknife_full_design_stratum_aggregation_formula_magnitude`)
was rewritten from a flaccid finite-positive check to a real
recomputation of the Rust & Rao formula on the returned 6-entry
`placebo_effects` array, asserting `result.se == pytest.approx(
expected, rel=1e-12)`.
Coverage MC (`benchmarks/data/sdid_coverage.json`) is unchanged:
the `stratified_survey` DGP spreads its 32 treated units across
PSUs 2 and 3 within stratum 1 and PSUs 0 and 1 within stratum 0,
so every LOO is defined there too. The previously-reported jackknife
anti-conservatism (α=0.05 rejection = 0.45, SE/trueSD = 0.46) is
the documented few-PSU limitation (1 effective DoF per stratum
with `n_h = 2`), not the P0 silent-skip bug.
P1 (Code Quality — get_loo_effects_df on survey jackknife):
`SyntheticDiDResults.get_loo_effects_df()` assumes a length-N
unit-indexed `placebo_effects` array (first n_control are control-
LOO, next n_treated are treated-LOO). Survey-jackknife fits return
a flat PSU-level replicate array of variable length; joining onto
the fit-time `control_unit_ids + treated_unit_ids` would mislabel
PSU replicates as unit-level effects.
Fix: `get_loo_effects_df()` now raises `NotImplementedError` with a
targeted message pointing to `result.placebo_effects` for the raw
PSU-level array and REGISTRY §SyntheticDiD "Note (survey + jackknife
composition)" for the aggregation formula. New regression
`test_get_loo_effects_df_raises_on_survey_jackknife` asserts the
raise on a survey fit. Non-survey and pweight-only jackknife fits
continue to use `get_loo_effects_df()` as before (unit-level LOO).
P3 (Documentation — stale default variance_method note):
`docs/methodology/REGISTRY.md:L1569` default-variance-method note
rewritten to reflect that all three variance methods now support
full survey designs (removing "full design supported on bootstrap
only" language) and to recommend bootstrap specifically on surveys
with few PSUs per stratum.
Branch also rebased onto current origin/main to pick up PR #356
(agent-profile-panel) and PR #361 — the R1 Maintainability finding
about "unrelated API deletions" was a stale-base-drift artifact
(my branch was created before #356 merged). After rebase the diff
against main shows only SDID-survey changes.
Verification
------------
pytest tests/test_survey_phase5.py
tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact}
→ 87 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #352references inCHANGELOG.mdtoPR #355(the SDID survey-bootstrap restoration merged 2026-04-24).sdid-bootstrap-surveyusedPR #352as an internal placeholder; the placeholder was never updated at merge. The actual GitHub PR HAD Phase 3: pre-test diagnostics (qug_test, stute_test, yatchew_hr_test, composite workflow) #352 is an unrelated PR (HAD Phase 3: pre-test diagnostics), so the existing### Added (PR #352)/### Changed (PR #352)section headers mislead release-notes readers into conflating two unrelated feature sets.CHANGELOG.mdonly. The same placeholder appears in ~34 other line-hits acrossdiff_diff/synthetic_did.py,diff_diff/utils.py,docs/methodology/REGISTRY.md,docs/methodology/survey-theory.md,docs/survey-roadmap.md,docs/tutorials/16_survey_did.ipynb,TODO.md, anddiff_diff/guides/llms-full.txt— accepted as cosmetic drift. Git blame is authoritative for authorship; maintaining accurate PR references across the wider codebase creates the same placeholder-rot bug class this fix addresses.tests/anddocs/found zero stale pre-PR-Fix SyntheticDiD bootstrap p-value dispatch and SE formula #349 bootstrap baselines (p ≈ 0.49, SE0.158X) and zero stale pre-PR-Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355NotImplementedErrorclaims on SDID + survey + bootstrap paths. Regression tests are forward-looking; tutorial 18 prose is tolerance-guarded by the cell-26305 <= att <= 320assert; tutorial 16 cell 40 was updated in PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355 itself.Methodology references (required if estimator / math changes)
Validation
CHANGELOG.mdedit has no code impact.Security / privacy