Fix CHANGELOG: rename survey-bootstrap PR placeholder #352 to #355 by igerber · Pull Request #361 · igerber/diff-diff

igerber · 2026-04-24T18:46:45Z

Summary

Renames three PR #352 references in CHANGELOG.md to PR #355 (the SDID survey-bootstrap restoration merged 2026-04-24).
Background: authoring on branch sdid-bootstrap-survey used PR #352 as an internal placeholder; the placeholder was never updated at merge. The actual GitHub PR HAD Phase 3: pre-test diagnostics (qug_test, stute_test, yatchew_hr_test, composite workflow) #352 is an unrelated PR (HAD Phase 3: pre-test diagnostics), so the existing ### Added (PR #352) / ### Changed (PR #352) section headers mislead release-notes readers into conflating two unrelated feature sets.
Scope locked to CHANGELOG.md only. The same placeholder appears in ~34 other line-hits across diff_diff/synthetic_did.py, diff_diff/utils.py, docs/methodology/REGISTRY.md, docs/methodology/survey-theory.md, docs/survey-roadmap.md, docs/tutorials/16_survey_did.ipynb, TODO.md, and diff_diff/guides/llms-full.txt — accepted as cosmetic drift. Git blame is authoritative for authorship; maintaining accurate PR references across the wider codebase creates the same placeholder-rot bug class this fix addresses.
Audit side-product: grep sweep across tests/ and docs/ found zero stale pre-PR-Fix SyntheticDiD bootstrap p-value dispatch and SE formula #349 bootstrap baselines (p ≈ 0.49, SE 0.158X) and zero stale pre-PR-Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355 NotImplementedError claims on SDID + survey + bootstrap paths. Regression tests are forward-looking; tutorial 18 prose is tolerance-guarded by the cell-26 305 <= att <= 320 assert; tutorial 16 cell 40 was updated in PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355 itself.

Methodology references (required if estimator / math changes)

Method name(s): N/A — no methodology, code, or math changes. Documentation-only fix.
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: No test changes. Three-line CHANGELOG.md edit has no code impact.
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

PR #355 (SDID survey-bootstrap restoration, merged 2026-04-24) was authored on branch sdid-bootstrap-survey using PR #352 as an internal placeholder; the placeholder was never updated at merge. The actual PR #352 is an unrelated PR (HAD Phase 3: pre-test diagnostics), so the release-notes section headers (`### Added (PR #352)` / `### Changed (PR #352)`) currently mislead readers into conflating two unrelated feature sets. This commit fixes the three CHANGELOG line-hits only. PR-number placeholders elsewhere in the codebase (code comments, REGISTRY methodology notes, survey-roadmap.md, tutorial 16, TODO, llms-full.txt) are accepted cosmetic drift — git blame is authoritative for authorship. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-24T18:49:01Z

Overall assessment

✅ Looks good

Executive summary

Review scope is changelog-only: the diff renames three PR references in CHANGELOG.md:L32-L37 and does not touch executable code.
Affected methods: none. No estimator, math, weighting, variance/SE, identification assumption, or default behavior changed, so there are no in-scope P0/P1 methodology findings.
Cross-checking the referenced SDID survey-bootstrap registry note shows a remaining provenance mismatch: CHANGELOG.md now says PR #355, while docs/methodology/REGISTRY.md:L1562-L1566 and L1627-L1627 still say PR #352.
That mismatch is documentation-only and non-blocking (P3); no runtime, inference, performance, or security risk is introduced by this PR.

Methodology

No findings. Severity: —. Impact: Affected methods: none; CHANGELOG.md:L32-L37 only changes PR labels, and the SDID survey-bootstrap methodology text in docs/methodology/REGISTRY.md:L1562-L1572 is unchanged. Concrete fix: None.

Code Quality

No findings. Severity: —. Impact: No executable code, API surface, or inference path changed. Concrete fix: None.

Performance

No findings. Severity: —. Impact: Changelog-only edits do not affect runtime or memory behavior. Concrete fix: None.

Maintainability

No findings. Severity: —. Impact: No code-level maintainability issue was introduced. Concrete fix: None.

Tech Debt

No findings. Severity: —. Impact: The PR does not add implementation debt or require a TODO-tracked deferral. Concrete fix: None.

Security

No findings. Severity: —. Impact: No secrets, dependencies, auth paths, or data-handling surfaces changed. Concrete fix: None.

Documentation/Tests

Severity: P3. Impact: CHANGELOG.md:L32-L37 now attributes the SDID survey-bootstrap restoration to PR #355, but the linked methodology registry still uses PR #352 at docs/methodology/REGISTRY.md:L1562-L1566 and L1627-L1627, so repository docs now disagree on provenance. No runtime behavior or test coverage is affected. Concrete fix: Update the matching registry references to the same PR number as the changelog, or add an explicit note if the registry is intentionally preserving the older label.

…ck get_loo_effects_df on survey jackknife P0 (Methodology — survey jackknife silently skipping undefined LOO): The Rust & Rao (1996) stratified jackknife formula `SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` requires every PSU-LOO `τ̂_{(h,j)}` to be defined. The previous implementation silently skipped PSUs whose deletion removed all treated units (or zeroed control ω_eff mass, or raised in the estimator) while still applying the full `(n_h-1)/n_h` factor, under-scaling variance on designs where treated units pack into a single PSU. Fix: `_jackknife_se_survey` now tracks any undefined replicate in a contributing stratum (n_h ≥ 2) and short-circuits to `SE=NaN` with a targeted `UserWarning` naming the stratum / PSU / reason (deletion removes all treated, kept ω_eff zero, kept treated survey mass zero, estimator raised, estimator returned non-finite). Partial LOOs are still returned in `placebo_effects` for debugging; users needing a variance estimator that accommodates PSU-deletion infeasibility should use `variance_method="bootstrap"`. Silent stratum-level skip for `n_h < 2` is preserved (canonical lonely-PSU handling matching R `survey::svyjkn`). New regression `test_jackknife_full_design_undefined_replicate_returns_nan` exercises the fix on the original `sdid_survey_data_full_design` fixture (treated all in stratum 0 PSU 0 → LOO PSU 0 removes all treated) and asserts both the `UserWarning` match and `np.isnan(se)`. The existing jackknife tests that asserted finite SE now use a new `sdid_survey_data_jk_well_formed` fixture where treated units are spread across two PSUs within stratum 0 (so every LOO leaves ≥1 treated). The self-consistency test (`test_jackknife_full_design_stratum_aggregation_formula_magnitude`) was rewritten from a flaccid finite-positive check to a real recomputation of the Rust & Rao formula on the returned 6-entry `placebo_effects` array, asserting `result.se == pytest.approx( expected, rel=1e-12)`. Coverage MC (`benchmarks/data/sdid_coverage.json`) is unchanged: the `stratified_survey` DGP spreads its 32 treated units across PSUs 2 and 3 within stratum 1 and PSUs 0 and 1 within stratum 0, so every LOO is defined there too. The previously-reported jackknife anti-conservatism (α=0.05 rejection = 0.45, SE/trueSD = 0.46) is the documented few-PSU limitation (1 effective DoF per stratum with `n_h = 2`), not the P0 silent-skip bug. P1 (Code Quality — get_loo_effects_df on survey jackknife): `SyntheticDiDResults.get_loo_effects_df()` assumes a length-N unit-indexed `placebo_effects` array (first n_control are control- LOO, next n_treated are treated-LOO). Survey-jackknife fits return a flat PSU-level replicate array of variable length; joining onto the fit-time `control_unit_ids + treated_unit_ids` would mislabel PSU replicates as unit-level effects. Fix: `get_loo_effects_df()` now raises `NotImplementedError` with a targeted message pointing to `result.placebo_effects` for the raw PSU-level array and REGISTRY §SyntheticDiD "Note (survey + jackknife composition)" for the aggregation formula. New regression `test_get_loo_effects_df_raises_on_survey_jackknife` asserts the raise on a survey fit. Non-survey and pweight-only jackknife fits continue to use `get_loo_effects_df()` as before (unit-level LOO). P3 (Documentation — stale default variance_method note): `docs/methodology/REGISTRY.md:L1569` default-variance-method note rewritten to reflect that all three variance methods now support full survey designs (removing "full design supported on bootstrap only" language) and to recommend bootstrap specifically on surveys with few PSUs per stratum. Branch also rebased onto current origin/main to pick up PR #356 (agent-profile-panel) and PR #361 — the R1 Maintainability finding about "unrelated API deletions" was a stale-base-drift artifact (my branch was created before #356 merged). After rebase the diff against main shows only SDID-survey changes. Verification ------------ pytest tests/test_survey_phase5.py tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact} → 87 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 24, 2026

igerber merged commit e28080d into main Apr 24, 2026
2 of 3 checks passed

igerber deleted the sdid-bootstrap-changelog-pr-number-fix branch April 24, 2026 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CHANGELOG: rename survey-bootstrap PR placeholder #352 to #355#361

Fix CHANGELOG: rename survey-bootstrap PR placeholder #352 to #355#361
igerber merged 1 commit intomainfrom
sdid-bootstrap-changelog-pr-number-fix

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 24, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall assessment

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant