Skip to content

Fix CHANGELOG: rename survey-bootstrap PR placeholder #352 to #355#361

Merged
igerber merged 1 commit intomainfrom
sdid-bootstrap-changelog-pr-number-fix
Apr 24, 2026
Merged

Fix CHANGELOG: rename survey-bootstrap PR placeholder #352 to #355#361
igerber merged 1 commit intomainfrom
sdid-bootstrap-changelog-pr-number-fix

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 24, 2026

Summary

  • Renames three PR #352 references in CHANGELOG.md to PR #355 (the SDID survey-bootstrap restoration merged 2026-04-24).
  • Background: authoring on branch sdid-bootstrap-survey used PR #352 as an internal placeholder; the placeholder was never updated at merge. The actual GitHub PR HAD Phase 3: pre-test diagnostics (qug_test, stute_test, yatchew_hr_test, composite workflow) #352 is an unrelated PR (HAD Phase 3: pre-test diagnostics), so the existing ### Added (PR #352) / ### Changed (PR #352) section headers mislead release-notes readers into conflating two unrelated feature sets.
  • Scope locked to CHANGELOG.md only. The same placeholder appears in ~34 other line-hits across diff_diff/synthetic_did.py, diff_diff/utils.py, docs/methodology/REGISTRY.md, docs/methodology/survey-theory.md, docs/survey-roadmap.md, docs/tutorials/16_survey_did.ipynb, TODO.md, and diff_diff/guides/llms-full.txt — accepted as cosmetic drift. Git blame is authoritative for authorship; maintaining accurate PR references across the wider codebase creates the same placeholder-rot bug class this fix addresses.
  • Audit side-product: grep sweep across tests/ and docs/ found zero stale pre-PR-Fix SyntheticDiD bootstrap p-value dispatch and SE formula #349 bootstrap baselines (p ≈ 0.49, SE 0.158X) and zero stale pre-PR-Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355 NotImplementedError claims on SDID + survey + bootstrap paths. Regression tests are forward-looking; tutorial 18 prose is tolerance-guarded by the cell-26 305 <= att <= 320 assert; tutorial 16 cell 40 was updated in PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355 itself.

Methodology references (required if estimator / math changes)

  • Method name(s): N/A — no methodology, code, or math changes. Documentation-only fix.
  • Paper / source link(s): N/A
  • Any intentional deviations from the source (and why): None

Validation

  • Tests added/updated: No test changes. Three-line CHANGELOG.md edit has no code impact.
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

PR #355 (SDID survey-bootstrap restoration, merged 2026-04-24) was
authored on branch sdid-bootstrap-survey using PR #352 as an internal
placeholder; the placeholder was never updated at merge. The actual
PR #352 is an unrelated PR (HAD Phase 3: pre-test diagnostics), so the
release-notes section headers (`### Added (PR #352)` / `### Changed
(PR #352)`) currently mislead readers into conflating two unrelated
feature sets.

This commit fixes the three CHANGELOG line-hits only. PR-number
placeholders elsewhere in the codebase (code comments, REGISTRY
methodology notes, survey-roadmap.md, tutorial 16, TODO, llms-full.txt)
are accepted cosmetic drift — git blame is authoritative for authorship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment

✅ Looks good

Executive summary

  • Review scope is changelog-only: the diff renames three PR references in CHANGELOG.md:L32-L37 and does not touch executable code.
  • Affected methods: none. No estimator, math, weighting, variance/SE, identification assumption, or default behavior changed, so there are no in-scope P0/P1 methodology findings.
  • Cross-checking the referenced SDID survey-bootstrap registry note shows a remaining provenance mismatch: CHANGELOG.md now says PR #355, while docs/methodology/REGISTRY.md:L1562-L1566 and L1627-L1627 still say PR #352.
  • That mismatch is documentation-only and non-blocking (P3); no runtime, inference, performance, or security risk is introduced by this PR.

Methodology

  • No findings. Severity: —. Impact: Affected methods: none; CHANGELOG.md:L32-L37 only changes PR labels, and the SDID survey-bootstrap methodology text in docs/methodology/REGISTRY.md:L1562-L1572 is unchanged. Concrete fix: None.

Code Quality

  • No findings. Severity: —. Impact: No executable code, API surface, or inference path changed. Concrete fix: None.

Performance

  • No findings. Severity: —. Impact: Changelog-only edits do not affect runtime or memory behavior. Concrete fix: None.

Maintainability

  • No findings. Severity: —. Impact: No code-level maintainability issue was introduced. Concrete fix: None.

Tech Debt

  • No findings. Severity: —. Impact: The PR does not add implementation debt or require a TODO-tracked deferral. Concrete fix: None.

Security

  • No findings. Severity: —. Impact: No secrets, dependencies, auth paths, or data-handling surfaces changed. Concrete fix: None.

Documentation/Tests

  • Severity: P3. Impact: CHANGELOG.md:L32-L37 now attributes the SDID survey-bootstrap restoration to PR #355, but the linked methodology registry still uses PR #352 at docs/methodology/REGISTRY.md:L1562-L1566 and L1627-L1627, so repository docs now disagree on provenance. No runtime behavior or test coverage is affected. Concrete fix: Update the matching registry references to the same PR number as the changelog, or add an explicit note if the registry is intentionally preserving the older label.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 24, 2026
@igerber igerber merged commit e28080d into main Apr 24, 2026
2 of 3 checks passed
@igerber igerber deleted the sdid-bootstrap-changelog-pr-number-fix branch April 24, 2026 19:56
igerber added a commit that referenced this pull request Apr 24, 2026
…ck get_loo_effects_df on survey jackknife

P0 (Methodology — survey jackknife silently skipping undefined LOO):
The Rust & Rao (1996) stratified jackknife formula `SE² =
Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` requires every
PSU-LOO `τ̂_{(h,j)}` to be defined. The previous implementation
silently skipped PSUs whose deletion removed all treated units (or
zeroed control ω_eff mass, or raised in the estimator) while still
applying the full `(n_h-1)/n_h` factor, under-scaling variance on
designs where treated units pack into a single PSU.

Fix: `_jackknife_se_survey` now tracks any undefined replicate in a
contributing stratum (n_h ≥ 2) and short-circuits to `SE=NaN` with a
targeted `UserWarning` naming the stratum / PSU / reason (deletion
removes all treated, kept ω_eff zero, kept treated survey mass zero,
estimator raised, estimator returned non-finite). Partial LOOs are
still returned in `placebo_effects` for debugging; users needing a
variance estimator that accommodates PSU-deletion infeasibility
should use `variance_method="bootstrap"`. Silent stratum-level skip
for `n_h < 2` is preserved (canonical lonely-PSU handling matching
R `survey::svyjkn`).

New regression `test_jackknife_full_design_undefined_replicate_returns_nan`
exercises the fix on the original `sdid_survey_data_full_design`
fixture (treated all in stratum 0 PSU 0 → LOO PSU 0 removes all
treated) and asserts both the `UserWarning` match and `np.isnan(se)`.

The existing jackknife tests that asserted finite SE now use a new
`sdid_survey_data_jk_well_formed` fixture where treated units are
spread across two PSUs within stratum 0 (so every LOO leaves ≥1
treated). The self-consistency test
(`test_jackknife_full_design_stratum_aggregation_formula_magnitude`)
was rewritten from a flaccid finite-positive check to a real
recomputation of the Rust & Rao formula on the returned 6-entry
`placebo_effects` array, asserting `result.se == pytest.approx(
expected, rel=1e-12)`.

Coverage MC (`benchmarks/data/sdid_coverage.json`) is unchanged:
the `stratified_survey` DGP spreads its 32 treated units across
PSUs 2 and 3 within stratum 1 and PSUs 0 and 1 within stratum 0,
so every LOO is defined there too. The previously-reported jackknife
anti-conservatism (α=0.05 rejection = 0.45, SE/trueSD = 0.46) is
the documented few-PSU limitation (1 effective DoF per stratum
with `n_h = 2`), not the P0 silent-skip bug.

P1 (Code Quality — get_loo_effects_df on survey jackknife):
`SyntheticDiDResults.get_loo_effects_df()` assumes a length-N
unit-indexed `placebo_effects` array (first n_control are control-
LOO, next n_treated are treated-LOO). Survey-jackknife fits return
a flat PSU-level replicate array of variable length; joining onto
the fit-time `control_unit_ids + treated_unit_ids` would mislabel
PSU replicates as unit-level effects.

Fix: `get_loo_effects_df()` now raises `NotImplementedError` with a
targeted message pointing to `result.placebo_effects` for the raw
PSU-level array and REGISTRY §SyntheticDiD "Note (survey + jackknife
composition)" for the aggregation formula. New regression
`test_get_loo_effects_df_raises_on_survey_jackknife` asserts the
raise on a survey fit. Non-survey and pweight-only jackknife fits
continue to use `get_loo_effects_df()` as before (unit-level LOO).

P3 (Documentation — stale default variance_method note):
`docs/methodology/REGISTRY.md:L1569` default-variance-method note
rewritten to reflect that all three variance methods now support
full survey designs (removing "full design supported on bootstrap
only" language) and to recommend bootstrap specifically on surveys
with few PSUs per stratum.

Branch also rebased onto current origin/main to pick up PR #356
(agent-profile-panel) and PR #361 — the R1 Maintainability finding
about "unrelated API deletions" was a stale-base-drift artifact
(my branch was created before #356 merged). After rebase the diff
against main shows only SDID-survey changes.

Verification
------------
pytest tests/test_survey_phase5.py
       tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact}
→ 87 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant