Revise survey theory doc for accuracy and precision by igerber · Pull Request #278 · igerber/diff-diff

igerber · 2026-04-06T21:14:12Z

Summary

Soften overclaiming language in Sections 1.1-1.5 based on external review and primary source verification
Add "design-based" terminological disambiguation (survey sampling design vs treatment-assignment design)
Fix software gap claims: acknowledge did/csdid/did_multiplegt_dyn cluster support while clarifying the real gap (strata + PSU + FPC jointly)
Fix DEFF terminology inconsistency: Section 7 formula is Kish weighting effect (deff_w), not full design effect
Rename "Horvitz-Thompson consistency" to "Design consistency" (formula is Hájek/self-normalized)
Add Ye et al. (2025) and Athey & Imbens (2022) references; update cross-references in survey-roadmap.md
Reframe "non-informative sampling" to "Estimand dependence on weights" (removes model-based concept from design-based context)
Clarify weight normalization semantics (raw vs normalized) and N vs N_hat in Section 4.3
Add RCS vs panel note and Stata singleunit(centered) alignment

Methodology references (required if estimator / math changes)

Method name(s): N/A — documentation-only changes, no code or equations modified
Paper / source link(s): Athey & Imbens (2022, JoE 226(1)); Ye, Bilinski & Lee (2025, HSORM); Roth et al. (2023, JoE 235(2)) verified via primary text
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: No test changes (documentation only)
Backtest / simulation / notebook evidence (if applicable): N/A
All claims verified against primary sources: Roth et al. (2023) full PDF read (44 pages), Ye et al. (2025) PMC full text, R did CRAN docs + source code, Stata csdid help file

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Soften overclaiming language, fix terminology inconsistencies, and narrow novelty claims based on external review and primary source verification. Core theory (Sections 4-6) unchanged. Key changes: - Disambiguate "design-based" (survey sampling vs treatment assignment) - Acknowledge did/csdid/did_multiplegt_dyn cluster support while clarifying the real gap (strata + PSU + FPC jointly) - Fix DEFF terminology (Kish weighting effect vs full design effect) - Rename Horvitz-Thompson to Design consistency (Hájek estimator) - Add Ye et al. (2025) and Athey & Imbens (2022) references - Reframe "non-informative sampling" paragraph - Clarify weight normalization semantics and N vs N_hat - Add RCS vs panel note and Stata singleunit(centered) alignment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-06T21:20:32Z

Overall Assessment

✅ Looks good

Executive Summary

No unmitigated P0/P1 findings. The PR is docs-only, and I did not find a changed claim that implies a code-side methodology regression in the touched surface.
One changed methodology sentence misstates the source trail: docs/methodology/survey-theory.md#L68 says Callaway & Sant’Anna (2021) cite Athey & Imbens (2022), but the paper discusses the 2018 working-paper version. (sciencedirect.com)
The newly added Ye/Bilinski/Lee reference in docs/methodology/survey-theory.md#L777 has the wrong DOI. (pubmed.ncbi.nlm.nih.gov)
The new software-gap wording in docs/methodology/survey-theory.md#L104 overstates didimputation’s weight support; the official did_imputation() interface includes a wname estimation-weight argument. (rdrr.io)
The new deff_w terminology is conceptually better, but the prose now implies a SurveyMetadata field name the code does not expose; the current API is still survey_metadata.design_effect in diff_diff/survey.py#L708.

Methodology

Severity: P2 Impact: docs/methodology/survey-theory.md#L68 says Callaway & Sant’Anna acknowledge design-based inference by citing Athey & Imbens (2022). The Callaway & Sant’Anna paper actually discusses Athey & Imbens as a 2018 working paper; 2022 is the later Journal of Econometrics publication year. In a methodology doc, that is a source-attribution error. Concrete fix: change the prose to “Athey & Imbens (2018; later published in 2022)” or equivalent. (sciencedirect.com)
Severity: P2 Impact: The new reference at docs/methodology/survey-theory.md#L777 lists DOI 10.1007/s10742-025-00366-3, but the Ye/Bilinski/Lee article is 10.1007/s10742-025-00364-7. Broken citation metadata undermines the “primary source verified” framing. Concrete fix: update the DOI and align the reference with the published Springer/PubMed record. (pubmed.ncbi.nlm.nih.gov)
Severity: P2 Impact: The new sentence at docs/methodology/survey-theory.md#L104 says didimputation does not accept probability weights at all, but the official did_imputation() docs expose wname for estimation weights. Even if the intended point is “no survey-design variance / no documented strata-PSU-FPC support,” the current wording overstates the gap. Concrete fix: qualify the sentence accordingly instead of saying the package does not accept weights at all. (rdrr.io)

Code Quality

No findings in the changed surface.

Performance

No findings in the changed surface.

Maintainability

No findings in the changed surface.

Tech Debt

No findings in the changed surface.

Security

No findings.

Documentation/Tests

Severity: P2 Impact: docs/methodology/survey-theory.md#L647 says diff-diff reports deff_w in SurveyMetadata, but the public attribute is still design_effect in diff_diff/survey.py#L708, summaries print “Kish DEFF (weights)” from diff_diff/results.py#L42, and the registry documents that existing label at docs/methodology/REGISTRY.md#L2445. Readers following the new prose will look for a field that does not exist. Concrete fix: either update the prose to “SurveyMetadata.design_effect (the Kish deff_w)” or add a backward-compatible alias if an API rename is intended.

- Fix Athey & Imbens citation: C&S 2021 cites the 2018 working paper, not the 2022 publication directly - Fix Ye et al. DOI: correct to 10.1007/s10742-025-00364-7 - Acknowledge didimputation accepts estimation weights via wname - Align deff_w prose with actual API field SurveyMetadata.design_effect Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-06T21:24:55Z

/ai-review

github-actions · 2026-04-06T21:31:45Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3a5d1d92e125225eed18d8f06b79b8e5c52e575d

Overall Assessment

✅ Looks good

No unmitigated P0 or P1 findings in the changed surface.

Executive Summary

Re-review outcome: the earlier external-source accuracy issues around the Athey/Imbens date trail, didimputation weight support, and the Ye/Bilinski/Lee DOI are addressed in the revised text. (nber.org)
Re-review outcome: the earlier SurveyMetadata.design_effect naming mismatch is also resolved; the revised prose now matches the current API, registry wording, and summary display at docs/methodology/survey-theory.md:648-649, docs/methodology/REGISTRY.md:2445-2447, diff_diff/survey.py:692-708, and diff_diff/results.py:42.
Remaining issues are documentation-accuracy issues, not code-side methodology defects: two P2s and one P3.
No estimator code, variance code, weighting code, or defaults were changed in this PR.

Methodology

Severity: P2 Impact: docs/survey-roadmap.md:122-123 now says the Hájek/design-weighted estimator is design-consistent for a “superpopulation quantity”, but the revised theory doc frames this section in finite-population/design-based terms at docs/methodology/survey-theory.md:227-235 and docs/methodology/survey-theory.md:628-636. That leaves the roadmap summarizing a different target than the main methodology document. Concrete fix: change “superpopulation quantity” to “finite-population/design-based quantity” or otherwise align the roadmap sentence with Section 3 of the theory doc.
Severity: P2 Impact: docs/methodology/survey-theory.md:381-384 says the implementation uses N_hat = sum_i w_i, “the sum of sampling weights”. But the same doc now explains that normalized pweights no longer represent absolute population counts at docs/methodology/survey-theory.md:174-180, and the code normalizes non-replicate pweight/aweight inputs to mean 1 before fit at diff_diff/survey.py:189-200; the IF paths then use the sum of those analysis weights at diff_diff/efficient_did.py:1129-1132 and diff_diff/triple_diff.py:1222-1227. As written, Section 4.3 overstates what the implementation literally uses. Concrete fix: distinguish raw sampling weights from normalized analysis weights, or state explicitly that the implementation uses the sum of analysis weights after normalization and that the scaling is variance-equivalent.

Code Quality

No findings in the changed surface.

Performance

No findings in the changed surface.

Maintainability

No findings in the changed surface.

Tech Debt

No findings in the changed surface. No new TODO.md tracking looks necessary for these docs-only issues.

Security

No findings.

Documentation/Tests

Severity: P3 Impact: docs/methodology/survey-theory.md:102-106 contains a double negative: “Neither eventstudyinteract ... does not accept probability weights.” The intended meaning is clear from context, but the sentence is grammatically ambiguous in a section that is otherwise trying to be more precise. Concrete fix: rewrite as either “eventstudyinteract does not accept probability weights” or “Nor does eventstudyinteract accept probability weights.”
No test gap finding for a docs-only PR.

- Align roadmap: "superpopulation quantity" → "finite-population quantity" - Fix double negative in didimputation/eventstudyinteract sentence - Clarify N_hat: after normalization sum(w_i) = n; scaling is variance-equivalent because only relative weights affect sandwich meat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-06T21:33:33Z

/ai-review

github-actions · 2026-04-06T21:40:17Z

🔁 AI review rerun (requested by @igerber)

Head SHA: aace57b7664eeaa22b1f7ff00e59e316c8730fca

Overall Assessment

Looks good. No unmitigated P0 or P1 findings in the changed surface.

Executive Summary

Re-review scope: the earlier roadmap estimand mismatch, weight-normalization wording mismatch, and eventstudyinteract double negative are addressed at docs/survey-roadmap.md#L122, docs/methodology/survey-theory.md#L177, docs/methodology/survey-theory.md#L382, and docs/methodology/survey-theory.md#L104.
No estimator code, variance code, weighting logic, or default behavior changed in this PR.
The new terminology note separating survey-sampling “design-based” from treatment-assignment “design-based” is aligned with the cited literature. (econpapers.repec.org)
One low-severity documentation-precision issue remains: docs/methodology/survey-theory.md#L99 says csdid accepts pweight, but the current csdid help/ado still document iw syntax and only say those weights are treated internally as pweights; the wboot and clustering part of the sentence is supported. (github.com)

Methodology

No findings. The revised Hájek/design-consistency wording, normalized-weight clarification, and Kish-DEFF labeling now match the implementation and registry at docs/methodology/survey-theory.md#L177, docs/methodology/survey-theory.md#L227, docs/methodology/survey-theory.md#L382, docs/methodology/survey-theory.md#L649, docs/survey-roadmap.md#L122, diff_diff/survey.py#L189, diff_diff/survey.py#L742, diff_diff/results.py#L42, and docs/methodology/REGISTRY.md#L2445.

Code Quality

No findings in the changed surface.

Performance

No findings in the changed surface.

Maintainability

No findings in the changed surface.

Tech Debt

No findings. No new TODO.md tracking looks necessary for this docs-only change.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: docs/methodology/survey-theory.md#L99 overstates csdid’s documented interface by saying it accepts pweight for point estimation. The package help says weights are taken as iweights and treated internally as pweights, and the ado syntax still declares [iw]; a reader could infer unsupported literal Stata syntax from the current sentence. The rest of the sentence about wild bootstrap and clustering is consistent with the help text. Concrete fix: change “accepts pweight” to “accepts sampling weights via iweight syntax, treated internally as pweights” or simply “accepts sampling weights.” (github.com)

igerber added the ready-for-ci Triggers CI test workflows label Apr 6, 2026

igerber merged commit 7d25baf into main Apr 6, 2026

igerber deleted the revise-survey-theory-doc branch April 6, 2026 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise survey theory doc for accuracy and precision#278

Revise survey theory doc for accuracy and precision#278
igerber merged 3 commits intomainfrom
revise-survey-theory-doc

igerber commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

igerber commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

igerber commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 6, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

igerber commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

igerber commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant