Skip to content

Revise survey theory doc for accuracy and precision#278

Merged
igerber merged 3 commits intomainfrom
revise-survey-theory-doc
Apr 6, 2026
Merged

Revise survey theory doc for accuracy and precision#278
igerber merged 3 commits intomainfrom
revise-survey-theory-doc

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 6, 2026

Summary

  • Soften overclaiming language in Sections 1.1-1.5 based on external review and primary source verification
  • Add "design-based" terminological disambiguation (survey sampling design vs treatment-assignment design)
  • Fix software gap claims: acknowledge did/csdid/did_multiplegt_dyn cluster support while clarifying the real gap (strata + PSU + FPC jointly)
  • Fix DEFF terminology inconsistency: Section 7 formula is Kish weighting effect (deff_w), not full design effect
  • Rename "Horvitz-Thompson consistency" to "Design consistency" (formula is Hájek/self-normalized)
  • Add Ye et al. (2025) and Athey & Imbens (2022) references; update cross-references in survey-roadmap.md
  • Reframe "non-informative sampling" to "Estimand dependence on weights" (removes model-based concept from design-based context)
  • Clarify weight normalization semantics (raw vs normalized) and N vs N_hat in Section 4.3
  • Add RCS vs panel note and Stata singleunit(centered) alignment

Methodology references (required if estimator / math changes)

  • Method name(s): N/A — documentation-only changes, no code or equations modified
  • Paper / source link(s): Athey & Imbens (2022, JoE 226(1)); Ye, Bilinski & Lee (2025, HSORM); Roth et al. (2023, JoE 235(2)) verified via primary text
  • Any intentional deviations from the source (and why): None

Validation

  • Tests added/updated: No test changes (documentation only)
  • Backtest / simulation / notebook evidence (if applicable): N/A
  • All claims verified against primary sources: Roth et al. (2023) full PDF read (44 pages), Ye et al. (2025) PMC full text, R did CRAN docs + source code, Stata csdid help file

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Soften overclaiming language, fix terminology inconsistencies, and
narrow novelty claims based on external review and primary source
verification. Core theory (Sections 4-6) unchanged.

Key changes:
- Disambiguate "design-based" (survey sampling vs treatment assignment)
- Acknowledge did/csdid/did_multiplegt_dyn cluster support while
  clarifying the real gap (strata + PSU + FPC jointly)
- Fix DEFF terminology (Kish weighting effect vs full design effect)
- Rename Horvitz-Thompson to Design consistency (Hájek estimator)
- Add Ye et al. (2025) and Athey & Imbens (2022) references
- Reframe "non-informative sampling" paragraph
- Clarify weight normalization semantics and N vs N_hat
- Add RCS vs panel note and Stata singleunit(centered) alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

Overall Assessment

✅ Looks good

Executive Summary

  • No unmitigated P0/P1 findings. The PR is docs-only, and I did not find a changed claim that implies a code-side methodology regression in the touched surface.
  • One changed methodology sentence misstates the source trail: docs/methodology/survey-theory.md#L68 says Callaway & Sant’Anna (2021) cite Athey & Imbens (2022), but the paper discusses the 2018 working-paper version. (sciencedirect.com)
  • The newly added Ye/Bilinski/Lee reference in docs/methodology/survey-theory.md#L777 has the wrong DOI. (pubmed.ncbi.nlm.nih.gov)
  • The new software-gap wording in docs/methodology/survey-theory.md#L104 overstates didimputation’s weight support; the official did_imputation() interface includes a wname estimation-weight argument. (rdrr.io)
  • The new deff_w terminology is conceptually better, but the prose now implies a SurveyMetadata field name the code does not expose; the current API is still survey_metadata.design_effect in diff_diff/survey.py#L708.

Methodology

  • Severity: P2 Impact: docs/methodology/survey-theory.md#L68 says Callaway & Sant’Anna acknowledge design-based inference by citing Athey & Imbens (2022). The Callaway & Sant’Anna paper actually discusses Athey & Imbens as a 2018 working paper; 2022 is the later Journal of Econometrics publication year. In a methodology doc, that is a source-attribution error. Concrete fix: change the prose to “Athey & Imbens (2018; later published in 2022)” or equivalent. (sciencedirect.com)
  • Severity: P2 Impact: The new reference at docs/methodology/survey-theory.md#L777 lists DOI 10.1007/s10742-025-00366-3, but the Ye/Bilinski/Lee article is 10.1007/s10742-025-00364-7. Broken citation metadata undermines the “primary source verified” framing. Concrete fix: update the DOI and align the reference with the published Springer/PubMed record. (pubmed.ncbi.nlm.nih.gov)
  • Severity: P2 Impact: The new sentence at docs/methodology/survey-theory.md#L104 says didimputation does not accept probability weights at all, but the official did_imputation() docs expose wname for estimation weights. Even if the intended point is “no survey-design variance / no documented strata-PSU-FPC support,” the current wording overstates the gap. Concrete fix: qualify the sentence accordingly instead of saying the package does not accept weights at all. (rdrr.io)

Code Quality

  • No findings in the changed surface.

Performance

  • No findings in the changed surface.

Maintainability

  • No findings in the changed surface.

Tech Debt

  • No findings in the changed surface.

Security

  • No findings.

Documentation/Tests

- Fix Athey & Imbens citation: C&S 2021 cites the 2018 working paper,
  not the 2022 publication directly
- Fix Ye et al. DOI: correct to 10.1007/s10742-025-00364-7
- Acknowledge didimputation accepts estimation weights via wname
- Align deff_w prose with actual API field SurveyMetadata.design_effect

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 6, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 3a5d1d92e125225eed18d8f06b79b8e5c52e575d


Overall Assessment

✅ Looks good

No unmitigated P0 or P1 findings in the changed surface.

Executive Summary

  • Re-review outcome: the earlier external-source accuracy issues around the Athey/Imbens date trail, didimputation weight support, and the Ye/Bilinski/Lee DOI are addressed in the revised text. (nber.org)
  • Re-review outcome: the earlier SurveyMetadata.design_effect naming mismatch is also resolved; the revised prose now matches the current API, registry wording, and summary display at docs/methodology/survey-theory.md:648-649, docs/methodology/REGISTRY.md:2445-2447, diff_diff/survey.py:692-708, and diff_diff/results.py:42.
  • Remaining issues are documentation-accuracy issues, not code-side methodology defects: two P2s and one P3.
  • No estimator code, variance code, weighting code, or defaults were changed in this PR.

Methodology

Code Quality

  • No findings in the changed surface.

Performance

  • No findings in the changed surface.

Maintainability

  • No findings in the changed surface.

Tech Debt

  • No findings in the changed surface. No new TODO.md tracking looks necessary for these docs-only issues.

Security

  • No findings.

Documentation/Tests

  • Severity: P3 Impact: docs/methodology/survey-theory.md:102-106 contains a double negative: “Neither eventstudyinteract ... does not accept probability weights.” The intended meaning is clear from context, but the sentence is grammatically ambiguous in a section that is otherwise trying to be more precise. Concrete fix: rewrite as either “eventstudyinteract does not accept probability weights” or “Nor does eventstudyinteract accept probability weights.”
  • No test gap finding for a docs-only PR.

- Align roadmap: "superpopulation quantity" → "finite-population quantity"
- Fix double negative in didimputation/eventstudyinteract sentence
- Clarify N_hat: after normalization sum(w_i) = n; scaling is
  variance-equivalent because only relative weights affect sandwich meat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 6, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: aace57b7664eeaa22b1f7ff00e59e316c8730fca


Overall Assessment

Looks good. No unmitigated P0 or P1 findings in the changed surface.

Executive Summary

Methodology

Code Quality

  • No findings in the changed surface.

Performance

  • No findings in the changed surface.

Maintainability

  • No findings in the changed surface.

Tech Debt

  • No findings. No new TODO.md tracking looks necessary for this docs-only change.

Security

  • No findings.

Documentation/Tests

  • Severity: P3
    Impact: docs/methodology/survey-theory.md#L99 overstates csdid’s documented interface by saying it accepts pweight for point estimation. The package help says weights are taken as iweights and treated internally as pweights, and the ado syntax still declares [iw]; a reader could infer unsupported literal Stata syntax from the current sentence. The rest of the sentence about wild bootstrap and clustering is consistent with the help text. Concrete fix: change “accepts pweight” to “accepts sampling weights via iweight syntax, treated internally as pweights” or simply “accepts sampling weights.” (github.com)

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 6, 2026
@igerber igerber merged commit 7d25baf into main Apr 6, 2026
@igerber igerber deleted the revise-survey-theory-doc branch April 6, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant