docs: add JOSS paper and bibliography by igerber · Pull Request #297 · igerber/diff-diff

igerber · 2026-04-12T18:22:04Z

Summary

Add paper.md and paper.bib for JOSS (Journal of Open Source Software) submission
Paper covers 16 estimators, survey-weighted inference, R validation, and practitioner tooling
19 BibTeX entries sourced from docs/methodology/REGISTRY.md
588 words (within JOSS 250-1000 range)

Methodology references (required if estimator / math changes)

N/A - no methodology changes (documentation only)

Validation

Tests added/updated: No test changes (documentation only)
Citation key consistency verified: 19 [@key] references in paper.md match 19 entries in paper.bib with no orphans
Word count verified within JOSS 250-1000 range
YAML frontmatter matches JOSS required structure
"16 estimators" claim verified against diff_diff/__init__.py

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

JOSS submission for diff-diff: 16 estimators, survey-weighted inference, R validation to machine precision, and practitioner tooling under a unified scikit-learn-style API. 19 BibTeX entries sourced from docs/methodology/REGISTRY.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-12T18:28:25Z

Overall Assessment

✅ Looks good

This is a docs-only PR. I did not find any P0/P1 implementation issues because no estimator, weighting, variance, inference, or identification code changed.

Executive Summary

The paper overstates survey support: not every estimator currently provides design-based survey inference, and ChaisemartinDHaultfoeuille explicitly rejects survey_design.
The abstract-level claim that point estimates and standard errors are validated to machine precision is broader than the documented benchmark scope; some validated SE gaps are sub-percent, not machine-precision.
The repeated “16 estimators” claim is inconsistent with current repo docs/registry, which separately document both TripleDifference and StaggeredTripleDifference and elsewhere describe the library as having 17 estimators.
The ETWFE entry should cite Wooldridge (2025) as the primary source, with Wooldridge (2023) as the nonlinear companion.

Methodology

Severity P2. In paper.md:27, the paper says all estimators accept SurveyDesign “for design-based variance estimation.” The compatibility matrix says support depth varies by estimator docs/choosing_estimator.rst:683, ChaisemartinDHaultfoeuille is marked unsupported docs/choosing_estimator.rst:732, and the implementation raises NotImplementedError when survey_design is passed diff_diff/chaisemartin_dhaultfoeuille.py:566. Impact: the paper overstates current survey-inference coverage. Concrete fix: rewrite this as “the library exposes survey_design broadly, with estimator-specific support documented in the compatibility matrix,” or summarize the major exceptions (dCDH, pweight-only estimators, bootstrap-only survey paths).
Severity P2. paper.md:29 claims point estimates and standard errors are validated against R “to machine precision.” The benchmark summary shows exact ATT parity for core estimators but not universal machine-precision SE parity; e.g. SyntheticDiD has a 0.3% SE relative difference docs/benchmarks.rst:94, and survey-estimator validation reports SE gaps up to 0.77% and 0.36% docs/benchmarks.rst:777, docs/benchmarks.rst:789. Impact: the summary overclaims validation precision relative to repo evidence. Concrete fix: narrow the statement to exact point-estimate parity plus exact/sub-percent SE parity for the specific validated estimators, matching the more careful wording later in the paper.
Severity P3. The ETWFE method description in paper.md:64 cites only Wooldridge (2023), but the Methodology Registry lists Wooldridge (2025) as the primary WooldridgeDiD source and Wooldridge (2023) as secondary docs/methodology/REGISTRY.md:1221, docs/methodology/REGISTRY.md:1223. Impact: the paper underattributes the primary ETWFE source material. Concrete fix: add the 2025 ETWFE paper to paper.bib and cite both 2025 and 2023 where ETWFE is described.

Code Quality

No findings. The diff only adds paper.md and paper.bib; no executable code changed.

Performance

No findings. No runtime or algorithmic paths changed.

Maintainability

Severity P2. The paper hard-codes “16 estimators” in paper.md:24, paper.md:50, and paper.md:58, but current repo docs also describe the library as having 17 estimators docs/llms.txt:5, docs/practitioner_decision_tree.rst:384, and the Methodology Registry has separate entries for both TripleDifference and StaggeredTripleDifference docs/methodology/REGISTRY.md:1461, docs/methodology/REGISTRY.md:1559. Impact: this PR adds another divergent estimator count and makes the JOSS paper inconsistent with existing docs. Concrete fix: define one counting rule and use it everywhere, or drop the numeric claim and describe method families instead.

Tech Debt

No findings. I did not find an existing TODO.md item that mitigates the documentation-accuracy issues above.

Security

No findings. Docs-only PR; no secrets, credentials, or new execution surfaces are introduced.

Documentation/Tests

No findings on citation integrity. I verified that paper.md cites 19 unique keys and paper.bib defines 19 BibTeX entries, with no orphaned or missing citation keys.

- Fix estimator count: 16 → 17 (includes StaggeredTripleDifference) - Soften survey claim: "All estimators" → "Most estimators" (dCDH raises NotImplementedError for survey_design) - Narrow validation claim: machine-precision point estimates, exact or sub-percent SE parity (SyntheticDiD has 0.3% SE gap) - Add Wooldridge (2025) as primary ETWFE citation alongside (2023) - Add ORCID to CITATION.cff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-12T18:32:33Z

/ai-review

github-actions · 2026-04-12T18:38:14Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 788a51df10f4ab16083149b88379c08f4b425536

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This is a docs-only PR, and the prior methodology-adjacent issues from the earlier review are resolved or reduced to minor documentation precision.

Executive Summary

No estimator, weighting, variance, inference, or identification code changed.
The prior ETWFE source issue is resolved: paper.md:L63 now cites both Wooldridge sources, matching docs/methodology/REGISTRY.md:L1219-L1223.
The prior validation-precision overclaim is resolved: paper.md:L29-L30 now matches the benchmark evidence in docs/benchmarks.rst:L88-L97 and docs/benchmarks.rst:L771-L794.
The prior headline estimator-count mismatch is resolved from 16 to 17, but one P3 taxonomy inconsistency remains in the key-features paragraph.
Bibliography integrity is good: paper.md cites 20 unique keys and paper.bib defines the same 20 entries, with no orphans.

Methodology

No findings. The changed text now aligns with the Methodology Registry and benchmark docs on ETWFE sourcing and validation scope.

Code Quality

No findings. Only documentation and citation metadata changed.

Performance

No findings. No runtime paths changed.

Maintainability

Severity P3. Impact: the “17 estimators” taxonomy in paper.md:L57-L66 still mixes estimators with diagnostics/sensitivity tools. The Methodology Registry classifies HonestDiD and PreTrendsPower under Diagnostics & Sensitivity, not estimator sections, in docs/methodology/REGISTRY.md:L7-L30. Concrete fix: change the phrasing to “17 estimators plus diagnostics/sensitivity tools,” or move Honest DiD and pre-trends power out of the estimator-count paragraph.

Tech Debt

No findings. I did not find a TODO.md entry that changes the status of the remaining documentation-only issue.

Security

No findings. No secrets, dependencies, or execution surfaces were added.

Documentation/Tests

Severity P3. Impact: paper.md:L27-L29 and paper.md:L74-L76 make absolute comparative claims about survey support across “any language” without a supporting citation or qualification. This does not affect library behavior, but it is likely to attract avoidable JOSS reviewer pushback. Concrete fix: qualify these as “to our knowledge” or add a comparative source if one exists.
No findings on structural validity. CITATION.cff and the paper.md front matter both parse as valid YAML.

igerber added the ready-for-ci Triggers CI test workflows label Apr 12, 2026

igerber merged commit 86a0c4b into main Apr 12, 2026

igerber deleted the docs/joss-paper branch April 12, 2026 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add JOSS paper and bibliography#297

docs: add JOSS paper and bibliography#297
igerber merged 2 commits intomainfrom
docs/joss-paper

igerber commented Apr 12, 2026

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

igerber commented Apr 12, 2026

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 12, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

igerber commented Apr 12, 2026

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant