Add v3.0 announcement carousel by igerber · Pull Request #283 · igerber/diff-diff

igerber · 2026-04-07T15:51:59Z

Summary

LinkedIn PDF carousel (8 slides) announcing diff-diff v3.0's survey design-based inference
Warm ivory (#FDFBF7) / burnt sienna (#B45309) paper aesthetic — visual break from previous carousels
Generation script at carousel/generate_v3_carousel.py using FPDF2 + matplotlib (same toolchain as existing carousels)

Slide content

Hook — "The first DiD library with design-based survey inference"
Competitive gap table (Strata, FPC, Replicate Wts: diff-diff vs R did vs Stata csdid)
DEFF visualization — paired CI bars showing naive vs design-based conclusions
Binder (1983) variance formula applied to DiD influence functions
Feature cards — TSL, replicate weights, survey-aware bootstrap
Validation — API, NHANES, RECS cross-validated against R (< 1e-10)
Code example — CallawaySantAnna + SurveyDesign (validated against actual API signatures)
CTA — pip install, GitHub link

Claim verification

Competitive gap sourced from docs/methodology/survey-theory.md Section 1.3 and actual R/Stata package docs
Precision badges sourced from tests/test_survey_real_data.py:40 ("observed gaps < 1e-10")
7 estimators validated against R golden values (DiD, TWFE, CS, Imputation, Stacked, SA, DDD)
Bacon asterisked as diagnostic throughout
"All packages support cluster-robust inference" footnote added to gap table

Methodology references

N/A — no estimator or math changes; carousel generation script only

Validation

Tests added/updated: No test changes
PDF generated and visually reviewed (8 pages, 90KB)

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

LinkedIn PDF carousel (8 slides) announcing diff-diff v3.0's headline feature: design-based variance estimation for complex surveys across all estimators. Warm ivory/burnt sienna paper aesthetic. Slides: claim hook, competitive gap table, DEFF CI visualization, Binder theorem equation, feature cards, R validation, code example, CTA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-07T16:00:45Z

Overall Assessment
⚠️ Needs changes. No P0s, but there are two unmitigated P1 methodology-claim mismatches in the carousel copy.

Executive Summary

Slides 1 and 5 overstate survey-feature coverage by pairing TSL, replicate weights, and survey bootstrap with “All 16 estimators,” while the published support matrix documents estimator-specific limits.
Slide 4 presents the Binder/influence-function argument as if it applies to all modern DiD estimators, but the source methodology explicitly excludes SyntheticDiD and TROP from that smooth-functional / IF chain.
The CallawaySantAnna code example and the listed replicate-weight methods appear aligned with the current API and survey docs.
No security or performance issues stood out in the new generator; the blockers are claim accuracy in the generated content.

Methodology

Severity P1. Affected methods: SyntheticDiD, TROP. Location: slide-4 reasoning text at carousel/generate_v3_carousel.py:509 and the surrounding slide block at carousel/generate_v3_carousel.py:459; compare docs/methodology/survey-theory.md:356 through line 368, plus docs/methodology/REGISTRY.md:1298 and docs/methodology/REGISTRY.md:1752.
Impact: The slide says “Modern DiD estimators are smooth functionals of F” and that their IFs are well-defined/design-independent without qualification. The methodology docs explicitly carve out SyntheticDiD and TROP as non-smooth/bootstrap-only cases, so the public mathematical justification is broader than the documented theory.
Concrete fix: Reword slide 4 to scope the Binder argument to IF-amenable / analytical-survey estimators, or add an explicit exception sentence that SyntheticDiD and TROP use Rao-Wu survey bootstrap rather than Binder/TSL linearization; then regenerate carousel/diff-diff-v3-carousel.pdf.
Severity P1. Affected methods: SyntheticDiD, TROP, WooldridgeDiD, BaconDecomposition, and the library-wide survey-support claim. Location: slide-1 teaser block at carousel/generate_v3_carousel.py:347, slide-5 feature cards at carousel/generate_v3_carousel.py:564, and slide-5 callout at carousel/generate_v3_carousel.py:599; compare the support matrix at docs/choosing_estimator.rst:602 through line 703 and the shipped-count note at docs/survey-roadmap.md:50 through line 54.
Impact: The docs are clear that all estimators accept survey_design, but not all support the same variance paths. The carousel collapses those distinctions by putting TSL, replicate weights, survey bootstrap, and “All 16 estimators” into one message, even though the current docs say replicate weights are only on 12/16, SyntheticDiD/TROP use bootstrap-only full-design survey support, WooldridgeDiD has analytical TSL only, and BaconDecomposition is diagnostic-only.
Concrete fix: Replace the blanket “All 16 estimators” wording with either the true per-feature counts or a qualified statement such as “survey support across 16 estimators, with estimator-specific variance paths,” and expand the asterisk text to cover the documented exceptions before regenerating the PDF.

Code Quality

No findings in scope.

Performance

No findings in scope.

Maintainability

No findings in scope.

Tech Debt

No findings. TODO.md does not currently track the two methodology-copy mismatches above, so they are unmitigated under the project’s deferred-work policy.

Security

No findings in scope.

Documentation/Tests

Severity P3. Location: slide-6 validation text at carousel/generate_v3_carousel.py:623 and carousel/generate_v3_carousel.py:700; supporting suites at tests/test_survey_real_data.py:1, tests/test_survey_r_crossvalidation.py:1, and tests/test_survey_estimator_validation.py:1.
Impact: “Cross-validated against R’s survey package” is accurate for the real-data cards, but “7 estimators total validated against R golden values” aggregates multiple suites and packages. Readers could reasonably read that as seven direct survey::svyglm() parity checks, which is stronger than the current test organization.
Concrete fix: If the 7-estimator total stays, change the nearby wording to “validated against R reference implementations,” or split survey-package validation from broader R-golden-value validation.

Path to Approval

Update slide 4 in carousel/generate_v3_carousel.py:459 so the Binder/IF argument is explicitly scoped to IF-based analytical-survey estimators, or explicitly carve out SyntheticDiD and TROP, then regenerate carousel/diff-diff-v3-carousel.pdf.
Update the slide 1 / slide 5 messaging in carousel/generate_v3_carousel.py:347, carousel/generate_v3_carousel.py:564, and carousel/generate_v3_carousel.py:599 so it matches the support matrix in docs/choosing_estimator.rst:602, then regenerate the PDF.

- Slide 4: "Most modern DiD estimators" (not all); add 4th bullet carving out SyntheticDiD/TROP as Rao-Wu bootstrap-only (survey-theory.md Section 4.2a) - Slide 1: "Survey support across all 16 estimators*" with asterisk noting variance paths vary by estimator - Slide 5: Expanded asterisk covering TSL/Rao-Wu split and Bacon - Slide 6: "validated against R reference implementations" (not "golden values") to avoid implying all 7 used survey::svyglm() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-07T16:13:41Z

/ai-review

github-actions · 2026-04-07T16:19:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7b7278ec933e613749e3b848edc7157fdf20f257

Overall Assessment

⚠️ Needs changes — the re-review resolves the earlier slide-4 theory-scope blocker, but one unmitigated P1 methodology/support-matrix mismatch remains in slide 5.

Executive Summary

Re-review result: the prior slide-4 Binder/IF overreach appears resolved. The revised copy now explicitly carves out SyntheticDiD and TROP from the IF/Binder path.
The earlier slide-6 validation-copy concern also appears resolved: “R reference implementations” is materially narrower and better aligned than implying seven direct survey::svyglm() parity checks.
One P1 remains on slide 5: the asterisk says TSL for 14 estimators, but the documented support matrix only supports 13 analytical/TSL-capable estimators.
Because the same slide still pairs “Replicate Weights” with “Survey support across all 16 estimators,” the safest fix is to use exact per-feature counts or a purely qualitative support-matrix qualifier.
No estimator code, inference code, NaN handling, or parameter-propagation paths changed in this PR, so the edge-case checklist did not surface new production-code risks.

Methodology

Severity P1. Impact: slide 5 still does not match the documented survey-support matrix. The carousel says Survey support across all 16 estimators.* with *TSL for 14 estimators, but the shipped matrix has 16 total estimator rows, only 13 analytical/TSL-capable estimators, SyntheticDiD and TROP as bootstrap-only, and BaconDecomposition as diagnostic-only; replicate weights are separately documented only for 12 public estimators. Affected methods: the survey-support claims across all estimators, especially WooldridgeDiD, SyntheticDiD, TROP, and BaconDecomposition. Concrete fix: replace the asterisk with the exact per-feature counts from the support matrix, or change the callout to a purely qualitative statement such as “survey support across 16 estimators, with estimator-specific variance paths,” then regenerate the PDF. References: carousel/generate_v3_carousel.py:601, carousel/generate_v3_carousel.py:606, carousel/generate_v3_carousel.py:609, docs/choosing_estimator.rst:602, docs/methodology/REGISTRY.md:1177, docs/methodology/REGISTRY.md:1298, docs/methodology/REGISTRY.md:1752, docs/methodology/REGISTRY.md:1938, docs/methodology/REGISTRY.md:2420.

Code Quality

No findings in scope. This PR does not touch estimator or inference implementation, and I did not see the known statistical anti-patterns in the added generator.

Performance

No findings in scope. The script is offline asset generation and introduces no production-path performance risk.

Maintainability

No findings in scope.

Tech Debt

No mitigating tracked item applies here. I did not find a matching entry under TODO.md:51, so the methodology mismatch above remains unmitigated.

Security

No findings in scope.

Documentation/Tests

No additional findings. The revised validation wording looks materially better aligned with the current test surface. I did not re-render the PDF in this read-only review; this assessment is based on the checked-in generator and the committed artifact.

Path to Approval

Update the slide-5 callout/asterisk in carousel/generate_v3_carousel.py:601 and carousel/generate_v3_carousel.py:606 so it matches the documented matrix exactly. If you keep counts, they need to reflect the shipped docs (TSL on 13 estimators; replicate weights only where documented; SyntheticDiD/TROP bootstrap-only; BaconDecomposition diagnostic-only).
Regenerate carousel/diff-diff-v3-carousel.pdf from the corrected script so the binary matches the fixed copy.

AI review caught that "TSL for 14" should be 13 (SyntheticDiD, TROP, and Bacon excluded). Replace with qualitative statement to avoid count mismatches with the support matrix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-07T16:21:38Z

/ai-review

github-actions · 2026-04-07T16:27:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c18f7e4b695ecac27fa9d7fc026765081ba20f7a

Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings. The prior methodology concerns called out in the earlier AI review appear resolved in the current carousel copy.

Executive Summary

The previous slide-5 blocker is resolved: the script no longer hard-codes an unsupported TSL estimator count and now defers readers to the estimator-specific support matrix.
The previous slide-4 Binder/IF scope concern is also resolved: SyntheticDiD and TROP are now explicitly carved out to Rao-Wu bootstrap rather than being implied to follow the IF/Binder path.
The slide-1 “first DiD library” hook is directly supported by the survey methodology note.
The validation slide is materially aligned with the current test surface: real-data parity against R survey plus broader R reference-implementation coverage.
No estimator, weighting, variance/SE, identification, or default-behavior code changed in this PR, so the edge-case checklist did not surface new production correctness risks.

Methodology

Severity P3 (informational). Impact: slide 1’s “The first DiD library with design-based survey inference” claim is supported by the project’s methodology note, so I do not see a methodology defect here. Concrete fix: none. References: carousel/generate_v3_carousel.py:342, docs/methodology/survey-theory.md:149
Severity P3 (informational). Impact: the prior slide-5 support-matrix mismatch appears resolved. The current copy now uses a qualitative asterisked claim and points to the support matrix, which matches the documented estimator-specific survey variance dispatch, including exceptional paths like bootstrap-only SyntheticDiD/TROP and diagnostic-only BaconDecomposition. Concrete fix: none. References: carousel/generate_v3_carousel.py:601, carousel/generate_v3_carousel.py:606, docs/choosing_estimator.rst:602, docs/choosing_estimator.rst:675, docs/choosing_estimator.rst:680, docs/choosing_estimator.rst:690, docs/methodology/REGISTRY.md:2420
Severity P3 (informational). Impact: the prior slide-4 theory-scope concern appears resolved. The slide now explicitly excludes SyntheticDiD and TROP from the IF/Binder route and sends them to Rao-Wu bootstrap, which matches the methodology note’s carve-out. Concrete fix: none. References: carousel/generate_v3_carousel.py:508, carousel/generate_v3_carousel.py:512, docs/methodology/survey-theory.md:358, docs/methodology/survey-theory.md:415

Code Quality

No findings in scope. This PR adds a standalone asset generator and PDF; it does not modify estimator or inference code.

Performance

No findings in scope. The new code runs only for offline carousel generation and does not affect library runtime paths.

Maintainability

No findings in scope. For this kind of one-off generator, the slide-specific helpers and claim/source docstrings are sufficiently auditable.

Tech Debt

No findings in scope. I did not find a relevant unmitigated entry under TODO.md:51, and I did not identify a new issue that needs to be tracked there.

Security

No findings in scope. I did not see secrets, credentials, or unsafe data-handling patterns in the added script or committed artifact.

Documentation/Tests

Severity P3 (informational). Impact: the revised validation messaging is materially aligned with the current test surface. Real-data parity against R survey is covered in tests/test_survey_real_data.py:1, four additional svyglm() equivalence checks are covered in tests/test_survey_estimator_validation.py:1, and CallawaySantAnna survey-weighted ATT is checked against R did::att_gt in tests/test_survey_r_crossvalidation.py:247. Concrete fix: none. References: carousel/generate_v3_carousel.py:626, carousel/generate_v3_carousel.py:706, carousel/generate_v3_carousel.py:711
No additional findings. I did not re-render the PDF or execute tests in this read-only review.

igerber added the ready-for-ci Triggers CI test workflows label Apr 7, 2026

igerber merged commit 7021ce5 into main Apr 7, 2026

igerber deleted the v3-announcement-carousel branch April 7, 2026 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add v3.0 announcement carousel#283

Add v3.0 announcement carousel#283
igerber merged 3 commits intomainfrom
v3-announcement-carousel

igerber commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

igerber commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

igerber commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 7, 2026

Summary

Slide content

Claim verification

Methodology references

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

igerber commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

igerber commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant