Update power analysis tutorial with simulation-based features by igerber · Pull Request #220 · igerber/diff-diff

igerber · 2026-03-21T13:04:41Z

Summary

Expand power analysis tutorial to cover the full simulation API from PR Extend power analysis to all estimators + simulation-based MDE/sample size #208
Add Section 8: Power analysis for any estimator (CallawaySantAnna, SyntheticDiD, TripleDifference examples + reference table of all 12 supported estimators)
Add Section 9: Simulation-based MDE search via simulate_mde() with search path inspection
Add Section 10: Simulation-based sample size search via simulate_sample_size() with analytical vs simulation comparison
Add Section 11: Custom data generators (data_generator_kwargs and fully custom data_generator)
Add analytical vs simulation guidance to practical recommendations
Update summary with new takeaways

Methodology references (required if estimator / math changes)

N/A — no methodology changes, tutorial notebook only

Validation

Tests added/updated: No test changes (tutorial notebook)
Backtest / simulation / notebook evidence: Notebook executes end-to-end without errors via jupyter nbconvert --execute

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Expand the tutorial to cover the full simulation power analysis API added in PR #208: multi-estimator support (CallawaySantAnna, SyntheticDiD, TripleDifference), simulate_mde() bisection search, simulate_sample_size() bisection search, custom data generators, and analytical vs simulation guidance. Includes reference table of all 12 supported estimators. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-21T13:11:03Z

Overall Assessment

⚠️ Needs changes

The notebook update is mostly aligned with power.py and the Methodology Registry, but one new example is methodologically incorrect and can yield a rank-deficient basic DiD specification. The other issues are documentation-accuracy problems rather than library-code regressions.

Executive Summary

P1: the new custom data_generator example teaches an invalid DifferenceInDifferences specification by passing a post-treatment exposure indicator as treatment; that duplicates the interaction term and can produce dropped/NaN ATT output. docs/tutorials/06_power_analysis.ipynb:L626-L634 diff_diff/estimators.py:L146-L170 diff_diff/estimators.py:L249-L259
P2: the new DDD prose says “one unit per cell minimum,” but the implementation clamps to n_per_cell >= 2 and effective_n_units >= 16; 64 is a separate registry search floor. docs/tutorials/06_power_analysis.ipynb:L482-L498 diff_diff/power.py:L112-L125 diff_diff/power.py:L331-L340 diff_diff/power.py:L528-L534
P3: the Section 8 intro overstates registry support as “any built-in estimator”; later notebook text partially corrects this, but the auto-registry only covers the 12 registered classes and some staggered configs have documented caveats. docs/tutorials/06_power_analysis.ipynb:L454-L498 diff_diff/power.py:L446-L535 docs/methodology/REGISTRY.md:L1720-L1723
No in-scope performance, maintainability, or security regressions were introduced.
Notebook CI coverage remains a tracked project gap, so that part is informational only. TODO.md:L64-L68

Methodology

Severity: P1. Impact: The custom DGP cell defines "treated": int(treated_unit and post) and then passes treatment="treated", time="post" into DifferenceInDifferences.fit(). But basic DiD expects a time-invariant treated-group indicator plus a separate post indicator, and the estimator internally constructs the interaction treatment * time. With the notebook’s coding, the treatment column and interaction column are identical, violating the documented DiD specification Treated_i × Post_t and creating a rank-deficient design that can drop the ATT coefficient. Concrete fix: Emit a separate ever_treated/treated_group column, keep post as the time indicator, and pass treatment="ever_treated", time="post". docs/tutorials/06_power_analysis.ipynb:L626-L634 docs/methodology/REGISTRY.md:L41-L71 diff_diff/estimators.py:L249-L259
Severity: P2. Impact: The DDD subsection says sample sizes are snapped to multiples of 8 with “one unit per cell minimum,” but the adapter actually uses n_per_cell = max(2, n_units // 8), so values below 16 are clamped to an effective N of 16. The adjacent Min N = 64 table entry comes from the registry’s sample-size-search floor, not the general simulate_power() lower bound. As written, the section gives contradictory guidance about valid DDD sample sizes. Concrete fix: Rewrite the note to distinguish the two rules explicitly: DDD simulations round to max(2, n_units // 8) * 8 with minimum effective N = 16, while simulate_sample_size() starts from the registry floor of 64. docs/tutorials/06_power_analysis.ipynb:L482-L498 docs/methodology/REGISTRY.md:L1721-L1723 tests/test_power.py:L969-L980

Code Quality

No findings in scope; this PR only changes tutorial content.

Performance

No findings in scope; no runtime code paths changed.

Maintainability

No findings in scope beyond the documentation-accuracy items above.

Tech Debt

Severity: P3. Impact: Tutorial notebooks are still not executed in CI, which makes regressions in notebook examples easier to miss. This is already tracked, so it is not a blocker for this PR. Concrete fix: None required here; tracked in TODO.md. TODO.md:L64-L68

Security

No findings in scope; no secrets, unsafe shell usage, or new external-input handling were introduced.

Documentation/Tests

Severity: P3. Impact: The Section 8 opener says the simulation approach works with “any built-in estimator” and that “everything else is handled,” but simulate_power() auto-supports only the 12 classes in _ESTIMATOR_REGISTRY; unsupported built-ins require custom DGP plumbing, and some staggered configurations have the documented single-cohort DGP caveat. The later support table/note softens this, so this is informational rather than blocking. Concrete fix: Rephrase the opener to “any registered/supported estimator,” and point readers to the support table plus the custom-data_generator path for unsupported or special-case estimators. docs/tutorials/06_power_analysis.ipynb:L454-L498 diff_diff/power.py:L1522-L1529 docs/methodology/REGISTRY.md:L1722-L1723

Path to Approval

Required: fix the custom data_generator example so it passes a time-invariant treated-group indicator to DifferenceInDifferences.fit() and keeps post separate.
Recommended while touching the same section: correct the DDD prose/table so the tutorial distinguishes DGP rounding (max(2, n_units // 8) * 8, minimum effective N = 16) from the simulate_sample_size() registry floor (min_n = 64).

- P1: Fix custom DGP example to use time-invariant `ever_treated` column instead of post-treatment exposure indicator (rank-deficient design) - P2: Correct DDD prose to document actual rounding rule `max(2, n_units // 8)` with min effective N = 16, distinguish from registry search floor of 64 - P3: Change Section 8 intro from "any built-in estimator" to "all 12 supported estimators" with forward refs to support table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-03-21T13:26:57Z

/ai-review

github-actions · 2026-03-21T13:34:39Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e26c0932b7dbe5b8ad86ff29778e5e95bc8e0f2a

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review blockers are resolved: the custom DGP example now uses a time-invariant treated-group indicator plus a separate post indicator, and the DDD prose now matches the registry’s effective-N rounding and simulate_sample_size() floor behavior.
No estimator implementation, variance/SE path, inference code, or default library behavior changed in this PR; the diff is tutorial-only.
Methodology cross-check against the power-analysis registry is largely clean.
One remaining issue is informational only: some new prose still slightly overstates the “just swap in the estimator” story for staggered configurations that the registry explicitly documents as needing DGP overrides or warnings.
Notebook execution is still not covered in CI, but that limitation is already tracked in TODO.md and is not blocking.

Methodology

Severity: P3. Impact: Section 8 and the later summary/table still say the registry effectively handles everything once you “just swap in the estimator,” but that is slightly too broad for staggered configurations such as control_group="not_yet_treated", clean_control="strict", or anticipation>0. The library intentionally warns that the default single-cohort staggered DGP does not match those designs, and the Methodology Registry documents that caveat. This is not a defect because the deviation is documented, but the tutorial prose can still mislead readers about default applicability. Concrete fix: optionally soften the wording to say the registry covers all 12 registered estimators, while some staggered configurations still require data_generator_kwargs or a custom data_generator. Locations: docs/tutorials/06_power_analysis.ipynb:L454-L456, docs/tutorials/06_power_analysis.ipynb:L719-L728, diff_diff/power.py:L270-L328, docs/methodology/REGISTRY.md:L1722-L1723

Code Quality

No findings in scope; this PR only updates tutorial content.

Performance

No findings in scope; no runtime code paths changed.

Maintainability

No findings in scope.

Tech Debt

No new findings in scope.

Security

No findings in scope; no new secret exposure, shelling, or external-input handling was introduced.

Documentation/Tests

Severity: P3. Impact: tutorial notebooks are still not executed in CI, so documentation regressions can slip through without automated coverage. This is already tracked and therefore non-blocking for this PR. Concrete fix: none required here; tracked in TODO.md. Location: TODO.md:L64-L68

igerber merged commit 215fff5 into main Mar 21, 2026

igerber deleted the power-analysis-notebook branch March 21, 2026 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update power analysis tutorial with simulation-based features#220

Update power analysis tutorial with simulation-based features#220
igerber merged 2 commits intomainfrom
power-analysis-notebook

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 21, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant