Skip to content

Update power analysis tutorial with simulation-based features#220

Merged
igerber merged 2 commits intomainfrom
power-analysis-notebook
Mar 21, 2026
Merged

Update power analysis tutorial with simulation-based features#220
igerber merged 2 commits intomainfrom
power-analysis-notebook

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Mar 21, 2026

Summary

  • Expand power analysis tutorial to cover the full simulation API from PR Extend power analysis to all estimators + simulation-based MDE/sample size #208
  • Add Section 8: Power analysis for any estimator (CallawaySantAnna, SyntheticDiD, TripleDifference examples + reference table of all 12 supported estimators)
  • Add Section 9: Simulation-based MDE search via simulate_mde() with search path inspection
  • Add Section 10: Simulation-based sample size search via simulate_sample_size() with analytical vs simulation comparison
  • Add Section 11: Custom data generators (data_generator_kwargs and fully custom data_generator)
  • Add analytical vs simulation guidance to practical recommendations
  • Update summary with new takeaways

Methodology references (required if estimator / math changes)

  • N/A — no methodology changes, tutorial notebook only

Validation

  • Tests added/updated: No test changes (tutorial notebook)
  • Backtest / simulation / notebook evidence: Notebook executes end-to-end without errors via jupyter nbconvert --execute

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Expand the tutorial to cover the full simulation power analysis API
added in PR #208: multi-estimator support (CallawaySantAnna, SyntheticDiD,
TripleDifference), simulate_mde() bisection search, simulate_sample_size()
bisection search, custom data generators, and analytical vs simulation
guidance. Includes reference table of all 12 supported estimators.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

Overall Assessment

⚠️ Needs changes

The notebook update is mostly aligned with power.py and the Methodology Registry, but one new example is methodologically incorrect and can yield a rank-deficient basic DiD specification. The other issues are documentation-accuracy problems rather than library-code regressions.

Executive Summary

Methodology

  • Severity: P1. Impact: The custom DGP cell defines "treated": int(treated_unit and post) and then passes treatment="treated", time="post" into DifferenceInDifferences.fit(). But basic DiD expects a time-invariant treated-group indicator plus a separate post indicator, and the estimator internally constructs the interaction treatment * time. With the notebook’s coding, the treatment column and interaction column are identical, violating the documented DiD specification Treated_i × Post_t and creating a rank-deficient design that can drop the ATT coefficient. Concrete fix: Emit a separate ever_treated/treated_group column, keep post as the time indicator, and pass treatment="ever_treated", time="post". docs/tutorials/06_power_analysis.ipynb:L626-L634 docs/methodology/REGISTRY.md:L41-L71 diff_diff/estimators.py:L249-L259

  • Severity: P2. Impact: The DDD subsection says sample sizes are snapped to multiples of 8 with “one unit per cell minimum,” but the adapter actually uses n_per_cell = max(2, n_units // 8), so values below 16 are clamped to an effective N of 16. The adjacent Min N = 64 table entry comes from the registry’s sample-size-search floor, not the general simulate_power() lower bound. As written, the section gives contradictory guidance about valid DDD sample sizes. Concrete fix: Rewrite the note to distinguish the two rules explicitly: DDD simulations round to max(2, n_units // 8) * 8 with minimum effective N = 16, while simulate_sample_size() starts from the registry floor of 64. docs/tutorials/06_power_analysis.ipynb:L482-L498 docs/methodology/REGISTRY.md:L1721-L1723 tests/test_power.py:L969-L980

Code Quality

  • No findings in scope; this PR only changes tutorial content.

Performance

  • No findings in scope; no runtime code paths changed.

Maintainability

  • No findings in scope beyond the documentation-accuracy items above.

Tech Debt

  • Severity: P3. Impact: Tutorial notebooks are still not executed in CI, which makes regressions in notebook examples easier to miss. This is already tracked, so it is not a blocker for this PR. Concrete fix: None required here; tracked in TODO.md. TODO.md:L64-L68

Security

  • No findings in scope; no secrets, unsafe shell usage, or new external-input handling were introduced.

Documentation/Tests

  • Severity: P3. Impact: The Section 8 opener says the simulation approach works with “any built-in estimator” and that “everything else is handled,” but simulate_power() auto-supports only the 12 classes in _ESTIMATOR_REGISTRY; unsupported built-ins require custom DGP plumbing, and some staggered configurations have the documented single-cohort DGP caveat. The later support table/note softens this, so this is informational rather than blocking. Concrete fix: Rephrase the opener to “any registered/supported estimator,” and point readers to the support table plus the custom-data_generator path for unsupported or special-case estimators. docs/tutorials/06_power_analysis.ipynb:L454-L498 diff_diff/power.py:L1522-L1529 docs/methodology/REGISTRY.md:L1722-L1723

Path to Approval

  1. Required: fix the custom data_generator example so it passes a time-invariant treated-group indicator to DifferenceInDifferences.fit() and keeps post separate.
  2. Recommended while touching the same section: correct the DDD prose/table so the tutorial distinguishes DGP rounding (max(2, n_units // 8) * 8, minimum effective N = 16) from the simulate_sample_size() registry floor (min_n = 64).

- P1: Fix custom DGP example to use time-invariant `ever_treated` column
  instead of post-treatment exposure indicator (rank-deficient design)
- P2: Correct DDD prose to document actual rounding rule
  `max(2, n_units // 8)` with min effective N = 16, distinguish from
  registry search floor of 64
- P3: Change Section 8 intro from "any built-in estimator" to
  "all 12 supported estimators" with forward refs to support table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: e26c0932b7dbe5b8ad86ff29778e5e95bc8e0f2a


Overall Assessment

✅ Looks good

Executive Summary

  • The prior re-review blockers are resolved: the custom DGP example now uses a time-invariant treated-group indicator plus a separate post indicator, and the DDD prose now matches the registry’s effective-N rounding and simulate_sample_size() floor behavior.
  • No estimator implementation, variance/SE path, inference code, or default library behavior changed in this PR; the diff is tutorial-only.
  • Methodology cross-check against the power-analysis registry is largely clean.
  • One remaining issue is informational only: some new prose still slightly overstates the “just swap in the estimator” story for staggered configurations that the registry explicitly documents as needing DGP overrides or warnings.
  • Notebook execution is still not covered in CI, but that limitation is already tracked in TODO.md and is not blocking.

Methodology

  • Severity: P3. Impact: Section 8 and the later summary/table still say the registry effectively handles everything once you “just swap in the estimator,” but that is slightly too broad for staggered configurations such as control_group="not_yet_treated", clean_control="strict", or anticipation>0. The library intentionally warns that the default single-cohort staggered DGP does not match those designs, and the Methodology Registry documents that caveat. This is not a defect because the deviation is documented, but the tutorial prose can still mislead readers about default applicability. Concrete fix: optionally soften the wording to say the registry covers all 12 registered estimators, while some staggered configurations still require data_generator_kwargs or a custom data_generator. Locations: docs/tutorials/06_power_analysis.ipynb:L454-L456, docs/tutorials/06_power_analysis.ipynb:L719-L728, diff_diff/power.py:L270-L328, docs/methodology/REGISTRY.md:L1722-L1723

Code Quality

  • No findings in scope; this PR only updates tutorial content.

Performance

  • No findings in scope; no runtime code paths changed.

Maintainability

  • No findings in scope.

Tech Debt

  • No new findings in scope.

Security

  • No findings in scope; no new secret exposure, shelling, or external-input handling was introduced.

Documentation/Tests

  • Severity: P3. Impact: tutorial notebooks are still not executed in CI, so documentation regressions can slip through without automated coverage. This is already tracked and therefore non-blocking for this PR. Concrete fix: none required here; tracked in TODO.md. Location: TODO.md:L64-L68

@igerber igerber merged commit 215fff5 into main Mar 21, 2026
@igerber igerber deleted the power-analysis-notebook branch March 21, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant