Extend power analysis to all estimators + simulation-based MDE/sample size#208
Extend power analysis to all estimators + simulation-based MDE/sample size#208
Conversation
|
Overall Assessment ⛔ Blocker Highest unmitigated severity is P0. Executive Summary
Path to Approval
Methodology
Code Quality
Performance No material findings. Maintainability No material findings beyond the correctness issues above. Tech Debt No separate tech-debt-only findings. The issues above are not tracked in TODO.md:38, so they remain unmitigated for assessment purposes. Security No material findings. Documentation/Tests
Review note: I could not execute the test suite in this environment because the local Python environment is missing |
…bisection brackets P0: Remove ContinuousDiD from power analysis registry — the DGP simulated att_slope as treatment_effect but extracted overall_att (a dose-weighted average), so the simulated truth and evaluated estimand didn't match. P1: Add endpoint evaluation for user-supplied effect_range/n_range in simulate_mde() and simulate_sample_size(). Previously these paths skipped evaluation entirely, leaving search_path empty and best_power at 0.0. Now warns when brackets don't contain the target power. P2: Add regression tests for all three fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: P1. Executive Summary
Methodology
Code Quality
Performance No material findings in the changed code. Maintainability No material findings beyond the correctness issues above. Tech Debt No separate deferred-work findings. These issues are not tracked in the code-review debt section of TODO.md:L38, so they remain unmitigated for assessment. Security No material findings. Documentation/Tests
Path to Approval
Static diff review only. I could not execute the test suite in this environment because the local Python environment is missing |
…short-circuit - Remove TwoWayFixedEffects from power analysis registry (time="period" produces treated*period_number, not standard ATT) - Add early return in simulate_sample_size() when lower bound already achieves target power (both explicit n_range and auto-bracket paths) - Narrow docstring from "All" to "Most" built-in estimators - Add regression tests for TWFE exclusion and lo-sufficient scenarios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: P1. The two prior blockers from the last review appear resolved: Executive Summary
Methodology
Code Quality No material findings beyond the methodology issue above. Performance No material findings in the changed code. Maintainability No material findings. Tech Debt No separate deferred-work findings. The remaining P1 is not tracked under the code-review debt section of TODO.md:L38-L64, so it remains unmitigated for assessment. Security No material findings. Documentation/Tests
Path to Approval
Static review only. I could not run the test suite in this environment because |
…ro-power bug Add _sdid_dgp_kwargs that caps treatment_fraction at 0.4 so the placebo variance method has enough pseudo-controls (n_control > n_treated). Add regression test for the default SyntheticDiD path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Highest unmitigated severity: Executive Summary
Methodology
Code Quality No material findings beyond the methodology blocker above. Performance No material findings in the changed code. Maintainability No material findings beyond the test gap below. Tech Debt No relevant entry under TODO.md:L38-L64 mitigates the blocker above, so it remains unmitigated for assessment. Security No material findings. Documentation/Tests
Path to Approval
Static review only. I could not run |
…n cap with fail-fast validation Remove _sdid_dgp_kwargs that silently capped treatment_fraction at 0.4. Add upfront ValueError in simulate_power() when SyntheticDiD placebo variance has n_control <= n_treated, with actionable error message suggesting lowering treatment_fraction or using bootstrap variance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance No material findings in the changed code. Maintainability
Tech Debt No relevant Security No material findings. Documentation/Tests
Path to Approval
Static review only. I did not execute the test suite in this environment. |
…tor fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated severity: Static review only. I could not execute the test suite in this environment because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
/ai-review |
…sample size - Add estimator registry mapping all 12 estimators to appropriate DGP, fit kwargs, and result extraction profiles - Refactor simulate_power() to use registry lookup instead of hardcoded if/elif chain (was 4 estimators, now 12 + custom fallback) - Add simulate_mde() for bisection search over effect sizes - Add simulate_sample_size() for bisection search over n_units - Add SimulationMDEResults and SimulationSampleSizeResults dataclasses - 34 new tests: registry, estimator coverage, MDE search, sample size search Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bisection brackets P0: Remove ContinuousDiD from power analysis registry — the DGP simulated att_slope as treatment_effect but extracted overall_att (a dose-weighted average), so the simulated truth and evaluated estimand didn't match. P1: Add endpoint evaluation for user-supplied effect_range/n_range in simulate_mde() and simulate_sample_size(). Previously these paths skipped evaluation entirely, leaving search_path empty and best_power at 0.0. Now warns when brackets don't contain the target power. P2: Add regression tests for all three fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…short-circuit - Remove TwoWayFixedEffects from power analysis registry (time="period" produces treated*period_number, not standard ATT) - Add early return in simulate_sample_size() when lower bound already achieves target power (both explicit n_range and auto-bracket paths) - Narrow docstring from "All" to "Most" built-in estimators - Add regression tests for TWFE exclusion and lo-sufficient scenarios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ro-power bug Add _sdid_dgp_kwargs that caps treatment_fraction at 0.4 so the placebo variance method has enough pseudo-controls (n_control > n_treated). Add regression test for the default SyntheticDiD path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n cap with fail-fast validation Remove _sdid_dgp_kwargs that silently capped treatment_fraction at 0.4. Add upfront ValueError in simulate_power() when SyntheticDiD placebo variance has n_control <= n_treated, with actionable error message suggesting lowering treatment_fraction or using bootstrap variance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tor fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… API symbols - Add `result_extractor` parameter to simulate_power, simulate_mde, and simulate_sample_size for unregistered estimators with non-standard schemas - Fix power.rst: correct PowerAnalysis method names, example code, and add SimulationMDEResults/SimulationSampleSizeResults/simulate_mde/simulate_sample_size - Add 4 missing symbols to docs/api/index.rst autosummary - Add api/power.rst to doc snippet smoke tests - Add tests for custom result_extractor and MDE forwarding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Add _check_staggered_dgp_compat() that detects control_group="not_yet_treated", clean_control="strict", and anticipation>0 on staggered estimators using the automatic single-cohort DGP, and emits UserWarning with specific guidance on supplying data_generator_kwargs or a custom data_generator. - 5 warning/no-warning unit tests + 2 slow regression tests with matching DGP - Document limitation in REGISTRY.md PowerAnalysis section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Highest unmitigated severity: Executive Summary
Methodology
Code Quality No material findings in the changed code beyond the methodology issue above. Performance No material findings. Maintainability No material findings. Tech Debt No material findings newly introduced by this diff. Security No material findings. Documentation/Tests
Path to Approval
|
… override
Move the SyntheticDiD placebo-variance fail-fast after kwargs merge so that
data_generator_kwargs={"n_treated": N} overrides are caught. Add tests for
the override path across simulate_power, simulate_mde, and simulate_sample_size.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Highest unmitigated severity: Executive Summary
Methodology
Code Quality No material findings beyond the methodology issue above. Performance No material findings. Maintainability No material findings beyond the need to make the Tech Debt No separate tech-debt-only findings. The Security No material findings. Documentation/Tests
Path to Approval
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only; I could not execute Executive Summary
Methodology
Code Quality No material findings beyond the correctness issues above. Performance No material findings. Maintainability No material findings beyond the need to separate “initial search guess” from “true minimum supported N” in the sample-size search path. Tech Debt
Security No material findings. Documentation/Tests
Path to Approval
|
… warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not run Executive Summary
Methodology
Code Quality No material findings beyond the correctness issues above. Performance No material findings. Maintainability No material findings beyond the unresolved Tech Debt
Security No material findings. Documentation/Tests
Path to Approval
|
Summary
simulate_power()from hardcoded 4-estimator if/elif chain to registry-based lookup supporting all estimators automaticallysimulate_mde()— bisection search over effect sizes to find the minimum detectable effect for any estimatorsimulate_sample_size()— bisection search overn_unitsto find the required sample size for any estimatorSimulationMDEResultsandSimulationSampleSizeResultsdataclasses withsummary(),to_dict(),to_dataframe()methodsMethodology references (required if estimator / math changes)
Validation
tests/test_power.py— 34 new tests across 4 test classes (TestEstimatorRegistry,TestEstimatorCoverage,TestSimulateMDE,TestSimulateSampleSize)black,ruff check, andmypyall pass cleanSecurity / privacy
Generated with Claude Code