Add EPV diagnostics for propensity score logit#251
Conversation
|
Overall Assessment The highest-severity unmitigated findings are P1s in the new EPV/propensity-score behavior: one strict-mode contract break on the repeated-cross-section Callaway-Sant’Anna path, one weighted-EPV calculation bug, one Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods: propensity-score logit fallback and EPV diagnostics in
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology Affected methods: propensity-score logit diagnostics/fallback in
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: P1. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment ✅ Looks good Highest unmitigated severity: P2. Executive summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No new issue needs Security No findings. Documentation/Tests Source-based only in this environment:
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment ✅ Looks good Highest unmitigated severity: P2. Executive summary
Methodology Affected methods: propensity-score logit EPV diagnostics and fallback behavior in
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. The remaining P2s are not currently tracked in TODO.md:47, so I treated them as live review items rather than accepted debt. Security No findings. Documentation/Tests
Verification note: source-based only here; |
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Highest unmitigated severity: P2. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ault Events Per Variable (EPV) check in solve_logit warns when minority-class observations per parameter falls below threshold (default 10, per Peduzzi et al. 1996). Affects CallawaySantAnna, TripleDifference, and StaggeredTripleDiff — all estimators using logit for propensity scores. New pscore_fallback parameter defaults to "error" instead of silently dropping covariates when logit fails. Set pscore_fallback="unconditional" for legacy behavior. diagnose_propensity() method on CallawaySantAnna enables pre-estimation EPV assessment across cohorts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cstrings P0: Zero-fill NaN coefficients from dropped rank-deficient columns before caching in CS panel IPW/DR paths, preventing NaN propagation on cache reuse. Matches existing pattern in StaggeredTripleDiff._compute_pscore(). P1: Restore strict-mode semantics so rank_deficient_action="error" always re-raises regardless of pscore_fallback setting. Update TripleDifference REGISTRY.md section with pscore_fallback and EPV documentation. Propagate epv_threshold and pscore_fallback through triple_difference() wrapper. P2: Store epv_threshold on results objects for correct summary rendering. Add class docstrings for new parameters. Add regression tests for cache NaN poisoning and strict-mode interaction with pscore_fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e diagnostics Fix RCS fallback handlers to re-raise when rank_deficient_action="error", matching the panel path fix. Compute EPV on positive-weight sample only when weights have zeros (Peduzzi's rule applies to the fitted sample). Guard diagnose_propensity() against panel=False with NotImplementedError. Update StaggeredTripleDifference REGISTRY.md section with EPV diagnostics and pscore_fallback documentation. Cache EPV diagnostic metadata alongside logit coefficients so cache-hit cells appear in epv_summary(show_all=True). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pagation Retain worst-case (minimum) EPV across all g_c comparison cohorts for the same (g,t) cell instead of overwriting. Cache EPV diagnostic metadata alongside logit coefficients in _compute_pscore() and propagate on cache hits, matching the CallawaySantAnna pattern. Add REGISTRY.md note documenting the cell-level worst-case reporting convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y guard, result params Skip propensity-score influence function correction when unconditional fallback is used (constant pscore has zero estimation uncertainty). Adds ps_fallback_used flag across all 4 IPW/DR methods (panel+RCS). Guard diagnose_propensity() against control_group='not_yet_treated' with NotImplementedError since the control set varies per (g,t) cell. Propagate pscore_fallback to all three results dataclasses and epv_threshold to TripleDifferenceResults for full audit trail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…f handling Use weighted treated share (np.average with survey weights) for unconditional fallback propensity instead of raw count ratio. Applies to all 4 panel/RCS IPW/DR fallback sites. Normalize np.inf → 0 for never-treated encoding in diagnose_propensity() to match fit()'s treatment_groups derivation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use weighted subgroup share (np.average with survey weights) for unconditional fallback instead of raw np.mean(PA4), matching the survey-weighted logit semantics used in CS and SDDD fallback paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aveat Add low-EPV diagnostic block to TripleDifferenceResults.summary(), matching the pattern in CS and SDDD results classes. Document diagnose_propensity() as a raw-count heuristic that may overstate EPV vs. fit-time effective sample (missing outcomes, zero survey weights). Direct users to results.epv_diagnostics for authoritative per-cell EPV. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention) Peduzzi et al. (1996) define EPV using independent predictor variables, not including the intercept. Change denominator from k_solve (which includes the intercept column) to n_predictors = k_solve - 1. Also fix TripleDifference fallback warning to use correct API keyword estimation_method (not est_method). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DR fallback warnings now say propensity model is unconditional while outcome regression still uses covariates, instead of misleading "all covariates dropped" text. IPW warnings unchanged. Update REGISTRY.md fallback description to distinguish IPW vs DR behavior. Fix docstrings to say "predictor variables (excluding intercept)" consistently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ention Make all fallback warnings and docstrings method-specific: IPW says covariates dropped, DR says propensity model unconditional while outcome regression still uses covariates. Update test comments from old intercept-inclusive arithmetic to predictor-variable counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensure epv_summary(show_all=False) returns DataFrame with correct column schema even when no entries have low EPV, across all three results classes. Fix remaining test comments to use intercept-excluded EPV arithmetic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6a2e254 to
67bc6db
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated severity: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
solve_logit()that warns when minority-class observations per parameter falls below threshold (default 10, per Peduzzi et al. 1996). Affects all estimators using logit for propensity scores: CallawaySantAnna, TripleDifference, StaggeredTripleDiff.pscore_fallbackparameter defaults to"error"instead of silently dropping all covariates when logit fails. Setpscore_fallback="unconditional"for legacy behavior.diagnose_propensity()method on CallawaySantAnna enables pre-estimation EPV assessment across all cohorts without running the full estimation.results.epv_diagnosticswithepv_summary()method and diagnostic block insummary()output.rank_deficient_action="error"always re-raises regardless ofpscore_fallbacksetting.Methodology references (required if estimator / math changes)
Validation
tests/test_linalg.py(7 EPV unit tests),tests/test_staggered.py(12 integration tests including cache NaN regression, strict-mode interaction, diagnose_propensity),tests/test_methodology_triple_diff.py(fallback test updated),tests/test_survey_staggered_ddd.py(fallback test updated)Security / privacy
Generated with Claude Code