Fix CallawaySantAnna propensity score estimation (IRLS)#202
Conversation
Switch CallawaySantAnna and TripleDifference from BFGS-based logistic regression to IRLS (Fisher scoring), matching R's glm(family=binomial). Under near-separation (e.g., large-scale covariates like population trends), BFGS converges to different coefficients than IRLS, producing inflated ATT estimates. IRLS matches the reference implementations. Key changes: - Add solve_logit() and _check_propensity_diagnostics() to linalg.py - Add configurable pscore_trim parameter to CallawaySantAnna - Fix silent DR fallback (now emits warning) - Remove duplicated _logistic_regression() from staggered.py and triple_diff.py - Document IRLS algorithm and diagnostics in REGISTRY.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Verification note: I could not run the test suite in this environment because the Python dependencies are not installed ( |
…it rank_deficient_action - Validate pscore_trim in [0, 0.5) in CallawaySantAnna.__init__ (P1) - Add pscore_trim field to CallawaySantAnnaResults and pass from fit() (P1) - Add rank_deficient_action parameter to solve_logit() with warn/error/silent (P2) - Thread rank_deficient_action through staggered.py and triple_diff.py call sites (P2) - Add tests for validation, results propagation, non-default trim, and rank_deficient_action Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Verification note: I could not execute the test suite in this environment because |
Re-raise ValueError in IPW/DR except blocks when rank_deficient_action is "error" instead of silently falling back to unconditional estimation. Applies to CallawaySantAnna and TripleDifference PS paths. Add estimator-level regression tests asserting ValueError propagation with collinear covariates under error mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The previous blocker around Executive Summary
Methodology
Code Quality No findings. The prior Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
Path to Approval
Verification note: this review is by diff inspection only; I could not execute the test suite in this sandbox because |
- Change pscore_trim validation from [0, 0.5) to (0, 0.5) since zero trimming allows exact 1.0 propensity scores through to the weight formula pscore/(1-pscore), producing inf/NaN - Update REGISTRY.md fallback note to document error-mode re-raise - Add pscore_trim to CallawaySantAnna and CallawaySantAnnaResults docstrings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The previous zero-trim blocker is fixed at construction time, but one approval-blocking hole remains: the new public Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
Path to Approval
Review based on diff/source inspection only; I did not execute the test suite in this sandbox. |
set_params() can inject invalid pscore_trim values since it bypasses __init__ validation. Add check at fit() entry to catch 0.0, negative, and >=0.5 values before they reach IPW/DR weight formulas. Also update TripleDifference registry fallback note for error mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Review based on diff/source inspection only; I did not execute the test suite in this sandbox. |
Track P3 from PR #202 review: CS R helpers hard-code xformla = ~ 1, so the IRLS covariate path lacks tight R-backed regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
glm(family=binomial)algorithmsolve_logit()and_check_propensity_diagnostics()to the unifiedlinalg.pybackendpscore_trimparameter toCallawaySantAnna_logistic_regression()fromstaggered.pyandtriple_diff.pyMethodology references (required if estimator / math changes)
Validation
Security / privacy
Generated with Claude Code