Address code review feedback for CallawaySantAnna covariates#23
Merged
Address code review feedback for CallawaySantAnna covariates#23
Conversation
Owner
igerber
commented
Jan 3, 2026
- Remove unused self._covariates instance variable
- Add docstring note about SE approximation in outcome regression
- Fix empty influence function in IPW without covariates
- Add test for extreme propensity scores (near-perfect separation)
- Update class docstring with covariate adjustment example
- Remove unused self._covariates instance variable - Add docstring note about SE approximation in outcome regression - Fix empty influence function in IPW without covariates - Add test for extreme propensity scores (near-perfect separation) - Update class docstring with covariate adjustment example
6035375 to
b044ae3
Compare
- Move scattered docstring notes to docs/reviews/pr22_covariate_adjustment_review.md - Keep actual code fixes: removed unused self._covariates, fixed IPW influence function - Keep new test for extreme propensity scores
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
Phase 2 silent-failures audit — axis-G (backend parity). Closes the coverage gap the audit flagged in three Rust-backed solver surfaces. Test-only PR; any discovered divergences are marked `xfail(strict=True)` and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope. Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373, 597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity` covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank (cond > 1e10), and rank-deficient collinear designs under `skip_rank_check=True` on HC1. Backends agree on fitted values within `rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed. Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`): three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular `Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg variations are `xfail(strict=True)` with a baselined ~15-80% weight divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is projected gradient descent (`utils.py:1228`) — same QP, different simplex vertices under near-degenerate inputs. Finding #23 — TROP Rust grid-search + bootstrap parity (`trop_global.py:688-750, 966-1006`): two parity tests in `TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both `xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6% divergence), bootstrap SE under `seed=42` (~28% divergence, RNG backend mismatch — Rust `rand` crate vs numpy `default_rng`). Plan governance: - Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust entry points (`_solve_ols_rust`, `_rust_synthetic_weights`, `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`); no additional silent-fallback surfaces identified. - Per plan Non-goal #4, did not open an axis-H finding on TROP's `seed=None → 0` substitution at `trop_global.py:994` (out of scope). - No behavioral changes, no warnings, no REGISTRY changes, no flags. TODO.md logs three P1 follow-up entries: algorithmic unification for `compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on rank-deficient Y, TROP bootstrap RNG unification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 19, 2026
igerber
added a commit
that referenced
this pull request
Apr 21, 2026
…t docstring After this PR deletes the old row 87 from TODO.md, row 87 now points to a different item. Replace the row-number breadcrumb with "Silent-failures audit Finding #23 (grid-search half)" which is stable across future TODO.md reshuffles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 21, 2026
Unify Rust TROP inner solver to SVD (close finding #23 grid-search divergence)
igerber
added a commit
that referenced
this pull request
Apr 24, 2026
Rust and Python TROP backends produced different bootstrap standard errors for the same `seed` value. On a tiny correlated panel under `seed=42` the gap was ~28% of SE: Rust seeded `rand_xoshiro:: Xoshiro256PlusPlus` per replicate while Python's fallback consumed `numpy.random.default_rng` (PCG64), so identical seeds mapped to different bytestreams. Canonicalize on numpy. New `stratified_bootstrap_indices` helper in `diff_diff/bootstrap_utils.py` pre-generates per-replicate (control, treated) positional index arrays from a numpy `Generator` and hands them to both backends through the PyO3 surface — both Rust bootstrap functions (`bootstrap_trop_variance_global`, `bootstrap_trop_variance`) now accept `control_indices` and `treated_indices` as `i64` arrays in place of `seed: u64`. Parallelism is preserved. Sampling law (stratified: controls then treated, with replacement) is unchanged. Global-method SE is now backend-invariant under the same seed to machine precision: the prior `xfail(strict=True)` in `test_bootstrap_seed_reproducibility` is flipped to a passing `assert_allclose(atol=rtol=1e-14)` and parametrized over `[0, 42, 12345]`. A companion `test_bootstrap_seed_reproducibility_local` is added for the local-method bootstrap. It is currently `xfail(strict=True)` because aligning the RNG exposed two separate local-method backend divergences beyond this PR's scope: Rust's `compute_weight_matrix` normalizes time and unit weights to sum to 1, while Python's `_compute_observation_weights` does not; and the Python fallback's `_compute_observation_weights(_precomputed branch)` reads the original-panel cached `Y`/`D` instead of the bootstrap-sample arguments. Both are tracked as follow-up rows in `TODO.md` with file:line pointers and will land in a separate methodology PR. Closes the bootstrap half of silent-failures audit finding #23 (the grid-search half closed in PR #348). Reference: Athey, Imbens, Qu & Viviano (2025), "Triply Robust Panel Estimators", Algorithm 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 24, 2026
…+ Python cache-fallthrough Closes the local-method half of silent-failures audit finding #23 (RNG half closed in PR #354; grid-search half in PR #348). Two methodology fixes, both isolated to the local-method path — global is unaffected. 1. Rust weight-matrix normalization removed ------------------------------------------ `rust/src/trop.rs::compute_weight_matrix` no longer divides `time_weights` and `unit_weights` by their respective sums before the outer product. The paper's Equation 2/3 (Athey, Imbens, Qu, Viviano 2025) and REGISTRY.md Requirements checklist (`[x] Unit weights: exp(-λ_unit × distance) (unnormalized, matching Eq. 2)`) both specify raw-exponential weights; Python's `_compute_observation_weights` was already REGISTRY-compliant. Rust's normalization inflated the effective nuclear-norm penalty relative to the data-fit term, changing the regularization trade-off. User-visible effect: Rust local-method ATT values may shift for fits with `lambda_nn < infinity`. For `lambda_nn = infinity` (factor model disabled) outputs are unchanged — uniform weight scaling leaves the minimum-norm WLS argmin invariant. Rust LOOCV-selected lambdas may also shift on that boundary; both backends now converge on the same selection. Affects both local-method Rust call sites (LOOCV at trop.rs:459, bootstrap at trop.rs:1096). 2. Python `_compute_observation_weights` cache-fallthrough removed --------------------------------------------------------------- Removed the `if self._precomputed is not None:` branch that silently substituted `self._precomputed["Y"]` / `["D"]` / `["time_dist_matrix"]` (original-panel cache populated during main fit) for the function-argument `Y, D`. Under bootstrap, `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled `boot_data` and passes them in; the helper was discarding those and recomputing unit distances from the original panel, so Python's local bootstrap resampled units but reused stale unit-distance weights. Rust's bootstrap was already correct (always consumed `y_boot, d_boot`). Test changes ------------ - `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity:: test_bootstrap_seed_reproducibility_local`: flipped from `xfail(strict=True)` to passing `assert_allclose` at `atol=1e-5` across seeds `[0, 42, 12345]`. Residual ~1e-7 gap is Rust `estimate_model` vs numpy `lstsq` roundoff that accumulates differently across per-replicate bootstrap fits; follow-up TODO row tracks unifying Rust to the `solve_wls_svd` path (same SVD helper the global-method uses since PR #348) for sub-1e-14 parity. - New `test_local_method_main_fit_parity`: parametrized over `(lambda_nn=inf, atol=1e-14)` and `(lambda_nn=0.1, atol=1e-10)`; asserts `atol=1e-14` bit-identity for the main-fit ATT at `lambda_nn=inf` (the regression guard for the normalization fix) and `atol=1e-10` for the finite-`lambda_nn` FISTA path. Verification ------------ Targeted regression sweep — all green: - 9 `TestTROPRustEdgeCaseParity` tests (grid-search + global bootstrap × 3 seeds + local bootstrap × 3 seeds + local main-fit × 2 regimes) - Full `test_rust_backend.py` suite: 92 passed - Full `test_trop.py` suite under Rust backend: 120 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 24, 2026
P3 — the PR #354 [Unreleased] Fixed entry (line 21) said local-method bit-identity SE remained blocked by the Rust-normalization and Python cache-fallthrough divergences and was "tracked as a follow-up in TODO.md." With the two TROP-local Fixed entries that this PR adds (lines 22-27) closing exactly those divergences, the PR #354 tail sentence is now internally inconsistent with the surrounding entries. Rewritten to say the RNG half of finding #23 is closed here (bootstrap contract), grid-search half was closed in PR #348, and the local- method methodology half is closed by the two Fixed entries that follow in the same release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.