Skip to content

Address code review feedback for CallawaySantAnna covariates#23

Merged
igerber merged 2 commits intomainfrom
claude/review-pr-22-1ulag
Jan 3, 2026
Merged

Address code review feedback for CallawaySantAnna covariates#23
igerber merged 2 commits intomainfrom
claude/review-pr-22-1ulag

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 3, 2026

  • Remove unused self._covariates instance variable
  • Add docstring note about SE approximation in outcome regression
  • Fix empty influence function in IPW without covariates
  • Add test for extreme propensity scores (near-perfect separation)
  • Update class docstring with covariate adjustment example

- Remove unused self._covariates instance variable
- Add docstring note about SE approximation in outcome regression
- Fix empty influence function in IPW without covariates
- Add test for extreme propensity scores (near-perfect separation)
- Update class docstring with covariate adjustment example
@igerber igerber force-pushed the claude/review-pr-22-1ulag branch from 6035375 to b044ae3 Compare January 3, 2026 23:18
- Move scattered docstring notes to docs/reviews/pr22_covariate_adjustment_review.md
- Keep actual code fixes: removed unused self._covariates, fixed IPW influence function
- Keep new test for extreme propensity scores
@igerber igerber merged commit c39c0f1 into main Jan 3, 2026
@igerber igerber deleted the claude/review-pr-22-1ulag branch January 3, 2026 23:38
igerber added a commit that referenced this pull request Apr 19, 2026
Phase 2 silent-failures audit — axis-G (backend parity). Closes the
coverage gap the audit flagged in three Rust-backed solver surfaces.
Test-only PR; any discovered divergences are marked `xfail(strict=True)`
and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope.

Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373,
597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity`
covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank
(cond > 1e10), and rank-deficient collinear designs under
`skip_rank_check=True` on HC1. Backends agree on fitted values within
`rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed.

Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`):
three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular
`Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg
variations are `xfail(strict=True)` with a baselined ~15-80% weight
divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is
projected gradient descent (`utils.py:1228`) — same QP, different
simplex vertices under near-degenerate inputs.

Finding #23 — TROP Rust grid-search + bootstrap parity
(`trop_global.py:688-750, 966-1006`): two parity tests in
`TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both
`xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6%
divergence), bootstrap SE under `seed=42` (~28% divergence, RNG
backend mismatch — Rust `rand` crate vs numpy `default_rng`).

Plan governance:
- Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust
  entry points (`_solve_ols_rust`, `_rust_synthetic_weights`,
  `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`);
  no additional silent-fallback surfaces identified.
- Per plan Non-goal #4, did not open an axis-H finding on TROP's
  `seed=None → 0` substitution at `trop_global.py:994` (out of scope).
- No behavioral changes, no warnings, no REGISTRY changes, no flags.

TODO.md logs three P1 follow-up entries: algorithmic unification for
`compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on
rank-deficient Y, TROP bootstrap RNG unification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 21, 2026
…t docstring

After this PR deletes the old row 87 from TODO.md, row 87 now points to a
different item. Replace the row-number breadcrumb with "Silent-failures
audit Finding #23 (grid-search half)" which is stable across future
TODO.md reshuffles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 21, 2026
Unify Rust TROP inner solver to SVD (close finding #23 grid-search divergence)
igerber added a commit that referenced this pull request Apr 24, 2026
Rust and Python TROP backends produced different bootstrap standard
errors for the same `seed` value. On a tiny correlated panel under
`seed=42` the gap was ~28% of SE: Rust seeded `rand_xoshiro::
Xoshiro256PlusPlus` per replicate while Python's fallback consumed
`numpy.random.default_rng` (PCG64), so identical seeds mapped to
different bytestreams.

Canonicalize on numpy. New `stratified_bootstrap_indices` helper in
`diff_diff/bootstrap_utils.py` pre-generates per-replicate
(control, treated) positional index arrays from a numpy `Generator`
and hands them to both backends through the PyO3 surface — both
Rust bootstrap functions (`bootstrap_trop_variance_global`,
`bootstrap_trop_variance`) now accept `control_indices` and
`treated_indices` as `i64` arrays in place of `seed: u64`. Parallelism
is preserved. Sampling law (stratified: controls then treated, with
replacement) is unchanged.

Global-method SE is now backend-invariant under the same seed to
machine precision: the prior `xfail(strict=True)` in
`test_bootstrap_seed_reproducibility` is flipped to a passing
`assert_allclose(atol=rtol=1e-14)` and parametrized over
`[0, 42, 12345]`.

A companion `test_bootstrap_seed_reproducibility_local` is added for
the local-method bootstrap. It is currently `xfail(strict=True)`
because aligning the RNG exposed two separate local-method backend
divergences beyond this PR's scope: Rust's `compute_weight_matrix`
normalizes time and unit weights to sum to 1, while Python's
`_compute_observation_weights` does not; and the Python fallback's
`_compute_observation_weights(_precomputed branch)` reads the
original-panel cached `Y`/`D` instead of the bootstrap-sample
arguments. Both are tracked as follow-up rows in `TODO.md` with
file:line pointers and will land in a separate methodology PR.

Closes the bootstrap half of silent-failures audit finding #23 (the
grid-search half closed in PR #348). Reference: Athey, Imbens, Qu &
Viviano (2025), "Triply Robust Panel Estimators", Algorithm 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 24, 2026
…+ Python cache-fallthrough

Closes the local-method half of silent-failures audit finding #23
(RNG half closed in PR #354; grid-search half in PR #348). Two
methodology fixes, both isolated to the local-method path — global
is unaffected.

1. Rust weight-matrix normalization removed
   ------------------------------------------
   `rust/src/trop.rs::compute_weight_matrix` no longer divides
   `time_weights` and `unit_weights` by their respective sums before
   the outer product. The paper's Equation 2/3 (Athey, Imbens, Qu,
   Viviano 2025) and REGISTRY.md Requirements checklist
   (`[x] Unit weights: exp(-λ_unit × distance) (unnormalized,
   matching Eq. 2)`) both specify raw-exponential weights; Python's
   `_compute_observation_weights` was already REGISTRY-compliant.
   Rust's normalization inflated the effective nuclear-norm penalty
   relative to the data-fit term, changing the regularization
   trade-off. User-visible effect: Rust local-method ATT values may
   shift for fits with `lambda_nn < infinity`. For
   `lambda_nn = infinity` (factor model disabled) outputs are
   unchanged — uniform weight scaling leaves the minimum-norm WLS
   argmin invariant. Rust LOOCV-selected lambdas may also shift on
   that boundary; both backends now converge on the same selection.
   Affects both local-method Rust call sites (LOOCV at trop.rs:459,
   bootstrap at trop.rs:1096).

2. Python `_compute_observation_weights` cache-fallthrough removed
   ---------------------------------------------------------------
   Removed the `if self._precomputed is not None:` branch that
   silently substituted `self._precomputed["Y"]` / `["D"]` /
   `["time_dist_matrix"]` (original-panel cache populated during
   main fit) for the function-argument `Y, D`. Under bootstrap,
   `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled
   `boot_data` and passes them in; the helper was discarding those
   and recomputing unit distances from the original panel, so
   Python's local bootstrap resampled units but reused stale
   unit-distance weights. Rust's bootstrap was already correct
   (always consumed `y_boot, d_boot`).

Test changes
------------
- `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::
  test_bootstrap_seed_reproducibility_local`: flipped from
  `xfail(strict=True)` to passing `assert_allclose` at `atol=1e-5`
  across seeds `[0, 42, 12345]`. Residual ~1e-7 gap is Rust
  `estimate_model` vs numpy `lstsq` roundoff that accumulates
  differently across per-replicate bootstrap fits; follow-up TODO
  row tracks unifying Rust to the `solve_wls_svd` path (same SVD
  helper the global-method uses since PR #348) for sub-1e-14
  parity.
- New `test_local_method_main_fit_parity`: parametrized over
  `(lambda_nn=inf, atol=1e-14)` and `(lambda_nn=0.1, atol=1e-10)`;
  asserts `atol=1e-14` bit-identity for the main-fit ATT at
  `lambda_nn=inf` (the regression guard for the normalization fix)
  and `atol=1e-10` for the finite-`lambda_nn` FISTA path.

Verification
------------
Targeted regression sweep — all green:
- 9 `TestTROPRustEdgeCaseParity` tests (grid-search + global
  bootstrap × 3 seeds + local bootstrap × 3 seeds + local main-fit
  × 2 regimes)
- Full `test_rust_backend.py` suite: 92 passed
- Full `test_trop.py` suite under Rust backend: 120 passed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 24, 2026
P3 — the PR #354 [Unreleased] Fixed entry (line 21) said local-method
bit-identity SE remained blocked by the Rust-normalization and Python
cache-fallthrough divergences and was "tracked as a follow-up in
TODO.md." With the two TROP-local Fixed entries that this PR adds
(lines 22-27) closing exactly those divergences, the PR #354 tail
sentence is now internally inconsistent with the surrounding entries.
Rewritten to say the RNG half of finding #23 is closed here (bootstrap
contract), grid-search half was closed in PR #348, and the local-
method methodology half is closed by the two Fixed entries that
follow in the same release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants