Conversation
|
Overall assessment Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…trol_units Post-audit cleanup closing finding #22 from the Phase 2 silent-failures audit. Replaces the previous "fix both backends" approach with "delete the helper and inline correctly in its single caller." Why the scope change. The audit found that Rust and Python backends of `compute_synthetic_weights` solved different QPs (Python shrink-to-uniform, Rust shrink-to-zero) with different step sizes (Python adaptive, Rust broken constant 0.1 that diverges at large Y). Closer inspection revealed: - `compute_synthetic_weights` was private (not in __all__). - Its only non-test caller was `rank_control_units` in prep.py — a user-facing diagnostic that ranks control units by trend similarity. - The `synthetic_weight` column it feeds into the `rank_control_units` DataFrame is informational only — `quality_score` is computed from RMSE + covariate distance, NOT from `synthetic_weight`. So the bug did not affect donor-selection ranking at all. - `SyntheticDiD.fit()` does NOT use this helper. It uses the Frank-Wolfe solver (`compute_sdid_unit_weights` / `_sc_weight_fw_numpy`) directly on a completely independent code path that already matches R synthdid. Maintaining two broken PGD implementations of a one-caller helper was not worth the cost. Delete the shim and inline correctly. Changes: - Delete `compute_synthetic_weights` (utils.py:1134) and `_compute_synthetic_weights_numpy` (utils.py:1202). - Remove `_rust_synthetic_weights` from the `_backend.py` import and None-fallback, and from the `__init__.py` re-export. - Remove the import from `prep.py`. - Inline a Frank-Wolfe computation in `rank_control_units` (prep.py:990) using the shared `_sc_weight_fw` dispatcher — same solver SDID uses, matching R `synthdid::sc.weight.fw`. Threads `zeta=sqrt(lambda_reg/N)` to absorb FW's (1/N) objective scaling, noise-level-scaled `min_decrease` per R convention, and `max_iter=1000` to preserve the existing cost envelope. - Short-circuit `n_control in {0, 1}` to avoid FW loop overhead on trivially-solvable cases. - Suppress non-convergence warnings inside the inline block — the caller uses the output as a heuristic score, not a statistical estimate. Tests: - Delete `TestSyntheticWeightsBackendParity` (3 tests, 2 xfailed) and the 3 direct-Rust-import tests in `test_rust_backend.py`. - Delete `TestComputeSyntheticWeightsEdgeCases` (4 tests) in test_utils.py. - Delete `test_compute_synthetic_weights` in test_estimators.py. - Add `test_extreme_Y_scale_synthetic_weight_column` in test_prep.py — regression guard asserting the `synthetic_weight` column produces a valid non-degenerate simplex vector at `Y ~ 1e9` (the exact input that the old Rust PGD mishandled by collapsing onto a single vertex). SDID invariance. Verified via bit-identical pre/post numerical baseline on a fixed-seed `SyntheticDiD.fit()` (n_units=30, n_pre=8, n_post=2, n_treated=3, n_bootstrap=20, seed=42): before: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 after: att=1.1895009159752075 se=0.2576531609311485 weight_sum=1.000000000000002e+00 weight_max=6.820411397386437e-02 Full SDID test suite: 40/40 pass before and after (test outcome diff empty). All 305 tests in the touched files pass. Rust-side `compute_synthetic_weights` PyO3 binding in `rust/src/weights.rs` is now dead code (no Python code calls it). Tracked as a low-priority cleanup follow-up in TODO.md; removal requires `maturin develop` rebuild. User-visible impact: - `rank_control_units` public signature and return schema unchanged. - `quality_score` values and ranking: unchanged. - `synthetic_weight` column: values agree with old code at ~1e-7 on default-parameter typical data; differ only in edge cases where the old code was mathematically wrong (extreme Y scale, lambda_reg > 0). - Private imports break: `from diff_diff.utils import compute_synthetic_weights`, `from diff_diff._rust_backend import compute_synthetic_weights`. These were not part of the documented stable API. Net diff: +110 / -367 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
69d3df3 to
de21ef9
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment ✅ Looks good. No unmitigated P0/P1 findings in the current diff. The prior methodology concern is addressed: the changed Executive summary
Methodology No P0/P1 findings. The changed code now correctly says
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…+ backend-parity regressions - diff_diff/utils.py: removed stale "matching R synthdid" language from the deletion-marker comment; now says the caller uses the shared FW dispatcher for a single-pass uncentered heuristic and points canonical SDID users to compute_sdid_unit_weights. - tests/test_prep.py: two new TestRankControlUnits regressions covering the surface the reviewer flagged — (a) lambda_reg > 0 produces a valid simplex and does not collapse support, (b) full-pipeline Rust/Python backend parity via patch.object(utils_mod, "HAS_RUST_BACKEND", False). 512 passed, 14 deselected across tests/test_prep.py, tests/test_utils.py, tests/test_estimators.py, tests/test_rust_backend.py; SDID fixed-seed baseline unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings in the current diff. Executive Summary
Methodology No findings. The changed weighting logic is confined to the Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
…_weight docstring Docstring at prep.py:841 now states synthetic_weight is an informational heuristic from a single-pass uncentered Frank-Wolfe solve, does NOT factor into quality_score ranking, and is NOT the canonical SDID unit weight. Directs users who want canonical SDID weights to SyntheticDiD.fit(). Aligns the public contract with the implementation notes already added in the prior round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings in the current diff. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Remove dead Rust compute_synthetic_weights (follow-up to PR #344)
Summary
Post-audit cleanup closing finding #22 from the Phase 2 silent-failures audit. Replaces the "fix both backends" approach with "delete the helper and inline correctly in its single caller."
Scope rationale. The audit found that Rust and Python backends of `compute_synthetic_weights` solved different QPs with different step sizes. Closer inspection revealed the function was private, had one non-test caller (`rank_control_units` in prep.py), contributed to an informational-only column (`synthetic_weight` doesn't factor into `quality_score`; the ranking is unaffected by the bug), and was totally disconnected from `SyntheticDiD.fit()` (which uses Frank-Wolfe directly via a separate code path already matching R `synthdid`). Maintaining two broken PGD implementations of a one-caller helper was not worth the cost.
What changed
SDID invariance verified.
Before/after `SyntheticDiD.fit()` on fixed-seed data:
Bit-identical. Full SDID test suite: 40/40 pass, same IDs.
User-visible impact. `rank_control_units` public signature + return schema + `quality_score` ranking all unchanged. `synthetic_weight` column values unchanged to ~1e-7 on default-parameter typical data; differ only in edge cases where the old code was mathematically wrong. Private imports break (not part of the stable API).
Rust dead code. The `compute_synthetic_weights` PyO3 binding in `rust/src/weights.rs` is now unused by Python. Logged as a low-priority cleanup follow-up in TODO.md; requires `maturin develop` rebuild to remove.
Methodology references
Validation
Security / privacy
No secrets/PII in this PR: Yes.
Generated with Claude Code