Skip to content

Address code review feedback for rank_control_units#12

Merged
igerber merged 1 commit intomainfrom
claude/code-review-recent-GgPBV
Jan 3, 2026
Merged

Address code review feedback for rank_control_units#12
igerber merged 1 commit intomainfrom
claude/code-review-recent-GgPBV

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 3, 2026

  • Move import to module level for efficiency
  • Add filtering for control units missing from pivot (unbalanced panels)
  • Use nanmean for RMSE to handle missing data
  • Fix edge case scoring when all controls have similar RMSE (min-max normalization)
  • Vectorize covariate distance computation for speed
  • Extract magic numbers to named constants
  • Add tests for unbalanced panels and single control unit edge case

- Move import to module level for efficiency
- Add filtering for control units missing from pivot (unbalanced panels)
- Use nanmean for RMSE to handle missing data
- Fix edge case scoring when all controls have similar RMSE (min-max normalization)
- Vectorize covariate distance computation for speed
- Extract magic numbers to named constants
- Add tests for unbalanced panels and single control unit edge case
@igerber igerber merged commit 4121ebb into main Jan 3, 2026
@igerber igerber deleted the claude/code-review-recent-GgPBV branch January 3, 2026 15:54
igerber added a commit that referenced this pull request Apr 18, 2026
…StageDiD

Addresses axis-C findings #8, #9, and #10 from the silent-failures audit:
three sites where a sparse factorization failure silently fell back to
dense lstsq without any user-facing signal.

- diff_diff/imputation.py:1516 (variance path: scipy.sparse.linalg.spsolve
  on (A_0' W A_0) z = A_1' w). Bare `except Exception` was swallowing
  the root cause before dense lstsq. Now emits a UserWarning identifying
  the exception type and explaining the fallback implication.
- diff_diff/two_stage.py:1647 (GMM sandwich: sparse_factorized on
  X'_{10} W X_{10} for Stage 1 normal equations). `except RuntimeError`
  was silent; now emits a UserWarning.
- diff_diff/two_stage_bootstrap.py:134 (bootstrap path: same pattern as
  above). `except RuntimeError` was silent; now emits a UserWarning.

All three are single-call sites (per fit, or per aggregation level, or
per bootstrap replicate at most a handful of times) so no aggregation
wrapper pattern is needed — one warning per fallback event is
appropriate.

REGISTRY.md updated under ImputationDiD and TwoStageDiD.

New tests (3): monkey-patch the sparse entry point to raise a
RuntimeError, run .fit(), assert the UserWarning fires with the
expected message prefix. Works against both the variance and bootstrap
surfaces.

Axis-C baseline: 3 major silent-fallback sites (imputation, two_stage,
two_stage_bootstrap) -> 0 remaining in these files. PowerAnalysis
simulation counter (finding #11) and ContinuousDiD B-spline (#12)
still open as separate follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
… cache

Bundles the two remaining S-complexity findings from the Phase 2 audit,
closing Phase 3 execution.

Finding #12 — ContinuousDiD B-spline degenerate knot (axis C, Minor,
`continuous_did_bspline.py:153`): `bspline_derivative_design_matrix`
silently swallowed `ValueError` from `scipy.interpolate.BSpline` in the
per-basis derivative loop, leaving affected columns of the derivative
design matrix as zero with no user-visible signal. Downstream
ContinuousDiD analytical inference then fed a biased `dPsi` into SE
computation. Fix aggregates failed-basis indices and emits ONE
`UserWarning` naming them. The all-identical-knot degenerate case
(single dose value, `knots[0] == knots[-1]`) remains silently handled —
derivatives there are mathematically zero, well-defined, and always
have been.

Finding #28 — PowerAnalysis survey-design cache staleness (axis J,
Major, `power.py:171-180`): `_build_survey_design()` populated
`self._cached_survey_design` on first call and never invalidated.
Mutating `config.survey_design` after `__init__` silently returned the
stale cached design. Default construction is microseconds and
user-provided designs are reference copies, so the cache never earned
its cost. Fix drops the cache entirely; method now reflects live
`self.survey_design` every call.

Six new tests:
- `tests/test_continuous_did.py::TestBSplineDerivativeDegenerateBasis` (3):
  single-dose silent contract, `ValueError`-forced aggregate warning,
  happy-path no-warning regression.
- `tests/test_power.py::TestSurveyPowerConfigDesignStaleness` (3):
  mutate-survey_design-picks-up-new, clearing-falls-back-to-default,
  repeat-calls-equivalent regression.

REGISTRY notes added under §ContinuousDiD (edge cases) and §PowerAnalysis
(`survey_config` section).

Audit state post-PR: all 28 actionable Phase-2 findings resolved (26 in
prior PRs; #12 + #28 here). Three P1 follow-ups remain logged in
`TODO.md` from PR #337's discovered divergences (FW/PGD algorithmic
mismatch in `compute_synthetic_weights`, TROP grid-search on rank-
deficient Y, TROP bootstrap RNG unification). Those are post-audit
cleanup work, not Phase-3 scope.

No behavioral changes on clean inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants