Close axis-C/J silent-failures audit: B-spline derivative + PA survey cache#339
Close axis-C/J silent-failures audit: B-spline derivative + PA survey cache#339
Conversation
… cache Bundles the two remaining S-complexity findings from the Phase 2 audit, closing Phase 3 execution. Finding #12 — ContinuousDiD B-spline degenerate knot (axis C, Minor, `continuous_did_bspline.py:153`): `bspline_derivative_design_matrix` silently swallowed `ValueError` from `scipy.interpolate.BSpline` in the per-basis derivative loop, leaving affected columns of the derivative design matrix as zero with no user-visible signal. Downstream ContinuousDiD analytical inference then fed a biased `dPsi` into SE computation. Fix aggregates failed-basis indices and emits ONE `UserWarning` naming them. The all-identical-knot degenerate case (single dose value, `knots[0] == knots[-1]`) remains silently handled — derivatives there are mathematically zero, well-defined, and always have been. Finding #28 — PowerAnalysis survey-design cache staleness (axis J, Major, `power.py:171-180`): `_build_survey_design()` populated `self._cached_survey_design` on first call and never invalidated. Mutating `config.survey_design` after `__init__` silently returned the stale cached design. Default construction is microseconds and user-provided designs are reference copies, so the cache never earned its cost. Fix drops the cache entirely; method now reflects live `self.survey_design` every call. Six new tests: - `tests/test_continuous_did.py::TestBSplineDerivativeDegenerateBasis` (3): single-dose silent contract, `ValueError`-forced aggregate warning, happy-path no-warning regression. - `tests/test_power.py::TestSurveyPowerConfigDesignStaleness` (3): mutate-survey_design-picks-up-new, clearing-falls-back-to-default, repeat-calls-equivalent regression. REGISTRY notes added under §ContinuousDiD (edge cases) and §PowerAnalysis (`survey_config` section). Audit state post-PR: all 28 actionable Phase-2 findings resolved (26 in prior PRs; #12 + #28 here). Three P1 follow-ups remain logged in `TODO.md` from PR #337's discovered divergences (FW/PGD algorithmic mismatch in `compute_synthetic_weights`, TROP grid-search on rank- deficient Y, TROP bootstrap RNG unification). Those are post-audit cleanup work, not Phase-3 scope. No behavioral changes on clean inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI reviewer correctly noted that dPsi feeds both ACRT point estimates (continuous_did.py:1026-1046, bootstrap path at 1524-1561) and the analytical/bootstrap inference, not just inference alone. Updated the warning text and the REGISTRY note to say both outputs are biased when derivative construction fails. No behavior change; text-only accuracy fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Reviewer noted the previous registry note said "any other config field" could trigger the stale cache, but _build_survey_design() only depends on self.survey_design. Narrowed the note to call out survey_design reassignment specifically (replace user-supplied design or toggle between None and a user-supplied design). Also explicitly states the other config fields never influenced the returned design, to prevent misleading future triage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Reviewer noted the partial-failure test only asserted failed columns are zero and the warning fires, but didn't verify non-failed columns remain correctly populated. A regression zeroing the entire derivative matrix on any ValueError would have passed. Added a baseline comparison (un-patched bspline_derivative_design_matrix) and asserted equality on unaffected columns, plus a sanity check that at least one unaffected non-intercept column is non-zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings in the changed surface. On inspection, the prior P3 test-gap from the last AI review appears addressed. Executive Summary
Methodology No findings. Cross-checking the ContinuousDiD methodology reference at docs/methodology/continuous-did.md:1, the new registry note at docs/methodology/REGISTRY.md:726, the helper change at diff_diff/continuous_did_bspline.py:143, and the ACRT/bootstrapping call sites at diff_diff/continuous_did.py:1026 and diff_diff/continuous_did.py:1524 shows no change to estimands, identification assumptions, weighting, or variance/SE formulas. The power-side change in diff_diff/power.py:169 is cache semantics only, not a methodological change. Code Quality No findings. Performance No findings. Dropping Maintainability No findings. The implementation, tests, and registry notes are aligned across diff_diff/continuous_did_bspline.py:168, diff_diff/power.py:169, docs/methodology/REGISTRY.md:726, and docs/methodology/REGISTRY.md:2586. Tech Debt No findings. The Rust/TROP follow-ups referenced in the PR body are already tracked in TODO.md:86, TODO.md:87, and TODO.md:88, so they are non-blocking under the stated rubric. Security No findings. Documentation/Tests No findings. The previous re-review’s P3 test concern is addressed by the new unaffected-column baseline assertions in tests/test_continuous_did.py:191 and tests/test_continuous_did.py:206, and the survey-design staleness coverage now exercises both replacement and reset-to-default cases in tests/test_power.py:2843 and tests/test_power.py:2866. Residual limitation: I could not execute the tests in this sandbox because |
…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Bundles the two remaining S-complexity findings from the Phase 2 audit — closes Phase 3 execution.
continuous_did_bspline.py:153): the per-basis derivative loop swallowed `ValueError` from `scipy.interpolate.BSpline`, leaving affected columns of the derivative design matrix as zero with no signal. ContinuousDiD analytical inference then fed a biased `dPsi` into SE computation. Fix aggregates failed-basis indices and emits one `UserWarning` naming them. The all-identical-knot degenerate case (single dose value) remains silently handled — zero derivatives are mathematically correct there.Audit state post-merge: all 28 actionable Phase-2 findings resolved. Three P1 follow-ups remain logged in `TODO.md` from PR #337 (Rust-backend algorithmic divergences: FW/PGD mismatch in `compute_synthetic_weights`, TROP grid-search on rank-deficient Y, TROP bootstrap RNG unification) — those are post-audit cleanup work.
Methodology references
Validation
Security / privacy
Confirm no secrets/PII in this PR: Yes.
Generated with Claude Code