Skip to content

Signal non-convergence in TROP alternating-minimization solvers#317

Merged
igerber merged 5 commits intomainfrom
fix/trop-convergence-warnings
Apr 18, 2026
Merged

Signal non-convergence in TROP alternating-minimization solvers#317
igerber merged 5 commits intomainfrom
fix/trop-convergence-warnings

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 18, 2026

Summary

  • trop_global._solve_global_with_lowrank at trop_global.py:448-488 now threads a converged flag through its outer alternating-minimization loop. The hard-coded range(20) inner FISTA loop (line 466) contributes a non-convergence count that is surfaced in the outer warning for diagnostic context — one consolidated warn_if_not_converged call per solver invocation.
  • trop_local._estimate_model at trop_local.py:680 now threads a converged flag through its outer alternating-min loop and calls the same helper.
  • Reuses diff_diff.utils.warn_if_not_converged (introduced in Signal non-convergence in FE imputation alternating-projection solvers #314).
  • REGISTRY entry added under the TROP section.
  • No numerical change: the returned iterate is identical to what the silent path already returned. The warning is additive.

Methodology references

  • Method: Triply Robust Panel (TROP), alternating minimization per Algorithm 1 (local) and the TWFE + low-rank global estimator. The solver math is unchanged.
  • REGISTRY updated under the TROP section describing the new signal on both paths.

Validation

  • New TestTROPConvergenceWarnings class (4 tests) in tests/test_trop.py:
    • test_global_alternating_min_warns_on_nonconvergence — forced non-convergence via max_iter=1, tol=1e-15, asserts warning
    • test_global_alternating_min_no_warning_on_convergence — convergent negative control (max_iter=500, tol=1e-6)
    • test_local_alternating_min_warns_on_nonconvergence — same for local path
    • test_local_alternating_min_no_warning_on_convergence — convergent negative control
  • Smoke test TestTROP::test_basic_fit passes unchanged.
  • Notable signal: the default TROP local config (max_iter=100, tol=1e-6) does not converge within max_iter on the simple synthetic panel fixture, so this PR will surface a previously silent non-convergence that was affecting routine user fits. The warning is the correct signal; users who want to silence it can raise max_iter or loosen tol.
  • TROP test suite is heavy by design; targeted regression on touched code paths only per the project's feedback_targeted_tests convention.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Addresses axis B findings #6 and #7 from the silent-failures audit:
trop_global.py:448 outer alternating-min loop, trop_global.py:466
hard-coded range(20) inner FISTA loop, and trop_local.py:680
alternating-minimization loop all exited silently on max_iter
exhaustion, returning the current iterate as if converged.

- trop_global._solve_global_with_lowrank: thread a converged flag through
  the outer loop; count non-convergence events from the inner FISTA and
  surface the count in the outer warning for diagnostic context. One
  warn_if_not_converged call per solver invocation.
- trop_local._estimate_model: thread a converged flag through the outer
  alternating-min loop; call warn_if_not_converged on exhaustion.
- REGISTRY updated under TROP.

New TestTROPConvergenceWarnings class (4 tests) exercises both global
and local paths with forced non-convergence (max_iter=1, tol=1e-15)
and a convergent negative control.

Notable: the default TROP local config (max_iter=100, tol=1e-6) does
not converge within max_iter on typical synthetic panels, so this PR
surfaces a previously silent non-convergence that affected routine
user fits. No numerical change in the returned iterate; the warning
is additive.

Axis-B regression-lint baseline: 5 -> 2 silent range(max_iter) loops
remaining (minor loops in honest_did/power not yet addressed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Highest unmitigated issue I found is P2. I did not find a P0/P1 methodology or correctness defect in the touched TROP code.

Executive Summary

  • The PR is methodologically neutral: the touched code only threads convergence state into warnings, and the new behavior is explicitly documented in docs/methodology/REGISTRY.md:L1975.
  • I do not see a change to TROP’s objective, weighting, identification assumptions, ATT construction, or bootstrap/SE logic in diff_diff/trop_global.py:L448 and diff_diff/trop_local.py:L681.
  • P2: the new warnings are emitted from low-level solver helpers that are called inside LOOCV, per-treated-observation fitting, and bootstrap loops, so one non-convergent fit can generate many duplicate warnings instead of one user-facing diagnostic.
  • P3: the new local warning tests do not match the production call contract for control_mask, and the suite only tests private helper entry points, not the user-facing fit() paths where warning fan-out occurs.
  • I could not execute targeted tests in this environment because the local Python environment is missing numpy.

Methodology

Code Quality

  • P2: warn_if_not_converged() is now attached to private solver workers at diff_diff/trop_local.py:L724 and diff_diff/trop_global.py:L498, but those workers are invoked inside internal loops: local final estimation at diff_diff/trop.py:L768, local LOOCV at diff_diff/trop_local.py:L815, global LOOCV at diff_diff/trop_global.py:L283, and global bootstrap/fixed-lambda paths at diff_diff/trop_global.py:L1206 and diff_diff/trop_global.py:L1282. Impact: any non-convergent configuration can now emit many duplicate UserWarnings, which is noisy, slows Python fallback paths, and becomes a hard failure in environments that escalate warnings even though the intended contract is “warn and return the current iterate.” Concrete fix: suppress warning emission inside these internal solver calls (for example via a warn_on_nonconvergence flag or accumulated counter) and emit one consolidated warning per top-level operation (fit, fixed-λ fit, or LOOCV wrapper), including counts/context in that single message.

Performance

  • No additional findings beyond the P2 warning fan-out above.

Maintainability

  • No additional findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3: The new local tests use a 1-D unit-level mask at tests/test_trop.py:L4007, but _estimate_model() documents and production code use an observation-level control mask at diff_diff/trop_local.py:L625 and diff_diff/trop.py:L567. Impact: the tests rely on NumPy broadcasting and do not cover the real fit() call shape, so they would miss regressions in the actual warning-propagation path. Concrete fix: change the local tests to pass control_mask = (D == 0) and add fit()-level tests for both local and global methods that record warnings and assert the intended user-facing warning count/message on the Python path.
  • Execution note: targeted tests were not run here because the local Python environment is missing numpy.

AI review on PR #317 flagged a P2 fan-out problem: the new warnings
fired from low-level solver helpers that are called inside LOOCV,
per-treated-observation, and bootstrap loops. A single non-convergent
configuration could emit dozens to hundreds of duplicate warnings,
which is noisy, slow on Python fallback paths, and a hard failure in
environments that escalate warnings to errors.

Fix pattern: add an optional _nonconvergence_tracker: list kwarg to
_solve_global_with_lowrank, _solve_global_model, _fit_global_with_fixed_lambda,
_estimate_model, and _fit_with_fixed_lambda. When provided, the solver
appends non-convergence events to the tracker instead of warning
directly. Each top-level caller (LOOCV, bootstrap, Rao-Wu bootstrap,
per-treated-observation fit) supplies a tracker and emits a single
consolidated warning summarizing the count of non-converged fits.

Six call sites wrapped:
- trop.py:768 local per-treated-observation main fit loop
- trop_local.py:815 local LOOCV
- trop_local.py:1044 local bootstrap
- trop_local.py:1199 local Rao-Wu bootstrap
- trop_global.py:283 global LOOCV
- trop_global.py:1048 global bootstrap
- trop_global.py:1226 global Rao-Wu bootstrap

Also addresses the P3 test-shape finding: the convergence tests now
pass observation-level `control_mask = (D == 0)` matching the
production call contract at trop.py:567 and trop_local.py:625 (not
the unit-level mask I had earlier). Plus a new fit()-level test
`test_local_fit_emits_single_aggregate_warning` that pins the
aggregate-per-call warning contract.

Smoke check: test_basic_fit previously emitted one per-observation
warning; it now emits a single aggregate like "TROP local
per-treated-observation fit: 15 of 15 fits did not converge".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 18, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 810d8629817bfee09fb55bfb15d7ac61b8ce10bc


Overall Assessment

✅ Looks good

Highest unmitigated issue I found is P3. I do not see a P0/P1 methodology or correctness defect in the touched TROP code.

Executive Summary

  • The affected methods are the TROP local alternating-minimization solver _estimate_model() and the global low-rank alternating-minimization solver _solve_global_with_lowrank(), but the diff is warning-only: I do not see a change to Equation 2 weighting, Algorithm 1 updates, Equation 5 LOOCV scoring, ATT construction, or bootstrap SE logic in diff_diff/trop_local.py:L680-L734 and diff_diff/trop_global.py:L407-L524.
  • The behavior change is explicitly documented in the TROP Registry note at docs/methodology/REGISTRY.md:L1975-L1975, so under the review rubric this is P3-informational rather than a methodology defect.
  • The prior re-review concerns appear addressed: loop-heavy paths now aggregate non-convergence warnings instead of warning per inner fit, and the local tests now use the production observation-level control_mask = D == 0 contract in diff_diff/trop.py:L749-L795, diff_diff/trop_local.py:L811-L855, diff_diff/trop_global.py:L278-L309, and tests/test_trop.py:L4004-L4047.
  • Remaining issue is P3 only: the new tests cover direct solver warnings and local fit() aggregation, but not the new wrapper-level aggregation paths in LOOCV/bootstrap for both methods. I also could not execute the targeted suite here because pytest is unavailable in this environment.

Methodology

Code Quality

No findings. The previous warning fan-out concern appears resolved by the new tracker plumbing in diff_diff/trop.py:L749-L795, diff_diff/trop_local.py:L811-L855, and diff_diff/trop_global.py:L278-L309.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

AI review on PR #317 flagged that my earlier fit()-level test only covered
the per-treated-observation aggregation path, not the LOOCV or bootstrap
wrapper paths. A regression in _nonconvergence_tracker plumbing for those
paths could slip through.

- test_local_fit_emits_single_aggregate_warning: expanded to assert per-obs,
  LOOCV, and bootstrap warnings each appear at most once per .fit().
- test_global_fit_emits_single_aggregate_warning: new test mirroring the
  local one for method="global" (LOOCV + bootstrap paths).

Both use n_bootstrap=2, minimal lambda grid, and max_iter=1/tol=1e-15 to
keep cost low: ~3.4s for all 6 TROP convergence tests combined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 18, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3d21dd87a5c06d61b023b1daf65529ea227d422a


Overall Assessment

✅ Looks good

Highest unmitigated finding is P3.

Executive Summary

  • The affected methods are TROP local alternating minimization and the TROP global low-rank alternating-minimization solver, plus their warning-aggregation wrappers in point estimation, LOOCV, and bootstrap paths.
  • I do not see a methodology or numerical-output change in the touched estimator code. The PR adds non-convergence signaling only, and that behavior is explicitly documented in the Methodology Registry.
  • The warning-plumbing itself is coherent: direct solver calls still warn, while wrapper-managed loops aggregate inner non-convergence into a single user-facing warning.
  • The prior re-review concern is mostly addressed, but one P3 test gap remains: the new fit()-level tests are backend-dependent and do not reliably exercise the changed Python LOOCV/global-bootstrap aggregation paths when Rust is available.
  • I could not run the targeted tests here because pytest is not installed in this environment.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

AI review on PR #317 flagged that my fit()-level tests did not force
HAS_RUST_BACKEND=False, so in Rust-enabled environments they could pass
without exercising the Python aggregation code they were intended to
cover. The earlier <= 1 assertion also would not catch a dropped
warning.

Changes:
- patch.object(sys.modules[...], "HAS_RUST_BACKEND", False) across
  diff_diff.trop, diff_diff.trop_local, diff_diff.trop_global so the
  LOOCV and bootstrap paths route through the Python aggregation wrappers.
  (Uses sys.modules to bypass the name collision between the trop()
  convenience function and the trop module at diff_diff.trop.)
- Tightened assertions: per-treated-observation and bootstrap aggregation
  are called exactly once per fit() so assert == 1. LOOCV is called
  multiple times by the coordinate-descent grid refinement in trop.py,
  so the per-call single-emission contract is verified via message
  format ("N of M per-observation fits") on every LOOCV aggregate
  rather than by global occurrence count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 18, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 71e4153e9d3217667b73939d591f43e362316700


Overall Assessment

✅ Looks good

Highest unmitigated finding is P3.

Executive Summary

  • The affected methods are the TROP local alternating-minimization solver _estimate_model() and the TROP global low-rank alternating-minimization solver _solve_global_with_lowrank(), plus their LOOCV/bootstrap/per-treated-observation warning aggregation wrappers in diff_diff/trop_local.py:604, diff_diff/trop_global.py:407, and diff_diff/trop.py:740.
  • Cross-checking the added registry note in docs/methodology/REGISTRY.md:1975 against the touched solver code shows a warning-only change: no estimand, weighting, identification check, or SE/inference math changed.
  • The prior re-review concern about backend-dependent wrapper tests appears addressed: the new fit-level tests explicitly patch HAS_RUST_BACKEND=False before asserting the Python LOOCV/bootstrap aggregation behavior in tests/test_trop.py:4051 and tests/test_trop.py:4111.
  • One minor diagnostic issue remains in the new local aggregate warning: its denominator counts all treated cells, including treated cells skipped because the outcome is missing, so the reported non-convergence rate can be understated.
  • Verification is static only; pytest is not installed in this environment.

Methodology

Code Quality

  • P3 Impact: The new local per-treated-observation aggregate warning in diff_diff/trop.py:750 and diff_diff/trop.py:787 uses len(treated_observations) as the denominator, but the loop skips treated cells with non-finite outcomes before calling _estimate_model(). In panels with missing treated outcomes, the warning can report a lower non-convergence rate than the fits actually attempted. Concrete fix: track attempted solver calls separately and use that count in the aggregate warning message.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not identify a new deferrable item that needs TODO.md tracking.

Security

  • No findings.

Documentation/Tests

…e warning

AI review noted that the per-treated-observation aggregate warning used
len(treated_observations) as the denominator, but the loop skips cells
with non-finite outcomes before calling _estimate_model(). On panels with
missing treated outcomes, the reported non-convergence rate would be
understated because attempted-but-failed fits were compared against a
total that included never-attempted cells.

Track n_fits_attempted separately and use that as the denominator.
Report is now "X of N-attempted fits did not converge" rather than
"X of N-treated-cells fits did not converge".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 18, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 41b6a5f38d21303577b917bbe04b999c4f7bea3e


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings.

Executive Summary

Methodology

Code Quality

Performance

  • No findings. The added tracking and aggregate warnings are negligible relative to the existing alternating-minimization, LOOCV, and bootstrap work.

Maintainability

  • No findings. Warning emission stays centralized through warn_if_not_converged(), which keeps the new behavior consistent with the project’s broader silent-failure audit pattern (diff_diff/utils.py:68).

Tech Debt

  • No findings. I did not identify a new deferrable item that needs TODO.md tracking, and no existing TODO.md entry is being used to mask a correctness issue in this diff (TODO.md:51).

Security

  • No findings. The PR adds warning/reporting logic, tests, and a registry note only; there is no new secrets, deserialization, subprocess, or user-input handling surface.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 18, 2026
@igerber igerber merged commit 306ed99 into main Apr 18, 2026
23 of 24 checks passed
@igerber igerber deleted the fix/trop-convergence-warnings branch April 18, 2026 20:05
igerber added a commit that referenced this pull request Apr 18, 2026
…odo config

Packages five merged PRs since v3.1.2 as patch release 3.1.3:

- #311 Replicate-weight variance and PSU-level bootstrap for dCDH — new
  variance_method="replicate" (BRR / Fay / JK1 / JKn / SDR) and PSU-level
  multiplier bootstrap, with df-aware inference and group-level PSU map.
- #321 Zenodo DOI auto-minting config — .zenodo.json + top-level LICENSE
  so the next GitHub Release mints a concept + versioned DOI automatically.
- #319 Silent sparse->dense lstsq fallback signaling in ImputationDiD and
  TwoStageDiD — emits ConvergenceWarning instead of switching paths silently.
- #317 Non-convergence signaling in TROP alternating-minimization solvers,
  including LOOCV and bootstrap aggregation. Top-level warning aggregation.
- #320 /bump-version skill now updates CITATION.cff; single RELEASE_DATE
  resolved upfront and threaded through all date-bearing files.

Version strings bumped in diff_diff/__init__.py, pyproject.toml,
rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff
(version: 3.1.3, date-released: 2026-04-18). CHANGELOG populated with
Added / Fixed / Changed sections and comparison-link footer.

Per project SemVer convention, minor bumps are reserved for new estimators
or new module-level API; additive extensions to existing estimators (like
PR #311's new variance_method values) are patch-level.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 18, 2026
CI review on #322 flagged that the 3.1.3 entries for PR #319 (sparse->dense
lstsq fallback) and PR #317 (TROP non-convergence) claimed ConvergenceWarning,
but the actual implementations emit UserWarning (imputation.py, two_stage.py,
utils.py, trop.py, and the REGISTRY.md contract all use UserWarning). Users
filtering warnings by category would be misled.

Same factual error was in the 3.1.2 entries I wrote in PR #316 for PR #314
and PR #315. Fixing both entries in this PR — CHANGELOG is a living doc and
the warning-category drift is actionable-ly misleading.

No code or test changes; CHANGELOG-only edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant