trop: methodology-review tracker promotion + test_methodology_trop.py#491
Conversation
Closes the Athey, Imbens, Qu & Viviano (2025) Triply Robust Panel Estimators (arXiv:2508.21536) primary-source review on the methodology tracker. PR-A (paper review on file at docs/methodology/papers/athey- 2025-review.md) was previously merged as #443; this PR is the F.L.I.P. consolidation: new tests/test_methodology_trop.py with paper-equation- numbered Verified Components walk-through (10 classes, 36 tests covering Eq. 2 nuclear-norm prox / FISTA / weighted-prox, Eq. 3 unit + time weights, Eqs. 4-5 + Algorithm 1 LOOCV with two-stage cycling, Corollary 1 three-condition unbiasedness, Theorem 5.1 MC-ranking realisation, Section 2.2 DID + MC reductions, Eq. 13 + Algorithm 2 per-(i, t) estimation, Algorithm 3 stratified pairs bootstrap, Section 3 / Eq. 6 factor-DGP recovery, plus a TestTROPDeviations class locking 11 documented library deviations). Migrated from tests/test_trop.py: TestMethodologyVerification (5 tests -> TestTROPEquation6FactorDGPRecovery), four paper-conformance tests + one weighted-solver convergence test from TestPaperConformanceFixes (distributed across the new equation-numbered classes), three prox / FISTA / weighted-objective tests from TestTROPNuclearNormSolver (-> TestTROPNuclearNormProx), plus a cycling-convergence test from TestCyclingSearch and the factor-DGP smoke from TestTROPvsSDID. The TestPaperConformanceFixes and TestTROPvsSDID shells are deleted; TestTROPNuclearNormSolver retains its single defensive test_zero_weights_no_division_error. METHODOLOGY_REVIEW.md TROP row promoted In Progress -> Complete (paper method="local") with merge date 2026-05-24, full Verified Components / Test Coverage / Deviations / Outstanding Concerns / R Parity structure mirroring HAD (PR #473), ContinuousDiD (PR #476), DCDH (PR #481), WooldridgeDiD (PR #486). The methodology promotion is scoped to the paper-aligned method="local" path (paper Algorithm 2); method="global" is a library-side efficiency adaptation per REGISTRY and stays defensively covered in tests/test_trop.py::TestTROPGlobalMethod. Documented deviations: Gap #5 (unnormalised weights match Eq. 2, not Section 5 sum-to-one) — locked by a direct kernel-weight inspection test against TROP._compute_observation_weights; Gap #9 (control / pre- treatment cell drops supported beyond paper's balanced-panel assumption); rank selection is implicit via nuclear-norm soft-thresholding (no discrete rank_selection constructor parameter — corrects an earlier REGISTRY overclaim that listed cv / ic / elbow methods); lambda_nn=inf -> 1e10 internal sentinel with original-value storage on results. Outstanding Concerns (deferred): Equation 14 covariate extension (TROP.fit() has no `covariates` kwarg; non-support locked by TestTROPDeviations::test_covariates_not_supported via inspect.signature to guard against future **kwargs) and Theorem 8.1 deferred until use cases motivate. SC / SDID reductions paper-claimed under "specific (omega, theta) weight choices" not provided in the paper text; cross- language anchor deferred until paper-author code clarifies the weight map. Eq. 10 direct numerical reconstruction deferred — requires exposing the internal per-(i, t) theta / omega weight vectors. R parity deferred ("forthcoming" per the paper). Methodology sign-off scope: paper-aligned identification ingredients (Eq. 2 prox + Eq. 3 weights + Eqs. 4-5 LOOCV + Algorithms 1-3 + Corollary 1 single-draw sanity checks + Eq. 6 simulation recovery + DID reduction + documented deviations) are directly locked. Theorem 5.1 is verified as a simulation sanity check (TROP RMSE < DID RMSE under LOOCV-tuned weights), NOT a direct fixed-weight conditional-bias-bound lock. The Matrix Completion reduction is verified as code-path activation (effective_rank > 0 + beats DID baseline), NOT equivalence against an independent MC reference. Plain (non-accelerated) prox- gradient objective monotonicity is locked; the shipped accelerated FISTA outer loop does NOT guarantee per-step monotonicity (Nesterov momentum gives O(1/k^2) but not monotonicity) and is not directly tested. REGISTRY.md ## TROP section gains a Verified Components expansion: 10 ticked requirements + four **Note:** / **Note (paper resolution):** / **Note (deferral):** annotations consolidating the deviation surface. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…scope + plain prox-grad wording Address 3 P3 documentation-scope findings from CI codex on PR #491 (verdict was ✅ Looks good; no P0/P1). P3 - Gap #5 paper-review inconsistency: METHODOLOGY_REVIEW.md and REGISTRY.md marked weight normalization as a "paper resolution", but the paper review at docs/methodology/papers/athey-2025-review.md still said "open source-ambiguity item ... do not promote until resolved against the source". Updated paper-review file with explicit "resolution status" header noting Gap #5 is resolved as a LIBRARY-SIDE choice (deliberate deviation from Section 5 sum-to-one), locked by TestTROPDeviations::test_unnormalized_weights_match_eq2. The source-side ambiguity remains open pending paper-author clarification; the unnormalized form is now the documented library contract. P3 - Eq. 13 / Algorithm 2 general-assignment overclaim: the wording "general assignment patterns (Section 6.1) including staggered adoption and heterogeneous treatment effects" was too broad — the implementation rejects non-absorbing D-matrices via the absorbing- state validator in trop_local.py. Narrowed to "absorbing-state staggered adoption / heterogeneous per-cell effects (paper Remark 6.1)" + explicit "Section 6.1 non-absorbing / on-off / switching assignment patterns are OUT OF SCOPE" with pointer to the rejection- contract test (TestTROPDeviations::test_non_absorbing_treatment_rejected_with_value_error). P3 - "FISTA monotonicity" wording in the Eq. 10 note: the test explicitly says it only verifies plain prox-gradient monotonicity and that the accelerated FISTA outer loop is not monotone (Nesterov momentum). Replaced "FISTA monotonicity" with "plain prox-gradient monotonicity — NOT the shipped accelerated FISTA outer loop, which uses Nesterov momentum and does not guarantee per-step monotonicity" in METHODOLOGY_REVIEW Outstanding Concerns + REGISTRY Eq. 10 note. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…HANGELOG wording Address 3 P3 findings from CI codex R2 on PR #491 (verdict was ✅ Looks good; no P0/P1). P3 - Note label overclaim: REGISTRY/CHANGELOG used "Note (paper resolution)" but the resolution is library-side only (the paper-side ambiguity remains open per the updated paper-review file). Renamed to "Note (library-side choice)" in both REGISTRY and CHANGELOG; expanded the note text to make the open paper-side status explicit and add a "will be revisited" line for when the paper-author reference lands. P3 - Provenance docstrings reference deleted classes: 10 references to TestPaperConformanceFixes / TestTROPvsSDID / TestMethodologyVerification formerly pointed at `tests/test_trop.py::ClassName::method` as if the classes still existed there. Rewrote each provenance pointer to explicitly say "pre-migration class X in test_trop.py (deleted in the methodology-promotion PR)" so future readers don't try to navigate to a non-existent class. Also tightened the TestTROPDeviations docstring: removed two "Tests cover" bullets (bootstrap-failure 5% warning, FISTA convergence warning) that overclaimed in-class coverage — those defensive tests live in test_trop.py and are cross-referenced; the "Cross-references for context" header is now explicit that these are NOT locked by tests in TestTROPDeviations. Added the three actual in-class tests that were missing from the cover list (lambda inf grid rejection, n_bootstrap < 2 rejection, safe_inference NaN-propagation). P3 - CHANGELOG "prox / FISTA / weighted-prox" wording: aligned with the methodology docs' narrower phrasing — "Eq. 2 soft-threshold SVD prox / plain prox-gradient monotonicity on a toy setup / weighted- prox solver (the shipped accelerated FISTA outer loop is NOT directly tested for per-step monotonicity because Nesterov momentum does not guarantee it)". The earlier shorthand was the same FISTA-scope overclaim that R1 P3 fixed in the methodology docs. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Address single P3 from CI codex R3 on PR #491 (verdict was ✅ Looks good; no P0/P1/P2). P3 - Paper-review cross-reference drift: docs/methodology/papers/athey- 2025-review.md L14 hard-coded "**Note (paper resolution):**" as the REGISTRY label, but my R2 fix renamed the registry note to "**Note (library-side choice):**". Replaced the hard-coded label reference with a structural pointer ("the weight-normalization note in the ## TROP block of REGISTRY.md") so the cross-reference stays correct if the label is renamed again. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…e numbers Address 2 P3 findings from CI codex R4 on PR #491 (verdict was ✅ Looks good; no P0/P1). P3 - Test-name cross-ref drift: METHODOLOGY_REVIEW and REGISTRY cited `TestTROPDeviations::test_non_absorbing_treatment_rejected_with_value_error` as the lock for the Section 6.1 non-absorbing rejection contract, but the actual implemented test is named `test_event_style_d_rejected_with_value_error`. Event-style D is one specific non-absorbing pattern, and the same absorbing-state validator catches all 1→0 transitions. Updated both doc references to point to the actual test name and clarified the scope (the validator covers all non-absorbing patterns, not just event-style). P3 - Stale REGISTRY line-number references: 7 `REGISTRY L22xx/L23xx` hard-coded references across METHODOLOGY_REVIEW.md and tests/test_methodology_trop.py (in METHODOLOGY_REVIEW Verified Components + Deviations sections, plus several TestTROPDeviations test docstrings) drift as REGISTRY.md gets edited. Replaced each with the semantic section name (e.g., `REGISTRY ## TROP "λ_nn=∞ implementation" edge-case note`, `REGISTRY ## TROP "Inference CI distribution" note`, etc.) so future edits to REGISTRY.md cannot orphan these cross-refs. No source-code changes to diff_diff/trop*.py. 36 methodology tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…epresentation" wording Address 2 P3 findings from CI codex R5 on PR #491 (verdict was ✅ Looks good; no P0/P1). P3 - Paper version pinning: paper review on file is v2-pinned (arXiv:2508.21536v2) but the current arXiv version is v3. Added an explicit "Version-pinning note (2026-05-25)" to the paper review file acknowledging that the methodology promotion ships against v2 and that a formal v2-vs-v3 delta-check has NOT been performed. Articulated the action item: refresh the review against the most recent arXiv version when the paper-author reference implementation lands ("forthcoming") and re-validate the verified-component checklist. This is honest deferral rather than an unverified "no methodology changes" claim. P3 - Eq. 10 wording drift: my new prose called Eq. 10 a "paper-side asymptotic identity" but the paper review on file describes it as a "balancing representation / decomposition" (per paper Section 5.2). Replaced "asymptotic identity" with "balancing representation" / "balancing representation / decomposition" across tests/test_methodology_trop.py (module + class docstrings), METHODOLOGY_REVIEW.md (Verified Components Eq. 2 bullet), and docs/methodology/REGISTRY.md (Eq. 10 note). CHANGELOG already used "balancing-decomposition pointer" — no change there. No source-code changes to diff_diff/trop*.py. 36 methodology tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Address single P3 from CI codex R6 on PR #491 (verdict was ✅ Looks good; no P0/P1/P2). P3 - Paper-version audit trail: codex surfaced a specific v2→v3 structural change (v3 adds Section 5.1.1 "Conditions for exact unbiasedness" subsection not present in v2). My R5 version-pinning note acknowledged the general v3 gap but didn't cite this specific change. Updated the note to: - Cite the codex finding (v3 adds 5.1.1 subsection) - Note this APPEARS to be restructuring (Corollary 1 material moved into its own subsection), not a methodological change to the underlying three balance conditions - Make the action item more specific: confirm v3's 5.1.1 is the same set of three balance conditions as v2's Corollary 1 when the paper-author reference implementation lands This is honest deferral with a concrete refresh-time check. I attempted a WebFetch of the arXiv abstract page to do the v2-vs-v3 diff inline, but the HTML page only exposes the submission history (size delta only) not the actual PDF/TeX diff — a real diff requires downloading both PDFs which is outside the methodology-promotion scope. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Address single P2 from CI codex R7 on PR #491 (verdict was ✅ Looks good; no P0/P1). P2 - Weighted-prox monotonicity overclaim: METHODOLOGY_REVIEW prose said the weighted non-uniform prox path "monotonically decreases the weighted objective" but: - The test (test_local_nonuniform_weights_objective) only asserts obj_final <= obj_init + 1e-10 (final-vs-initial check), NOT per-iteration monotonicity - The shipped solver _weighted_nuclear_norm_solve uses FISTA / Nesterov acceleration which is allowed to have transient per-step objective increases (already acknowledged in the TestTROPNuclearNormProx class docstring + the test_plain_prox_gradient_objective_decreases scope note) Softened the METHODOLOGY_REVIEW Eq. 2 bullet to: - "at-or-below initialisation (final-vs-initial check ... NOT per-iteration monotonicity — the accelerated FISTA loop is allowed to have transient per-step increases)" Added a "Scope note" to the test docstring explicitly distinguishing this final-vs-initial check from the plain prox-gradient per-step monotonicity verified by test_plain_prox_gradient_objective_decreases. No source-code changes to diff_diff/trop*.py. 6 TestTROPNuclearNormProx tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Address single P2 from CI codex R8 on PR #491 (verdict was ✅ Looks good; no P0/P1). P2 - "TWFE-DiD" wording vs actual comparator class: the methodology test uses `_fit_did()` which instantiates `DifferenceInDifferences` on `outcome ~ treat * post_flag`, but the METHODOLOGY_REVIEW + REGISTRY prose said "TWFE-DiD". The library distinguishes `DifferenceInDifferences` (basic 2×2 DiD design) from `TwoWayFixedEffects` (separate explicit-FE class). On the balanced two-period block- assignment panel used in the DID-reduction test, the two coincide numerically (and paper Section 2.2 uses "DID/TWFE" interchangeably for this special case), but the prose overstated by naming the `TwoWayFixedEffects` class. Softened wording across all surfaces: - `_fit_did()` docstring now explains it's the basic 2×2 DiD estimator (not `TwoWayFixedEffects`) - METHODOLOGY_REVIEW + REGISTRY DID-reduction bullets now say "library's basic-DiD estimator (`DifferenceInDifferences` ... numerically coincides with TWFE on balanced two-period block- assignment panels per paper Section 2.2's 'DID/TWFE' interchangeable framing — though `TwoWayFixedEffects` is a separate explicit-FE class)" - Test class + Theorem 5.1 docstrings: "TWFE-DiD" → "the DiD benchmark" No source-code changes to diff_diff/trop*.py. TestTROPSpecialCases + TestTROPTheorem51TripleRobustness still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No new findings. Existing TROP follow-ups in Security No findings. Documentation/Tests No additional findings beyond the P2 above. The new methodology file is careful about what is directly verified versus only smoke-checked or deferred; I just could not execute the suite in this environment because |
…IEW + REGISTRY Address single P2 from CI codex R9 on PR #491 (verdict was ✅ Looks good; no P0/P1). P2 - Version-pinning visibility: the v2→v3 paper-version gap surfaced as a P3 in R5 (general acknowledgement) and R6 (specific Section 5.1.1 delta noted in paper review file). R9 escalates because the paper review honestly admits the gap but the TROP entry in METHODOLOGY_REVIEW and REGISTRY still says "Complete" without a visible caveat, making the methodology sign-off version-ambiguous at the exact sections newly promoted (Eqs. 2-3, Algorithms 1-3, Section 2.2, Section 5.2-5.3, Section 6.1-6.2, Theorem 5.1, Corollary 1, Appendix Theorem 8.1). Added a visible "**Version Pinning**" subsection to METHODOLOGY_REVIEW TROP entry immediately below the Status row, and a "**Note (version pinning):**" annotation to REGISTRY ## TROP section immediately below the Primary source line. Both: - Anchor the promotion on arXiv:2508.21536v2 explicitly - List the current version (v3, submitted 2026-02-09) - Enumerate the sections covered by the promotion that have NOT been v2→v3 delta-checked - Cite the known v3 structural change (Section 5.1.1 added) - Action-item to refresh when paper-author reference implementation lands Status row also updated to "**Complete** (paper `method="local"`, version-pinned to arXiv v2 — see Version Pinning below)" so the caveat is visible even from a quick scan of the status table. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings beyond the two registry inconsistencies above. Tech Debt No new tracked tech-debt items. The two registry inconsistencies above are not currently recorded in Security No findings. Documentation/Tests No additional findings. |
…or-loadings overclaim Address 2 P2 internal-inconsistency findings from CI codex R10 on PR #491 (verdict was ✅ Looks good; no P0/P1). Both predate this PR but became visible after the methodology-promotion checklist cleanup made the rest of the ## TROP block inconsistent. P2 - Stale rank-selection prose: REGISTRY ## TROP block had three locations claiming rank-selection automation that contradict the new checklist: 1. Assumption checks bullet: "Warns if estimated rank seems too high/low relative to panel dimensions" — no such warning exists in the shipped code (only effective_rank is computed/stored). Removed. 2. Edge cases bullet: "Rank selection: automatic via cross-validation, information criterion, or elbow" — same overclaim as the previously-corrected checklist line. Replaced with the actual contract (implicit via nuclear-norm soft-thresholding; effective_rank is a diagnostic only; no rank_selection constructor parameter) + back-reference to the corrected Requirements checklist bullet + explicit note that this was corrected in the 2026-05-24 promotion PR. P2 - "Returns factor loadings and scores" overclaim: the Requirements checklist had `- [x] Returns factor loadings and scores for interpretation`, but TROPResults only exposes factor_matrix (n_periods × n_units) and effective_rank — no separate loading / score fields. Rewrote the checklist item to describe the actual API surface (`factor_matrix` and `effective_rank`) and flag the overclaim as a 2026-05-24 promotion-PR correction. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. The prior rerun findings are fixed, and the remaining issues I found are documentation/test overclaims rather than shipped estimator defects. Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
|
… unbalanced-panel xref Address 1 P2 + 1 P3 from CI codex R11 on PR #491 (verdict was ✅ Looks good; no P0/P1). P2 - "DID reduction verified" overclaim: the test test_did_reduction_lambda_nn_inf_uniform_weights uses a 10-period panel (n_pre=6, n_post=4) and compares TROP to DifferenceInDifferences, but `_fit_did()`'s own docstring notes that the library's basic DiD only coincides with TWFE on a balanced two-period panel. So the multi-period test is a friendly-DGP sanity check, NOT a direct algebraic-equivalence lock of paper Section 2.2's DID/TWFE reduction. Softened METHODOLOGY_REVIEW + REGISTRY wording from "DID reduction verified" to "DiD benchmark sanity check (NOT a direct algebraic- equivalence proof)" + explicit caveat that "a direct Section 2.2 reduction lock (true 2-period block-assignment panel where basic DiD is the algebraic target, OR a comparison against TwoWayFixedEffects with explicit unit FE) is deferred". The test remains useful as a friendly-DGP smoke check, but the prose now matches the evidence. P3 - Wrong unbalanced-panel test cross-reference: prose cited tests/test_trop.py::TestDMatrixValidation for "missing-treated-cell and thinner-donor-support coverage", but TestDMatrixValidation only covers absorbing-state validation (L776-910). The actual unbalanced- panel regressions are in TestPR110FeedbackRound8 (3 tests: test_unbalanced_panel_d_matrix_validation, ..._real_violation_still_caught, ..._multiple_missing_periods). Repointed the cross-references in both METHODOLOGY_REVIEW and REGISTRY; also dropped the "thinner-donor- support" claim since no dedicated regression exists for that surface. No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…Section 2.2 wording Address 1 P2 + 1 P3 from CI codex R12 on PR #491 (verdict was ✅ Looks good; no P0/P1). P2 - Version-pinning note carrying AI-sourced v3 details: the 2026-05-25 version-pinning notes recorded specific v3-paper claims ("Section 5.1.1 Conditions for exact unbiasedness added") attributed to CI codex rather than a primary-source audit. Reviewer noted that turning unverified AI claims into methodology-tracker facts weakens the source-material standard for a Complete-promoted entry. Stripped the AI-sourced specifics from all three notes (METHODOLOGY_REVIEW, REGISTRY, paper-review file) and reduced to bare "review pinned to v2; no v3 PDF delta-check performed; action item to refresh on next substantive release". This is honest deferral without surfacing unverified specifics. P3 - Section 2.2 residual wording overclaim: even after R11 softening, residual prose in the module docstring + test docstring + tracker summary still used "TWFE-clean (no factor structure, no time trends)" language. The DGP helper _make_no_factor_panel actually has additive time_fe = 0.2*t (not "no time trends"). Aligned with the already- correct "benchmark sanity check" framing: - Module docstring: "TWFE-clean panel" → "no-interactive-FE panel (additive unit + time effects only)"; "verified as an algebraic match" → "benchmark sanity check" - test_did_reduction docstring: "exact in expectation" → "benchmark sanity check, not an algebraic-equivalence proof"; explicit note that the multi-period DGP is the canonical comparator but not the algebraic target - METHODOLOGY_REVIEW: "TWFE-clean ... no time trends" → "no-interactive-FE ... additive unit + time effects only" No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good - no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…-factor DGP Address single P3 from CI codex R13 on PR #491 (verdict was ✅ Looks good; no P0/P1/P2). P3 - effective_rank > 0 assertion locks solver artifact on no-factor DGP: test_factor_matrix_consistent_with_treatment_effects builds a panel with additive unit + time effects + iid noise only (no interactive factor structure). Per the paper's framework, additive FE are absorbed by alpha_i / beta_t, so a near-zero L_hat is methodologically correct on this DGP. The R6 addition of `assert results.effective_rank > 0` was meant to guard the "non-triviality" claim that R5 codex flagged, but on a no-factor DGP this could lock a solver artifact (e.g., regularisation under- shrinkage) rather than the intended low-rank behavior. Removed the effective_rank > 0 assertion from test_factor_matrix_consistent_with_treatment_effects; replaced with a docstring note explaining why no rank claim is made (test DGP has no interactive factors). Updated REGISTRY Eq. 10 note to drop the "non-triviality" claim and explicitly state the test is a structural pointer only (shape + finiteness + treatment_effects existence), with the rationale why effective_rank is not asserted on this DGP. If a future test wants to assert non-triviality of L_hat as Eq. 10 evidence, it would need an explicit interactive-factor DGP — that test is deferred (the test surface here remains the structural pointer per the original promotion plan). No source-code changes to diff_diff/trop*.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ective_rank" docstring claims Address single P3 from CI codex R14 on PR #491 (verdict was ✅ Looks good; no P0/P1/P2). P3 - Stale rank/non-triviality docstrings: after the R13 fix removed the `effective_rank > 0` assertion from test_factor_matrix_consistent_with_treatment_effects, several docstrings still described the test as checking "non-triviality" / "non-zero effective_rank", which the code intentionally no longer does. The inconsistency could mislead future readers into thinking the old rank-based claim is still locked. Updated five locations to match the actual assertions: - Module docstring Eq. 10 line: "non-triviality" → "shape + finiteness ... + treatment_effects populated with finite entries" - TestTROPNuclearNormProx class docstring: similar correction + explicit "does NOT assert non-triviality of L_hat magnitude" - test_factor_matrix_consistent_with_treatment_effects docstring: explicit "**No non-triviality / magnitude claim**" with rationale - test_weighted_nuclear_norm_objective_recovers_att docstring: "non-zero effective_rank" → "non-negative effective_rank" (matches the actual `assert effective_rank >= 0` in the body) with explicit note that the active code path is verified by positive-ATT recovery, not by a non-zero rank claim - The remaining "non-trivial" / "non-triviality" mentions in the file are now either (a) about a different test surface, (b) in a NEGATION clarifying what the test does NOT assert, or (c) on the MC reduction test where the claim IS source-grounded (interactive- factor DGP) No source-code changes to diff_diff/trop*.py. 6 TestTROPNuclearNormProx tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
method="local") with full Verified Components / Test Coverage / Deviations / Outstanding Concerns / R Parity structure mirroring HAD (HeterogeneousAdoptionDiD methodology-review-tracker promotion: In Progress -> Complete #473) / ContinuousDiD (ContinuousDiD methodology-review-tracker promotion: In Progress -> Complete #476) / DCDH (ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion: In Progress -> Complete #481) / WooldridgeDiD (PR-B: WooldridgeDiD tracker promotion + methodology bundle #486) precedents.tests/test_methodology_trop.py(10 paper-equation-anchored classes, 36 tests): Eq. 2 nuclear-norm prox + plain prox-gradient monotonicity + weighted-solver; Eq. 3 unit-distance untreated-only mask + time-decay formula (direct hand-computed assertions); Eqs. 4-5 / Algorithm 1 LOOCV control-set semantics + cycling-search convergence; Corollary 1 single-draw sanity checks across three balance conditions; Theorem 5.1 MC-ranking realisation (TROP RMSE < DID RMSE on factor-confounded DGP); Section 2.2 DID + MC reductions; Eq. 13 / Algorithm 2 per-(i, t) estimation; Algorithm 3 stratified pairs bootstrap; Section 3 / Eq. 6 factor-DGP recovery; plus aTestTROPDeviationsclass locking 11 documented library deviations.tests/test_trop.pycleanup: 9 methodology-relevant tests migrated to the new file (TestMethodologyVerification, paper-conformance subset ofTestPaperConformanceFixes, 3 of 4TestTROPNuclearNormSolvertests,TestCyclingSearch::test_cycling_search_converges,TestTROPvsSDID::test_trop_handles_factor_dgp); theTestPaperConformanceFixesandTestTROPvsSDIDshells are deleted.Methodology references
method="local"— paper Algorithm 2.method="global"is a library-side efficiency adaptation (REGISTRY) and stays defensively covered intests/test_trop.py::TestTROPGlobalMethod.Validation
Security / privacy
Generated with Claude Code