Skip to content

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Jan 18, 2026

Implement performance optimizations for the TROP estimator while preserving exact methodology from Athey, Imbens, Qu & Viviano (2025). Target: 5-10x speedup.

Key optimizations:

  • Vectorized data matrix construction using pandas pivot() instead of iterrows
  • Pre-computed structures (_precompute_structures) for distance matrices, time distances, and observation lists reused across LOOCV iterations
  • Vectorized unit distance computation with symmetric matrix storage
  • Optimized observation weights using pre-computed distance matrices
  • Vectorized alternating minimization with matrix operations instead of nested loops for alpha, beta, and L updates
  • LOOCV optimization using shared pre-computed structures

Added TestOptimizationEquivalence test class with 5 tests verifying:

  • Pre-computed structures consistency
  • Vectorized alternating minimization convergence
  • Weight computation correctness
  • Pivot vs iterrows equivalence
  • Reproducibility with seed

All 33 tests pass (28 original + 5 new equivalence tests).

Implement performance optimizations for the TROP estimator while preserving
exact methodology from Athey, Imbens, Qu & Viviano (2025). Target: 5-10x speedup.

Key optimizations:
- Vectorized data matrix construction using pandas pivot() instead of iterrows
- Pre-computed structures (_precompute_structures) for distance matrices,
  time distances, and observation lists reused across LOOCV iterations
- Vectorized unit distance computation with symmetric matrix storage
- Optimized observation weights using pre-computed distance matrices
- Vectorized alternating minimization with matrix operations instead of
  nested loops for alpha, beta, and L updates
- LOOCV optimization using shared pre-computed structures

Added TestOptimizationEquivalence test class with 5 tests verifying:
- Pre-computed structures consistency
- Vectorized alternating minimization convergence
- Weight computation correctness
- Pivot vs iterrows equivalence
- Reproducibility with seed

All 33 tests pass (28 original + 5 new equivalence tests).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Jan 18, 2026

PR #76 Review: Optimize TROP estimator performance with vectorized operations

Reviewer: Claude Code
Date: 2026-01-18
PR: #76
Branch: perf/trop-optimizationmain
Files Changed: 2 (+567/-128 lines)


Summary

This PR optimizes the TROP (Triply Robust Panel) estimator implementation with vectorized operations and pre-computed data structures. The changes significantly improve computational efficiency while maintaining methodological correctness according to the Athey, Imbens, Qu & Viviano (2025) paper.


1. Methodology Validation

1.1 Paper Alignment ✅

The implementation correctly follows the methodology from Athey, Imbens, Qu & Viviano (2025) "Triply Robust Panel Estimators":

Equation 2 (Objective Function) - Page 7:
The implementation correctly minimizes the weighted objective with nuclear norm penalty ||L||_*. The _estimate_model() method (lines 993-1109) implements alternating minimization between:

  • Fixed effects α, β (weighted least squares)
  • Factor matrix L (soft-thresholding SVD for nuclear norm proximal operator)

Equation 3 (Distance-Based Weights) - Page 7:

# Time weights: θ_s = exp(-λ_time × |t - s|)
time_weights = np.exp(-lambda_time * self._precomputed["time_dist_matrix"][t, :])

# Unit weights: ω_j = exp(-λ_unit × dist(j, i))
unit_weights[j] = np.exp(-lambda_unit * dist)

The _compute_observation_weights() method (lines 829-930) correctly implements the exponential distance-based weights.

Equation 5 (LOOCV Criterion) - Page 8:
The _loocv_score_obs_specific() method (lines 1111-1205) correctly computes:

Q(λ) = Σ_{j,s: D_js=0} [τ̂_js^loocv(λ)]²

Algorithm 1 & 2 (Pages 9, 27):
The per-observation estimation loop (lines 731-755) correctly implements Algorithm 2, computing observation-specific weights and fitting separate models for each treated (i,t) observation.

1.2 Issues Identified

Minor: The unit distance computation in _compute_all_unit_distances() (lines 467-526) uses a nested loop over unit pairs. While this is a one-time computation, it could be further optimized using scipy.spatial.distance.cdist for larger panels. However, the current approach is correct and the O(n²) complexity is acceptable since it runs once per fit.


2. Code Quality

2.1 Strengths ✅

  1. Clean abstractions: Pre-computed structures are encapsulated in _precompute_structures() (lines 400-465), improving readability and maintainability.

  2. Vectorized operations: The alternating minimization in _estimate_model() uses numpy broadcasting effectively:

    R_minus_beta = R - beta[:, np.newaxis]  # (n_periods, n_units)
    weighted_R_minus_beta = W_masked * R_minus_beta
    alpha_numerator = np.sum(weighted_R_minus_beta, axis=0)  # (n_units,)
  3. Efficient data reshaping: Pivot-based matrix construction (lines 631-642) replaces O(n) iterrows loop:

    Y = (
        data.pivot(index=time, columns=unit, values=outcome)
        .reindex(index=all_periods, columns=all_units)
        .values
    )
  4. Proper handling of edge cases: Division-by-zero protection with safe_unit_denom and safe_time_denom (lines 1062-1063).

2.2 Suggestions

  1. Type hints completeness: The _precomputed dictionary could benefit from a TypedDict or dataclass definition for better IDE support and documentation.

  2. Magic number: The LOOCV subsample size max_loocv = min(100, len(control_obs)) (line 1171) should be a configurable parameter or class constant with documentation explaining the choice.


3. Security

3.1 Assessment ✅

No security concerns identified:

  • No external data fetching or network calls
  • No file I/O beyond standard pandas operations
  • No shell command execution
  • Input validation present for required columns (lines 612-614)
  • No pickle/deserialization of untrusted data

4. Documentation

4.1 Strengths ✅

  1. Comprehensive docstrings: The _precompute_structures() method has clear documentation explaining its purpose and the structures it creates.

  2. Paper references: Appropriate citations to Athey et al. (2025) with equation numbers.

  3. Test class documentation: TestOptimizationEquivalence includes clear docstrings explaining what each test verifies.

4.2 Suggestions

  1. Algorithm reference: The vectorized alternating minimization section could benefit from inline comments referencing the specific update equations from the paper.

5. Performance

5.1 Optimization Analysis ✅

The PR achieves its performance goals through:

Optimization Location Complexity Change
Pivot-based matrix construction fit() lines 631-642 O(n) → O(1)
Pre-computed distance matrices _precompute_structures() Computed once, reused across LOOCV
Vectorized alternating minimization _estimate_model() O(n×m) loops → numpy broadcasting
Pre-computed time distances time_dist_matrix Avoids recomputation per observation

5.2 Memory Considerations

The pre-computed structures add memory overhead:

  • unit_dist_matrix: O(n_units²)
  • time_dist_matrix: O(n_periods²)

This is acceptable for typical panel sizes but could be documented as a trade-off.


6. Maintainability

6.1 Strengths ✅

  1. Single responsibility: _precompute_structures() is a focused method that handles all pre-computation.

  2. Backward compatibility: The _compute_observation_weights() method includes fallback logic (lines 904-928) for when pre-computed structures are unavailable (bootstrap/jackknife cases).

  3. Comprehensive test coverage: The new TestOptimizationEquivalence class (lines 965-1230) includes:

    • test_precomputed_structures_consistency - Validates pre-computed matrices
    • test_vectorized_alternating_minimization - Tests FE recovery
    • test_vectorized_weights_computation - Validates weight formula
    • test_pivot_vs_iterrows_equivalence - Ensures reshaping equivalence
    • test_reproducibility_with_seed - Tests determinism

6.2 Suggestions

  1. Consider extracting constants:
    DEFAULT_LOOCV_MAX_SAMPLES = 100
    CONVERGENCE_TOL_SVD = 1e-10

7. Test Coverage

7.1 Added Tests ✅

The PR adds 5 new tests in TestOptimizationEquivalence:

  1. test_precomputed_structures_consistency - Verifies time/unit distance matrices are symmetric, diagonal-zero, and correctly computed.

  2. test_vectorized_alternating_minimization - Tests that vectorized FE estimation recovers true effects (correlation > 0.95).

  3. test_vectorized_weights_computation - Validates that weights follow Equation 3: exp(-λ × distance).

  4. test_pivot_vs_iterrows_equivalence - Ensures pivot-based reshaping produces identical matrices to iterrows.

  5. test_reproducibility_with_seed - Confirms deterministic results with same seed.

7.2 Suggestions

Consider adding a performance benchmark test that verifies the optimization provides speedup on larger datasets.


8. Verdict

Recommendation: APPROVE

This PR successfully optimizes the TROP estimator while maintaining methodological correctness. The implementation:

  1. ✅ Correctly implements Athey et al. (2025) methodology
  2. ✅ Uses vectorized operations for significant speedup
  3. ✅ Pre-computes reusable structures to avoid redundant computation
  4. ✅ Includes comprehensive equivalence tests
  5. ✅ Maintains backward compatibility
  6. ✅ Has proper documentation and docstrings

Minor Suggestions (Non-blocking)

  1. Extract magic number 100 for LOOCV samples to a configurable parameter
  2. Consider TypedDict for _precomputed structure
  3. Add performance benchmark test

References

  • Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025). Triply Robust Panel Estimators. Working Paper. https://arxiv.org/abs/2508.21536
    • Equation 2 (page 7): TROP objective function
    • Equation 3 (page 7): Distance-based weights
    • Equation 5 (page 8): LOOCV criterion
    • Algorithm 1 & 2 (pages 9, 27): Estimation procedures

…ct constants

Based on PR #76 code review recommendations:

1. **Issue 1.2 - Vectorize unit distance computation**: Replace nested Python
   loop in _compute_all_unit_distances() with fully vectorized numpy operations
   using broadcasting. Eliminates O(n²) Python loop in favor of optimized
   numpy/BLAS operations.

2. **Add TypedDict for _precomputed structure**: Add _PrecomputedStructures
   TypedDict for better IDE support and documentation of the pre-computed
   data structures.

3. **Extract magic numbers to class constants**:
   - DEFAULT_LOOCV_MAX_SAMPLES = 100 (configurable via max_loocv_samples param)
   - CONVERGENCE_TOL_SVD = 1e-10 (SVD truncation tolerance)

4. **Add max_loocv_samples parameter**: Make LOOCV subsample size configurable
   instead of hardcoded 100. Updated __init__, get_params, and docstrings.

5. **Add algorithm reference comments**: Enhanced alternating minimization
   documentation with paper equation references (Equation 2, Algorithm 1).

All 33 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber igerber merged commit b46f34a into main Jan 18, 2026
4 checks passed
@igerber igerber deleted the perf/trop-optimization branch January 18, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants