Update roadmap with current implementation limitations by igerber · Pull Request #21 · igerber/diff-diff

igerber · 2026-01-03T22:36:50Z

Add Priority 1 section documenting features that are partially implemented
or have known limitations in existing estimators:

CallawaySantAnna bootstrap inference (n_bootstrap raises NotImplementedError)
CallawaySantAnna covariate adjustment (parameter accepted but unused)
MultiPeriodDiD wild bootstrap (warns and falls back to analytical)
DifferenceInDifferences.predict() (raises NotImplementedError)
SyntheticDiD robustness (silent bootstrap failures)

Also add:

Quick overview table for at-a-glance status
Goodman-Bacon decomposition to usability section
Code quality & technical debt section
Future considerations for alternative inference methods
Updated visualization and formula interface status

Add Priority 1 section documenting features that are partially implemented or have known limitations in existing estimators: - CallawaySantAnna bootstrap inference (n_bootstrap raises NotImplementedError) - CallawaySantAnna covariate adjustment (parameter accepted but unused) - MultiPeriodDiD wild bootstrap (warns and falls back to analytical) - DifferenceInDifferences.predict() (raises NotImplementedError) - SyntheticDiD robustness (silent bootstrap failures) Also add: - Quick overview table for at-a-glance status - Goodman-Bacon decomposition to usability section - Code quality & technical debt section - Future considerations for alternative inference methods - Updated visualization and formula interface status

Reorganize priorities based on what practitioners actually need: 1.0 Blockers (essential for credibility): - Honest DiD / Rambachan-Roth sensitivity analysis - CallawaySantAnna covariate adjustment - API documentation site 1.0 Target (strengthen release): - Goodman-Bacon decomposition - Power analysis tools - CallawaySantAnna bootstrap inference Post-1.0 (future versions): - Sun-Abraham, Borusyak-Jaravel-Spiess, ML extensions Demoted to technical debt: - predict() method (rarely needed) - MultiPeriodDiD wild bootstrap (edge case) Added clear rationale for each feature explaining why it matters to practitioners and how it compares to R ecosystem.

Phase 2 silent-failures audit — axis-G (backend parity). Closes the coverage gap the audit flagged in three Rust-backed solver surfaces. Test-only PR; any discovered divergences are marked `xfail(strict=True)` and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope. Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373, 597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity` covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank (cond > 1e10), and rank-deficient collinear designs under `skip_rank_check=True` on HC1. Backends agree on fitted values within `rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed. Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`): three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular `Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg variations are `xfail(strict=True)` with a baselined ~15-80% weight divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is projected gradient descent (`utils.py:1228`) — same QP, different simplex vertices under near-degenerate inputs. Finding #23 — TROP Rust grid-search + bootstrap parity (`trop_global.py:688-750, 966-1006`): two parity tests in `TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both `xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6% divergence), bootstrap SE under `seed=42` (~28% divergence, RNG backend mismatch — Rust `rand` crate vs numpy `default_rng`). Plan governance: - Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust entry points (`_solve_ols_rust`, `_rust_synthetic_weights`, `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`); no additional silent-fallback surfaces identified. - Per plan Non-goal #4, did not open an axis-H finding on TROP's `seed=None → 0` substitution at `trop_global.py:994` (out of scope). - No behavioral changes, no warnings, no REGISTRY changes, no flags. TODO.md logs three P1 follow-up entries: algorithmic unification for `compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on rank-deficient Y, TROP bootstrap RNG unification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude added 2 commits January 3, 2026 22:28

igerber merged commit 1774ffd into main Jan 3, 2026

igerber deleted the claude/update-roadmap-doc-u8rcF branch January 3, 2026 22:37

igerber mentioned this pull request Apr 19, 2026

Add axis-G Rust vs Python backend parity edge-case tests #337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update roadmap with current implementation limitations#21

Update roadmap with current implementation limitations#21
igerber merged 2 commits intomainfrom
claude/update-roadmap-doc-u8rcF

igerber commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

igerber commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants