Mark slow tests and exclude by default for faster local iteration by igerber · Pull Request #201 · igerber/diff-diff

igerber · 2026-03-15T20:21:27Z

Summary

Add @pytest.mark.slow to Sun-Abraham bootstrap tests (~696s), TROP rust-backend parity tests (~98s), and all TROP tests (module-level pytestmark)
Set addopts in pyproject.toml to exclude slow tests by default, reducing local pytest from ~17min to ~4min
Update CI workflows to pass -m '' so all tests still run in CI
Vectorize SA bootstrap resampling loop: pre-compute unit→row index mapping and replace Python for loop with np.repeat
Update CLAUDE.md to reflect new slow-test convention

Methodology references (required if estimator / math changes)

Method name(s): N/A — no methodology changes
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None. The SA bootstrap vectorization is a pure performance refactor producing identical results (same RNG path, same index selection, same unit ID assignment).

Validation

Tests added/updated: tests/test_sun_abraham.py, tests/test_rust_backend.py, tests/test_trop.py (marker additions only)
Full suite passes with pytest -m '' (1773 collected, 1710 passed, 63 skipped)
Default pytest run: 1656 collected, 117 slow deselected, all pass in ~4min

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Add @pytest.mark.slow to Sun-Abraham bootstrap tests (~696s), TROP parity tests (~98s), and all TROP tests. Set addopts to exclude slow tests by default, reducing local test time from ~17min to ~4min. CI workflows updated to pass -m '' to run all tests. Also vectorize SA bootstrap resampling loop (pre-compute unit-to-row index mapping, replace Python loop with np.repeat). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-15T20:26:58Z

Overall Assessment

✅ Looks good

Executive Summary

P3 only: internal docs/config were updated for the new slow-test default, but the public Development section still says plain pytest, which now skips the marked slow suites by default in README.md:L2761-L2768, CLAUDE.md:L125-L127, and pyproject.toml:L73-L80.
The only methodology-adjacent code change is the Sun-Abraham bootstrap refactor, and I did not find a source-material mismatch: it still implements unit-level pairs bootstrap, consistent with diff_diff/sun_abraham.py:L999-L1028, docs/methodology/REGISTRY.md:L681-L684, and docs/methodology/REGISTRY.md:L1763-L1763.
I did not see new NaN/SE anti-patterns in the touched estimator path; aggregate inference still goes through safe_inference() / compute_effect_bootstrap_stats() in diff_diff/sun_abraham.py:L599-L638 and diff_diff/sun_abraham.py:L1097-L1117.
The pytest workflow coverage in this repo still opts back into the full suite where needed via -m '' in rust-test.yml:L135-L180.
I could not execute pytest in this sandbox because pytest is not installed here, so this review is from diff inspection and local source review.

Methodology

Severity: None. Impact: Affected method is Sun-Abraham’s optional pairs bootstrap. The refactor in diff_diff/sun_abraham.py:L999-L1028 preserves unit resampling with replacement and within-unit panel blocks, which matches the registry’s documented bootstrap method in docs/methodology/REGISTRY.md:L681-L684 and docs/methodology/REGISTRY.md:L1763-L1763. Concrete fix: None.

Code Quality

Severity: None. Impact: No new inline inference computation, partial NaN guard, or parameter-plumbing anti-pattern appears in the changed estimator path; the file still uses helper-based inference in diff_diff/sun_abraham.py:L599-L638 and diff_diff/sun_abraham.py:L1097-L1117. Concrete fix: None.

Performance

Severity: None. Impact: Precomputing per-unit row indices/counts outside the bootstrap loop removes repeated boolean scans while preserving sampled row order in diff_diff/sun_abraham.py:L1003-L1022. Concrete fix: None.

Maintainability

Severity: None. Impact: The slow-test policy uses standard pytest markers/config plus explicit CI overrides, which keeps behavior understandable in pyproject.toml:L73-L80 and rust-test.yml:L135-L180. Concrete fix: None.

Tech Debt

Severity: None. Impact: I did not identify a new deferrable issue that should have been added to TODO.md. Concrete fix: None.

Security

Severity: None. Impact: The diff changes test config, internal docs, and an in-memory bootstrap loop only; I did not see a new secret-handling or unsafe-command surface. Concrete fix: None.

Documentation/Tests

Severity: P3. Impact: Public contributor docs still say to run plain pytest in README.md:L2761-L2768, but after pyproject.toml:L73-L80 that command excludes the slow-marked suites by default, including tests/test_trop.py:L1-L9, tests/test_sun_abraham.py:L343-L345, and tests/test_rust_backend.py:L1152-L1275. That is not a blocker, but it can mislead contributors about what their local run actually covered. Concrete fix: Update the Development docs to say pytest runs the fast/default subset, pytest -m '' runs the full suite, and pytest -m slow runs only slow tests.

igerber merged commit 1b1fa84 into main Mar 15, 2026
11 checks passed

igerber deleted the refactor/slow-test-markers branch March 15, 2026 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark slow tests and exclude by default for faster local iteration#201

Mark slow tests and exclude by default for faster local iteration#201
igerber merged 1 commit intomainfrom
refactor/slow-test-markers

igerber commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 15, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant