Reduce TROP test time in pure Python CI by igerber · Pull Request #204 · igerber/diff-diff

igerber · 2026-03-15T23:34:32Z

Summary

Replace file-level pytestmark = pytest.mark.slow in test_trop.py with per-class @pytest.mark.slow on the 5 most expensive test classes (38 tests)
Remove -m '' override from pure Python CI job so the default not slow filter takes effect
59 fast tests continue running in pure Python CI; all 97 tests still run in Rust CI

Methodology references (required if estimator / math changes)

N/A — no methodology changes

Validation

Tests added/updated: No new tests; verified collection counts match expectations:
- pytest tests/test_trop.py --collect-only -q → 59 tests (fast subset)
- pytest tests/test_trop.py --collect-only -q -m '' → 97 tests (all)
- pytest tests/test_trop.py --collect-only -q -m slow → 38 tests (slow only)
DIFF_DIFF_BACKEND=python pytest tests/test_trop.py -q → 59 passed

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Replace the file-level pytestmark = pytest.mark.slow with @pytest.mark.slow on the 5 most expensive test classes (38 tests), keeping 59 fast tests running in pure Python CI. Remove -m '' override from the CI workflow so the default 'not slow' filter takes effect. The Rust CI job continues to run all 97 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-15T23:40:06Z

Overall Assessment

✅ Looks good

Executive Summary

No estimator, weighting, variance/SE, identification, or default-behavior implementation changed. I cross-checked the affected methods against docs/methodology/REGISTRY.md, and this PR is confined to CI/docs/tests.
The test_trop.py marker split is mechanically consistent: the file-level slow mark is replaced by five class-level slow marks, leaving the rest of the TROP test surface runnable under the default not slow filter.
The main downside is coverage, not correctness: removing -m '' from the pure-Python workflow makes that job skip all slow tests, not just the re-marked TROP classes.
That broader skip removes pure-Python CI coverage for non-TROP inference checks in Callaway-Sant’Anna and Sun-Abraham, plus all end-to-end TROP method="global"/"joint" fit tests.
CLAUDE.md is slightly out of sync with the actual default collection behavior because it omits the slow Callaway SE verification test.
I could not run pytest in this environment because it is not installed, so the scope/count conclusions are from static inspection of markers and workflow config.

Methodology

No findings. Cross-check against docs/methodology/REGISTRY.md did not surface any methodology deviation because the diff only touches .github/workflows/rust-test.yml:179, tests/test_trop.py:466, and CLAUDE.md:125, not estimator code under diff_diff/.

Code Quality

No findings.

Performance

No findings. The workflow change should reduce pure-Python CI time as intended.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P2 Impact: Removing -m '' from .github/workflows/rust-test.yml:179 means the suite-wide default marker filter in pyproject.toml:77 now applies to the pure-Python fallback job. That skips every slow test there, not just the TROP classes changed in this PR. In practice, it drops pure-Python execution of the Callaway SE convergence test at tests/test_methodology_callaway.py:804 and the Sun-Abraham bootstrap suite at tests/test_sun_abraham.py:343. Because pure-Python mode also changes test scaling via tests/conftest.py:97, those fallback-specific inference paths are no longer covered in CI. Concrete fix: add a dedicated pure-Python step that explicitly runs the non-TROP slow inference tests with -m '', or scope the skip to TROP-only targets instead of relying on the global slow filter.
P2 Impact: The class-level @pytest.mark.slow on tests/test_trop.py:2721 suppresses all end-to-end pure-Python CI coverage for the public TROP.fit(..., method="global"/"joint") path. That also hides cheap contract tests such as tests/test_trop.py:2847, tests/test_trop.py:2852, and tests/test_trop.py:2866 along with the expensive optimization/bootstrap cases. The remaining fast joint-related coverage at tests/test_trop.py:2602 and tests/test_trop.py:3588 is low-level helper coverage, not fit() smoke coverage. Concrete fix: split TestTROPJointMethod into a fast smoke/API class and a slow optimization/bootstrap class, and keep at least one public method="global" fit test in the non-slow set.
P3 Impact: CLAUDE.md:125 now documents only TROP/Sun-Abraham/TROP-parity slow tests, but the default exclusion also covers the Callaway SE formula test at tests/test_methodology_callaway.py:804. Contributor guidance is therefore incomplete. Concrete fix: reword that paragraph to say “all @pytest.mark.slow tests are excluded by default” or explicitly add the Callaway test to the list.

igerber merged commit cfc4057 into main Mar 16, 2026
11 checks passed

igerber deleted the reduce-python-ci-tests branch March 16, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce TROP test time in pure Python CI#204

Reduce TROP test time in pure Python CI#204
igerber merged 1 commit intomainfrom
reduce-python-ci-tests

igerber commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 15, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant