perf(test): remove redundant integration test executions by Stevengre · Pull Request #972 · runtimeverification/mir-semantics

Stevengre · 2026-03-04T15:14:03Z

Summary

Skip LLVM backend test_exec_smir tests in CI — keep Haskell only, since it's the backend used for proving and bugs there have higher impact
Skip test_prove_termination in CI — the same 19 programs are already executed via test_exec_smir[*-haskell]

This is done via a pytest -k filter in the CI workflow only — no test code is modified, so all tests remain available for local development.

Deselected: 58 of 247 tests (39 LLVM exec_smir + 19 prove_termination)

Risk Analysis

LLVM backend regression will be missed in the previous test, which should be handled by future test framework refactoring. But if we don't add new exec_smir test or add new exec_smir test with llvm to update expected files, the result is the same as we run CI before.
test_prove_termination just uses prove-rs to validate the termination, which is the same as the comparison with the current expected files. I believe that there is no risk to remove them in this way.

Expected CI time reduction

~2h37m → ~1h20m (based on this run)

Test plan

Integration tests pass with the -k filter applied
Verify CI time improvement

Resolves #971 (Phase 1)

Add TEST_ARGS to the CI integration test step to skip: - test_exec_smir[*-llvm]: keep Haskell backend only, since it's the backend used for proving and bugs there have higher impact - test_prove_termination: the same 19 programs are already executed via test_exec_smir[*-haskell] This deselects 58 of 247 tests (39 LLVM exec + 19 prove_termination) without modifying any test code — tests remain available for local use. Expected CI time reduction: ~2h37m → ~1h20m. Resolves #971 (Phase 1)

The parentheses in the -k expression were interpreted by the shell inside the docker exec / make pipeline. Rewrite the filter to avoid parentheses: "not llvm" is sufficient since only test_exec_smir tests have "llvm" in their test IDs.

mariaKt · 2026-03-05T22:54:39Z

LLVM backend regression will be missed in the previous test, which should be handled by future test framework refactoring. But if we don't add new exec_smir test or add new exec_smir test with llvm to update expected files, the result is the same as we run CI before.

I am not sure I understand what you mean here, could you clarify?

Stevengre · 2026-03-06T03:01:50Z

LLVM backend regression will be missed in the previous test, which should be handled by future test framework refactoring.

I thin exec_smir test is just for mir-semantics availability of llvm backend. If we assume the backends are correct, what we need to do is just leaving some quick tests for llvm backend to make sure our semantics can run on it. That's what I mean about refactoring. But just a thought for now.

But if we don't add new exec_smir test or add new exec_smir test with llvm to update expected files, the result is the same as we run CI before.

Existing tests have been validated by both backends. If we assume that the backends are correct, the semantics may only cause problem because new rules will introduce nd. This case, haskell backend will produce different expected file and will show errors in CI.

@mariaKt I don't this description is enough. Please let me know if you have more questions.

dkcumming · 2026-03-06T05:55:27Z

If we assume the backends are correct

Is it true that we can assume the backends are correct? I thought the main way that we were finding regressions in the backends was from the test suites of the semantics using them. I thought the main way @jberthold was finding out about problems in the haskell backend was from the KEVM test suite, and I feel that when Pi Squared changed the LLVM backend in the past we noticed it in our semantics tests. @ehildenb @palinatolmach what do you think? I am interested in the speed up, but is this the right way to go? It feels to me the solution we would really like is to have #853 implemented - but I don't know how likely that is

Stevengre force-pushed the jh/reduce-integration-test-time branch 2 times, most recently from 448018a to eb489e6 Compare March 4, 2026 15:16

Stevengre marked this pull request as draft March 4, 2026 15:17

Stevengre self-assigned this Mar 4, 2026

Stevengre added 2 commits March 4, 2026 23:22

fix(ci): quote pytest -k expression in TEST_ARGS

871776b

Stevengre requested review from dkcumming, ehildenb and mariaKt March 5, 2026 02:36

Stevengre marked this pull request as ready for review March 5, 2026 02:36

Stevengre requested a review from palinatolmach March 6, 2026 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(test): remove redundant integration test executions#972

perf(test): remove redundant integration test executions#972
Stevengre wants to merge 3 commits intomasterfrom
jh/reduce-integration-test-time

Stevengre commented Mar 4, 2026 •

edited

Loading

Uh oh!

mariaKt commented Mar 5, 2026 •

edited

Loading

Uh oh!

Stevengre commented Mar 6, 2026

Uh oh!

dkcumming commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Stevengre commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Risk Analysis

Expected CI time reduction

Test plan

Uh oh!

mariaKt commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stevengre commented Mar 6, 2026

Uh oh!

dkcumming commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Stevengre commented Mar 4, 2026 •

edited

Loading

mariaKt commented Mar 5, 2026 •

edited

Loading

dkcumming commented Mar 6, 2026 •

edited

Loading