perf(test): remove redundant integration test executions#972
perf(test): remove redundant integration test executions#972
Conversation
Add TEST_ARGS to the CI integration test step to skip: - test_exec_smir[*-llvm]: keep Haskell backend only, since it's the backend used for proving and bugs there have higher impact - test_prove_termination: the same 19 programs are already executed via test_exec_smir[*-haskell] This deselects 58 of 247 tests (39 LLVM exec + 19 prove_termination) without modifying any test code — tests remain available for local use. Expected CI time reduction: ~2h37m → ~1h20m. Resolves #971 (Phase 1)
448018a to
eb489e6
Compare
The parentheses in the -k expression were interpreted by the shell inside the docker exec / make pipeline. Rewrite the filter to avoid parentheses: "not llvm" is sufficient since only test_exec_smir tests have "llvm" in their test IDs.
I am not sure I understand what you mean here, could you clarify? |
I thin
Existing tests have been validated by both backends. If we assume that the backends are correct, the semantics may only cause problem because new rules will introduce nd. This case, haskell backend will produce different expected file and will show errors in CI. @mariaKt I don't this description is enough. Please let me know if you have more questions. |
Is it true that we can assume the backends are correct? I thought the main way that we were finding regressions in the backends was from the test suites of the semantics using them. I thought the main way @jberthold was finding out about problems in the haskell backend was from the KEVM test suite, and I feel that when Pi Squared changed the LLVM backend in the past we noticed it in our semantics tests. @ehildenb @palinatolmach what do you think? I am interested in the speed up, but is this the right way to go? It feels to me the solution we would really like is to have #853 implemented - but I don't know how likely that is |
Summary
test_exec_smirtests in CI — keep Haskell only, since it's the backend used for proving and bugs there have higher impacttest_prove_terminationin CI — the same 19 programs are already executed viatest_exec_smir[*-haskell]This is done via a pytest
-kfilter in the CI workflow only — no test code is modified, so all tests remain available for local development.Deselected: 58 of 247 tests (39 LLVM exec_smir + 19 prove_termination)
Risk Analysis
exec_smirtest or add newexec_smirtest with llvm to update expected files, the result is the same as we run CI before.test_prove_terminationjust usesprove-rsto validate the termination, which is the same as the comparison with the current expected files. I believe that there is no risk to remove them in this way.Expected CI time reduction
~2h37m → ~1h20m (based on this run)
Test plan
-kfilter appliedResolves #971 (Phase 1)