Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/oasis-test-runner: parallel job execution runs the same scenarios multiple times or misses some scenarios #3104

Closed
tjanez opened this issue Jul 13, 2020 · 2 comments · Fixed by #3107
Assignees
Labels
c:bug Category: bug c:testing Category: testing

Comments

@tjanez
Copy link
Member

tjanez commented Jul 13, 2020

SUMMARY

To speed up getting the results of tests, it is possible to parallelize their execution by specifying the --parallel.job_count and --parallel.job_index flags as is done in out Buildkite CI script:

${BUILDKITE_PARALLEL_JOB_COUNT:+--parallel.job_count ${BUILDKITE_PARALLEL_JOB_COUNT}} \
${BUILDKITE_PARALLEL_JOB:+--parallel.job_index ${BUILDKITE_PARALLEL_JOB}} \

However, the oasis-test-runner currently doesn't correctly partition the jobs to different runners.

ISSUE TYPE
  • Bug Report
ACTUAL RESULTS

For example, running:

.buildkite/scripts/test_e2e.sh --test e2e/runtime/byzantine/.* --parallel.job_count 3 --parallel.job_index 2

resulted in the following two scenarios being run:

e2e/runtime/byzantine/executor-honest
e2e/runtime/byzantine/executor-straggler

Fuller output:

+ WORKDIR=/home/tadej/Oasis/oasis-core
+ runtime_target=default
+ [[ '' == \i\n\t\e\l\-\s\g\x ]]
+ [[ '' != '' ]]
+ node_binary=/home/tadej/Oasis/oasis-core/go/oasis-node/oasis-node
+ test_runner_binary=/home/tadej/Oasis/oasis-core/go/oasis-test-runner/oasis-test-runner
+ [[ '' != '' ]]
+ ias_mock=true
+ set +x
+ /home/tadej/Oasis/oasis-core/go/oasis-test-runner/oasis-test-runner --basedir.no_cleanup --e2e.node.binary /home/tadej/Oasis/oasis-core/go/oasis-node/oasis-node --e2e/runtime.client.binary_dir /home/tadej/Oasis/oasis-core/target/default/debug --e2e/runtime.runtime.binary_dir /home/tadej/Oasis/oasis-core/target/default/debug --e2e/runtime.runtime.loader /home/tadej/Oasis/oasis-core/target/default/debug/oasis-core-runtime-loader --e2e/runtime.tee_hardware '' --e2e/runtime.ias.mock=true --remote-signer.binary /home/tadej/Oasis/oasis-core/go/oasis-remote-signer/oasis-remote-signer --log.level info --test 'e2e/runtime/byzantine/.*' --parallel.job_count 3 --parallel.job_index 2
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:51:34.823787934Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/merge-wrong run_id=0
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:51:34.823817532Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/merge-honest run_id=0
level=info module=test-runner caller=root.go:367 ts=2020-07-13T10:51:34.823826608Z msg="running test case" test=e2e/runtime/byzantine/executor-honest run_id=0
[ ... output trimmed ...]
level=info module=test-runner caller=root.go:425 ts=2020-07-13T10:51:34.823832891Z msg="passed test case" test=e2e/runtime/byzantine/executor-honest run_id=0
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:51:34.823838913Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/merge-straggler run_id=0
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:51:34.823844475Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/executor-wrong run_id=0
level=info module=test-runner caller=root.go:367 ts=2020-07-13T10:51:34.823850193Z msg="running test case" test=e2e/runtime/byzantine/executor-straggler run_id=0
[ ... output trimmed ...]
level=info module=test-runner caller=root.go:425 ts=2020-07-13T10:51:34.823855891Z msg="passed test case" test=e2e/runtime/byzantine/executor-straggler run_id=0

Running the same command again resulted in the following two scenarios being run:

e2e/runtime/byzantine/executor-straggler
e2e/runtime/byzantine/merge-wrong

Fuller output:

+ WORKDIR=/home/tadej/Oasis/oasis-core
+ runtime_target=default
+ [[ '' == \i\n\t\e\l\-\s\g\x ]]
+ [[ '' != '' ]]
+ node_binary=/home/tadej/Oasis/oasis-core/go/oasis-node/oasis-node
+ test_runner_binary=/home/tadej/Oasis/oasis-core/go/oasis-test-runner/oasis-test-runner
+ [[ '' != '' ]]
+ ias_mock=true
+ set +x
+ /home/tadej/Oasis/oasis-core/go/oasis-test-runner/oasis-test-runner --basedir.no_cleanup --e2e.node.binary /home/tadej/Oasis/oasis-core/go/oasis-node/oasis-node --e2e/runtime.client.binary_dir /home/tadej/Oasis/oasis-core/target/default/debug --e2e/runtime.runtime.binary_dir /home/tadej/Oasis/oasis-core/target/default/debug --e2e/runtime.runtime.loader /home/tadej/Oasis/oasis-core/target/default/debug/oasis-core-runtime-loader --e2e/runtime.tee_hardware '' --e2e/runtime.ias.mock=true --remote-signer.binary /home/tadej/Oasis/oasis-core/go/oasis-remote-signer/oasis-remote-signer --log.level info --test 'e2e/runtime/byzantine/.*' --parallel.job_count 3 --parallel.job_index 2
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:52:12.53867671Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/executor-honest run_id=0
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:52:12.538721487Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/executor-wrong run_id=0
level=info module=test-runner caller=root.go:367 ts=2020-07-13T10:52:12.538734912Z msg="running test case" test=e2e/runtime/byzantine/executor-straggler run_id=0
[ ... output trimmed ...]
level=info module=test-runner caller=root.go:425 ts=2020-07-13T10:52:12.538745432Z msg="passed test case" test=e2e/runtime/byzantine/executor-straggler run_id=0
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:52:12.538757833Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/merge-straggler run_id=0
level=info module=test-runner caller=root.go:352 ts=2020-07-13T10:52:12.538768428Z msg="skipping test case (assigned to different parallel job)" test=e2e/runtime/byzantine/merge-honest run_id=0
level=info module=test-runner caller=root.go:367 ts=2020-07-13T10:52:12.538779146Z msg="running test case" test=e2e/runtime/byzantine/merge-wrong run_id=0
[ ... output trimmed ...]
level=info module=test-runner caller=root.go:425 ts=2020-07-13T10:52:12.538789464Z msg="passed test case" test=e2e/runtime/byzantine/merge-wrong run_id=0
EXPECTED RESULTS

Running the same command for parallel job execution should always result in the same set of scenarios being run.

@tjanez tjanez added c:testing Category: testing c:bug Category: bug labels Jul 13, 2020
@tjanez tjanez self-assigned this Jul 13, 2020
@ptrus
Copy link
Member

ptrus commented Jul 14, 2020

This is probably cause we iterate a map of scenarios (for which go deliberately randomizes order).

See:

for scName, scenario := range common.GetScenarios() {
var match bool
match, err = regexp.MatchString(name, scName)
if err != nil {
return fmt.Errorf("root: bad scenario name regexp: %w", err)
}
if match {
matched[scenario] = true
anyMatched = true
}

I think this just means that scenarios will be grouped into different parallel jobs, not that it will miss/skip some cases. If we want to ensure a stable order one possible fix (i think) is sorting this array, once it's built:

toRun = append(toRun, scenario)

@kostko
Copy link
Member

kostko commented Jul 14, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:bug Category: bug c:testing Category: testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants