Skip to content

Test getting started twice - causing it to report as failed when it passed #1942

@branfosj

Description

@branfosj

This looks like #1901 but I'm running ReFrame 3.5.3 - so the fix (#1904) is included.

During a ReFrame run I am having tests reporting as failing during the sanity check with a file not found error. However, this is being caused by the test is being run twice. The test passes the first time and fails on the second run - the first run cleans up the stage directory so that it is not there when the second run checks it. I'm also seeing the test counts showing 482/469.

I'm using the async execution policy. I have max_jobs set on the partition.

I see about 1% of my tests fail each time with this issue, but it is different tests from different partitions failing each time.

All the items in the log relating to this test in this partition:

  ('2020b-gompi-osu_reduce_scatter-benchmark', 'bluebear:cascadelake', 'none') -> []
  ('2020b-gompi-osu_reduce_scatter-benchmark', 'bluebear:cascadelake', 'none') -> []
[2021-04-22T02:51:36] info: reframe: [----------] started processing 2020b-gompi-osu_reduce_scatter-benchmark (2020b gompi osu_reduce_scatter Benchmark)
[2021-04-22T02:51:36] info: reframe: [ ^[[32mRUN     ^[[0m ] 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark: Entering stage: setup
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Setting up test paths
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Created stage directory '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark' [clean_stagedir: True]
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Created output directory '/rds/bear-sysadmin/tools/apps-reframe-testing/output/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Setting up job 'rfm_2020b-gompi-osu_reduce_scatter-benchmark_job' (scheduler: 'slurm', launcher: 'local')
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile_wait
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Generating the run script
[2021-04-22T02:51:40] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Spawned run job (id=32949782)
[2021-04-22T02:51:43] info: reframe: [----------] finished processing 2020b-gompi-osu_reduce_scatter-benchmark (2020b gompi osu_reduce_scatter Benchmark)
[2021-04-22T02:55:25] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_complete
[2021-04-22T02:55:25] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_wait
[2021-04-22T02:56:22] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: sanity
[2021-04-22T02:56:22] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: performance
[2021-04-22T02:56:22] info: reframe: [ ^[[32m      OK^[[0m ] ( 45/469) 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none [compile: 0.006s run: 285.079s total: 286.143s]
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile_wait
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Generating the run script
[2021-04-22T02:56:54] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Spawned run job (id=32949858)
[2021-04-22T02:57:30] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: cleanup
[2021-04-22T02:57:30] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Copying test files to output directory
[2021-04-22T02:57:30] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Removing stage directory
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_complete
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_wait
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: sanity
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: caught builtins.FileNotFoundError: [Errno 2] No such file or directory: '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T02:58:02] info: reframe: [ ^[[31m    FAIL^[[0m ] (379/469) 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none [compile: 0.006s run: 70.905s total: 385.847s]
[2021-04-22T02:58:02] info: reframe: ==> test failed during 'sanity': test staged in '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T05:13:36] info: reframe: FAILURE INFO for 2020b-gompi-osu_reduce_scatter-benchmark
[2021-04-22T05:13:36] info: reframe:   * Stage directory: /rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark
[2021-04-22T05:13:36] info: reframe:   * Rerun with '-n 2020b-gompi-osu_reduce_scatter-benchmark -p none --system bluebear:cascadelake -r'
[2021-04-22T05:13:36] info: reframe:   * Reason: file not found error: [Errno 2] No such file or directory: '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
FileNotFoundError: [Errno 2] No such file or directory: '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T05:13:36] info: reframe:                     [2020b-gompi-osu_reduce_scatter-benchmark, none, bluebear:cascadelake]
2020b-gompi-osu_reduce_scatter-benchmark
$ ls -l /rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark
ls: cannot access '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark': No such file or directory

$ ls -l /rds/bear-sysadmin/tools/apps-reframe-testing/output/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark
total 49
-rw-rw-r-- 1 branfosj build 47646 Apr 22 02:57 rfm_2020b-gompi-osu_reduce_scatter-benchmark_job.err
-rw-rw-r-- 1 branfosj build   388 Apr 22 02:57 rfm_2020b-gompi-osu_reduce_scatter-benchmark_job.out
-rwxrwxr-- 1 branfosj build   610 Apr 22 02:57 rfm_2020b-gompi-osu_reduce_scatter-benchmark_job.sh

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions