-
Notifications
You must be signed in to change notification settings - Fork 117
Closed as not planned
Closed as not planned
Copy link
Description
This looks like #1901 but I'm running ReFrame 3.5.3 - so the fix (#1904) is included.
During a ReFrame run I am having tests reporting as failing during the sanity check with a file not found error. However, this is being caused by the test is being run twice. The test passes the first time and fails on the second run - the first run cleans up the stage directory so that it is not there when the second run checks it. I'm also seeing the test counts showing 482/469.
I'm using the async execution policy. I have max_jobs set on the partition.
I see about 1% of my tests fail each time with this issue, but it is different tests from different partitions failing each time.
All the items in the log relating to this test in this partition:
('2020b-gompi-osu_reduce_scatter-benchmark', 'bluebear:cascadelake', 'none') -> []
('2020b-gompi-osu_reduce_scatter-benchmark', 'bluebear:cascadelake', 'none') -> []
[2021-04-22T02:51:36] info: reframe: [----------] started processing 2020b-gompi-osu_reduce_scatter-benchmark (2020b gompi osu_reduce_scatter Benchmark)
[2021-04-22T02:51:36] info: reframe: [ ^[[32mRUN ^[[0m ] 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark: Entering stage: setup
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Setting up test paths
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Created stage directory '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark' [clean_stagedir: True]
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Created output directory '/rds/bear-sysadmin/tools/apps-reframe-testing/output/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T02:51:36] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Setting up job 'rfm_2020b-gompi-osu_reduce_scatter-benchmark_job' (scheduler: 'slurm', launcher: 'local')
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile_wait
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run
[2021-04-22T02:51:37] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Generating the run script
[2021-04-22T02:51:40] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Spawned run job (id=32949782)
[2021-04-22T02:51:43] info: reframe: [----------] finished processing 2020b-gompi-osu_reduce_scatter-benchmark (2020b gompi osu_reduce_scatter Benchmark)
[2021-04-22T02:55:25] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_complete
[2021-04-22T02:55:25] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_wait
[2021-04-22T02:56:22] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: sanity
[2021-04-22T02:56:22] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: performance
[2021-04-22T02:56:22] info: reframe: [ ^[[32m OK^[[0m ] ( 45/469) 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none [compile: 0.006s run: 285.079s total: 286.143s]
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: compile_wait
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run
[2021-04-22T02:56:51] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Generating the run script
[2021-04-22T02:56:54] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Spawned run job (id=32949858)
[2021-04-22T02:57:30] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: cleanup
[2021-04-22T02:57:30] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Copying test files to output directory
[2021-04-22T02:57:30] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Removing stage directory
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_complete
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: run_wait
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: Entering stage: sanity
[2021-04-22T02:58:02] debug: 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none: caught builtins.FileNotFoundError: [Errno 2] No such file or directory: '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T02:58:02] info: reframe: [ ^[[31m FAIL^[[0m ] (379/469) 2020b-gompi-osu_reduce_scatter-benchmark on bluebear:cascadelake using none [compile: 0.006s run: 70.905s total: 385.847s]
[2021-04-22T02:58:02] info: reframe: ==> test failed during 'sanity': test staged in '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T05:13:36] info: reframe: FAILURE INFO for 2020b-gompi-osu_reduce_scatter-benchmark
[2021-04-22T05:13:36] info: reframe: * Stage directory: /rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark
[2021-04-22T05:13:36] info: reframe: * Rerun with '-n 2020b-gompi-osu_reduce_scatter-benchmark -p none --system bluebear:cascadelake -r'
[2021-04-22T05:13:36] info: reframe: * Reason: file not found error: [Errno 2] No such file or directory: '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
FileNotFoundError: [Errno 2] No such file or directory: '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark'
[2021-04-22T05:13:36] info: reframe: [2020b-gompi-osu_reduce_scatter-benchmark, none, bluebear:cascadelake]
2020b-gompi-osu_reduce_scatter-benchmark
$ ls -l /rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark
ls: cannot access '/rds/bear-sysadmin/tools/apps-reframe-testing/stage/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark': No such file or directory
$ ls -l /rds/bear-sysadmin/tools/apps-reframe-testing/output/bluebear/cascadelake/none/2020b-gompi-osu_reduce_scatter-benchmark
total 49
-rw-rw-r-- 1 branfosj build 47646 Apr 22 02:57 rfm_2020b-gompi-osu_reduce_scatter-benchmark_job.err
-rw-rw-r-- 1 branfosj build 388 Apr 22 02:57 rfm_2020b-gompi-osu_reduce_scatter-benchmark_job.out
-rwxrwxr-- 1 branfosj build 610 Apr 22 02:57 rfm_2020b-gompi-osu_reduce_scatter-benchmark_job.sh