Skip to content

Tasks in hold are silently aborted when all the currently running tasks fail #507

@vkarak

Description

@vkarak

This is relevant to the async execution policy. Here is an example:

Command line: ./bin/reframe -C config/cscs.py -c cscs-checks/mch/alltoallv.py --exec-policy=async -r
Reframe version: 2.15-dev1
Launched by user: karakasv
Launched on host: keschln-0003
Reframe paths
=============
    Check prefix      :
    Check search path : 'cscs-checks/mch/alltoallv.py'
    Stage dir prefix     : /users/karakasv/Devel/reframe/stage/
    Output dir prefix    : /users/karakasv/Devel/reframe/output/
    Perf. logging prefix : /users/karakasv/Devel/reframe/perflogs
[==========] Running 6 check(s)
[==========] Started on Thu Oct 11 14:52:30 2018

[----------] started processing AlltoallvTest_default (AlltoallvTest_default)
[ RUN      ] AlltoallvTest_default on kesch:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_default (AlltoallvTest_default)

[----------] started processing AlltoallvTest_nocomm (AlltoallvTest_nocomm)
[ RUN      ] AlltoallvTest_nocomm on kesch:cn using PrgEnv-gnu
[     HOLD ] AlltoallvTest_nocomm on kesch:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_nocomm (AlltoallvTest_nocomm)

[----------] started processing AlltoallvTest_nocomp (AlltoallvTest_nocomp)
[ RUN      ] AlltoallvTest_nocomp on kesch:cn using PrgEnv-gnu
[     HOLD ] AlltoallvTest_nocomp on kesch:cn using PrgEnv-gnu
[----------] finished processing AlltoallvTest_nocomp (AlltoallvTest_nocomp)

[----------] started processing HaloExchangeTest_default (HaloExchangeTest_default)
[ RUN      ] HaloExchangeTest_default on kesch:cn using PrgEnv-gnu
[     HOLD ] HaloExchangeTest_default on kesch:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_default (HaloExchangeTest_default)

[----------] started processing HaloExchangeTest_nocomm (HaloExchangeTest_nocomm)
[ RUN      ] HaloExchangeTest_nocomm on kesch:cn using PrgEnv-gnu
[     HOLD ] HaloExchangeTest_nocomm on kesch:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_nocomm (HaloExchangeTest_nocomm)

[----------] started processing HaloExchangeTest_nocomp (HaloExchangeTest_nocomp)
[ RUN      ] HaloExchangeTest_nocomp on kesch:cn using PrgEnv-gnu
[     HOLD ] HaloExchangeTest_nocomp on kesch:cn using PrgEnv-gnu
[----------] finished processing HaloExchangeTest_nocomp (HaloExchangeTest_nocomp)

[----------] waiting for spawned checks to finish
[     FAIL ] AlltoallvTest_default on kesch:cn using PrgEnv-gnu
[----------] all spawned checks have finished

[  FAILED  ] Ran 6 test case(s) from 6 check(s) (1 failure(s))
[==========] Finished on Thu Oct 11 14:53:54 2018

==============================================================================
SUMMARY OF FAILURES
------------------------------------------------------------------------------
FAILURE INFO for AlltoallvTest_default
  * System partition: kesch:cn
  * Environment: PrgEnv-gnu
  * Stage directory: /users/karakasv/Devel/reframe/stage/kesch/cn/PrgEnv-gnu/AlltoallvTest_default
  * Job type: batch job (id=94772)
  * Maintainers: ['AJ', 'VK']
  * Failing phase: poll
  * Reason: caught framework exception: (jobid=94772) job cancelled because it was blocked due to a perhaps non-recoverable reason: ReqNodeNotAvail,  UnavailableNodes:keschcn-[0001-0011]

------------------------------------------------------------------------------

Although the currently running tests fail, none of the tasks in hold is tried.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions