Skip to content

OpenPBS job queue hang  #1930

@cblackworth-anl

Description

@cblackworth-anl

When running jobs in OpenPBS 20.0.1 we have found what looks to be partly related to #1301 & #1473

Currently our PBS implementation uses job history output, when running jobs in PBS with reframe the job will error out due to this:

[CMD] 'qstat -f 1015.sch1 1016.sch1 1017.sch1 1018.sch1'
[  FAILED  ] Ran 0/2 test case(s) from 3 check(s) (0 failure(s))
[==========] Finished on Wed Apr 14 16:32:55 2021
/usr/bin/reframe: run session stopped: job scheduler error: qstat failed with exit code 35 (standard error follows):
qstat: 1015.sch1 Job has finished, use -x or -H to obtain historical job information
qstat: 1016.sch1 Job has finished, use -x or -H to obtain historical job information
qstat: 1017.sch1 Job has finished, use -x or -H to obtain historical job information
qstat: 1018.sch1 Job has finished, use -x or -H to obtain historical job information

if we disable the job history support (not ideal), the job just hangs infinitely regardless of if the PBS jobs fail or run successfully

Reframe version:
reframe --version 3.5.0

SLE 15 SP2
lmod
OpenPBS 20.0.1

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions