Skip to content

Heterogeneous job ids can't be matched by ReFrame's polling #2359

@rsarm

Description

@rsarm

It looks like #2356 exposes an issue in ReFrame with the Slurm scheduler. The problem comes from the job id format when running heterogeneous jobs which is something like 12345+0, 12345+1, etc. That prevents the polling from matching the jobid properly:

Entering stage: run_complete
Polling 1 task(s) in 'daint:gpu'
[CMD] 'sacct -S 2021-12-23 -P -j 35642692 -o jobid,state,exitcode,end,nodelist'
[S] slurm: Job state not matched (stdout follows)
JobID|State|ExitCode|End|NodeList
35642692+0|PENDING|0:0|Unknown|None assigned
35642692+1|PENDING|0:0|Unknown|None assigned

After the job exits, reframe keeps polling indefinitely.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions