-
Notifications
You must be signed in to change notification settings - Fork 117
Description
After talking to Miguel, the way that our sacct backend is implemented is not optimal. We use sacct always to query the job state, which is generally not the best idea. The reason we are doing that is because it is the only reliable way to retrieve the job state and the exit code. With squeue or scontrol you might end up missing completed jobs. However, we can still optimise the sacct backend by relying more on squeue or scontrol: The default status query should be done with squeue or scontrol and if we cannot retrieve the job information we should issue an sacct command to check if the job has completed. In fact, this behaviour is simply an extension of the current squeue-only backend.
Implementing this change correctly won't be entirely straightforward, because we would need to inverse the current logic. Currently, our base Slurm backend relies on sacct and the squeue one is simply an extension of the sacct-based one. But with what I propose here, this implementation must be reversed (the sacct backend should now be an extension of the squeue one), in order to match the new logic.