Skip to content

Minimise Slurm DB hit rate #508

@vkarak

Description

@vkarak

After talking to Miguel, the way that our sacct backend is implemented is not optimal. We use sacct always to query the job state, which is generally not the best idea. The reason we are doing that is because it is the only reliable way to retrieve the job state and the exit code. With squeue or scontrol you might end up missing completed jobs. However, we can still optimise the sacct backend by relying more on squeue or scontrol: The default status query should be done with squeue or scontrol and if we cannot retrieve the job information we should issue an sacct command to check if the job has completed. In fact, this behaviour is simply an extension of the current squeue-only backend.

Implementing this change correctly won't be entirely straightforward, because we would need to inverse the current logic. Currently, our base Slurm backend relies on sacct and the squeue one is simply an extension of the sacct-based one. But with what I propose here, this implementation must be reversed (the sacct backend should now be an extension of the squeue one), in order to match the new logic.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions