[batch] Consider rewriting scheduler and canceller queries to be more efficient #14412

jigold · 2024-03-14T13:01:50Z

What happened?

This is a follow up issue to a comment in the job groups PR #14282. The queries for the scheduler and canceller both pull the job groups that are schedulable/cancellable and then makes one of two queries depending on if the job is always run. This can be confusing that there's two similar queries and requires multiple calls to the database. There's a question here on whether it is possible to come up with a more efficient query given the indices we have or do we need to create a new index to support this. In case it's hard to find the exact comment in the PR, I pasted an example thread below:

Example comment from Dan: https://github.com/hail-is/hail/pull/14170/files#r1476847018

This is my concrete critqiue: the two SQL queries are nearly the same, but they are long enough to
make seeing that challenging.

SELECT jobs.batch_id, jobs.job_id
FROM jobs FORCE INDEX(jobs_batch_id_state_always_run_cancelled)
WHERE batch_id = %s AND job_group_id = %s AND state = 'Ready' AND always_run = 0
LIMIT %s;

SELECT jobs.batch_id, jobs.job_id
FROM jobs FORCE INDEX(jobs_batch_id_state_always_run_cancelled)
WHERE batch_id = %s AND job_group_id = %s AND state = 'Ready' AND always_run = 0 AND cancelled = 1
LIMIT %s;

They only differ in the cancelled condition. As a reader, I'd prefer code that revealed that
similarity and used names to suggest what the difference was doing, and maybe a comment, if there is
no better way to say it, indicaing what you indicated in your GitHub comment.

if job_group['cancelled']:
    where_job_needs_cancelling = ''  # every job in a cancelled group needs cancelling
else:
    where_job_needs_cancelling = 'AND jobs.cancelled'  # jobs.cancelled means child of a failed job
query_for_jobs_to_be_cancelled = f"""
SELECT jobs.batch_id, jobs.job_id
FROM jobs FORCE INDEX(jobs_batch_id_state_always_run_cancelled)
WHERE batch_id = %s
  AND job_group_id = %s
  AND state = 'Ready'
  AND NOT always_run
  {where_job_needs_cancelling}
LIMIT %s;
"""
async for record in self.db.select_and_fetchall(
    query_for_jobs_to_be_cancelled,
    (job_group['batch_id'], job_group['job_group_id'], remaining.value),
):
    yield record

Other similar threads:

Version

0.2.128

Relevant log output

No response

jigold added needs-triage A brand new issue that needs triaging. batch labels Mar 14, 2024

daniel-goldstein added Epic enhancement and removed needs-triage A brand new issue that needs triaging. Epic labels Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[batch] Consider rewriting scheduler and canceller queries to be more efficient #14412

[batch] Consider rewriting scheduler and canceller queries to be more efficient #14412

jigold commented Mar 14, 2024

[batch] Consider rewriting scheduler and canceller queries to be more efficient #14412

[batch] Consider rewriting scheduler and canceller queries to be more efficient #14412

Comments

jigold commented Mar 14, 2024

What happened?

Version

Relevant log output