Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Consider rewriting scheduler and canceller queries to be more efficient #14412

Open
jigold opened this issue Mar 14, 2024 · 0 comments
Open

Comments

@jigold
Copy link
Collaborator

jigold commented Mar 14, 2024

What happened?

This is a follow up issue to a comment in the job groups PR #14282. The queries for the scheduler and canceller both pull the job groups that are schedulable/cancellable and then makes one of two queries depending on if the job is always run. This can be confusing that there's two similar queries and requires multiple calls to the database. There's a question here on whether it is possible to come up with a more efficient query given the indices we have or do we need to create a new index to support this. In case it's hard to find the exact comment in the PR, I pasted an example thread below:

Example comment from Dan: https://github.com/hail-is/hail/pull/14170/files#r1476847018

This is my concrete critqiue: the two SQL queries are nearly the same, but they are long enough to
make seeing that challenging.

SELECT jobs.batch_id, jobs.job_id
FROM jobs FORCE INDEX(jobs_batch_id_state_always_run_cancelled)
WHERE batch_id = %s AND job_group_id = %s AND state = 'Ready' AND always_run = 0
LIMIT %s;
SELECT jobs.batch_id, jobs.job_id
FROM jobs FORCE INDEX(jobs_batch_id_state_always_run_cancelled)
WHERE batch_id = %s AND job_group_id = %s AND state = 'Ready' AND always_run = 0 AND cancelled = 1
LIMIT %s;

They only differ in the cancelled condition. As a reader, I'd prefer code that revealed that
similarity and used names to suggest what the difference was doing, and maybe a comment, if there is
no better way to say it, indicaing what you indicated in your GitHub comment.

if job_group['cancelled']:
    where_job_needs_cancelling = ''  # every job in a cancelled group needs cancelling
else:
    where_job_needs_cancelling = 'AND jobs.cancelled'  # jobs.cancelled means child of a failed job
query_for_jobs_to_be_cancelled = f"""
SELECT jobs.batch_id, jobs.job_id
FROM jobs FORCE INDEX(jobs_batch_id_state_always_run_cancelled)
WHERE batch_id = %s
  AND job_group_id = %s
  AND state = 'Ready'
  AND NOT always_run
  {where_job_needs_cancelling}
LIMIT %s;
"""
async for record in self.db.select_and_fetchall(
    query_for_jobs_to_be_cancelled,
    (job_group['batch_id'], job_group['job_group_id'], remaining.value),
):
    yield record

Other similar threads:

Version

0.2.128

Relevant log output

No response

@jigold jigold added needs-triage A brand new issue that needs triaging. batch labels Mar 14, 2024
@daniel-goldstein daniel-goldstein added Epic enhancement and removed needs-triage A brand new issue that needs triaging. Epic labels Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants