Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Searching in the batches UI is slow for users with many batches #14599

Closed
daniel-goldstein opened this issue Jul 2, 2024 · 1 comment · Fixed by #14629, #14648 or #14649
Closed

Comments

@daniel-goldstein
Copy link
Contributor

daniel-goldstein commented Jul 2, 2024

What happened?

The Batch UI should only show the most recent 50 batches that the user has submitted, but I suspect there is a flaw in the search query that is doing a full scan on one of the main tables. Searching for user = ci in the Batch UI nearly times out.

First thing I would do here is get the SQL query that is run for user = ci (can hopefully do this in ipython) and run an EXPLAIN against the database. That should hopefully expose a WHERE condition that could be re-written to use an existing index or at worst add an index for.

Version

0.2.131

Relevant log output

No response

@daniel-goldstein daniel-goldstein added needs-triage A brand new issue that needs triaging. batch good-first-project labels Jul 2, 2024
@ehigham ehigham self-assigned this Jul 8, 2024
@daniel-goldstein daniel-goldstein removed the needs-triage A brand new issue that needs triaging. label Jul 9, 2024
hail-ci-robot pushed a commit that referenced this issue Jul 31, 2024
This PR optimises the V2 batches query along with its associated
subqueries.
Listing batches for the CI user now executes in 1s were previously it
had taken ~14s.
Resolves #14599

---------

Co-authored-by: Chris Llanwarne <cjllanwarne@users.noreply.github.com>
@ehigham ehigham reopened this Jul 31, 2024
@ehigham
Copy link
Member

ehigham commented Jul 31, 2024

Reopening as the new query causes a large regression for when a user isn't specified

hail-ci-robot pushed a commit that referenced this issue Aug 1, 2024
@ehigham ehigham reopened this Aug 1, 2024
hail-ci-robot pushed a commit that referenced this issue Aug 8, 2024
#14629 improved the speed of listing a user's batches but introduced a
large
regression for listing all batches readble by that user. This change
fixes that
regression by making use index hints and `STRAIGHT_JOIN`s.

The index hint tells MySQL to never consider the index `batches_deleted`
as it
has very low cardinality. In some forms of this query, the planner tries
to use
it to its peril.

A problem in query 0 with #14629 (see below) was that fewer filters on
batches
made the optimiser consider joins in a suboptimal order - it did a table
scan
on `job_groups` first then sorted the results by to `batches.id DESC`
instead
of doing an index scan on `batches` in reverse.

Using `STRAIGHT_JOIN`s instead of `INNER JOIN` mades the optimiser start
from
`batches` and read its index in reverse before considering other tables
in
subsequent joins. From the
[documentation](https://dev.mysql.com/doc/refman/8.4/en/join.html):

> STRAIGHT_JOIN is similar to JOIN, except that the left table is always
read
before the right table. This can be used for those (few) cases for which
the
  join optimizer processes the tables in a suboptimal order.

This is advantageous for a couple of reasons:
- We want to list newer batches first
- For this query, the `batches` table has more applicables indexes
- We want the variable to order by to be in the primary key of the first
  table so we can read the index in reverse

Before and after timings, collected by running the query 5 times, then
using
profiles gathered by MySQL.
```
+-------+---------------------------------------------------*
| query |  description                                      |                                                                                                                                                                                                                                                         
+-------+---------------------------------------------------+
|     0 | All batches accessible to user `ci`               |
|     1 | All batches accessible to user `ci` owned by `ci` |
+-------+---------------------------------------------------*

+-------+--------+--------------------------------------------------------+------------+------------+
| query | branch | timings                                                |    mean    |    stdev   |                                                                                                                                                                                                                                             
+-------+--------+--------------------------------------------------------+------------+------------+
|     0 |  main  | 0.05894400,0.05207850,0.07067875,0.06281800,0.060250   | 0.06095385 | 0.00602255 |
|     1 |  main  | 14.1106150,12.2619323,13.8442850,12.0749633,14.0297822 | 13.2643156 | 0.90087263 |
+-------+--------+--------------------------------------------------------+------------+------------+
|     0 |   PR   | 0.04717375,0.04974350,0.04312150,0.04070850,0.04193650 | 0.04453675 | 0.00339069 |
|     1 |   PR   | 0.04423925,0.03967550,0.03935425,0.04056875,0.05286850 | 0.04334125 | 0.00507140 |
+-------+--------+--------------------------------------------------------+------------+------------+
```

I'm hopeful that this won't introduce regressions for most use cases.
While I
haven't benchmarked other queries, the MySQL client does feel more
responsive
for a wider array of users. One notable exception is for the user
`dking` who
owns 3.7x more batches than has access to, of which all have been
deleted. I
don't think this is a common enough use case to make this query even
more
complicated than it already is.

Resolves #14599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment