New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.01] Optimize count query in grids (Part 1). #5574

Merged
merged 1 commit into from Feb 27, 2018

Conversation

2 participants
@jmchilton
Member

jmchilton commented Feb 21, 2018

Many grid operations execute a complex query twice - once to produce the total number of rows and once to iterate through the selected parts of the data. This can be more optimal in many ways - this starts us down the path to addressing that by doing two things.

The first thing it does is eliminates the query all together if paging is enabled but the user has opted to see "all" the rows. There was never a reason to execute the query in that case I don't think.

The second thing it does is move the calculation of the count query until after the data has been generated - this isn't more optimal by itself but it allows us to potentially calculate the count as part of the main query. I've implemented this in a broken way @ ea40472 but I realized it was too broad an approach and doesn't work if certain kinds of joins are used in the initial query - but a more targetted approach on a per-grid basis could work I think and moving this code after allow us to potentially do that more easily in subsequent commits.

Additionally I added a note - that I think it would be best to just return None as the number of pages and issue a second query to the same endpoint with the same arguments but with count_only=true added as a query request parameter so we can render the main grid faster and then render the paging elements after the second query finishes. This would feel more responsive I think.

Optimize count query in grids (part 1).
Many grid operations execute a complex query twice - once to produce the total number of rows and once to iterate through the selected parts of the data. This can be more optimal in many ways - this starts us down the path to addressing that by doing two things.

The first thing it does is eliminates the query all together if paging is enabled but the user has opted to see "all" the rows. There was never a reason to execute the query in that case I don't think.

The second thing it does is move the calculation of the count query until after the data has been generated - this isn't more optimal by itself but it allows us to potentially calculate the count as part of the main query. I've implemented this in a broken way @ ea40472 but I realized it was too broad an approach and doesn't work if certain kinds of joins are used in the initial query - but a more targetted approach on a per-grid basis could work I think and moving this code after allow us to potentially do that more easily in subsequent commits.

Additionally I added a note - that I think it would be best to just return None as the number of pages and issue a second query to the same endpoint with the same arguments but with count_only=true added as a query request parameter so we can render the main grid faster and then render the paging elements after the second query finishes. This would feel more responsive I think.

@martenson martenson merged commit 50cd159 into galaxyproject:release_18.01 Feb 27, 2018

6 checks passed

api test Build finished. 351 tests run, 4 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 171 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 79 tests run, 4 skipped, 0 failed.
Details
selenium test Build finished. 118 tests run, 3 skipped, 0 failed.
Details
toolshed test Build finished. 577 tests run, 0 skipped, 0 failed.
Details

@martenson martenson added this to Merged in Performance Feb 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment