New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.01] Optimize public grid database interactions. #5514

Merged
merged 1 commit into from Feb 13, 2018

Conversation

4 participants
@jmchilton
Member

jmchilton commented Feb 13, 2018

For all four grid types (histories, viz, workflows, and pages), this fetches the community rating and any annotations in the initial SQL query. This eliminates at least two extra queries per response element in the grid. See comment in the history controller for why I am fairly confident this is a good idea for annotations but tags are less obvious - these grids still go back to the postgres server multiple times per rendered item to fetch tags.

I'm confident we should either subqueryload, joinedload, or upgrade to sqlalchemy 2.2 and selectinload the tags as well - but I'm not sure which without being able to hack on a usegalaxy.org.

This also brings in less of the user model (only username instead of all of it) to reduce over-the-wire transmission of unneeded data from postgres to Galaxy.

For workflows we were LEFT OUTER JOIN-ing on the steps of the latest workflow - so we were bringing back a lot of extra rows for data that was completely unused. In light of this, it makes perfect sense to me why published workflows were the slowest of these and I suspect they will all be equally performant after this change (modulo the number of rows in the tables and the number of rows rendered).

xref #5473 (this should resolve the performance issues by and large - we should still render more quickly and stick a spinner on the page).

@@ -82,8 +83,8 @@ class PageAllPublishedGrid(grids.Grid):
)
def build_initial_query(self, trans, **kwargs):
# Join so that searching history.user makes sense.
return trans.sa_session.query(self.model_class).join(model.User.table)
# See optimization description comments and TODO for tags in matching pulblic histories query.

This comment has been minimized.

@nsoranzo

nsoranzo Feb 13, 2018

Member

s/pulblic/public/

@@ -212,8 +213,8 @@ class VisualizationAllPublishedGrid(grids.Grid):
)
def build_initial_query(self, trans, **kwargs):
# Join so that searching history.user makes sense.
return trans.sa_session.query(self.model_class).join(model.User.table)
# See optimization description comments and TODO for tags in matching pulblic histories query.

This comment has been minimized.

@nsoranzo

nsoranzo Feb 13, 2018

Member

s/pulblic/public/

@@ -138,8 +138,10 @@ class StoredWorkflowAllPublishedGrid(grids.Grid):
]
def build_initial_query(self, trans, **kwargs):
# Join so that searching stored_workflow.user makes sense.
return trans.sa_session.query(self.model_class).join(model.User.table)
# See optimization description comments and TODO for tags in matching pulblic histories query.

This comment has been minimized.

@nsoranzo

nsoranzo Feb 13, 2018

Member

s/pulblic/public/

@guerler

This comment has been minimized.

Contributor

guerler commented Feb 13, 2018

Thanks @jmchilton.

Optimize public grid database interactions.
For all four grid types (histories, viz, workflows, and pages), this computes rating average on the server and prefetches any annotations. This eliminates at least two extra queries for response element in the grid. See comment in the history controller for why I am fairly confident this is a good idea for annotations but tags are less obvious - these grids still go back to the postgres server multiple times per rendered item to render tags.

I'm confident we should either subqueryload, joinedload, or upgrade to sqlalchemy 2.2 and selectinload (http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#sqlalchemy.orm.selectinload) the tags as well - but I'm not sure which without being able to hack on a usegalaxy.org.

This also brings in less of the user model (only username instead of all of it) to reduce over-the-wire transmission of unneeded data from postgres to Galaxy.

For workflows we were LEFT OUTER JOIN-ing on the steps of the latest workflow - so we were bringing back a lot of extra rows for data that was completely unused. In light of this, it makes perfect sense to me why published workflows were the slowest of these and I suspect they will all be equally performant after this change (modulo the number of rows in the tables and the number of rows rendered).
@martenson

This comment has been minimized.

Member

martenson commented Feb 13, 2018

lgtm, thanks for stabbing this (does that phrase work?)

Especially for published lists like this we should implement a cache, I suspect there is a handful of changes a day (if any) to the results of this query.

@martenson martenson merged commit ed23cc6 into galaxyproject:release_18.01 Feb 13, 2018

6 checks passed

api test Build finished. 351 tests run, 4 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 171 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 79 tests run, 4 skipped, 0 failed.
Details
selenium test Build finished. 118 tests run, 3 skipped, 0 failed.
Details
toolshed test Build finished. 577 tests run, 0 skipped, 0 failed.
Details

@martenson martenson added this to Merged in Performance Feb 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment