Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constraint information is not being cached #101

Closed
atrigent opened this issue Nov 8, 2016 · 9 comments
Closed

Constraint information is not being cached #101

atrigent opened this issue Nov 8, 2016 · 9 comments

Comments

@atrigent
Copy link

atrigent commented Nov 8, 2016

In RedshiftDialect, there are three methods, _get_all_relation_info, _get_all_column_info, and _get_all_constraint_info, which query information about the whole cluster. These methods are called by several other methods which pass their info_cache argument down to one of the above methods. These methods, in turn, are called by other methods which, in most cases, pass down all of their keyword arguments. The one exception is _get_redshift_constraints - this method is never called with the keyword arguments of its caller, and therefore never gets the info_cache object that was passed in by SQLAlchemy. The result of this is that constraint information is never cached. When running something like alembic which does a lot of reflection, this can slow things down considerably because the dialect winds up querying information about the entire database every time alembic asks for anything.

@jklukas
Copy link
Member

jklukas commented Nov 8, 2016

Thanks for catching the lack of caching on _get_redshift_constraints. Sounds like an easy optimization.

When running something like alembic which does a lot of reflection, this can slow things down considerably because the dialect winds up querying information about the entire database every time alembic asks for anything

I believe there's a fundamental constraint in SQLAlchemy's API for reflection that allows caching only within the context of reflecting a single table. I originally wrote the reflection code in this dialect with bulk queries like _get_all_relation_info hoping that we could reconstruct the entire database schema with just a few queries, but I don't believe this is possible.

I do hope that using cache_info for the constraints query will speed things up a bit, but reflection across many tables is likely going to continue to be slow.

@jklukas
Copy link
Member

jklukas commented Nov 8, 2016

PR for this in #102

@jklukas
Copy link
Member

jklukas commented Nov 8, 2016

Also see #103 for more thoughts on persisting schema info across multiple tables.

@atrigent
Copy link
Author

atrigent commented Nov 8, 2016

I think your approach of doing just a few queries actually does work. You're right that the reflection methods that make up the public interface of the dialect generally require a table name, and that caching the result of those methods by itself won't reduce the number of queries that you have to make, but caching the result of your _get_all_* methods, which return information about the whole database, does definitely speed things up considerably. With the change you made in PR #102, a test that we have for making sure that our migrations are up to date goes from 2+ minutes to about 3 seconds.

@jklukas
Copy link
Member

jklukas commented Nov 8, 2016

a test that we have for making sure that our migrations are up to date goes from 2+ minutes to about 3 seconds

Wow. That is much more improvement than I expected.

@atrigent
Copy link
Author

atrigent commented Nov 8, 2016

Well, previously that query in _get_all_constraint_info was being run over and over and over again. Now it only gets run once.

@jklukas
Copy link
Member

jklukas commented Nov 8, 2016

Thank you again for inspecting the code enough to figure out what was going on. I look forward to seeing whether this significantly improves some of my own use cases.

@atrigent
Copy link
Author

atrigent commented Nov 8, 2016

And thank you for the quick response. :)

It's an especially large gain for us because we often have a bunch of stale schemas in this cluster. Therefore, the _get_all_constraint_info query slows down slightly, and the cumulative effect of running that slightly slower query over and over again can cause the test that I mentioned to take 2 minutes or longer. I guess you probably won't notice as much of a speedup, but probably still some.

@jklukas
Copy link
Member

jklukas commented Nov 8, 2016

This change is now merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants