Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase performance for table_exists? #3867

Merged
merged 1 commit into from Dec 5, 2011
Merged

Conversation

jadeforrest
Copy link
Contributor

At New Relic, we have hundreds of thousands of tables, and our migrations took 30 minutes without a similar patch. This cuts it down to a more reasonable amount of time. The more tables you have, the more efficiency gain you derive from the patch.

The rescue false part is ugly, but necessary as far as I can tell. I don't know of a cross-database statement you can make that will work without trapping and relying on errors.

Tested on MySQL and SQLite, but I believe this should work across any database.

At New Relic, we have hundreds of thousands of tables, and our migrations took 30 minutes without this similar patch. This cuts it down to a more reasonable amount of time.

The rescue false part is ugly, but necessary as far as I can tell. I don't know of a cross-database statement you can make that will work without trapping errors.
@jfernandez
Copy link

+1

@tenderlove
Copy link
Member

Isn't the tables list cached? If it's a problem of the array being too long, can we just convert it to a hash and do tables.key?(table_name)?

I'm hesitant to add this patch because it could mask actual errors coming from the adapter.

@jadeforrest
Copy link
Contributor Author

The list is cached, but that doesn't help if you truly have a lot of tables, or if you have tables being generated dynamically. Migrations, for example, do not get the caching benefit.

And this would be a performance improvement for anyone.

One possibility is the adapters could override with their own, database-dependent way of determining whether a table exists. That would preserve the performance improvement but allow you to avoid swallowing errors (which is ugly).

@rkbodenner
Copy link

The problem is doing a "SHOW TABLES". Takes up to 30 minutes in production, per shard. Different adapters might have specific exception classes that we could look for, but then it wouldn't work for all adapters.

@tenderlove
Copy link
Member

@jadeforrest can we push this down to the mysql adapter(s)? I'd be more comfortable / happy with it if we rescued only expected exceptions. Also, is this technique faster on SQLite3 when it has many tables?

@jadeforrest
Copy link
Contributor Author

@tenderlove It will be faster on any database, by a huge amount.

We can push it into the mysql adapters if you prefer, but in my opinion the base case is pretty broken. Imagine a database with a million tables. Do you want to ask the database if that one table exists, or pull down the entire list of a million tables, and then store that whole list of tables, and query against it?

That said, I totally understand your distaste for swallowing the exceptions. I see two possibilities: either we could add a comment encouraging the database adapters to override with a db specific version that does not swallow the exception, or we could just only implement this in the adapters.

Which sounds better to you?

@tenderlove
Copy link
Member

@jadeforrest good point. Let's leave your pull request as is for now. If people complain, we can revert or fix it.

tenderlove added a commit that referenced this pull request Dec 5, 2011
Increase performance for table_exists?
@tenderlove tenderlove merged commit 988061d into rails:master Dec 5, 2011
@jenseng
Copy link
Contributor

jenseng commented Dec 6, 2011

i've confirmed this is fine for postgres (9.0+ anyway) ... the query planner applies the false condition filter before it ever scans any rows, so execution time should always be extremely fast (<0.1ms in my tests)

@jadeforrest
Copy link
Contributor Author

Thank you for testing it on Postgres!

Sent from my iPhone

On Dec 5, 2011, at 8:06 PM, Jon Jensenreply@reply.github.com wrote:

i've confirmed this is fine for postgres (9.0+ anyway) ... the query planner applies the false condition filter before it ever scans any rows, so execution time should always be extremely fast (<0.1ms in my tests)


Reply to this email directly or view it on GitHub:
#3867 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants