-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize data sources sql for MySQL #45504
Optimize data sources sql for MySQL #45504
Conversation
This does appear to help a lot. I think it's reasonable compromise between the various MySQL bugs we're fighting against. |
Hello! Just to say on an upgrade from 5.2 to 6 this issue has bitten us quite badly. Our application for better or for worse stores batch experiment results in MySQL and each batch has a different schema, which we use a different table before. We're halfway through a MySQL 5.6->5.7 migration and some databases have hundreds of thousands of tables in them. Deploying this fix as a monkey patch resolved the issue, thanks! Note that only our 5.6 databases were affected. The 5.7 hosts which had not switched off |
Hi - revisiting this since it hit us again just after upgrading one of the last 5.6 servers to 5.7. The application immediately started experiencing latency issues and the query produced by the patch was taking in the region of 4 seconds. Taking a closer look at the patch, the new query being produced looked like this: SELECT table_name FROM information_schema.tables
WHERE table_schema = database()
AND table_name = 'my_table'
AND table_name IN (
SELECT table_name FROM information_schema.tables
WHERE table_schema = database()
)
AND table_type = 'BASE TABLE' In this case the inner select was causing the original performance issue and removing it fixed it. This seems safe since the Perhaps there's a reason for it to be there that was an intended performance benefit that I'm not seeing. Any comments would be useful, especially since it's in |
Although maybe this goes back to what I suggested in #45503 that maybe the behavior should be configurable |
@mfoo, I am having trouble following your report and am not sure what you're asking for. What version of Rails is there an issue? Are you asking us if we're willing backport this fix to 6-0-stable / 6-1-stable? Or are you asking for something else? |
Hi, sorry for the delay. No backporting is necessary, it's not the Rails project's fault that we're behind on version updates! The MySQL 5.7 server in question has around one million innodb tables in it. This makes the subquery method extremely slow. We have not disabled I acknowledge that this is a fairly high number of tables (although innodb permits up to 4 billion) and I don't think Rails/ActiveRecord should necessarily be making changes to support such an unusual use-case, but potentially we can find a solution that works everywhere. Just for clarity, the sequence of events we experienced was that the monkey-patch here (moving from a join to a subquery) fixed some serious performance regressions in Rails 6 on MySQL 5.6, but it introduced performance regressions for us on 5.7 on instances with more tables. We just didn't initially notice because the instances using 5.7 were smaller. Upon upgrading the remaining 5.6 instances we experienced a performance regression and had to monkey-patch out the inner select. I think I understand the inconsistency in MySQL < 8 raised in #39712, but what I don't understand is why it's a problem for the Rails project. What's the use case where Rails needs to know about the existence of tables that it can't read data from? With my current understanding it feels more correct to revert back to the simple version without either the join or the subquery (which is very fast): SELECT table_name FROM information_schema.tables WHERE table_schema = database() AND table_name = 'some_table' and to intentionally rely on MySQL's behaviour (hiding tables you don't have SELECT permission granted for). |
Fixes #45503.
This uses queries like
SELECT table_name FROM information_schema.tables WHERE table_schema = database() AND table_name = "schema_migrations" AND table_name IN (SELECT table_name FROM information_schema.tables WHERE table_schema = database())
(note theIN
).With these changes for 1000 runs I got:
72s
information_schema.tables
bytable_name
. #39712:1.53s
6.8s
@blowfishpro Can you verify these changes improved the case you provided?