New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MM-23369: Allow mysql to choose a better index #14119
Conversation
When the ORDER BY clause contains a column which is in the WHERE clause and also part of an index, mysql tries to use that specific index to avoid sorting. This is inspite of the fact that there may be other indices which are better for scanning the table and then doing a sort. Essentially, mysql becomes dumb and scans a lot of rows to avoid sorting. Whereas, it could have scanned a lot less rows and do the sorting in no time. To fix this, we use the other columns in the ORDER BY clause as well which are part of the index. This causes no change in the results because the other columns are an EQUAL condition check, but this lets mysql use the right index. Because now mysql sees that it has to order by other columns too, so it better use the other index to scan and then do the sorting. This does not affect tables of smaller size because the LIMIT of rows is always 1. And mysql will stop sorting the moment it gets the first row. So sorting is not the overhead at all. Therefore, this seems like an optimal fix. References: https://dev.mysql.com/doc/refman/5.7/en/table-scan-avoidance.html https://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
@lieut-data - Thought you would be interested in this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes a ton of sense to me -- great research!
store/sqlstore/post_store.go
Outdated
// Adding ChannelId and DeleteAt order columns | ||
// to let mysql choose the "idx_posts_channel_id_delete_at_create_at" index always. | ||
// See MM-23369. | ||
OrderBy("ChannelId, DeleteAt, CreateAt " + sort). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might we write this as follows to minimize the "manual SQL" and maximize Squirrel's flexibility?
OrderBy("ChannelId, DeleteAt, CreateAt " + sort). | |
OrderBy("ChannelId", "DeleteAt", "CreateAt " + sort). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agnivade 👍 This will be tested post-merge...
/cherry-pick release-5.23 |
* MM-23369: Allow mysql to choose a better index When the ORDER BY clause contains a column which is in the WHERE clause and also part of an index, mysql tries to use that specific index to avoid sorting. This is inspite of the fact that there may be other indices which are better for scanning the table and then doing a sort. Essentially, mysql becomes dumb and scans a lot of rows to avoid sorting. Whereas, it could have scanned a lot less rows and do the sorting in no time. To fix this, we use the other columns in the ORDER BY clause as well which are part of the index. This causes no change in the results because the other columns are an EQUAL condition check, but this lets mysql use the right index. Because now mysql sees that it has to order by other columns too, so it better use the other index to scan and then do the sorting. This does not affect tables of smaller size because the LIMIT of rows is always 1. And mysql will stop sorting the moment it gets the first row. So sorting is not the overhead at all. Therefore, this seems like an optimal fix. References: https://dev.mysql.com/doc/refman/5.7/en/table-scan-avoidance.html https://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html * Added a comment to clarify things in code * Incorporating review comments Co-authored-by: Agniva De Sarker <agnivade@yahoo.co.in>
Summary
When the ORDER BY clause contains a column which is in the WHERE
clause and also part of an index, mysql tries to use that specific
index to avoid sorting. This is inspite of the fact that
there may be other indices which are better for scanning the table
and then doing a sort.
Essentially, mysql becomes dumb and scans a lot of rows to avoid
sorting. Whereas, it could have scanned a lot less rows and do the
sorting in no time.
To fix this, we use the other columns in the ORDER BY clause as well
which are part of the index. This causes no change in the results
because the other columns are an EQUAL condition check, but this
lets mysql use the right index. Because now mysql sees that it has
to order by other columns too, so it better use the other index
to scan and then do the sorting.
This does not affect tables of smaller size because the LIMIT of
rows is always 1. And mysql will stop sorting the moment it gets
the first row. So sorting is not the overhead at all.
Therefore, this seems like an optimal fix.
References:
https://dev.mysql.com/doc/refman/5.7/en/table-scan-avoidance.html
https://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index
https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
Ticket Link
https://mattermost.atlassian.net/browse/MM-23369