-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"SELECT DISTINCT" on all queries is a performance sucker #4923
Comments
@tractorcow Is this an API change? It's currently assigned against 4.0.0 stable |
It will result in API breakages as many parts of the ORM rely on distinct to enforce unique rows. e.g. left join on one_to_many relations. |
@tractorcow OK moving this out of the 4.0 milestone, doesn't seem realistic at this point. |
In 4.0 we have made a lot of effort in reducing the need for distinct. For instance, public function applyRelation($relation, $linearOnly = false) $linearOnly = true will ensure that distinct isn't necessary. In 5.x I suggest switching distinct to false by default and seeing if we can live with that as the new status quo. DataQuery::initialiseQuery() calls setDistinct(true), which should be off instead. |
As a SRE/DevOps I am in very much in favour of removing as many Example from a DESCRIBE of a random query with distinct:
Without distinct:
Notice how the top row loses the From the MySQL docs:
(sorry about the close/repoen of the issue, it was an tab-enter mistake) |
DISTINCT could be removed from queries that don't add 1-to-many joins, notably those that don't use the dot-syntax in filters/sorts that reference has_many or many_many relations. A lot of queries would be covered by this restriction, so it would be a useful optimisation. Additionally, we could replace DISTINCT by GROUP BY [all, the fields] if we wanted, but would that actually help? Finally, the filters/sorts on related data could potentially be refactored into subqueries, but again, I don't know if a subquery would have any benefit there. If @stojg or others have any views on those 2nd questions, please post :-) |
I don't think a group by would help.. the following is sort of how it works according to MySQL
https://dev.mysql.com/doc/refman/5.6/en/internal-temporary-tables.html Subqueries could possibly work, but as per above documentation.. so it is a rather tricky thing
|
@sminnee you can see in DataQuery::applyRelation() I have added a flag $linearOnly which causes an error in case a join is added that would require a distinct. You could possibly refactor this logic to conditionally add At the moment you can ONLY sort on linear relations. It's just a matter of applying the same logic to conditions as well. ;) All we really need is the free performance when we know it's safe. |
That may mean that this wouldn't work:
Moving from |
My recommendation for this, as a minor release change:
The specific queries called aren't part if our public API, only the data returned by them. So I don't think this will be a breaking change. We'd need to test that, of course. |
I feel like a method that adds a join and a group-by in a single atomic action would be a great idea. An aggregate API (such as |
Ran into this issue while I was trying to work out why some queries were taking a while. If I hack the core to turn off DISTINCT by default it makes a big different, queries I tested (which can be simple I can't seem to work out a nice way to turn this off per query - looks like an all or nothing approach via an extension. Is this still being pursued? |
I'd rather address and resolve all instances where using distinct is absolutely necessary, and resolve those. Using distinct on all queries to solve poorly formed queries is lazy.
Possible implementation (from Sam)
My recommendation for this, as a minor release change:
enableGroupedDistinct()
that adds a clause along the lines ofGROUP BY 1,2,3,4,5...
. This will be preferred over the DISTINCT clauseapplyRelation()
, ensure thatenableGroupDistinct()
is called.The specific queries called aren't part if our public API, only the data returned by them. So I don't think this will be a breaking change. We'd need to test that, of course.
The text was updated successfully, but these errors were encountered: