Please sign in to comment.
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge `LIMIT` on top of `DISTINCT`
On databases which support `DISTINCT ON` (like PostgreSQL) we skip the `rewriteDistinct` phase and the queries already collapse nicely to a single comprehension. In many cases the `DISTINCT ON` can still be rewritten to a `DISTINCT` later in the code generator (if the columns match up). On databases that do not support `DISTINCT ON` (like MySQL) we have to eliminate these operations early on (in `rewriteDistinct`) and replace them by either `DISTINCT` (where possible) or `GROUP BY`. The catch is that we have to inject an artificial subquery boundary on top of a `DISTINCT` to prevent mappings from being applied across the `DISTINCT` (which could change the set of columns that determine distinctness). This subquery boundary then prevents the `Take` operation from being merged into the existing comprehension in `mergeComprehensions`. The solution is to push the always distinctness-preserving operations `Take` and `Drop` down under `Subquery.AboveDistinct` in `reorderOperations`. The test case (`q7` in `AggregateTest.testDistinct`) also triggers a bug when running on the special H2Rownum test profile: When `resolveZipJoins` uses `rownumStyle=true` you can end up with a `Subquery.BelowRownum` boundary between a `Distinct` and its enclosing `Bind`, in which case `rewriteDistinct` doesn't perform the rewriting. The solution is to ensure that there is always a `Bind` below the boundary in `resolveZipJoins` and create an identity `Bind` where necessary to preserve this invariant (which should hold in all phases between `forceOuterBinds` and `mergeToComprehensions`).
- Loading branch information
Showing with 23 additions and 4 deletions.