New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary remote exchange for DISTINCT #11251
Comments
@martint , do you intend to change the default for |
There are many queries that fail or have worse performance due to lack of parallelism when use_mark_distinct is off. We can’t yet change the default. |
Currently for MarkDistinct ( presto/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddExchanges.java Line 295 in 054c7d2
presto/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddExchanges.java Line 251 in 054c7d2
There was a similar issue for the Window functions resolved by the #12122 This prevents us from using the grouped execution efficiently. |
I recently found that queries like
Are very common. Currently the high level plan for this query will be
It requires extra shuffle at every step, that is very expensive. In theory, the input can be partitioned once by I think the concert @martint expressed:
does not apply for this very case, as at the final step the aggregation by the The proposal is to change the condition here: presto/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddExchanges.java Line 297 in c386452
to
(of course with doing all the safety checks like The additional check effectively verifies that if the parent partitioning requirement is satisfied, no need to change the partitioning just for the mark distinct. @wenleix , @rschlussel Thoughts? |
So this is indeed an interesting problem. We will label the following query shape for convenience. Query-Single: SELECT custkey, COUNT(DISTINCT col)
FROM ...
GROUP BY custkey Query-Many: SELECT custkey,
COUNT(DISTINCT col1),
COUNT(DISTINCT col2),
...
COUNT(DISTINCT col100)
FROM ...
GROUP BY custkey We will discuss over the two different scenarios (over unbucketed table vs. bucketed table) Unbucketed TableWhen the underlying table is unbucketed and So, for In such case, the discussion in this issue about not inserting add exchange won't help. We should change to use Bucketed TableWhen the input is already partitioned on On the other hand, one could also argue should we just set Other thoughtsAdaptive execution? For example, we can always start the query with I found in general this problem has many resemblance with join/aggregation skew and whether to insert additional exchange, and might want to be solved in the same framework. |
But anyhow, the latest
Again, if the downstream stage is an aggregation on The only reason i can see - is the memory usage skew (some custkey's may have more distinct values for |
After the offline discussion with @rschlussel, we came into conclusion that the shuffles mainly needed to take care of the memory skew if the aggregation column cardinality is low. I think nothing has to be changed related to |
@arhimondr / @wenleix - If we run multiple distinct queries on presto views which is made from multiple union all. Remote exchange stage is happening in single stage for all the distincts
On tables we have multiple stages for mark distinct |
Is this a bug?
Is this intentional to avoid skew and low parallelism? If so, what's the long term plan?
For reference, this is how the plan looks like without DISTINCT
The text was updated successfully, but these errors were encountered: