Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing hash generation makes simple select distinct blocking #17328

Open
kaikalur opened this issue Feb 22, 2022 · 1 comment
Open

Optimizing hash generation makes simple select distinct blocking #17328

kaikalur opened this issue Feb 22, 2022 · 1 comment

Comments

@kaikalur
Copy link
Contributor

kaikalur commented Feb 22, 2022

A common ad-hoc/exploratory query is:

SELECT DISTINCT c1, c2 .. FROM table WHERE ... LIMIT N;

But due to optimize_hash_generation - the plan will add projections for a) hash computation in scan step and b) project for results after the final distinct. But if we disable hash generation - it's simply like distributed scan+ hash table with no blocking operations in the middle making the results show quicker (even if the query doesn't complete). This makes the query very useful as the users can mostly complete results. Often the limit is way higher than the actual number of distinct values which makes it a useful thing.

So I think we should disable optimizing hash generation for DistinctLimit.

CC: @mbasmanova @rongrong @nlaptev

@kaikalur
Copy link
Contributor Author

root cause: #17631

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

2 participants